Hands on Machine Learning with a Practical Approach

In this Article/Blog we are going to predict the Survival of a person , for this we'll be using the titanic survival dataset .

Hands on Machine Learning with a Practical Approach

This project will cover only some basic parts of Machine Learning, and this blog is still incomplete as I have not defined the functions used, such as how and where to use accuracy_score(), train_test_split(), etc. Please feel free to follow and comment here if you want that dataset file .
Your valuable suggestions are cordially welcomed and greatly appreciated.

-Bhaaavre

Importing all the necessary libraries

import numpy as np
import pandas as pd

Importing the dataset

dataset = pd.read_csv('file_path')

Analyzing the first 5 entries

dataset.head()

Dropping all the unnecessary columns

As we can see that many columns in this dataset are not meaningful and we are keeping this model to be beginner friendly so we'll try more no of columns

dataset = dataset.drop['PassengerId','Name', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'SibSp', 'Parch'], axis=1)

Again analyze the data , to check whether the columns are dropped or not

dataset.head()

Checking the shape of the dataset

dataset.shape

This will tell us that how many rows and cols are present in our dataset

Label Encoding

Now as our 'Sex' col contains the string value i.e Male or female so we have to change it into the numeric value as male = 0 and female = 1

dataset['Sex'] = dataset['Sex'].replace({'male' : 0 , 'female' : 1})

Analyzing that how many Missing Values are present

dataset.isnull().sum()

We can see here the 'Age' column contains so many missing values .

Handling Missing Values

So in that missing values we will insert the mean of the other values present inside the whole column.

dataset['Age'] = dataset['Age'].fillna(dataset['Age'].mean())

Again checking that missing values are handled or not

dataset.isnull().sum()

Splitting Features and Target

Here X= features (all the columns except 'Survived' , Y = target

X = dataset.drop(['Survived'], axis=1)
Y = dataset['Survived']

Splitting the dataset for Training and Testing parts

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)

Training the model using DecisionTreeClassifier

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(max_depth= 4)
model.fit(X_train,Y_train)

Evaluating the model for the Training data

from sklearn.metrics import accuracy_score

train_prediction = model.predict(X_train)
train_accuracy = accuracy_score(train_prediction, Y_train)

print("Training data accuracy = ", train_accuracy)

Evaluating the model for the Testing data

test_prediction = model.predict(X_test)
test_accuracy = accuracy_score(test_prediction, Y_test)

print("Testing data accuracy = ", test_accuracy)