Hands on Machine Learning with a Practical Approach

In this Article/Blog we are going to predict the Survival of a person , for this we'll be using the titanic survival dataset .

This project will cover only some basic parts of Machine Learning, and this blog is still incomplete as I have not defined the functions used, such as how and where to use accuracy_score(), train_test_split(), etc. Please feel free to follow and comment here if you want that dataset file .
Importing all the necessary libraries

import numpy as np
import pandas as pd

Importing the dataset

dataset = pd.read_csv('file_path')

Analyzing the first 5 entries


Dropping all the unnecessary columns

As we can see that many columns in this dataset are not meaningful and we are keeping this model to be beginner friendly so we'll try more no of columns

dataset = dataset.drop['PassengerId','Name', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'SibSp', 'Parch'], axis=1)

Again analyze the data , to check whether the columns are dropped or not


Checking the shape of the dataset


This will tell us that how many rows and cols are present in our dataset

Label Encoding

Now as our 'Sex' col contains the string value i.e Male or female so we have to change it into the numeric value as male = 0 and female = 1

dataset['Sex'] = dataset['Sex'].replace({'male' : 0 , 'female' : 1})

Analyzing that how many Missing Values are present


We can see here the 'Age' column contains so many missing values .

Handling Missing Values

So in that missing values we will insert the mean of the other values present inside the whole column.

dataset['Age'] = dataset['Age'].fillna(dataset['Age'].mean())

Again checking that missing values are handled or not


Splitting Features and Target

Here X= features (all the columns except 'Survived' , Y = target

X = dataset.drop(['Survived'], axis=1)
Y = dataset['Survived']

Splitting the dataset for Training and Testing parts

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)

Training the model using DecisionTreeClassifier

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(max_depth= 4)

Evaluating the model for the Training data

from sklearn.metrics import accuracy_score

train_prediction = model.predict(X_train)
train_accuracy = accuracy_score(train_prediction, Y_train)

print("Training data accuracy = ", train_accuracy)

Evaluating the model for the Testing data

test_prediction = model.predict(X_test)
test_accuracy = accuracy_score(test_prediction, Y_test)

print("Testing data accuracy = ", test_accuracy)