0% found this document useful (0 votes)

96 views

Import Numpy As NP Import Pandas As PD

This document discusses building a random forest classifier model to predict bank customer defaults. It loads banking customer data, preprocesses columns from string to numeric, splits data into train and test sets, builds a random forest classifier with grid search to tune hyperparameters, and evaluates the model's performance on both training and test sets.

Uploaded by

Shripad H

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

Import Numpy As NP Import Pandas As PD

Uploaded by

Shripad H

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

import numpy as np

import pandas as pd
In [2]:
#Only execute this cell if the directory in which your dataset is different
from the directory that you are running the
#Jupyter Notebook

#import os
#os.chdir('C:\\Shripad\\Personal\\DataScience\\DSBA\\Curricumulum\\4 Data
Mining\\3 Random Forest')
In [2]:
from sklearn.ensemble import RandomForestClassifier
In [3]:
bank_df = pd.read_csv("Banking Dataset.csv")
In [4]:
bank_df.head(10)
Out[4]:
Cust_I Targ Ag Gende Occupati No_OF_CR_TX AGE_BK SC Holding_Peri
Balance
D et e r on NS T R od

160378.
0 C1 0 30 M SAL 2 26-30 826 9
60

84370.5 SELF-
1 C10 1 41 M 14 41-45 843 9
9 EMP

60849.2
2 C100 0 49 F PROF 49 46-50 328 26
6

10558.8
3 C1000 0 49 M SAL 23 46-50 619 19
1

C1000 97100.4
4 0 43 M SENP 3 41-45 397 8
0 8

C1000 160378.
5 0 30 M SAL 2 26-30 781 11
1 60

C1000 26275.5
6 0 43 M PROF 23 41-45 354 12
2 5

7 C1000 0 53 M 33616.4 SAL 45 >50 239 5

Cust_I Targ Ag Gende Occupati No_OF_CR_TX AGE_BK SC Holding_Peri
Balance
D et e r on NS T R od

3 7

C1000
8 0 45 M 1881.37 PROF 3 41-45 339 13
4

C1000
9 0 37 M 3274.37 PROF 33 36-40 535 9
5

In [5]:
bank_df.shape
Out[5]:
(20000, 10)
In [6]:
bank_df.info() # many columns are of type object i.e. strings. These need to
be converted to ordinal type
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Cust_ID 20000 non-null object
1 Target 20000 non-null int64
2 Age 20000 non-null int64
3 Gender 20000 non-null object
4 Balance 20000 non-null float64
5 Occupation 20000 non-null object
6 No_OF_CR_TXNS 20000 non-null int64
7 AGE_BKT 20000 non-null object
8 SCR 20000 non-null int64
9 Holding_Period 20000 non-null int64
dtypes: float64(1), int64(5), object(4)
memory usage: 1.5+ MB
In [33]:
## For RandomForestClassifier, none of the data type need to be Object, but
everything should be integers
In [7]:
# Decision tree in Python can take only numerical / categorical colums. It
cannot take string / object types.
# The following code loops through each column and checks if the column type
is object then converts those columns
# into categorical with each distinct value becoming a category or code.

for feature in bank_df.columns:

if bank_df[feature].dtype == 'object':
bank_df[feature] = pd.Categorical(bank_df[feature]).codes
In [8]:
bank_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Cust_ID 20000 non-null int16
1 Target 20000 non-null int64
2 Age 20000 non-null int64
3 Gender 20000 non-null int8
4 Balance 20000 non-null float64
5 Occupation 20000 non-null int8
6 No_OF_CR_TXNS 20000 non-null int64
7 AGE_BKT 20000 non-null int8
8 SCR 20000 non-null int64
9 Holding_Period 20000 non-null int64
dtypes: float64(1), int16(1), int64(5), int8(3)
memory usage: 1.0 MB
In [9]:
# capture the target column ("default") into separate vectors for training
set and test set

X = bank_df.drop(["Target","Cust_ID"] , axis=1)

y = bank_df.pop("Target")
In [10]:
# splitting data into training and test set for independent attributes
# X_train = independent variable for Train, X_test = independent variable for
Test,
# train_labels = dependent varliable for Train, test_labels = dependent
variable for Test
from sklearn.model_selection import train_test_split

X_train, X_test, train_labels, test_labels = train_test_split(X, y,

test_size=.30, random_state=1)

Ensemble RandomForest Classifier

In [22]:
rfcl = RandomForestClassifier(n_estimators = 501,
oob_score=True,
max_depth=10,
max_features=5,
min_samples_leaf = 50,
min_samples_split = 110,
)
In [23]:
# n_estimators = 501 i.e.number of trees that want to build within Random
Forest classifier
#rfcl = RandomForestClassifier(n_estimators = 501, oob_score=True,
max_depth=10, max_features=3, min_samples_leaf: 50)
# rfcl = RandomForestClassifier(n_estimators = 501, oob_score=True,)
rfcl = rfcl.fit(X_train, train_labels)
In [24]:
#out of bag (oob) score, by default False means oob is not stored in the
random forest classifier
rfcl.oob_score
Out[24]:
True
In [25]:
rfcl.oob_score_
Out[25]:
0.9155714285714286
In [26]:
#max_features = out of total 8 independent features, at random 4 variables
are chosen for split
#min_samples_split approx. 3 times min_samples_leaf

from sklearn.model_selection import GridSearchCV

param_grid = {
'max_depth': [7, 10],
'max_features': [4, 6],
'min_samples_leaf': [50, 100],
'min_samples_split': [150, 300],
'n_estimators': [301, 501]
}
In [27]:
rfcl = RandomForestClassifier()
In [28]:
# cv = cross validation, value of 3 i.e. Number of combination is 3.
# Random forest model will be created with first as 7, 4, 50, 150 and 301 and
split data into 3 (fold) parts
grid_search = GridSearchCV(estimator = rfcl, param_grid = param_grid, cv = 3)
In [29]:
grid_search.fit(X_train, train_labels)
Out[29]:
GridSearchCV(cv=3, estimator=RandomForestClassifier(),
param_grid={'max_depth': [7, 10], 'max_features': [4, 6],
'min_samples_leaf': [50, 100],
'min_samples_split': [150, 300],
'n_estimators': [301, 501]})
In [30]:
grid_search.best_params_
Out[30]:
{'max_depth': 7,
'max_features': 6,
'min_samples_leaf': 50,
'min_samples_split': 150,
'n_estimators': 501}
In [ ]:
best_grid = grid_search.best_estimator_
In [ ]:
ytrain_predict = best_grid.predict_proba(X_train)
ytest_predict = best_grid.predict_proba(X_test)
In [ ]:
ytrain_predict = best_grid.predict(X_train)
ytest_predict = best_grid.predict(X_test)
In [29]:
from sklearn.metrics import confusion_matrix,classification_report
In [30]:
confusion_matrix(train_labels,ytrain_predict)
Out[30]:
array([[12754, 28],
[ 1152, 66]], dtype=int64)
In [31]:
confusion_matrix(test_labels,ytest_predict)
Out[31]:
array([[5475, 10],
[ 490, 25]], dtype=int64)
In [32]:
print(classification_report(train_labels,ytrain_predict))
precision recall f1-score support

0 0.92 1.00 0.96 12782

1 0.70 0.05 0.10 1218

accuracy 0.92 14000

macro avg 0.81 0.53 0.53 14000
weighted avg 0.90 0.92 0.88 14000

In [33]:
print(classification_report(test_labels,ytest_predict))
precision recall f1-score support

0 0.92 1.00 0.96 5485

1 0.71 0.05 0.09 515

accuracy 0.92 6000

macro avg 0.82 0.52 0.52 6000
weighted avg 0.90 0.92 0.88 6000

In [34]:
import matplotlib.pyplot as plt
In [35]:
# AUC and ROC for the training data
# predict probabilities
probs = best_grid.predict_proba(X_train)
# keep probabilities for the positive outcome only
probs = probs[:, 1]
# calculate AUC
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(train_labels, probs)
print('AUC: %.3f' % auc)
# calculate roc curve
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(train_labels, probs)
plt.plot([0, 1], [0, 1], linestyle='--')
# plot the roc curve for the model
plt.plot(fpr, tpr, marker='.')
# show the plot
plt.show()
AUC: 0.844

In [36]:
# AUC and ROC for the test data

# predict probabilities
probs = best_grid.predict_proba(X_test)
# keep probabilities for the positive outcome only
probs = probs[:, 1]
# calculate AUC
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(test_labels, probs)
print('AUC: %.3f' % auc)
# calculate roc curve
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(test_labels, probs)
plt.plot([0, 1], [0, 1], linestyle='--')
# plot the roc curve for the model
plt.plot(fpr, tpr, marker='.')
# show the plot
plt.show()
AUC: 0.777

Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
Diabetes Case Study - Jupyter Notebook
100% (1)
Diabetes Case Study - Jupyter Notebook
10 pages
How Many Virtual CPUs Can A Single M7 Processor Support Considering That Each Thread Can Be A Virtual CPU
No ratings yet
How Many Virtual CPUs Can A Single M7 Processor Support Considering That Each Thread Can Be A Virtual CPU
127 pages
Sample Discharge Summary Template
100% (1)
Sample Discharge Summary Template
3 pages
5) Randomforest - Ipynb - Colaboratory
No ratings yet
5) Randomforest - Ipynb - Colaboratory
12 pages
Decision Tree, Random Forest
No ratings yet
Decision Tree, Random Forest
37 pages
FB Models PDF
No ratings yet
FB Models PDF
14 pages
RandomForest
No ratings yet
RandomForest
8 pages
23BCE7092_ML_Lab_Assignment[1]
No ratings yet
23BCE7092_ML_Lab_Assignment[1]
14 pages
RANDOM_FOREST__1737667979
No ratings yet
RANDOM_FOREST__1737667979
11 pages
AAM 6th Prac
No ratings yet
AAM 6th Prac
3 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
DWDM Lab 3
No ratings yet
DWDM Lab 3
10 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Random Forest
No ratings yet
Random Forest
11 pages
10 Random - Forest - Algo
No ratings yet
10 Random - Forest - Algo
6 pages
Slip
No ratings yet
Slip
5 pages
AML_code_for_m2
No ratings yet
AML_code_for_m2
7 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Recsify Technologies Assignment
No ratings yet
Recsify Technologies Assignment
10 pages
PCA2-1
No ratings yet
PCA2-1
26 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
DT RF
No ratings yet
DT RF
7 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
AML_lab[1] (1)
No ratings yet
AML_lab[1] (1)
14 pages
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
No ratings yet
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
13 pages
Random Forest
No ratings yet
Random Forest
3 pages
lab3
No ratings yet
lab3
6 pages
Decision tree
No ratings yet
Decision tree
3 pages
Experiment 2 FDL - Jupyter Notebook
No ratings yet
Experiment 2 FDL - Jupyter Notebook
2 pages
St. John College of Engineering and Management, Palghar - Maharashtra
No ratings yet
St. John College of Engineering and Management, Palghar - Maharashtra
11 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
Decision Tree
No ratings yet
Decision Tree
9 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Modelling and Simulation Sample Model 4
No ratings yet
Modelling and Simulation Sample Model 4
3 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
23BCE7199 ML Lab Assignment[1]
No ratings yet
23BCE7199 ML Lab Assignment[1]
15 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
ML Shristi File
No ratings yet
ML Shristi File
49 pages
Decision Tree - Jupyter Notebook
No ratings yet
Decision Tree - Jupyter Notebook
4 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Python Implementation of Random Forest Algorithm
No ratings yet
Python Implementation of Random Forest Algorithm
10 pages
reast-cancer-prediction-using-debt
No ratings yet
reast-cancer-prediction-using-debt
18 pages
CCD.ipynb - Colab
No ratings yet
CCD.ipynb - Colab
6 pages
m1
No ratings yet
m1
10 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 7
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 7
23 pages
Anemia Word
No ratings yet
Anemia Word
7 pages
ML_Prac1-10
No ratings yet
ML_Prac1-10
32 pages
ml using python programs
No ratings yet
ml using python programs
12 pages
Aiml 5-8
No ratings yet
Aiml 5-8
19 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Measurement of Destination Image
No ratings yet
The Measurement of Destination Image
11 pages
Computational Thinking - GR 9
No ratings yet
Computational Thinking - GR 9
10 pages
Performance Appraisal - Review of Literature
No ratings yet
Performance Appraisal - Review of Literature
4 pages
CISCO CIP Migration Scenario 2
No ratings yet
CISCO CIP Migration Scenario 2
28 pages
Grade 10: Academic Excellence Awards
No ratings yet
Grade 10: Academic Excellence Awards
2 pages
Alkali and Alkaline Earth Metals
No ratings yet
Alkali and Alkaline Earth Metals
23 pages
Kiss Me Kate - To Darn Hot Lyrics
No ratings yet
Kiss Me Kate - To Darn Hot Lyrics
4 pages
Network Simulator NS2-Beginner's Guide - Eexploria
No ratings yet
Network Simulator NS2-Beginner's Guide - Eexploria
8 pages
Two Triangles
No ratings yet
Two Triangles
8 pages
UG Question Paper Descriptive Component
No ratings yet
UG Question Paper Descriptive Component
10 pages
Emily Dickinson Poetry
100% (1)
Emily Dickinson Poetry
11 pages
Robin Camp Opening Documpents
No ratings yet
Robin Camp Opening Documpents
14 pages
Friday, 3 July 2020: Molar Incisor Hypomineralisation"
No ratings yet
Friday, 3 July 2020: Molar Incisor Hypomineralisation"
6 pages
Project Cola Next
No ratings yet
Project Cola Next
32 pages
Trees Reviewer
No ratings yet
Trees Reviewer
4 pages
My IPCR July - Dec 2020
No ratings yet
My IPCR July - Dec 2020
9 pages
Losses in Valves and Fittings
No ratings yet
Losses in Valves and Fittings
17 pages
[FREE PDF sample] A Beginner s Guide to Circuits Nine Simple Projects with Lights Sounds and More Dahl ebooks
100% (3)
[FREE PDF sample] A Beginner s Guide to Circuits Nine Simple Projects with Lights Sounds and More Dahl ebooks
62 pages
Lesson Plan (Phonics)
0% (1)
Lesson Plan (Phonics)
4 pages
Chapter 8 Measures of Economic Activity
No ratings yet
Chapter 8 Measures of Economic Activity
14 pages
After I Was Thrown in the River
No ratings yet
After I Was Thrown in the River
7 pages
Instant download Oxford Textbook of Psychopathology 2nd Edition Paul H Blaney pdf all chapter
100% (9)
Instant download Oxford Textbook of Psychopathology 2nd Edition Paul H Blaney pdf all chapter
85 pages
Air Asia Report PDF
No ratings yet
Air Asia Report PDF
24 pages
System Center 2022 v23.10
No ratings yet
System Center 2022 v23.10
1 page
Vertex Standard Vx-351pmr446
No ratings yet
Vertex Standard Vx-351pmr446
30 pages
LESSON-3 History of Global Politics Creating An International Order
100% (1)
LESSON-3 History of Global Politics Creating An International Order
40 pages
SETS DPP 2 Solution Min
No ratings yet
SETS DPP 2 Solution Min
3 pages
Accelerating Adult Language Learning-Spiderweb Method-Cecilia Sassone
100% (2)
Accelerating Adult Language Learning-Spiderweb Method-Cecilia Sassone
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Import Numpy As NP Import Pandas As PD

Uploaded by

Import Numpy As NP Import Pandas As PD

Uploaded by

import numpy as np

7 C1000 0 53 M 33616.4 SAL 45 >50 239 5

for feature in bank_df.columns:

X_train, X_test, train_labels, test_labels = train_test_split(X, y,

Ensemble RandomForest Classifier

from sklearn.model_selection import GridSearchCV

0 0.92 1.00 0.96 12782

accuracy 0.92 14000

0 0.92 1.00 0.96 5485

accuracy 0.92 6000

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.