DL Lab-final
DL Lab-final
1
Implement multilayer perceptron algorithm for MNIST Hand written Digit Classification.
Human Visual System is a marvel of the world. People can readily recognize digits. But it is not as simple
as it looks like. The human brain has a million neurons and billions of connections between them, which
makes this exceptionally complex task of image processing easier. People can effortlessly recognize digits.
However, it turns into a challenging task for computers to recognize digits. Simple hunches about how to
recognize digits become difficult to express algorithmically. Moreover, there is a significant variation in
writing from person to person, which makes it immensely complex.
Handwritten digit recognition system is the working of a machine to train itself so that it can recognize
digits from different sources like emails, bank cheque, papers, images, etc.
Google Colab
Google Colab has been used to implement the network. It is a free cloud service that can be used to
develop deep learning applications using popular libraries such as Keras, TensorFlow, PyTorch, and
OpenCV. The most important feature that distinguishes Colab from other free cloud services is; it provides
GPU and is totally free. Thus, if PC is incompatible with hardware requirements or does not support GPU,
then it is the best option because a stable internet connection is the only requirement.
MNIST Datasets
MNIST stands for “Modified National Institute of Standards and Technology”. It is a dataset of 70,000
handwritten images. Each image is of 28x28 pixels i.e. about 784 features. Each feature represents only
one pixel’s intensity i.e. from 0(white) to 255(black). This database is further divided into 60,000 training
and 10,000 testing images.
Phases of Implementation
We imported TensorFlow which is an open-source free library that is used for machine learning
applications such as neural networks etc. Further, we imported pyplot function, which is basically used
for plotting, from the matplotlib library which is used for visualization purposes. After that, we
imported NumPy i.e. Numerical Python which is used to perform various mathematical operations.
The MNIST dataset is also part of it. So, we imported it from keras.datasets and loaded it into variable
“objects”. The objects.load_data() method returns us the training data(train_img), its labels(train_lab) and
also the testing data(test_img) and its labels(test_lab). Out of the 70,000 images provided in the dataset,
60,000 are given for training and 10,000 are given for testing.
Before preprocessing the data, we first displayed the first 20 images of the training set with the help of for
loop.
subplot() is used to add a subplot or grid-like structure to the current figure. The first argument is for “no.
of rows”, second for “no. of columns” and third for position index in the grid.
Suppose we have to plot 10 images in the 4x5 grid starting from the second position in the grid. Then, it
will be like
imshow() is used to display data as an image i.e. training image (train_img[i])
whereas cmap stands for the colour map. Cmap is an optional feature. Basically, if the image is in
the array of shape (M, N), then the cmap controls the colour map used to display the values.
cmap=‘gray’ will display image as grayscale while cmap=‘gray_r’ is used to display image as
inverse grayscale.
title() sets title for each image. We have set “Digit: train_lab[i]” as the title for each image in the
subplot.
subplots_adjust() is used for tuning subplot layout. In order to change the space provided between
two rows, we have used hspace. If you want to change space between two columns then you can
use wspace.
By default parameters of the subplot layout are,
In order to hide the axis of the image, plt.axis(‘off’) has been used.
After that, we displayed the shape of training and testing section.
(60000,28,28) means there are 60,000 images in the training set and each image is of size 28x28 pixels.
Similarly, there are 10,000 images of the same size in the testing set.
So each image is of size 28x28 i.e. 784 features, and each feature represents the intensity of each pixel
from 0 to 255.
You can use print(train_img[0]) to print the first training set image in the matrix form of 28x28.
We plotted the first training image on a histogram. Before normalization,
hist() is used to plot the histogram for the first training image i.e. train_img[0]. The image has been
reshaped into a 1-D array of size 784. facecolor is an optional parameter which specifies the colour of the
histogram. Title of the histogram, Y-axis and X-axis have been named as “Pixel vs its intensity”, “PIXEL”
and “Intensity”.
Pre-process the data
Before feeding the data to the network, we will normalize it. Normalizing the input data helps to speed up
the training. Also, it reduces the chance of getting stuck in local optima, since we’re using stochastic
gradient descent to find the optimal weights for the network.
The pixel values are between 0 and 255. So, scaling of input values is good when using neural network
models since the scale is well known and well behaved, we can very quickly normalize the pixel values to
the range 0 and 1 by dividing each value by the maximum intensity of 255.
After normalization,
Creating the model
The output layer has 10 neurons i.e. for each class from 0 to 9. A softmax activation function is used on
the output layer to turn the outputs into probability-like values.
Note: You can add more neurons int the hidden layers. You can even increase the no. of hidden layers int
the model to increase efficiency. However, it will take more time during training.
Next, we need to compile our model. Compiling the model takes three parameters: optimizer, loss and
metrics. The optimizer controls the learning rate. We are using ‘adam’ as our optimizer. It is generally a
good optimizer to use for many cases. It adjusts the learning rate throughout the training.
We will use ‘Sparse_Categorical_Crossentropy’ for our loss function because it saves time in memory
as well as computation since it simply uses a single integer for a class, rather than a whole vector. A lower
score indicates that the model is performing better.
In order to determine the accuracy, we will use the ‘accuracy’ metric to see the accuracy score on the
validation set when we train the model.
After evaluating the model, we will now check the model for the testing section.
Now, in order to make a prediction for a new image that is not part of MNIST dataset. We will first create
a function named “load_image”.
Above function converts the image into an array of pixels which is fed to the model as an input.
In order to upload a file from local drive, we used the code:
It will lead you to select a file. Click on “Choose Files” then select and upload the file and wait for the file
to be uploaded 100%. You will see the name of the file once Colab has uploaded it.
In order to display image file, we used the code:
from IPython .display import Image Image(‘5img.jpeg’,width=250,height=250)
5img.jpeg is the file name.
Now, if we want to run the model after a few days then, we will have to run the whole code again, which is
time-consuming.
In that case, you can use the saved model i.e. project.h5
So, before closing the colab notebook, you can download the model from the folder symbol.
Highlighted folder
So, when you try to run the model again, all you have to do is upload project.h5 file from the computer by
using the code :
When the file is 100% uploaded, use the following code & after that, you can predict the digit for new
images without running the whole code.
model=tf.keras.models.load_model(‘project.h5’)
In [1]:
linkcode
from tensorflow.keras.datasets import imdb
# Load the data, keeping only 10,000 of the most frequently occuring words
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = 10000)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17465344/17464789 [==============================] - 0s 0us/step
17473536/17464789 [==============================] - 0s 0us/step
The argument num_words=10000 means you’ll only keep the top 10,000 most frequently occurring
words in the training data. Rare words will be discarded. This allows you to work with vector data of
manageable size.
In [2]:
train_data[0]
Out[2]:
[1,
14,
22,
16,
43,
530,
973,
1622,
1385,
65,
458,
4468,
66,
3941,
4,
173,
36,
256,
5,
25,
100,
43,
838,
112,
50,
670,
2,
9,
35,
480,
284,
5,
150,
4,
172,
112,
167,
2,
336,
385,
39,
4,
172,
4536,
1111,
17,
546,
38,
13,
447,
4,
192,
50,
16,
6,
147,
2025,
19,
14,
22,
4,
1920,
4613,
469,
4,
22,
71,
87,
12,
16,
43,
530,
38,
76,
15,
13,
1247,
4,
22,
17,
515,
17,
12,
16,
626,
18,
2,
5,
62,
386,
12,
8,
316,
8,
106,
5,
4,
2223,
5244,
16,
480,
66,
3785,
33,
4,
130,
12,
16,
38,
619,
5,
25,
124,
51,
36,
135,
48,
25,
1415,
33,
6,
22,
12,
215,
28,
77,
52,
5,
14,
407,
16,
82,
2,
8,
4,
107,
117,
5952,
15,
256,
4,
2,
7,
3766,
5,
723,
36,
71,
43,
530,
476,
26,
400,
317,
46,
7,
4,
2,
1029,
13,
104,
88,
4,
381,
15,
297,
98,
32,
2071,
56,
26,
141,
6,
194,
7486,
18,
4,
226,
22,
21,
134,
476,
26,
480,
5,
144,
30,
5535,
18,
51,
36,
28,
224,
92,
25,
104,
4,
226,
65,
16,
38,
1334,
88,
12,
16,
283,
5,
16,
4472,
113,
103,
32,
15,
16,
5345,
19,
178,
32]
In [3]:
train_labels[0]
Out[3]:
1
Because you’re restricting yourself to the top 10,000 most frequent words, no word index will exceed
10,000:
In [4]:
max([max(sequence) for sequence in train_data])
Out[4]:
9999
In [5]:
# Let's quickly decode a review
# step 2: reverse word index to map integer indexes to their respective words
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
Pad your lists so that they all have the same length, turn them into an integer tensor of shape (samples,
word_indices), and then use as the first layer in your network a layer capable of handling such integer
tensors (the Embedding layer, which we’ll cover in detail later in the book).
One-hot encode your lists to turn them into vectors of 0s and 1s. This would mean, for instance, turning
the sequence [3, 5] into a 10,000-dimensional vector that would be all 0s except for indices 3 and 5,
which would be 1s. Then you could use as the first layer in your network a Dense layer, capable of
handling floating-point vector data.
In [7]:
#Encoding the integer sequences into a binary matrix
'''Explaination: I first created 2D matrix of shape(number of examples,10000)
then I looped over each word of each example, if it exist put 1 in its place
if not just leave it as 0
ITS JUST ONE HOT ENCODER'''
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension)) # Creates an all zero matrix of shape
(len(sequences),10K)
for i,sequence in enumerate(sequences):
results[i,sequence] = 1 # Sets specific indices of results[i] to 1s
return results
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
2022-11-28 21:10:52.669899: I tensorflow/core/common_runtime/process_util.cc:146] Creating new
thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best
performance.
Compiling the model
In [11]:
from tensorflow.keras import optimizers
from tensorflow.keras import losses
from tensorflow.keras import metrics
model.compile(optimizer=optimizers.RMSprop(learning_rate=0.001),
loss = losses.binary_crossentropy,
metrics = [metrics.binary_accuracy])
Validating your approach
In order to monitor during training the accuracy of the model on data it has never seen before, you’ll
create a validation set by setting apart 10,000 samples from the original training data.
In [12]:
# Input for Validation
X_val = X_train[:10000]
partial_X_train = X_train[10000:]
# Labels for validation
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
In [13]:
history = model.fit(partial_X_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(X_val, y_val))
2022-11-28 21:10:54.194950: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of
the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/20
30/30 [==============================] - 2s 35ms/step - loss: 0.5318 - binary_accuracy: 0.7740
- val_loss: 0.4200 - val_binary_accuracy: 0.8359
Epoch 2/20
30/30 [==============================] - 0s 16ms/step - loss: 0.3170 - binary_accuracy: 0.9026
- val_loss: 0.3175 - val_binary_accuracy: 0.8845
Epoch 3/20
30/30 [==============================] - 1s 17ms/step - loss: 0.2310 - binary_accuracy: 0.9278
- val_loss: 0.2862 - val_binary_accuracy: 0.8873
Epoch 4/20
30/30 [==============================] - 1s 17ms/step - loss: 0.1842 - binary_accuracy: 0.9407
- val_loss: 0.2772 - val_binary_accuracy: 0.8879
Epoch 5/20
30/30 [==============================] - 0s 17ms/step - loss: 0.1470 - binary_accuracy: 0.9537
- val_loss: 0.2971 - val_binary_accuracy: 0.8813
Epoch 6/20
30/30 [==============================] - 1s 17ms/step - loss: 0.1239 - binary_accuracy: 0.9617
- val_loss: 0.3034 - val_binary_accuracy: 0.8823
Epoch 7/20
30/30 [==============================] - 1s 17ms/step - loss: 0.1032 - binary_accuracy: 0.9697
- val_loss: 0.3133 - val_binary_accuracy: 0.8810
Epoch 8/20
30/30 [==============================] - 0s 15ms/step - loss: 0.0846 - binary_accuracy: 0.9776
- val_loss: 0.3281 - val_binary_accuracy: 0.8787
Epoch 9/20
30/30 [==============================] - 0s 16ms/step - loss: 0.0730 - binary_accuracy: 0.9800
- val_loss: 0.3470 - val_binary_accuracy: 0.8785
Epoch 10/20
30/30 [==============================] - 1s 17ms/step - loss: 0.0611 - binary_accuracy: 0.9844
- val_loss: 0.3704 - val_binary_accuracy: 0.8763
Epoch 11/20
30/30 [==============================] - 1s 17ms/step - loss: 0.0491 - binary_accuracy: 0.9889
- val_loss: 0.3981 - val_binary_accuracy: 0.8734
Epoch 12/20
30/30 [==============================] - 0s 17ms/step - loss: 0.0410 - binary_accuracy: 0.9911
- val_loss: 0.4173 - val_binary_accuracy: 0.8748
Epoch 13/20
30/30 [==============================] - 1s 17ms/step - loss: 0.0323 - binary_accuracy: 0.9935
- val_loss: 0.4495 - val_binary_accuracy: 0.8740
Epoch 14/20
30/30 [==============================] - 1s 18ms/step - loss: 0.0281 - binary_accuracy: 0.9936
- val_loss: 0.4782 - val_binary_accuracy: 0.8711
Epoch 15/20
30/30 [==============================] - 0s 17ms/step - loss: 0.0207 - binary_accuracy: 0.9969
- val_loss: 0.5064 - val_binary_accuracy: 0.8721
Epoch 16/20
30/30 [==============================] - 1s 18ms/step - loss: 0.0174 - binary_accuracy: 0.9975
- val_loss: 0.5392 - val_binary_accuracy: 0.8694
Epoch 17/20
30/30 [==============================] - 1s 18ms/step - loss: 0.0157 - binary_accuracy: 0.9969
- val_loss: 0.5629 - val_binary_accuracy: 0.8693
Epoch 18/20
30/30 [==============================] - 0s 16ms/step - loss: 0.0084 - binary_accuracy: 0.9995
- val_loss: 0.5886 - val_binary_accuracy: 0.8685
Epoch 19/20
30/30 [==============================] - 0s 16ms/step - loss: 0.0091 - binary_accuracy: 0.9993
- val_loss: 0.6248 - val_binary_accuracy: 0.8673
Epoch 20/20
30/30 [==============================] - 0s 15ms/step - loss: 0.0109 - binary_accuracy: 0.9971
- val_loss: 0.6583 - val_binary_accuracy: 0.8662
Note that the call to model.fit() returns a History object. This object has a member history, which is a
dictionary containing data about everything that happened during training. Let’s look at it:
In [14]:
history_dict = history.history
history_dict.keys()
Out[14]:
dict_keys(['loss', 'binary_accuracy', 'val_loss', 'val_binary_accuracy'])
Plotting the training and validation loss
In [15]:
import matplotlib.pyplot as plt
%matplotlib inline
In [16]:
# Plotting losses
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
plt.show()