Machine Learning Mini-Project Report
Machine Learning Mini-Project Report
Group:
Saurav Rai (17558)
Saichand A V R P (17552)
Akhilesh Pandey (17551)
AIM :
Variations of Neural Network Models for image classification: fashion-MNIST.
Experimental Procedure :
Platforms Used :
1. ( Saurav Rai ) : Python3( backend tensorflow ).
2. ( Saichand ): IPython(anaconda3), Keras, Python ( backend Tensorflow).
3. ( Akhilesh Pandey ) : Python3, Pytorch
Dataset Description :
Fashion-MNIST is a dataset of Zalando’s article images - consisting of training set
of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 gray
scale image, associated with a label from 10 classes. An example of how the data looks:
We have used 3 distinct models, with varied Activation functions and Optimization
techniques. 3 Models used are:
√ Convolutional Neural Network (CNN): Algorithm has 3 convolution layers with
different activation functions and each followed by a maxpooling phase with pool_size
(2, 2), which gives translation invariance. A dropout phase is used after every maxpool
phase for regularization. The algorithm uses around 2.5 lakh parameters. 1.5 lakh
parameters are given as input for the first Fully Connected layer and 1290 parameters
to the second Fully Connected layer.
√ Multi Layer Perceptron (MLP): Algorithm used is a simple multi layer perceptron
model with one hidden layer. Initial parameters to input node are 784. Hidden layer size
is 256. The output layer has nodes equal to the number of labels(10). Weights are
initialized using random_normal function of tensorflow package.
Softmax_cross_entropy_with_logits is the cost function from the neural network package
of tensorflow.
√ Logistic Regression (LR): Algorithm is implemented using the linear_model
library of python-sklearn, that has LogisticRegression model. We use ‘lbfgs’ solver as the
optimizer for the model, and softmax function for predicting probabilities. We also
implemented the k-class LR model with multinomial = ‘ovr’, which fits a binary problem
for each label. The default optimizer used for this model is ‘liblinear’ which is an open
source library for large-scale linear classification.
Experiments Conducted:
❖ At first, we dealt with the fashion-mnist image classification problem using
Simple Logistic Regression(LR) to classify images, which are saved as a features in
a .csv file for train and test data. A fixed train set (60,000) and a fixed test set (10,000)
are used for the model as given by the creators of the fashion-MNIST (zalando). The
accuracies observed in this model are not satisfactory. We observe that LR works
better than k-class LR, which uses ovr(one versus rest) policy. But LR gives only
85.19 and KLR(K-Class LR) gives 84.55.
❖ Then we used CNN using keras with tensorflow backend, which showed good
improvement in the accuracy. The maximum test accuracy we touched was 93.4,
with adamax optimizer. We say that CNN based model performed better, based on
the model and also the time taken to execute was only 15 minutes compared to
double the time taken for MLP and even more for LR models.
The Comparison for all the three different models shows that CNN based
models outperform other models, due to the ability of multi-layer processing.
The hyper parameters of the models also have been tuned to see the accuracy
improve in case of both MLP and CNN models.
The figures 256 and 512 in CNN_adamax model, correspond to the batch
size. ROC curves for the models have been plotted and the training and test
accuracies are noted down. In the above figure, the polynomial trend line shows
the shift in the test accuracy.
Observations:
Tanh, Adamax
softmax (256)
Tanh, Adamax
softmax (512)
Sigmoid, Adam
relu
Inferences:
From our analysis, we infer that CNN boosts the accuracy, with less computation
time, compared to other classifiers because of the factor of multi-layer processing. Also
the validation phase of the CNN models help in boosting the accuracy. Because we split
the training and validation data as 80 ad 20 percent, from the training data set. The
area under the curve for CNN with adam and adamax optimizers outstands at 93.
We haven’t experimented thoroughly with variations of Batch_size for the convolutional
neural networks. Also, there is no much work done in MLP also. May be we can increase
the hidden layers and improve the accuracies. The changes in the learning rates for CNN,
MLP have showed some intuitive understandings as to, if lr is low (0.001), it takes more
time and also learns very slow. If lr is too high (0.1, 0.2..), it halts much early, but drastic
changes lead to more bias and reduction in accuracy can be observed. Hence, we found
that a good learning rate for CNN with adagrad and adam is 0.01. But, strangely, for
CNN with adamax optimizer, we see that learning rate of 0.2 gave 93 percent test
accuracy.
Conclusion:
In this mini-project, We have done an image classification for the fashion-MNIST
dataset. We have used various classifiers such as CNN, MLP and LR. We have further
studied the accuracies and concluded that with our minimal experiments, we observe
that CNN performs better. We can extend the work to improve the performance by tuning
the batch_size for CNN, number of hidden layers for MLP and better novel solver for
Logistic Regression.
Models used: ( Work done by Akhilesh Pandey (17551))
CNN
Conv. Batch
Sl.
Layres _norm Batch
No Model Pool Optimizers Iters Acc
(yes/n size
.
(CL) o)
2 CL,
1. Relu Max Adam No 100 7000 83.42
1 FC,
2 CL,
2. Relu Avg Adam Yes 256 4000 88.45
1 FC,
2 CL,
3. Relu Avg Adam Yes 256 5500 88.62
1 FC,
2 CL,
4. Relu Avg Adam Yes 256 4500 89.32
1 FC,
2 CL,
5. Relu Avg RMSprop Yes 300 4000 88.35
1 FC,
6. 2 CL,
Relu Max RMSprop No 100 7000 83.56
1 FC,
2 CL,
7. Relu Avg RMSprop Yes 300 8000 88.31
1 FC,
8. 2 CL,
Relu Avg RMSprop Yes 100 3000 87.21
1 FC,
2 CL,
9. Relu Avg Adamax Yes 100 3000 85.22
1 FC,
2 CL,
10. Relu Avg Adamax Yes 512 4000 89.07
1 FC,
We want to build a model that will classify digits into one of the ten classes of
clothes ,shoes etc.
I prefered CNN with 2 convolution layer and one fully connected layer. The activation
function I’ve used is Relu. Relu is defined as Relu(x) = x if x >= 0 and Relu(x)= 0
otherwise.
I used two different CNN models with slightly different architecture – one without batch
normalization and other with batch normalization.
Both of these models were run while varying the hyper parameters like batch_size,
number of iterations, optimizers, etc. For each optimizer, the model was run with varying
batch_size and number of iterations.
In the graph given below yes/no refers to whether batch normalization was used or not.
For example, we can observe the improvement in the accuracy for the model using Adam
Optimizer, with batch_size of 100 and running through 5.5 K iterations. From the figure
shown below, we can observe that the accuracy initially improves faster than it did in
the later iterations.
In this experiment we experimented with various optimizers like Adam, Adamax,
RMSprop. Obervation regarding each of these optimizer is given below.
ADAM Optimizer:
From the chart given above we can observe that as the batch size is increasing the
accuracy too is increasing. This can be seen from the fact that as the batch size increases
from 100 to 256, the accuracy increases from 83 to 88. However, this is not the case
with number of iterations. We see from the figure that even though the number of
iterations increased from 4k to 5.5 k, there is very less increase in the accuracy. Also
we observe that when the iteration increases from 4.5k to 5.5k, there is a decrease in
accuracy.
RMSprop Optimizer:
From the chart given below we can observe that as the batch size is increasing
the accuracy too is increasing. This can be seen from the fact that as the batch
size increases from 100 to 300, the accuracy increases from 83 to 88. However,
this is not the case with number of iterations. We see from the figure that
keeping the batch_size constant, even though the number of iterations
increased from 4k to 8 k, there is slight decrease in the accuracy.
Adamax Optimizer:
From the chart given below we can observe that as the batch size is
increasing the accuracy too is increasing. This can be seen from the fact that as
the batch size increases from 100 to 512, the accuracy increases from 85 to 89.
Similar is the case with number of iterations. We see from the figure that when
the number of iterations increased from 3k to 4k, there is great increase in the
accuracy.
From the above graphs few things can be observed.
❖ Batch narmalization gives better result campared to without batch normalization.
❖ As the batch size increases, the accuracy also increases.
❖ RMSprop does a better job in terms of accuracy.
Adam:
As we can see from the ROC curve that the area under the curve for the optimizer
Adam is only 89, that indicates that the true positive rate is not up to the mark, because
of which the area reduced.
Adamax:
In this model, we observe from the graph that the area under the curve has
increased compared to the previous, as we note that the True positive rate increased
from around 79 percent to 81.
RMSprop:
In this final model, we conclude that it has maximum area under the curve with
true positive rate at 84.
Conclusion:
From above experinment and observations we can conclude the following:
❖ Accuracy is dependent on batch_size, i.e. when batch_size increases the accuracy
too increases.
❖ With increase in number of iterations the accuracy tends to oscillate. Hence proper
choice of number of iterations to run through is important.
❖ Apart from these, when the batch_size is increased the model takes more time to
train which is natural.
IMPLEMENTATION
ABOUT SNN
3. MODELS USED :
MODEL :
COST FUNCTION :
OPTIMIZER :
•AdagradOptimizer
tf.train.AdagradOptimizer(learning_rate=learning_rate).mi
nimize(cost)
ACTIVATION FUNCTION :
selu(x) = λ * x if x>=0
= λ * α * exp(x) – α if x<=0
5 .OBSERVATIONS:
Parameters
Learning_rate = 0.05
training_epochs = 10
batch_size = 100
display_step = 1
Networks parameters
Multilayer
Adagrad
1 Perceptron 10 50 89.59 139.51
Optimizer
(SELU)
Multilayer
Adagrad
2 Perceptron 20 100 89.61 122.17
Optimizer
(SELU)
Multilayer Adadelta
3 10 50 87.49 150.63
Perceptron Optimizer
(SELU)
Multilayer
Adadelta
4 Perceptron 20 100 88.58 303.89
Optimizer
(SELU)
Multilayer
Adagrad
5 Perceptron 10 50 81.21 141.21
Optimizer
(RELU)
Multilayer Proximal
(SELU) Optimizer
Multilayer Proximal
561.68
7 Perceptron 20 Adagrad 50 90.33
(SELU) Optimizer
Observation :
7 . CONCLUSIONS :
SAIRAM