Machine Learning Unit 5 Notes
Machine Learning Unit 5 Notes
Introduction to Artificial Neural Networks with Keras, Implementing MLPs with Keras, Installing TensorFlow 2,
Loading and Preprocessing Data with TensorFlow
………………………………………………………………………………………………………………………………...
Similarly, a Neural Network is a network of artificial neurons, as found in human brains, for solving
artificial intelligence problems such as image identification. They may be a physical device or
mathematical constructs.
In other words, Artificial Neural Network is a parallel computational system consisting of many simple
processing elements connected to perform a particular task.
Biological Motivation
Motivation behind neural network is human brain. Human brain is called as the best processor even though it
works slower than other computers. Many researchers thought to make a machine that would work in the
prospective of the human brain.
Human brain contains billion of neurons which are connected to many other neurons to form a network so that
if it sees any image, it recognizes the image and processes the output.
Dendrite receives signals from other neurons.
When the sum reaches a threshold value, neuron fires and the signal travels down the axon to the other
neurons.
The amount of signal transmitted depend upon the strength of the connections.
Connections can be inhibitory, i.e. decreasing strength or excitatory, i.e. increasing strength in nature.
In the similar manner, it was thought to make artificial interconnected neurons like biological neurons making
up an Artificial Neural Network(ANN). Each biological neuron is capable of taking a number of inputs and
produce output.
Neurons in human brain are capable of making very complex decisions, so this means they run many parallel
processes for a particular task. One motivation for ANN is that to work for a particular task identification
through many parallel processes.
Artificial Neuron
Artificial Neuron are also called as perceptrons. This consist of the following basic terms:
Input
Weight
Bias
Activation Function
Output
Artificial neuron, also called linear threshold unit (LTU), by McCulloch and Pitts, 1943: with one or more
numeric inputs, it produces a weighted sum of them, applies an activation function, and outputs the result.
3. Perception
The original Perceptron was designed to take a number of binary inputs, and produce one binary output (0 or
1).
The idea was to use different weights to represent the importance of each input, and that the sum of the values
should be greater than a threshold value before making a decision like true or false (0 or 1).
Perceptron Example
Threshold = 1.5
x1 * w1 = 1 * 0.7 = 0.7
x2 * w2 = 0 * 0.6 = 0
x3 * w3 = 1 * 0.5 = 0.5
x4 * w4 = 0 * 0.3 = 0
x5 * w5 = 1 * 0.4 = 0.4
Return true if the sum > 1.5 ("Yes I will go to the Concert")
1. The output of a perceptron can only be a binary number (0 or 1) due to the hard-edge transfer
function.
2. It can only be used to classify the linearly separable sets of input vectors. If the input vectors are non-
linear, it is not easy to classify them correctly.
A set of data points are said to be linearly separable if the data can be divided into two classes using a straight
line. If the data is not divided into two classes using a straight line, such data points are said to be called non-
linearly separable data.
Although the perceptron rule finds a successful weight vector when the training examples are linearly separable,
it can fail to converge if the examples are not linearly separable.
A second training rule, called the delta rule, is designed to overcome this difficulty.
If the training examples are not linearly separable, the delta rule converges toward a best-fit approximation to
the target concept.
The key idea behind the delta rule is to use gradient descent to search the hypothesis space of possible weight
vectors to find the weights that best fit the training examples.
This rule is important because gradient descent provides the basis for the BACKPROPAGATON algorithm, which
can learn networks with many interconnected units.
The delta training rule is best understood by considering the task of training an unthresholded perceptron; that
is, a linear unit for which the output o is given by
Thus, a linear unit corresponds to the first stage of a perceptron, without the threshold.
In order to derive a weight learning rule for linear units, let us begin by specifying a measure for the training
error of a hypothesis (weight vector), relative to the training examples.
Although there are many ways to define this error, one common measure is
where D is the set of training examples, ‘td’ is the target output for training example ‘d’, and od is the output of
the linear unit for training example ‘d’.
The direction of steepest can be found by computing the derivative of E with respect to each component of the
vector w. This vector derivative is called the gradient of E with respect to w, written as,
The gradient specifies the direction of steepest increase of E, the training rule for gradient descent is
Here η is a positive constant called the learning rate, which determines the step size in the gradient descent
search.
The negative sign is present because we want to move the weight vector in the direction that decreases E.
Here,
Finally,
It is a strategy for searching through a large or infinite hypothesis space that can be applied whenever
1. the hypothesis space contains continuously parameterized hypotheses (e.g., the weights in a linear unit),
and
1. Converging to a local minimum can sometimes be quite slow (i.e., it can require many thousands of
gradient descent steps), and
2. If there are multiple local minima in the error surface, then there is no guarantee that the procedure will
find the global minimum.
3.2 Multilayer networks and backpropagation
Firstly, let’s know how do a multi-layer neural network looks like. In a multi-layer neural network, there will be
one input layer, one output layer and one or more hidden layers.
Each and every node in the nth layer will be connected to each and every node in the (n-1)th layer(n>1). So, the
input from the input layer is multiplied with the associated weights of every link and will be traversed till the
output layer for the final output. In case of any error, unlike perceptron, in this case we might need to update
several weight vectors in many hidden layers. This is where Back Propagation comes into place.
It’s nothing but Updation of the weight vectors in the hidden layers according to the training error or the loss
produced in the output layer.
In this post, we are considering multiple output units rather than a single output unit as discussed in our
previous post. Therefore, the formula for calculating training error for a neural network can be represented as
follows:
t and o are target values and the output values produced by the network for the kth output unit for
data point ‘d’.
Now that we have the error function, input and output units we need to know the rule for updation of weight
vector. Before that let’s know about one of the most common activation functions used in multi layer neural
networks i.e sigmoid function.
A sigmoid function is any function which is continuously differentiable be it e^x or hyberbolic tangent(tanh)
which produces the output in the range of 0 to 1 ( not including 0 and 1). It can be represented as:
where, y is the linear combination of input vector and the weight vector at a given node.
Now, let’s know how the weight vectors are updated in multi-layer networks according to Back Propagation
Algorithm.
Updation of weights in Back Propagation
Input the first data point into the network and calculate the output for each output unit and let it be ‘o’
for every unit ‘u’.
For each output unit ‘k’, training error ‘ 𝛿 ‘ can be calculated by the given formula:
For each hidden unit ‘h’, training error ‘ 𝛿 ‘ can be calculated by the given formula in which the training
error of output units to which the hidden layer is connected is taken into consideration:
weight vector from jth node to ith node is updated using above formula in which ‘η’ is the learning rate,
‘𝛿’ is the training error and ‘x’ is the input vector for the given node.
The above algorithm is continuously implemented on all data points until we specify a termination criterion,
which can be implemented in either of these three ways:
setting the threshold to an error, if the error goes below the given threshold, we can stop training the
neural network further.
Creating a validation sample of data, after every iteration we validate our model with this data and the
iteration with the highest accuracy can be considered as the final model.
The first way of termination might not yield us better results , the most recommended way is the third way as
we are aware of the accuracy of our model so far.
Q) Implementing MLPs with Keras
1. Sequential API : As the name suggest, this API is to build the model in sequential style, i.e. output of one
layer will be fed to the later layer.
2. Function API : This API allows us to completely control the flow of data (doesn't need to be sequential).
3. Subclassig the tf.keras.models class : This will gives a complete control of the overall model. We can
have loops, conditional branching, and other dynamic behaviors in the model.
1. Sequential API
Multi-Layer Perceptron (MLP) is a type of artificial neural network that consists of several layers of nodes, where
each node is a neuron that performs a nonlinear operation on its input. Keras is a popular high-level neural
networks API that makes it easy to build, train, and deploy deep learning models.
First, we need to install Keras and its dependencies. You can do this by running the following command in your
terminal:
Once you have installed Keras, you can start building your MLP. Here's an example of a simple MLP with one
hidden layer:
model = Sequential()
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
In the above code, we first import the necessary Keras modules, which include the Sequential model and the
Dense layer. Then, we define the architecture of our MLP by adding layers to our model. The first layer is a
dense layer with 64 neurons, ReLU activation, and an input dimension of 100. The second layer is a dense layer
with 10 neurons and a softmax activation.
Next, we compile the model by specifying the loss function, the optimizer, and the evaluation metric. In this
example, we use categorical cross-entropy as the loss function, stochastic gradient descent (SGD) as the
optimizer, and accuracy as the evaluation metric.
Finally, we train the model by calling the fit() method on our model object, passing in our training data (x_train
and y_train), the number of epochs to train for, and the batch size to use during training.
You can modify the architecture of your MLP by adding more layers, changing the number of neurons in each
layer, and using different activation functions. You can also experiment with different loss functions, optimizers,
and evaluation metrics to see how they affect the performance of your model.
2. Function API
In addition to the Sequential API, Keras also provides a Functional API for building neural networks. The
Functional API offers more flexibility and allows you to build more complex models with shared layers, multiple
inputs, and multiple outputs.
Next, we define the architecture of our MLP using the Functional API. Here's an example of a MLP with one
hidden layer:
inputs = Input(shape=(100,))
x = Dense(64, activation='relu')(inputs)
In the above code, we first define the input shape of our MLP by creating an Input object with the shape (100,).
Then, we define the first hidden layer by creating a Dense object with 64 neurons and a ReLU activation function.
We connect this layer to the input layer by calling it with the input object. This returns a new object, which we
assign to the variable x.
Next, we define the output layer by creating another Dense object with 10 neurons and a softmax activation
function. We connect this layer to the previous layer (x) by calling it with x. This returns a new object, which we
assign to the variable outputs.
Finally, we create a Model object by specifying the input and output layers. We pass the input object to the inputs
parameter, and the output object to the outputs parameter.
We can compile and train the model in the same way as we did with the Sequential API:
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
In this example, we use categorical cross-entropy as the loss function, stochastic gradient descent (SGD) as the
optimizer, and accuracy as the evaluation metric.
You can modify the architecture of your MLP by adding more layers, changing the number of neurons in each
layer, and using different activation functions. You can also experiment with different loss functions, optimizers,
and evaluation metrics to see how they affect the performance of your model. The Functional API offers even
more flexibility than the Sequential API, so feel free to experiment and build more complex models.
3. Subclassing API
Keras also provides a Subclassing API for building neural networks. The Subclassing API allows for even greater
flexibility and customization than the Sequential and Functional APIs. In this tutorial, we will learn how to
implement MLPs with the Keras Subclassing API.
Next, we define a custom class for our MLP by subclassing the Layer class:
class MLP(Layer):
x = self.dense1(inputs)
x = self.dense2(x)
return x
In the above code, we define a custom class called MLP, which subclasses the Layer class. In the constructor (init),
we create two Dense layers, one with 64 neurons and a ReLU activation function, and another with 10 neurons
and a softmax activation function. We store these layers as member variables of the class using the self keyword.
In the call method, we define the forward pass of our MLP. We first pass the input tensor (inputs) through the
first Dense layer (dense1), and then pass the output through the second Dense layer (dense2). We return the
output tensor (x).
We can create an instance of our MLP class and use it to build a Keras model:
inputs = Input(shape=(100,))
mlp = MLP()
outputs = mlp(inputs)
In the above code, we first define the input shape of our MLP by creating an Input object with the shape (100,).
Then, we create an instance of our MLP class called mlp, and pass the input object to it. This returns the output
tensor of our MLP, which we assign to the variable outputs.
Finally, we create a Model object by specifying the input and output layers. We pass the input object to the inputs
parameter, and the output tensor to the outputs parameter.
We can compile and train the model in the same way as we did with the other APIs:
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
In this example, we use categorical cross-entropy as the loss function, stochastic gradient descent (SGD) as the
optimizer, and accuracy as the evaluation metric.
You can modify the architecture of your MLP by adding more layers, changing the number of neurons in each
layer, and using different activation functions. You can also experiment with different loss functions, optimizers,
and evaluation metrics to see how they affect the performance of your model. The Subclassing API offers the most
flexibility and customization, so feel free to experiment and build even more complex models.
Q) Installing TensorFlow 2
1. First, you need to install Python on your computer. You can download the latest version of Python from
the official website: https://www.python.org/downloads/
2. Once you have installed Python, you can use pip (the package installer for Python) to install TensorFlow.
Open a terminal or command prompt and type the following command:
This will install the latest version of TensorFlow available on the PyPI package index.
3. Alternatively, you can install a specific version of TensorFlow by specifying the version number in the pip
command. For example, to install TensorFlow version 2.7.0, you would use the following command:
If you want to take advantage of your GPU for faster computation, you can install the GPU version of TensorFlow.
First, make sure you have the necessary CUDA and cuDNN libraries installed on your system. Then, use the
following command to install the GPU version of TensorFlow
Once TensorFlow is installed, you can test it by opening a Python shell and running the following command:
import tensorflow as tf
print(tf. version )
This should output the version number of TensorFlow that you have installed.
That's it! You should now have TensorFlow 2 installed on your system.
Q) Loading and Preprocessing Data with TensorFlow
Loading and preprocessing data is a crucial step in machine learning and deep learning pipelines. TensorFlow
provides several tools and APIs for loading and preprocessing data efficiently. Here are some common methods:
1. Loading Data: TensorFlow provides a tf.data API that makes it easy to load and process data. This API
provides several classes for reading data from different sources, such as files, numpy arrays, or tensors. For
example, to read data from a CSV file, you can use tf.data.experimental.CsvDataset.
2. Data Augmentation: Data augmentation is a technique used to increase the diversity of the training set by
applying various transformations to the data. TensorFlow provides several APIs for data augmentation,
such as tf.image, which provides methods for image processing, such as flipping, rotating, and resizing.
3. Normalization: Normalizing the data is an essential preprocessing step that helps to ensure that the input
features are in the same scale. You can use the tf.keras.layers.Normalization API to normalize the data.
4. Batching: Batching is a technique used to split the data into small batches and process them in parallel.
TensorFlow provides several APIs for batching, such as tf.data.Dataset.batch.
5. Shuffling: Shuffling the data helps to prevent the model from overfitting to the order of the examples in
the training set. You can use the tf.data.Dataset.shuffle API to shuffle the data.
6. Caching: Caching the data in memory can help to speed up the training process. You can use the
tf.data.Dataset.cache API to cache the data.
import tensorflow as tf
# Data augmentation
# Normalization
normalization_layer = tf.keras.layers.Normalization()
normalization_layer.adapt(dataset.map(lambda x, y, z: x))
# Batching
dataset = dataset.batch(32)
# Shuffling
dataset = dataset.shuffle(buffer_size=1000)
# Caching
dataset = dataset.cache()
model.fit(dataset, epochs=10)
1. The Data API
The Data API in TensorFlow is a set of tools and APIs for building efficient and scalable input pipelines for machine
learning models. It provides a simple and flexible way to read and preprocess data from various sources, such as
CSV files, image files, and databases, and feed it to a machine learning model for training or inference.
The Data API is built on top of the tf.data module, which provides a collection of classes for building input
pipelines. These classes can be used to read and preprocess data in a highly parallel and efficient manner, allowing
you to take full advantage of your hardware resources.
1. Efficient data loading: The Data API provides a number of tools for efficient data loading, including
support for reading data from various file formats, shuffling, batching, and prefetching. These tools allow
you to minimize the time spent on input processing and maximize the time spent on training your model.
2. Flexible data preprocessing: The Data API supports a wide range of data preprocessing operations, such
as data augmentation, normalization, and feature engineering. These operations can be applied in a highly
parallel and efficient manner, allowing you to easily preprocess your data on the fly as it is being read
from disk.
3. Customizable pipeline construction: The Data API provides a flexible and extensible pipeline construction
framework, allowing you to easily build custom input pipelines that suit your specific needs. You can easily
add new preprocessing operations, data sources, or data formats to your pipeline, and the API will take
care of the rest.
4. Multi-GPU and distributed training: The Data API provides built-in support for multi-GPU and distributed
training, allowing you to scale your input pipeline to multiple GPUs or even multiple machines. This can
significantly reduce the time required to train large models on large datasets.
Here is an example of how to use the Data API to build an input pipeline for a machine learning model:
import tensorflow as tf
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)
inputs = tf.keras.layers.Input(shape=(2,))
outputs = tf.keras.layers.Dense(1)(inputs)
model.compile(optimizer='adam', loss='mse')
model.fit(dataset, epochs=10)
2. TFRecord Format
TFRecord is a binary file format used in TensorFlow for efficient storage and loading of large datasets. It is a
flexible and extensible format that is designed to be fast, scalable, and platform-independent. The TFRecord
format consists of a sequence of binary records, each containing a serialized protocol buffer message. Protocol
buffers are a language-neutral, platform-neutral, extensible way of serializing structured data, developed by
Google.
The TFRecord format is commonly used for large-scale machine learning tasks, where datasets can be very large
and difficult to store and load efficiently. The format is especially useful when dealing with datasets that contain
a large number of small files, such as image datasets.
Here are the main steps for creating and reading TFRecord files in TensorFlow:
Creating a TFRecord file: To create a TFRecord file, you need to define a schema for your data and then serialize
the data into binary records. Here is an example of how to create a TFRecord file:
import tensorflow as tf
feature_description = {
'feature1': tf.train.Feature(float_list=tf.train.FloatList(value=[data['feature1']])),
'feature2': tf.train.Feature(bytes_list=tf.train.BytesList(value=[data['feature2'].encode('utf-8')])),
}))
writer.write(example.SerializeToString())
Reading a TFRecord file: To read a TFRecord file, you need to define a schema for the data and then parse the
binary records into tensors. Here is an example of how to read a TFRecord file:
import tensorflow as tf
feature_description = {
def parse_example(example_proto):
dataset = tf.data.TFRecordDataset('data.tfrecord')
dataset = dataset.map(parse_example)
The parse_example function takes a binary record and returns a dictionary of tensors. The TFRecordDataset class
reads the TFRecord file and returns a dataset of binary records. The map method applies the parse_example
function to each binary record to parse it into a dictionary of tensors.
APIs in TensorFlow, an open-source machine learning framework developed by Google, have some unique
features that make it easier for developers to build and deploy machine learning models. Some of the key features
of TensorFlow's API include:
1. High-level abstractions: TensorFlow's API provides high-level abstractions for building machine learning
models, allowing developers to focus on the overall structure of the model rather than the low-level
details of the implementation.
2. Compatibility with multiple programming languages: TensorFlow's API supports multiple programming
languages, including Python, C++, and Java, making it easier for developers to integrate machine learning
models into their existing software systems.
3. Flexibility: TensorFlow's API is designed to be flexible and modular, allowing developers to build custom
models and integrate them with other tools and frameworks.
4. Distributed computing: TensorFlow's API supports distributed computing, allowing developers to train
and deploy machine learning models across multiple machines and devices.
5. Pre-built models: TensorFlow's API includes a library of pre-built models for common use cases, such as
image classification and natural language processing, making it easier for developers to get started with
machine learning.
6. Visualization tools: TensorFlow's API includes tools for visualizing and monitoring the performance of
machine learning models, making it easier for developers to identify and fix issues.
4. TF Transform
TF Transform (TFT) is a library in TensorFlow that enables data preprocessing, cleaning, and feature engineering
for machine learning models. TFT is designed to be used as part of the TensorFlow Extended (TFX) platform,
which is a production-ready framework for building scalable and deployable machine learning pipelines.
TFT provides a set of pre-built transformations for cleaning and preprocessing data, such as scaling, normalization,
and one-hot encoding. It also enables custom transformations to be created and added to the pipeline, allowing
developers to tailor the preprocessing steps to the specific needs of their application.
One of the key features of TFT is its ability to process data in a distributed and scalable manner. It uses Apache
Beam, a distributed data processing framework, to parallelize the preprocessing steps across multiple machines.
This makes it possible to handle large datasets and improve the efficiency of the preprocessing pipeline.
TFT also integrates with other TensorFlow components, such as Estimators and Keras, making it easy to
incorporate data preprocessing and feature engineering into the overall machine learning workflow.
Overall, TF Transform is a powerful tool for data preprocessing and feature engineering in TensorFlow, enabling
developers to clean and transform data in a distributed and scalable manner, while also providing the flexibility
to create custom transformations tailored to their specific application.
The TensorFlow Datasets (TFDS) project is a collection of preprocessed, ready-to-use datasets for machine learning
applications, provided by the TensorFlow team. The goal of TFDS is to make it easy for developers to access and
use high-quality datasets, without the need for extensive preprocessing or data cleaning.
TFDS includes a wide range of datasets, covering topics such as computer vision, natural language processing, and
audio analysis. Some of the popular datasets available in TFDS include MNIST, CIFAR-10, and the IMDb movie
review dataset.
One of the key features of TFDS is its ease of use. The datasets are packaged as TensorFlow Dataset objects, which
can be easily imported into a TensorFlow workflow using a simple API. The datasets are also automatically split
into training, validation, and test sets, with consistent shuffling and batch sizes.
TFDS also provides a range of useful features for working with datasets, including data augmentation, caching,
and preprocessing. These features can help improve the quality of machine learning models, while also reducing
the time and effort required for data preprocessing.
Another benefit of TFDS is its community-driven approach. The project is open source, and developers are
encouraged to contribute new datasets, as well as improvements and bug fixes to existing datasets.
Using TensorFlow Datasets (TFDS) is a straightforward process that involves importing the dataset, configuring
any required parameters, and then using it in a TensorFlow workflow. Here's an example of how to use the
CIFAR-10 dataset in TensorFlow:
import tensorflow as tf
# Configure dataset
dataset = dataset.shuffle(1024).batch(32).prefetch(tf.data.AUTOTUNE)
# Define model
model = tf.keras.Sequential([
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation='softmax')
])
# Compile model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train model
model.fit(dataset, epochs=5)
In this example, we first import TensorFlow and TensorFlow Datasets. We then load the CIFAR-10 dataset using
the load function, specifying the train split and the with_info flag, which includes information about the dataset
such as the number of classes.
We then configure the dataset by shuffling the data, batching it into groups of 32, and prefetching it to improve
performance.
Next, we define a simple convolutional neural network model using the Keras Sequential API, and compile it with
the Adam optimizer and the sparse categorical cross-entropy loss function.
Finally, we train the model on the CIFAR-10 dataset using the fit function, passing in the configured dataset and
specifying the number of epochs to train for.
This is just a simple example, but it demonstrates how easy it is to use TensorFlow Datasets in a TensorFlow
workflow. By using preprocessed datasets like CIFAR-10, we can focus on building and training machine learning
models, rather than worrying about data preprocessing and cleaning.