0% found this document useful (0 votes)
17 views

unit-3_ml[1]

The document covers advanced supervised learning techniques in machine learning, focusing on neural networks, support vector machines, and K-nearest neighbors. It details the structure and functioning of neural networks, including perceptrons and multilayer perceptrons, as well as the principles behind support vector machines and their kernel functions. Additionally, it explains the K-NN algorithm, its implementation, advantages, and disadvantages.

Uploaded by

Devabn Nirmal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

unit-3_ml[1]

The document covers advanced supervised learning techniques in machine learning, focusing on neural networks, support vector machines, and K-nearest neighbors. It details the structure and functioning of neural networks, including perceptrons and multilayer perceptrons, as well as the principles behind support vector machines and their kernel functions. Additionally, it explains the K-NN algorithm, its implementation, advantages, and disadvantages.

Uploaded by

Devabn Nirmal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

MACHINE LEARNING

UNIT-III: Advanced Supervised Learning

Neural Networks: Introduction, Perceptron, Multilayer Perceptron,


Support vector machines: Linear and Non-Linear, Kernel Functions,
K-Nearest Neighbors. Probabilistic Models: Bayesian Learning,
Bayes Optimal Classifier, Naïve Bayes Classifier, Bayesian Belief
Networks.

Neural Networks:
A neural network consists of connected units or nodes called artificial neurons, which loosely model
the neurons in the brain. Artificial neuron models that mimic biological neurons more closely have also
been recently investigated and shown to significantly improve performance. These are connected
by edges, which model the synapses in the brain. Each artificial neuron receives signals from connected
neurons, then processes them and sends a signal to other connected neurons. The "signal" is a real
number, and the output of each neuron is computed by some non-linear function of the sum of its inputs,
called the activation function. The strength of the signal at each connection is determined by a weight,
which adjusts during the learning process.
Typically, neurons are aggregated into layers. Different layers may perform different transformations on
their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly
passing through multiple intermediate layers (hidden layers). A network is typically called a deep neural
network if it has at least two hidden layers.
Artificial neural networks are used for various tasks, including predictive modeling, adaptive control, and
solving problems in artificial intelligence. They can learn from experience, and can derive conclusions from
a complex and seemingly unrelated set of information.

Artificial and biological neurons:

Biological Neural Network Artificial Neural Network


Feature
(BNN) (ANN)

Digital units (nodes) that handle


Real neurons with dendrites,
Building Block numeric weights and
cell bodies, and axons.
activations.
Electrical impulses (action
Numeric values passed
Signal Type potentials) + chemical
between layers.
transmitters.

Local changes in synapse Data-driven adjustments of


Learning Process strength (e.g., neurons that weights guided by error
fire together wire together). calculations.

Continuous remodeling of Structured training phases that


Adaptation connections through growth depend on large datasets and
or pruning (plasticity). backpropagation.

Often requires powerful


About 20 watts to run a vast
Energy Use processors, which can draw far
parallel system.
more energy.

Sensitive to broken nodes


Can reroute signals if neurons
Fault Tolerance unless special redundancies are
or areas are damaged.
designed.

Distributed across connected Encoded in numeric weights;


Memory Storage neurons; shaped by daily may forget old tasks when
experiences. retrained.

Can compute very quickly on


Slower signal transfer but
Speed specialized hardware, but often
extensive parallelism.
in a more linear sequence.

Hard to measure individual


Sometimes seen as a “black
Interpretability neuron contributions in
box” of weights and biases.
complex thoughts.

In summary, while both biological neurons and artificial neurons share the foundational concept of
transmitting and processing signals to achieve specific outcomes, they differ in terms of their physical
structure, signalling mechanisms, learning processes, and overall computational capabilities.
The process of training a neural network involves providing it with a labeled dataset, known as training
data, and adjusting the weights of the connections iteratively to minimize the difference between the
predicted outputs and the actual target outputs. This optimization process is typically achieved using
gradient descent algorithms.
Think of each individual node as its own linear regression model, composed of input data, weights, a bias
(or threshold), and an output. The formula would look something like this:
∑wixi + bias = w1x1 + w2x2 + w3x3 + bias
Output = f(x) = 1,
if ∑w1x1 + b >= 0
0 otherwise // i.e., if ∑w1x1 + b < 0

Perceptron:
A perceptron is one of the simplest types of artificial neural networks. It's a mathematical model that's
used for binary classification tasks, where the goal is to separate input data into two classes. The
perceptron takes a set of inputs, each multiplied by corresponding weights, and then applies an activation
function to produce an output. This output is used to classify the input into one of the two classes. The
perceptron is very useful for classifying data sets that are linearly separable.

The basic structure of a perceptron consists of the following components:


Inputs: Each input is associated with a weight, which represents the importance of that input in the
classification process.
Weights: Each input is multiplied by a weight, and the weighted sum of inputs is calculated. Summation
Function: The weighted sum of inputs is passed through a summation function to calculate a value, which
is then used as the input to the activation function.
Activation Function: The activation function introduces non-linearity to the model. The most common
activation function used in perceptrons is the step function (also known as the Heaviside step function),
which produces a binary output based on whether the input is greater than a certain threshold.
Output: The output of the activation function serves as the final classification result. If the output exceeds
the threshold, the perceptron assigns one class to the input; otherwise, it assigns the other class.
The process of training a neural network involves providing it with a labeled dataset, known as training
data, and adjusting the weights of the connections iteratively to minimize the difference between the
predicted outputs and the actual target outputs. This optimization process is typically achieved using
gradient descent algorithms.
Multilayer Perceptron:
Multi-layer perception is also known as MLP. It is fully connected dense layers, which transform any input
dimension to the desired dimension.
A multi-layer perception is a neural network that has multiple layers. To create a neural network, we
combine neurons together so that the outputs of some neurons are inputs of other neurons.
The typical architecture of a neural network consists of three main types of layers:
Input Layer: This layer receives raw data or features as inputs and passes them on to the subsequent layers
for processing.
Hidden Layers: These intermediate layers process the input data through a series of weighted calculations
and activation functions. Each neuron in a hidden layer is connected to every neuron in the previous layer,
and their interconnectedness allows the network to learn intricate patterns in the data.
Output Layer: This layer produces the final output of the neural network. The number of neurons in the
output layer depends on the specific task the network is designed for. For example, a neural network for
image classification might have neurons corresponding to different classes (e.g., "cat," "dog," "car").
Neurons in each layer apply weights to the input data and sum up the results. The sum then undergoes an
activation function that introduces non-linearity into the network, allowing it to learn complex
relationships between input features. Common activation functions include the sigmoid function, ReLU
(Rectified Linear Unit), and tanh (hyperbolic tangent).

The algorithm for the MLP is as follows:


1. Just as with the perceptron, the inputs are pushed forward through the MLP by taking the dot product of
the input with the weights that exist between the input layer and the hidden layer (WH).
2. MLPs utilize activation functions at each of their calculated layers. There are many activation functions
to discuss: rectified linear units (ReLU), sigmoid function, tanh. Push the calculated output at the current
layer through any of these activation functions.
3. Once the calculated output at the hidden layer has been pushed through the activation function, push it
to the next layer in the MLP by taking the dot product with the corresponding weights.
4. Repeat steps 2 and 3 until the output layer is reached. 5. At the output layer, the calculations will either
be used for a backpropagation algorithm that corresponds to the activation function that was selected for
the MLP (in the case of training) or a decision will be made based on the output (in the case of testing).
Support Vector Machines:
A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and
regression tasks. It's particularly effective for binary classification problems, where the goal is to separate
data points into two classes. The primary objective of an SVM is to find a hyperplane (or decision
boundary) that best separates the data points belonging to different classes.
The "support vectors" in SVM refer to the data points that lie closest to the decision boundary. These
support vectors play a critical role in defining the decision boundary because they determine the position
and orientation of the hyperplane.
The key idea of SVM is to maximize the margin between the classes, which is the distance between the
decision boundary and the nearest data points from both classes. Maximizing this margin helps the SVM
achieve better generalization to unseen data and enhances its ability to handle outliers.
SVMs can handle both linearly separable and non-linearly separable data through the use of the "kernel
trick." The kernel trick involves transforming the original feature space into a higher-dimensional space,
where the data might become linearly separable. Common kernel functions include linear, polynomial,
radial basis function (RBF or Gaussian), and sigmoid kernels.
The applications of SVM are in Image classification, Natural language processing, credit scoring, Gesture
recognition, Handwritten recognition, fraud detection, Text classification and sentiment analysis.
The general steps of building an SVM model are as follows:
Data Collection and Preprocessing: Gather labeled data for training. Ensure the data is properly
preprocessed, which may involve scaling, normalization, and handling missing values.
Choosing a Kernel and Model Parameters: Select an appropriate kernel function and tune the associated
hyperparameters, such as the regularization parameter (C) and kernel-specific parameters.
Training the Model: Feed the labeled training data into the SVM algorithm. The algorithm finds the
hyperplane that best separates the classes while maximizing the margin and considering the support
vectors.
Prediction: Once the SVM model is trained, it can be used to make predictions on new, unseen data.
Evaluation: Assess the performance of the model using appropriate evaluation metrics (e.g., accuracy,
precision, recall, F1-score) on a separate validation or test dataset.
Fine-Tuning and Optimization: If the model's performance is not satisfactory, fine-tune the model by
adjusting the hyperparameters and exploring different kernel functions. SVMs are widely used in various
domains such as image classification, text categorization, bioinformatics, and finance due to their ability to
handle complex data and their theoretical grounding in optimization and statistical learning theory.
Linear and Non-Linear Kernels:
1. Linear Kernel:
 Decision Boundary:
 Form: The linear kernel produces a decision boundary that is a hyperplane in the feature space. This
hyperplane separates data points from different classes in a linear fashion.
 Assumption: It assumes that the relationship between the features and the target variable is linear.
 Use Cases:
 Linearly Separable Data: Linear kernels are most effective when the data can be effectively
separated by a straight line, plane (in 3D), or hyperplane (in higher dimensions).
 Simplicity: They are computationally less expensive and are suitable for simple, linear relationships.
 Example:
 2D Data: Imagine a dataset with two features, and classes are separated by a straight line.
2. Non-linear Kernel:
 Decision Boundary:
 Form: Non-linear kernels, such as Polynomial or Radial Basis Function (RBF) kernels, allow for more
complex decision boundaries. These can be curves, circles, or more intricate shapes.
 Flexibility: They provide more flexibility in capturing complex relationships in the data.
 Use Cases:
 Non-linear Relationships: Non-linear kernels are beneficial when the relationship between features
and the target variable is not linear. They can capture more intricate patterns.
 Complex Data: In scenarios where the data is not easily separable with a straight line, a non-linear
kernel can perform better.
 Example:
 Circles or Spirals: Imagine a dataset where classes are arranged in circles or spirals, and a straight
line cannot effectively separate them.
Common non-linear kernel functions include:
1. Polynomial Kernel:
 Decision Boundary:
 Form: The polynomial kernel introduces non-linearity by using polynomial functions of the original
features. The decision boundary can have curves and turns.
 Degree Parameter: The degree of the polynomial is a parameter that determines the complexity of
the decision boundary.
 Use Cases:
 Polynomial Relationships: Suitable for data with polynomial relationships between features and
classes.
 Example:
 Data with Curves: Consider a dataset where the relationship between features and classes follows a
polynomial curve.
2. Radial Basis Function (RBF) Kernel:
 Decision Boundary:
 Form: RBF kernel, also known as Gaussian kernel, creates a decision boundary based on the
similarity of data points to a reference point (or points). It can create complex, non-linear decision
boundaries.
 Bandwidth Parameter: The bandwidth parameter determines the reach or influence of a data point
in defining the decision boundary.
 Use Cases:
 Complex Relationships: Effective for capturing complex and non-linear relationships in the data.
 Example:
 Data with Clusters: In a dataset with clusters of data points, the RBF kernel can create decision
boundaries around these clusters.
3. Overfitting and Underfitting:
 Linear Kernel: Linear kernels may underfit if the data has complex non-linear patterns.
 Non-linear Kernel: Non-linear kernels, especially with a high degree or bandwidth, can lead to
overfitting if not properly tuned.
4. Computational Complexity:
 Linear Kernel: Training SVM with a linear kernel is often computationally less expensive.
 Non-linear Kernel: Non-linear kernels, especially RBF, can be computationally more intensive,
especially with large datasets.
Pros and cons of SVM:
Pros:
• It is really effective in the higher dimension.
• Effective when the number of features is more than training examples.
• Best algorithm when classes are separable
• The hyperplane is affected by only the support vectors thus outliers have less impact. SVM is suited for
extreme case binary classification.
Cons:
• For larger dataset, it requires a large amount of time to process.
• Does not perform well in case of overlapped classes.
• Selecting, appropriately hyperparameters of the SVM that will allow for sufficient generalization
performance.
• Selecting the appropriate kernel function can be tricky.

K-Nearest Neighbours Algorithm:


K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique. K-NN algorithm assumes the similarity between the new case/data and available cases and put
the new case into the category that is most similar to the available categories.
K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified into a well suite category by using K- NN
algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems. K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbours
o Step-2: Calculate the Euclidean distance of K number of neighbours
o Step-3: Take the K nearest neighbours as per the calculated Euclidean distance.
o Step-4: Among these k neighbours, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is maximum. o
Step-6: Our model is ready.
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so
this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:
some points to remember while selecting the value of K in the K-NN algorithm:
o There is no particular way to determine the best value for "K", so we need to try some values to find the
best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model. o
Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data points for all the
training samples.
Bayesian Learning
Bayesian learning is a statistical approach used to model uncertainty and update beliefs based on new
evidence. It employs Bayesian reasoning, which provides a probabilistic framework for inference. This
approach is based on the assumption that the quantities of interest are governed by probability
distributions, and that optimal decisions can be made by reasoning about these probabilities in conjunction
with observed data.
Bayesian learning methods are significant in the study of machine learning for two main reasons:
1. Practical Application: Bayesian algorithms that compute explicit probabilities for hypotheses—such
as the Naive Bayes classifier—are among the most practical and effective methods for solving
certain types of learning problems.
2. Theoretical Insight: Bayesian methods also offer a valuable perspective for understanding many
learning algorithms that do not explicitly handle probabilities. This probabilistic viewpoint helps to
deepen the theoretical understanding of various machine learning techniques.
It's commonly applied in machine learning, especially in problems involving classification and regression.
An important concept of Bayes theorem named Bayesian method is used to calculate conditional
probability in Machine Learning application that includes classification tasks.
Bayes theorem is also known with some other name such as Bayes rule or Bayes Law. Bayes theorem helps
to determine the probability of an event with random knowledge. It is used to calculate the probability of
occurring one event while other one already occurred. It is a best method to relate the condition
probability and marginal probability.
Naïve Bayes Classification:
Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for
solving classification problems. It is mainly used in text classification that includes a high-dimensional
training dataset. Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick predictions. It is a
probabilistic classifier, which means it predicts on the basis of the probability of an object. Some popular
examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles.
Naive Bayes classifier is a probabilistic machine learning algorithm that is used for classification tasks. It is
based on Bayes' theorem and makes the assumption that the features used to predict the class labels are
independent of each other, which is a strong and often unrealistic assumption. Despite this simplifying
assumption, Naive Bayes classifiers work surprisingly well in many real-world situations, especially in text
classification problems.

Naive Bayes Algorithm


The Naive Bayes algorithm is a probabilistic machine learning model based on Bayes' theorem. It is
particularly effective for classification tasks. The algorithm works in two main phases: Training and
Prediction.
1. Training Phase:
Step 1: Calculate Class Priors (Prior Probabilities of Classes)
 Determine the probability of each class occurring in the dataset.
 This is done by dividing the number of instances in each class by the total number of instances.
Step 2: Calculate Likelihoods of Features
 For each feature, compute the probability of that feature occurring given a specific class.
 These are called the likelihoods and are calculated for all features across all classes.
2. Prediction Phase:
Step 1: Calculate Posterior Probabilities for Each Class
 For a new data point, use Bayes' theorem to calculate the posterior probability of it belonging to
each class.
 This involves multiplying the prior probability of each class by the likelihoods of the observed
features.
Step 2: Predict the Class
 The algorithm assigns the class label with the highest posterior probability as the predicted class
for the new data point.
Bayesian Belief Networks:
In machine learning, a Bayesian belief network (BBN), also known as a Bayesian network, is a probabilistic
graphical model that represents the probabilistic relationships among a set of variables.
BBNs are used for various tasks, including classification, regression, and decision-making, especially when
dealing with uncertain or incomplete information.
BBN is a probabilistic graphical model (PGM) that represents conditional dependencies between variables
through Directed Acyclic Graph (DAG). Moreover, it is suitable for representing probabilistic relation
between multiple events (more than 2 events).
It is a classifier with no dependency on attributes i.e it is condition independent. Due to its feature of joint
probability, the probability in Bayesian Belief Network is derived, based on a condition —
P(attribute/parent) i.e probability of an attribute, true over parent attribute.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy