0% found this document useful (0 votes)

142 views

Machine Learning in 10 Pages PDF

The document provides information about bias-variance tradeoff and dimensionality reduction using PCA. It defines bias and variance in machine learning models and explains how the tradeoff between the two impacts model performance. It also outlines the steps for performing principal component analysis (PCA) to reduce the dimensionality of datasets while preserving as much information as possible. PCA determines orthogonal feature vectors that maximize data variance and can be used to select the most important features.

Uploaded by

Pratyush

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views

Machine Learning in 10 Pages PDF

Uploaded by

Pratyush

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Cheat Sheet – Bias-Variance Tradeoff

What is Bias?
• Error between average model prediction and ground truth
• The bias of the estimated function tells us the capacity of the underlying model to
predict the values
What is Variance?
• Average variability in the model prediction for the given dataset
• The variance of the estimated function tells you how much the function can adjust
to the change in the dataset
High Bias Overly-simplified Model
Under-fitting
High error on both test and train data

High Variance Overly-complex Model

Over-fitting
Low error on train data and high on test
Starts modelling the noise in the input

High Bias Low Bias
Low Variance High Variance
Low Bias High Bias
Low Variance

Minimum Error

Bias
Variance
Error
High Variance

Under-fitting Just Right Over-fitting

Preferred if size  Preferred if size 
of dataset is small of dataset is large
Bias variance Trade-off
• Increasing bias (not always) reduces variance and vice-versa
• Error = bias2 + variance +irreducible error
• The best model is where the error is reduced.
• Compromise between bias and variance
Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here
Cheat Sheet – Imbalanced Data in Classification
Blue: Label 1

Green: Label 0 Correct Predictions

Accuracy =
Total Predictions
Classifier that always predicts label blue yields prediction accuracy of 90%
Accuracy doesn’t always give the correct insight about your trained model
Accuracy: %age correct prediction Correct prediction over total predictions One value for entire network
Precision: Exactness of model From the detected cats, how many were Each class/label has a value
actually cats
Recall: Completeness of model Correctly detected cats over total cats Each class/label has a value
F1 Score: Combines Precision/Recall Harmonic mean of Precision and Recall Each class/label has a value

Performance metrics associated with Class 1

(Is your prediction correct?) (What did you predict)
Actual Labels True Negative
1 0
(Your prediction is correct) (You predicted 0)
TP FP
True False
Predicted Labels

Precision = False +ve rate =

Positive Positive TP + FP TN + FP
(Prec x Rec) TP + TN
F1 score = 2x Accuracy =
(Prec + Rec) TP + FN + FP + TN
False True
0

Negative Negative TN TP
Specificity = Recall, Sensitivity =
TN + FP True +ve rate TP + FN

Possible solutions
1. Data Replication: Replicate the available data until the Blue: Label 1
number of samples are comparable Green: Label 0
2. Synthetic Data: Images: Rotate, dilate, crop, add noise to Blue: Label 1
existing input images and create new data Green: Label 0
3. Modified Loss: Modify the loss to reflect greater error when !"## = % ∗ '())!"##$ + + ∗ '())%&'# %>+
misclassifying smaller sample set
4. Change the algorithm: Increase the model/algorithm complexity so that the two classes are perfectly
separable (Con: Overfitting)
Increase model
complexity

No straight line (y=ax) passing through origin can perfectly Straight line (y=ax+b) can perfectly separate data.
separate data. Best solution: line y=0, predict all labels blue Green class will no longer be predicted as blue

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Cheat Sheet – PCA Dimensionality Reduction
What is PCA?
• Based on the dataset find a new set of orthogonal feature vectors in such a way that the
data spread is maximum in the direction of the feature vector (or dimension)
• Rates the feature vector in the decreasing order of data spread (or variance)
• The datapoints have maximum variance in the first feature vector, and minimum variance
in the last feature vector
• The variance of the datapoints in the direction of feature vector can be termed as a
measure of information in that direction.
Steps
1. Standardize the datapoints
2. Find the covariance matrix from the given datapoints
3. Carry out eigen-value decomposition of the covariance matrix
4. Sort the eigenvalues and eigenvectors

Dimensionality Reduction with PCA

• Keep the first m out of n feature vectors rated by PCA. These m vectors will be the best m
vectors preserving the maximum information that could have been preserved with m
vectors on the given dataset
Steps:
1. Carry out steps 1-4 from above
2. Keep first m feature vectors from the sorted eigenvector matrix
3. Transform the data for the new basis (feature vectors)
4. The importance of the feature vector is proportional to the magnitude of the eigen value

Figure 1 Figure 2
Feature # 1 (F1)

FeFeature # 1

Variance
Variance

1
e#

2
ur

#
re
at

atu
Fe
ew

w
Ne
N

F2 F1 Feature # 2 (F2) Feature # 2 F2 F1

Figure 3 Figure 1: Datapoints with feature vectors as

x and y-axis
Figure 2: The cartesian coordinate system is
rotated to maximize the standard deviation
Variance
Fe Feature # 1

along any one axis (new feature # 2)

1
e#

2 Figure 3: Remove the feature vector with

e# minimum standard deviation of datapoints

ur
at

at
Fe F2 F2 (new feature # 1) and project the data on
w
w

Ne
Ne

Feature # 2 new feature # 2

Source: https://www.cheatsheets.aqeel-anwar.com
Cheat Sheet – Bayes Theorem and Classifier
What is Bayes’ Theorem?
• Describes the probability of an event, based on prior knowledge of conditions that might be
related to the event.

P(A B)
• How the probability of an event changes when
we have knowledge of another event Posterior
Probability
P(A) P(A B)
Usually, a better
estimate than P(A)
Bayes’ Theorem
Example
• Probability of fire P(F) = 1%
• Probability of smoke P(S) = 10%
Likelihood P(A) Evidence
• Prob of smoke given there is a fire P(S F) = 90%
• What is the probability that there is a fire given P(B A) Prior P(B)
we see a smoke P(F S)? Probability

Maximum Aposteriori Probability (MAP) Estimation

The MAP estimate of the random variable y, given that we have observed iid (x1, x2, x3, … ), is
given by. We try to accommodate our prior knowledge when estimating.
ˆMAP y that maximizes the product of
prior and likelihood

Maximum Likelihood Estimation (MLE)

The MAP estimate of the random variable y, given that we have observed iid (x1, x2, x3, … ), is
given by. We assume we don’t have any prior knowledge of the quantity being estimated.
ˆ y that maximizes only the
MLE
likelihood
MLE is a special case of MAP where our prior is uniform (all values are equally likely)

Naïve Bayes’ Classifier (Instantiation of MAP as classifier)

Suppose we have two classes, y=y1 and y=y2. Say we have more than one evidence/features (x1,
x2, x3, … ), using Bayes’ theorem

Naïve Bayes’ theorem assumes the features (x1, x2, … ) are i.i.d. i.e

Source: https://www.cheatsheets.aqeel-anwar.com
Cheat Sheet – Regression Analysis
What is Regression Analysis?
Fitting a function f(.) to datapoints yi=f(xi) under some error function. Based on the estimated
function and error, we have the following types of regression
1. Linear Regression:
Fits a line minimizing the sum of mean-squared error
for each datapoint.
2. Polynomial Regression:
Fits a polynomial of order k (k+1 unknowns) minimizing
the sum of mean-squared error for each datapoint.
3. Bayesian Regression:
For each datapoint, fits a gaussian distribution by
minimizing the mean-squared error. As the number of
data points xi increases, it converges to point
estimates i.e.
4. Ridge Regression:
Can fit either a line, or polynomial minimizing the sum
of mean-squared error for each datapoint and the
weighted L2 norm of the function parameters beta.
5. LASSO Regression:
Can fit either a line, or polynomial minimizing the the
sum of mean-squared error for each datapoint and the
weighted L1 norm of the function parameters beta.
6. Logistic Regression:
Can fit either a line, or polynomial with sigmoid
activation minimizing the binary cross-entropy loss for
each datapoint. The labels y are binary class labels.
Visual Representation:
Linear Regression Polynomial Regression Bayesian Linear Regression Logistic Regression
Label 1
y
y

Label 0

x x x x

Summary:
What does it fit? Estimated function Error Function
Linear A line in n dimensions
Polynomial A polynomial of order k
Bayesian Linear Gaussian distribution for each point
Ridge Linear/polynomial
LASSO Linear/polynomial
Logistic Linear/polynomial with sigmoid

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Bias
Cheat Sheet – Regularization in ML Variance
Error

What is Regularization in ML?

• Regularization is an approach to address over-fitting in ML.
• Overfitted model fails to generalize estimations on test data
• When the underlying model to be learned is low bias/high
variance, or when we have small amount of data, the Under-fitting Just Right Over-fitting
estimated model is prone to over-fitting. Preferred if size  Preferred if size 
of dataset is small of dataset is large
• Regularization reduces the variance of the model
Types of Regularization: Figure 1. Overfitting
1. Modify the loss function:
• L2 Regularization: Prevents the weights from getting too large (defined by L2 norm). Larger
the weights, more complex the model is, more chances of overfitting.

• L1 Regularization: Prevents the weights from getting too large (defined by L1 norm). Larger
the weights, more complex the model is, more chances of overfitting. L1 regularization
introduces sparsity in the weights. It forces more weights to be zero, than reducing the the
average magnitude of all weights

• Entropy: Used for the models that output probability. Forces the probability distribution
towards uniform distribution.

2. Modify data sampling:

• Data augmentation: Create more data from available data by randomly cropping, dilating,
rotating, adding small amount of noise etc.
• K-fold Cross-validation: Divide the data into k groups. Train on (k-1) groups and test on 1
group. Try all k possible combinations.

3. Change training approach:

• Injecting noise: Add random noise to the weights when they are being learned. It pushes the
model to be relatively insensitive to small variations in the weights, hence regularization
• Dropout: Generally used for neural networks. Connections between consecutive layers are
randomly dropped based on a dropout-ratio and the remaining network is trained in the
current iteration. In the next iteration, another set of random connections are dropped.
5-fold cross-validation Original Network Dropout-ratio = 30%
Test Train
Train Test Train

Train Test Train

Train Test Connections = 16 Active = 11 (70%) Active = 11 (70%)

Figure 2. K-fold CV Figure 3. Drop-out

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here
Cheat Sheet – Convolutional Neural Network
Convolutional Neural Network:
The data gets into the CNN through the input layer and passes
through various hidden layers before getting to the output layer.
The output of the network is compared to the actual labels in
terms of loss or error. The partial derivatives of this loss w.r.t the
trainable weights are calculated, and the weights are updated
through one of the various methods using backpropagation.

CNN Template:
Most of the commonly used hidden layers (not all) follow a
pattern
1. Layer function: Basic transforming function such as
convolutional or fully connected layer.
a. Fully Connected: Linear functions between the input and the
output.
a. Convolutional Layers: These layers are applied to 2D (3D) input feature maps. The trainable weights are a 2D (3D)
kernel/filter that moves across the input feature map, generating dot products with the overlapping region of the input
feature map.
b.Transposed Convolutional (DeConvolutional) Layer: Usually used to increase the size of the output feature map
(Upsampling) The idea behind the transposed convolutional layer is to undo (not exactly) the convolutional layer
Fully Connected Layer Convolutional Layer
w11*x
x1 1+ b1
+ b1 y1
w21*x2
x2
b1
3+
1*x
x3 w3

Input Node Output Node Input Map Kernel Output Map

2. Pooling: Non-trainable layer to change the size of the feature map

a. Max/Average Pooling: Decrease the spatial size of the input layer based on
selecting the maximum/average value in receptive field defined by the kernel
b. UnPooling: A non-trainable layer used to increase the spatial size of the input
layer based on placing the input pixel at a certain index in the receptive field
of the output defined by the kernel.
3. Normalization: Usually used just before the activation functions to limit the
unbounded activation from increasing the output layer values too high
a. Local Response Normalization LRN: A non-trainable layer that square-normalizes the pixel values in a feature map
within a local neighborhood.
b. Batch Normalization: A trainable approach to normalizing the data by learning scale and shift variable during training.
3. Activation: Introduce non-linearity so CNN can 5. Loss function: Quantifies how far off the CNN prediction
efficiently map non-linear complex mapping. is from the actual labels.
a. Non-parametric/Static functions: Linear, ReLU a. Regression Loss Functions: MAE, MSE, Huber loss
b. Parametric functions: ELU, tanh, sigmoid, Leaky ReLU b. Classification Loss Functions: Cross entropy, Hinge loss
c. Bounded functions: tanh, sigmoid 4.0
MSE Loss
2.0
MAE Loss
2.0 Ω
Huber Loss
æ
mse = (x ° x̂)2 mae = |x ° x̂| 1
2 (x ° x̂)
2
: |x ° x̂| < ∞
3.5 1.75 1.75 ∞|x ° x̂| ° 12 ∞ 2 : else
∞ =1.9
3.0 1.5 1.5
2.5 1.25 1.25
2.0 1.0 1.0
1.5 0.75 0.75
1.0 0.5 0.5
0.5 0.25 0.25
0.0 0.0 0.0
-2.0 -1.0 0.0 1.0 2.0 -2.0 -1.0 0.0 1.0 2.0 -2.0 -1.0 0.0 1.0 2.0

Hinge Loss Cross Entropy Loss

1.0
3.0 Ω æ
max(0, 1 ° x̂) : x = 1 °ylog(p) ° (1 ° y)log(1 ° p)
2.5
max(0, 1 + x̂) : x = °1 8.0 0.8

2.0 6.0 0.6

1.5
4.0 0.4
1.0
2.0
0.5 0.2

0.0 0.0 0.0

-2.0 -1.0 0.0 1.0 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Cheat Sheet – Famous CNNs
AlexNet – 2012
Why: AlexNet was born out of the need to improve the results of
the ImageNet challenge.
What: The network consists of 5 Convolutional (CONV) layers and 3
Fully Connected (FC) layers. The activation used is the Rectified
Linear Unit (ReLU).
How: Data augmentation is carried out to reduce over-fitting, Uses
Local response localization.

VGGNet – 2014
Why: VGGNet was born out of the need to reduce the # of
parameters in the CONV layers and improve on training time
What: There are multiple variants of VGGNet (VGG16, VGG19, etc.)
How: The important point to note here is that all the conv kernels are
of size 3x3 and maxpool kernels are of size 2x2 with a stride of two.

ResNet – 2015
Why: Neural Networks are notorious for not being able to find a
simpler mapping when it exists. ResNet solves that.
What: There are multiple versions of ResNetXX architectures where
‘XX’ denotes the number of layers. The most used ones are ResNet50
and ResNet101. Since the vanishing gradient problem was taken care of
(more about it in the How part), CNN started to get deeper and deeper
How: ResNet architecture makes use of shortcut connections do solve
the vanishing gradient problem. The basic building block of ResNet is
a Residual block that is repeated throughout the network.
Filter
Concatenation
Weight layer

f(x) x 1x1
3x3
Conv
5x5
Conv
1x1 Conv

Weight layer Conv 1x1 1x1 3x3

Conv Conv Maxpool

+ Previous
f(x)+x Layer

Figure 1 ResNet Block Figure 2 Inception Block

Inception – 2014
Why: Lager kernels are preferred for more global features, on the other
hand, smaller kernels provide good results in detecting area-specific
features. For effective recognition of such a variable-sized feature, we
need kernels of different sizes. That is what Inception does.
What: The Inception network architecture consists of several inception
modules of the following structure. Each inception module consists of
four operations in parallel, 1x1 conv layer, 3x3 conv layer, 5x5 conv
layer, max pooling
How: Inception increases the network space from which the best
network is to be chosen via training. Each inception module can
capture salient features at different levels.

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Cheat Sheet – Ensemble Learning in ML
What is Ensemble Learning? Wisdom of the crowd
Combine multiple weak models/learners into one predictive model to reduce bias, variance and/or improve accuracy.

Types of Ensemble Learning: N number of weak learners

1.Bagging: Trains N different weak models (usually of same types – homogenous) with N non-overlapping subset of the
input dataset in parallel. In the test phase, each model is evaluated. The label with the greatest number of predictions is
selected as the prediction. Bagging methods reduces variance of the prediction

2.Boosting: Trains N different weak models (usually of same types – homogenous) with the complete dataset in a
sequential order. The datapoints wrongly classified with previous weak model is provided more weights to that they can
be classified by the next weak leaner properly. In the test phase, each model is evaluated and based on the test error of
each weak model, the prediction is weighted for voting. Boosting methods decreases the bias of the prediction.

3.Stacking: Trains N different weak models (usually of different types – heterogenous) with one of the two subsets of the
dataset in parallel. Once the weak learners are trained, they are used to trained a meta learner to combine their
predictions and carry out final prediction using the other subset. In test phase, each model predicts its label, these set of
labels are fed to the meta learner which generates the final prediction.

The block diagrams, and comparison table for each of these three methods can be seen below.
Ensemble Method – Boosting Ensemble Method – Bagging
Input Dataset Step #1 Input Dataset
Step #1 Create N subsets
Assign equal weights Complete dataset from original Subset #1 Subset #2 Subset #3 Subset #4
to all the datapoints dataset, one for each
in the dataset weak model

Uniform weights
Step #2
Train each weak
Weak Model Weak Model Weak Model Weak Model
Step #2a Step #2b model with an
Train a weak model Train Weak • Based on the final error on the independent #1 #2 #3 #4
with equal weights to trained weak model, calculate a subset, in
Model #1 parallel
all the datapoints scalar alpha.
• Use alpha to increase the weights of
wrongly classified points, and
decrease the weights of correctly
alpha1 Adjusted weights classified points
Step #3
In the test phase, predict from
each weak model and vote their Voting
Step #3b predictions to get final prediction
Step #3a Train Weak • Based on the final error on the
Train a weak model Model #2 trained weak model, calculate a
with adjusted weights scalar alpha.
on all the datapoints • Use alpha to increase the weights of
in the dataset wrongly classified points, and Final Prediction
decrease the weights of correctly
alpha2 Adjusted weights classified points

Train Weak Ensemble Method – Stacking

Model #3
Step #1
Create 2 subsets from Input Dataset
original dataset, one
for training weak Subset #1 – Weak Learners Subset #3#2 – Meta Learner
Subset
alpha3 Adjusted weights models and one for
meta-model

Train Weak
Step #(n+1)a Model #4 Step #2
Train a weak model Train each weak
with adjusted weights model with the
Train Weak Train Weak Train Weak Train Weak
on all the datapoints weak learner Model #1 Model #2 Model #3 Model #4
in the dataset dataset
alpha3

x x x x Input Dataset
Subset #1 – Weak Learners Subset #2 – Meta Learner
Step #n+2
In the test phase, predict from each
weak model and vote their predictions
weighted by the corresponding alpha to
get final prediction Step #3
Voting Train a meta-
learner for which Trained Weak Trained Weak Trained Weak Trained Weak
the input is the
outputs of the Model Model Model Model
weak models for #1 #2 #3 #4
the Meta Learner
dataset
Final Prediction

Parameter Bagging Boosting Stacking

Meta Model
Focuses on Reducing variance Reducing bias Improving accuracy
Nature of weak
Homogenous Homogenous Heterogenous Step #4
learners is In the test phase, feed the input to the
weak models, collect the output and feed
Weak learners are Learned voting it to the meta model. The output of the
Final Prediction
Simple voting Weighted voting meta model is the final prediction
aggregated by (meta-learner)

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Cheat Sheet – Autoencoder & Variational Autoencoder
Context – Data Compression
• Data compression is an essential phase in training a network. The idea is to compress the data so
that the same amount of information can be represented by fewer bits.
Auto Encoder (AE)
• Autoencoder is used to learn efficient embeddings of unlabeled data for a given network
configuration. It consists of two parts, an encoder, and a decoder.
• The encoder compresses the data from a higher-dimensional space to a lower-dimensional space (also
called the latent space),while the decoder converts the latent space back to higher-dimensional space.
• The entire encoder-decoder architecture is collectively trained on the loss function which encourages
that the input is reconstructed at the output. Hence the loss function is the mean squared error
between the encoder input and the decoder output.
• The latent variable is not regularized. Picking a random latent variable will generate garbage output.
• Latent variable is deterministic values and the space lacks the generative capability

Autoencoder – Block Diagram

encoder decoder

latent vector
reconstructed
input
input

Variational Auto Encoder (VAE)

• Variational autoencoder addresses the issue of non-regularized latent space in
autoencoder and provides the generative capability to the entire space.
• Instead of outputting the vectors in the latent space, the encoder of VAE outputs parameters of a
pre-defined distribution in the latent space for every input.
• The VAE then imposes a constraint on this latent distribution forcing it to be a normal distribution.
• The latent variable in the compressed form is mean and variance
• The training loss of VAE is defined as the sum of the reconstruction loss and the similarity loss (the
KL divergence between the unit gaussian and decoder output distribution.
• The latent variable is smooth and continuous i.e., random values of latent variable generates
meaningful output at the decoder, hence the latent space has generative capabilities.
• The input of the decoder is sampled from a gaussian with mean/variance of the output of encoder.
Variational Autoencoder – Block Diagram

encoder decoder

sampling
latent
latent vector
distribution
reconstructed
input
input

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Desriptive Statistics - Zarni Amri
No ratings yet
Desriptive Statistics - Zarni Amri
57 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Approaches To The Analysis of Survey Data PDF
No ratings yet
Approaches To The Analysis of Survey Data PDF
28 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Exploratory Data Analysis
100% (3)
Exploratory Data Analysis
791 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
No ratings yet
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
12 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
Linear Algebra For Business Analytics
No ratings yet
Linear Algebra For Business Analytics
27 pages
01 ASAP TimeSeriesForcasting Day1 2 Introduction
No ratings yet
01 ASAP TimeSeriesForcasting Day1 2 Introduction
62 pages
POL BigDataStatisticsJune2014
No ratings yet
POL BigDataStatisticsJune2014
27 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
Exploratory Factor Analysis
100% (1)
Exploratory Factor Analysis
33 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Multivariate Analysis IBS
No ratings yet
Multivariate Analysis IBS
20 pages
In-Class Practices - Session 1 - Answers
No ratings yet
In-Class Practices - Session 1 - Answers
19 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Eda PDF
100% (1)
Eda PDF
45 pages
Data Presentation and Analysis
No ratings yet
Data Presentation and Analysis
71 pages
Review Article: Data Mining For The Internet of Things: Literature Review and Challenges
No ratings yet
Review Article: Data Mining For The Internet of Things: Literature Review and Challenges
14 pages
Cluster
100% (1)
Cluster
72 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Data Science Analytics For Ordinary People PDF
No ratings yet
Data Science Analytics For Ordinary People PDF
199 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Different Types of Regression Models
No ratings yet
Different Types of Regression Models
18 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Introduction To Data Science 5-13
No ratings yet
Introduction To Data Science 5-13
19 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Course Title: Data Pre-Processing and Visualization
100% (2)
Course Title: Data Pre-Processing and Visualization
11 pages
Glossary of Statistical Terms: Roger Stern, Ian Dale and Sandro Leidi
No ratings yet
Glossary of Statistical Terms: Roger Stern, Ian Dale and Sandro Leidi
23 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
PPTs of Business Analytics
No ratings yet
PPTs of Business Analytics
22 pages
Pert 7 - Ethics and Privacy
No ratings yet
Pert 7 - Ethics and Privacy
18 pages
Random Forest
No ratings yet
Random Forest
5 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Practical Statistics For Geoscientists
No ratings yet
Practical Statistics For Geoscientists
180 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
11-12 Big Data Concepts and Tools
No ratings yet
11-12 Big Data Concepts and Tools
30 pages
Implications of Predictive Analytics
No ratings yet
Implications of Predictive Analytics
9 pages
Chapter02 - Nature of Data, Statistical Modelling, and Visualization
No ratings yet
Chapter02 - Nature of Data, Statistical Modelling, and Visualization
102 pages
Poly
100% (1)
Poly
108 pages
Binary Classification Tutorial With The Keras Deep Learning Library
No ratings yet
Binary Classification Tutorial With The Keras Deep Learning Library
33 pages
Confusion Matrix: Prof. Asim Tewari IIT Bombay
No ratings yet
Confusion Matrix: Prof. Asim Tewari IIT Bombay
8 pages
AD8552 UNIT 1 Machine Learning
No ratings yet
AD8552 UNIT 1 Machine Learning
17 pages
Chapter 1. Introduction
100% (2)
Chapter 1. Introduction
39 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Math LP Q1
No ratings yet
Math LP Q1
5 pages
Thesis Survey Questionaire
No ratings yet
Thesis Survey Questionaire
4 pages
Defining Leadership David Carl Wilson
No ratings yet
Defining Leadership David Carl Wilson
31 pages
6208Download full General Mathematics Units 1 2 for Queensland 1st Edition Peter Jones ebook all chapters
100% (2)
6208Download full General Mathematics Units 1 2 for Queensland 1st Edition Peter Jones ebook all chapters
52 pages
Assembly 1 Drawing 1
No ratings yet
Assembly 1 Drawing 1
1 page
Ask03 Linear-Programming
No ratings yet
Ask03 Linear-Programming
36 pages
All Sem
No ratings yet
All Sem
22 pages
Audit Criteria: AC7114/1S REV. M
No ratings yet
Audit Criteria: AC7114/1S REV. M
47 pages
A Chronology For Geomorphological Developments in Greater Bandung Area
No ratings yet
A Chronology For Geomorphological Developments in Greater Bandung Area
15 pages
Can Knitting Socks Be Scholarly Research
0% (1)
Can Knitting Socks Be Scholarly Research
127 pages
3023-Article Text-13376-1-10-20230813
No ratings yet
3023-Article Text-13376-1-10-20230813
6 pages
Working Principles Need of Dynamic Modeling and Accurate Control of Motor Drives Dynamic Modeling of DC Machine Control of DC Machine
No ratings yet
Working Principles Need of Dynamic Modeling and Accurate Control of Motor Drives Dynamic Modeling of DC Machine Control of DC Machine
23 pages
B.Tech 4th G Scheme
No ratings yet
B.Tech 4th G Scheme
2 pages
Astm D790 - Dma Q
No ratings yet
Astm D790 - Dma Q
3 pages
HIRARC - BEMS (Loading of Goods)
No ratings yet
HIRARC - BEMS (Loading of Goods)
4 pages
Full Computer Vision-Based Agriculture Engineering 1st Edition Han Zhongzhi (Author) Ebook All Chapters
100% (3)
Full Computer Vision-Based Agriculture Engineering 1st Edition Han Zhongzhi (Author) Ebook All Chapters
62 pages
Basic Calculus Module 1 Second Semester Quarter 3
No ratings yet
Basic Calculus Module 1 Second Semester Quarter 3
6 pages
2022 - 04 - 22 Direct Time Study Problems
No ratings yet
2022 - 04 - 22 Direct Time Study Problems
14 pages
Classification of Survey
No ratings yet
Classification of Survey
20 pages
Mary Anning Statue Secured
No ratings yet
Mary Anning Statue Secured
2 pages
High Quality With High Speed: Multivane XD
No ratings yet
High Quality With High Speed: Multivane XD
2 pages
Schacht 1990
No ratings yet
Schacht 1990
23 pages
Immediate download (Ebook) Applied Calculus by Geoffrey C. Berresford, Andrew M. Rockett ISBN 9781305085312, 1305085310 ebooks 2024
100% (10)
Immediate download (Ebook) Applied Calculus by Geoffrey C. Berresford, Andrew M. Rockett ISBN 9781305085312, 1305085310 ebooks 2024
65 pages
Y8 Portions - First Term Assessment
No ratings yet
Y8 Portions - First Term Assessment
5 pages
Supplement Soal-Soal Hidrodinamika
No ratings yet
Supplement Soal-Soal Hidrodinamika
48 pages
Mapeh English WHLP Quarter1 Week9 Module8
No ratings yet
Mapeh English WHLP Quarter1 Week9 Module8
8 pages
ME-352 (Machanics of Machinery Sessional)
No ratings yet
ME-352 (Machanics of Machinery Sessional)
5 pages
Camera Parts
No ratings yet
Camera Parts
11 pages
AO 07 of DENR
No ratings yet
AO 07 of DENR
16 pages
Ayush Singh, Physics, Roll No. 19
No ratings yet
Ayush Singh, Physics, Roll No. 19
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Machine Learning in 10 Pages PDF

Uploaded by

Machine Learning in 10 Pages PDF

Uploaded by

Cheat Sheet – Bias-Variance Tradeoff

High Variance Overly-complex Model

Under-fitting Just Right Over-fitting

Green: Label 0 Correct Predictions

Performance metrics associated with Class 1

Precision = False +ve rate =

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Dimensionality Reduction with PCA

F2 F1 Feature # 2 (F2) Feature # 2 F2 F1

Figure 3 Figure 1: Datapoints with feature vectors as

along any one axis (new feature # 2)

2 Figure 3: Remove the feature vector with

e# minimum standard deviation of datapoints

Feature # 2 new feature # 2

Maximum Aposteriori Probability (MAP) Estimation

Maximum Likelihood Estimation (MLE)

Naïve Bayes’ Classifier (Instantiation of MAP as classifier)

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

What is Regularization in ML?

2. Modify data sampling:

3. Change training approach:

Train Test Train

Train Test Connections = 16 Active = 11 (70%) Active = 11 (70%)

Figure 2. K-fold CV Figure 3. Drop-out

Input Node Output Node Input Map Kernel Output Map

2. Pooling: Non-trainable layer to change the size of the feature map

Hinge Loss Cross Entropy Loss

2.0 6.0 0.6

0.0 0.0 0.0

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Weight layer Conv 1x1 1x1 3x3

Figure 1 ResNet Block Figure 2 Inception Block

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Types of Ensemble Learning: N number of weak learners

Train Weak Ensemble Method – Stacking

Parameter Bagging Boosting Stacking

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Autoencoder – Block Diagram

Variational Auto Encoder (VAE)

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Under-fitting Just Right Over-fitting