0% found this document useful (0 votes)

354 views

Dimensionality Reduction

There are two primary methods for reducing dimensionality in machine learning: feature selection and feature extraction. Feature selection involves filtering irrelevant or redundant features from a dataset to keep a subset of the original features. Feature extraction creates new features from the original ones to capture most of the useful information in a smaller set of features. Common feature selection methods include variance thresholds, correlation thresholds, and genetic algorithms. Principal component analysis (PCA) and linear discriminant analysis (LDA) are examples of feature extraction methods. Dimensionality reduction techniques are important for addressing the curse of dimensionality and allowing machine learning algorithms to work effectively.

Uploaded by

PAWAN TIWARI

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

354 views

Dimensionality Reduction

Uploaded by

PAWAN TIWARI

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Dimensionality Reduction

In machine learning, “dimensionality” simply refers to the number of features

(i.e. input variables) in the dataset.
When the number of features is very large relative to the number of observations
in your dataset, certain algorithms struggle to train effective models. This is called the
“Curse of Dimensionality”, and it’s especially relevant for clustering algorithms that
rely on distance calculations.
We have 2 primary methods for reducing dimensionality:

1. Feature Selection
2. Feature Extraction.

Feature Selection: - Feature selection is for filtering irrelevant or redundant features

from your dataset. The key difference between feature selection and extraction is that
feature selection keeps a subset of the original features while feature extraction creates
brand new ones.

To be clear, some supervised algorithms already have built-in feature selection, such
as Regularized Regression and Random Forests. As a stand-alone task, feature selection
can be unsupervised (e.g. Variance Thresholds) or supervised (e.g. Genetic Algorithms).
You can also combine multiple methods if needed.

Variance Thresholds

Variance thresholds remove features whose values don't change much from
observation to observation (i.e. their variance falls below a threshold). These
features provide little value. Because variance is dependent on scale, you should
always normalize your features first.

Correlation Thresholds

Correlation thresholds remove features that are highly correlated with others
(i.e. its values change very similarly to another's). These features provide
redundant information.

Genetic Algorithm

Genetic algorithms (GA) are a broad class of algorithms that can be adapted to
different purposes. They are search algorithms that are inspired by evolutionary
biology and natural selection, combining mutation and cross-over to efficiently
traverse large solution spaces. Here's a great intro to the intuition behind GA's.
In machine learning, GA's have two main uses. The first is for optimization, such
as finding the best weights for a neural network.

The second is for supervised feature selection. In this use case, "genes" represent
individual features and the "organism" represents a candidate set of features.
Each organism in the "population" is graded on a fitness score such as model
performance on a hold-out set. The fittest organisms survive and reproduce,
repeating until the population converges on a solution some generations later.

Honorable Mention: Stepwise Search

Stepwise search is a supervised feature selection method based on sequential

search, and it has two flavours: forward and backward.

For forward stepwise search, you start without any features. Then, you'd train a
1-feature model using each of your candidate features and keep the version with
the best performance. You'd continue adding features, one at a time, until
your performance improvements stall.

Backward stepwise search is the same process, just reversed: start with all
features in your model and then remove one at a time until performance starts
to drop substantially.

Feature Extraction: - Feature extraction is for creating a new, smaller set of

features that stills captures most of the useful information. Again, feature selection
keeps a subset of the original features while feature extraction creates new ones.

As with feature selection, some algorithms already have built-in feature extraction.
The best example is Deep Learning, which extracts increasingly useful representations
of the raw input data through each hidden neural layer.
As a stand-alone task, feature extraction can be unsupervised (i.e. PCA) or supervised
(i.e. LDA).

Principal Component Analysis (PCA)

Principal component analysis (PCA) is an unsupervised algorithm that creates

linear combinations of the original features. The new features are orthogonal,
which means that they are uncorrelated. Furthermore, they are ranked in order
of their "explained variance." The first principal component (PC1) explains the
most variance in your dataset, PC2 explains the second-most variance, and so on.

Therefore, you can reduce dimensionality by limiting the number of principal

components to keep based on cumulative explained variance.

You should always normalize your dataset before performing PCA because the
transformation is dependent on scale. If you don't, the features that are on the
largest scale would dominate your new principal components.

Linear Discriminant Analysis (LDA)

Linear discriminant analysis (LDA) - not to be confused with latent Dirichlet

allocation - also creates linear combinations of your original features. However,
unlike PCA, LDA doesn't maximize explained variance. Instead, it maximizes
the separability between classes.

Therefore, LDA is a supervised method that can only be used with labeled data.
So which is better: LDA and PCA? Well, results will vary from problem to
problem, and the same "No Free Lunch" theorem.

The LDA transformation is also dependent on scale, so you should normalize

your dataset first.

Autoencoders

Autoencoders are neural networks that are trained to reconstruct their original
inputs. For example, image autoencoders are trained to reproduce the original
images instead of classifying the image as a dog or a cat.

So how is this helpful? Well, the key is to structure the hidden layer to have fewer
neurons than the input/output layers. Thus, that hidden layer will learn to
produce a smaller representation of the original image.
Because you use the input image as the target output, autoencoders are
considered unsupervised. They can be used directly (e.g. image compression) or
stacked in sequence (e.g. deep learning).

No Free Lunch Theorem: -

In machine learning, there’s something called the “No Free Lunch” theorem. In a
nutshell, it states that no one algorithm works best for every problem, and it’s especially
relevant for supervised learning (i.e. predictive modelling).

For example, you can’t say that neural networks are always better than decision trees or
vice-versa. There are many factors at play, such as the size and structure of your dataset.

As a result, you should try many different algorithms for your problem, while using
a hold-out “test set” of data to evaluate performance and select the winner.

Of course, the algorithms you try must be appropriate for your problem, which is where
picking the right machine learning task comes in. As an analogy, if you need to clean
your house, you might use a vacuum, a broom, or a mop, but you wouldn't bust out
a shovel and start digging.

Deep Learning Notes
100% (1)
Deep Learning Notes
71 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Practical File: Internet Programming Lab
No ratings yet
Practical File: Internet Programming Lab
26 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Unit 4
100% (1)
Unit 4
57 pages
Unit I Notes Machine Learning Techniques 1
No ratings yet
Unit I Notes Machine Learning Techniques 1
21 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
JNTUK R20 ML UNIT-I Final
No ratings yet
JNTUK R20 ML UNIT-I Final
22 pages
Ccs355 Neural Networks and Deep Learning Unit1 (1)
No ratings yet
Ccs355 Neural Networks and Deep Learning Unit1 (1)
29 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Deep Learning Question Bank(2024-25)
No ratings yet
Deep Learning Question Bank(2024-25)
2 pages
ML First Unit
No ratings yet
ML First Unit
70 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
43 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
136 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
ML-UNIT4
No ratings yet
ML-UNIT4
41 pages
UNIT-4
No ratings yet
UNIT-4
79 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
ML Lab Manual (1-10) FINAL
No ratings yet
ML Lab Manual (1-10) FINAL
34 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
NN UNIT-1 Complete Notes with 153 pages (1)
No ratings yet
NN UNIT-1 Complete Notes with 153 pages (1)
153 pages
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
No ratings yet
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
5 pages
cs3451 Ios Unit III Notes
No ratings yet
cs3451 Ios Unit III Notes
31 pages
CP5191 Machine Learning Techniques L T P C3 0 0 3
No ratings yet
CP5191 Machine Learning Techniques L T P C3 0 0 3
7 pages
ad3461-ml-lab-manual
No ratings yet
ad3461-ml-lab-manual
48 pages
Machine Learning-Unit-V-Notes
No ratings yet
Machine Learning-Unit-V-Notes
23 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Unit -3-NNDL- Notes
No ratings yet
Unit -3-NNDL- Notes
17 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
VTU Algorithms Notes (DAA Notes) by Nithin, VVCE
100% (1)
VTU Algorithms Notes (DAA Notes) by Nithin, VVCE
113 pages
ML Question Bank - Beena Kapadia
No ratings yet
ML Question Bank - Beena Kapadia
3 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
ml unit 2
No ratings yet
ml unit 2
23 pages
ADL Unit-3
No ratings yet
ADL Unit-3
21 pages
ML unit-2
100% (1)
ML unit-2
28 pages
Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
Feature Extraction
No ratings yet
Feature Extraction
14 pages
ML Mid Sem Question Bank
No ratings yet
ML Mid Sem Question Bank
11 pages
715ECT04 Embedded Systems 2M & 16M
0% (1)
715ECT04 Embedded Systems 2M & 16M
32 pages
Face Recognition Using Neural Network: Seminar Report
100% (3)
Face Recognition Using Neural Network: Seminar Report
33 pages
Email Classification: Roll No-41463 (LP-3)
No ratings yet
Email Classification: Roll No-41463 (LP-3)
5 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
ccs355 Syllabus NNDL
100% (1)
ccs355 Syllabus NNDL
3 pages
ML - LAB Record
No ratings yet
ML - LAB Record
36 pages
Feature Extraction: 4.1. Principal Component Analysis (PCA)
No ratings yet
Feature Extraction: 4.1. Principal Component Analysis (PCA)
10 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
Naive Bayes Theory
No ratings yet
Naive Bayes Theory
4 pages
Prasoon Raj - 1709131099 - Report
100% (1)
Prasoon Raj - 1709131099 - Report
41 pages
Naive Bayes Implementation
No ratings yet
Naive Bayes Implementation
2 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
2 pages
Matplotlib Library Implementation
No ratings yet
Matplotlib Library Implementation
3 pages
Dimentionality Reduction Implementation
No ratings yet
Dimentionality Reduction Implementation
8 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
4 pages
Lectronic OSE: Presentation by
No ratings yet
Lectronic OSE: Presentation by
18 pages
Electronic Nose
No ratings yet
Electronic Nose
25 pages
Seminar Report
57% (7)
Seminar Report
31 pages
The Cloud, and Any Other Site Off Of: Your Personal Computer)
No ratings yet
The Cloud, and Any Other Site Off Of: Your Personal Computer)
22 pages
Kuysen (July 11, 2019)
No ratings yet
Kuysen (July 11, 2019)
2 pages
The Quick Command For AutoCAD User
No ratings yet
The Quick Command For AutoCAD User
14 pages
Important Information: Section 1D - Troubleshooting
No ratings yet
Important Information: Section 1D - Troubleshooting
54 pages
Aerodynamic Heating in Supersonic and Hypersonic Flows: Advanced Techniques for Drag and Aero-heating Reduction Mostafa Barzegar Gerdroodbary download
100% (2)
Aerodynamic Heating in Supersonic and Hypersonic Flows: Advanced Techniques for Drag and Aero-heating Reduction Mostafa Barzegar Gerdroodbary download
54 pages
Fast and Furious Review
No ratings yet
Fast and Furious Review
2 pages
Where Im From Poem Template Prewrite-1 1
No ratings yet
Where Im From Poem Template Prewrite-1 1
7 pages
MUSIC GRADE 6 NOTES
No ratings yet
MUSIC GRADE 6 NOTES
5 pages
Cardiology NBE PDF
No ratings yet
Cardiology NBE PDF
54 pages
Tutorial 1 questions
No ratings yet
Tutorial 1 questions
3 pages
Download Emergency Radiology 1st Edition Mayil S. Krishnam ebook All Chapters PDF
No ratings yet
Download Emergency Radiology 1st Edition Mayil S. Krishnam ebook All Chapters PDF
43 pages
Applied Electronics (Mid-Term Presentation) K-18EL101
No ratings yet
Applied Electronics (Mid-Term Presentation) K-18EL101
9 pages
Cold Water Supply
No ratings yet
Cold Water Supply
18 pages
R-2.2- 2016 (Compression)
No ratings yet
R-2.2- 2016 (Compression)
2 pages
Exhibitor List - Updated Show Directory (Damini)
No ratings yet
Exhibitor List - Updated Show Directory (Damini)
6 pages
Comparing Numbers Through Millions 4 Grade
No ratings yet
Comparing Numbers Through Millions 4 Grade
16 pages
Modelling and Analysis of Thermo-Acoustic Instability of Primixed Flame
No ratings yet
Modelling and Analysis of Thermo-Acoustic Instability of Primixed Flame
15 pages
TMI Data 300311
No ratings yet
TMI Data 300311
19 pages
MWP Act Policies
No ratings yet
MWP Act Policies
10 pages
Year 5 Daily Lesson Plans: By:Missash
No ratings yet
Year 5 Daily Lesson Plans: By:Missash
6 pages
UEU-Undergraduate-3268-L-1 Struktur Organisasi PT Pardic Jaya Chemical
No ratings yet
UEU-Undergraduate-3268-L-1 Struktur Organisasi PT Pardic Jaya Chemical
1 page
NHLS 603 F001 Spin Protocol Viral RNA Extraction Using QIAamp RNA Mini Kit Work Sheet
No ratings yet
NHLS 603 F001 Spin Protocol Viral RNA Extraction Using QIAamp RNA Mini Kit Work Sheet
10 pages
TG Monomer - San Esters Brochure
No ratings yet
TG Monomer - San Esters Brochure
4 pages
Risk Assessment - Trenching and Excavation - 2023
No ratings yet
Risk Assessment - Trenching and Excavation - 2023
9 pages
4.1 06 - Areas Related To A Circle Solved Questions PDF
No ratings yet
4.1 06 - Areas Related To A Circle Solved Questions PDF
11 pages
Innovative Strategies in Teaching
No ratings yet
Innovative Strategies in Teaching
7 pages
lea-2-lea-2-final-exam
No ratings yet
lea-2-lea-2-final-exam
13 pages
The Cetane Index Is A Figure Which Denotes The Quality of A Diesel Fuel
No ratings yet
The Cetane Index Is A Figure Which Denotes The Quality of A Diesel Fuel
4 pages
Dynamic Modelling of A Two-Wheeled Vehicle: Jourdain Formalism
No ratings yet
Dynamic Modelling of A Two-Wheeled Vehicle: Jourdain Formalism
34 pages
Musical Instrument Inventory
No ratings yet
Musical Instrument Inventory
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Dimensionality Reduction

Uploaded by

Dimensionality Reduction

Uploaded by

Dimensionality Reduction

In machine learning, “dimensionality” simply refers to the number of features

Feature Selection: - Feature selection is for filtering irrelevant or redundant features

Honorable Mention: Stepwise Search

Stepwise search is a supervised feature selection method based on sequential

Feature Extraction: - Feature extraction is for creating a new, smaller set of

Principal Component Analysis (PCA)

Principal component analysis (PCA) is an unsupervised algorithm that creates

Therefore, you can reduce dimensionality by limiting the number of principal

Linear Discriminant Analysis (LDA)

Linear discriminant analysis (LDA) - not to be confused with latent Dirichlet

The LDA transformation is also dependent on scale, so you should normalize

No Free Lunch Theorem: -

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.