0% found this document useful (0 votes)

242 views

Statistical Learning Theory

Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics and baseball.

Uploaded by

slowdog

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

242 views

Statistical Learning Theory

Uploaded by

slowdog

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Statistical learning theory

Statistical learning theory is a framework for machine

learning drawing from the elds of statistics and
functional analysis.[1][2] Statistical learning theory deals
with the problem of nding a predictive function based
on data. Statistical learning theory has led to successful applications in elds such as computer vision, speech
recognition, bioinformatics and baseball.[3]

Take X to be the vector space of all possible inputs, and

Y to be the vector space of all possible outputs. Statistical
learning theory takes the perspective that there is some
unknown probability distribution over the product space
Z = X Y , i.e. there exists some unknown p(z) =
p(x, y) . The training set is made up of n samples from
this probability distribution, and is notated

Introduction
S = {(x1 , y1 ), . . . , (xn , yn )} = {z1 , . . . , zn }

The goals of learning are prediction and understanding. Learning falls into many categories, including
supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective
of statistical learning theory, supervised learning is best
understood.[4] Supervised learning involves learning from
a training set of data. Every point in the training is an
input-output pair, where the input maps to an output.
The learning problem consists of inferring the function
that maps between the input and the output, such that the
learned function can be used to predict output from future
input.

Every xi is an input vector from the training data, and yi

is the output that corresponds to it.
In this formalism, the inference problem consists of nding a function f : X 7 Y such that f (x) y . Let H
be a space of functions f : X Y called the hypothesis
space. The hypothesis space is the space of functions the
algorithm will search through. Let V (f (x), y) be the loss
functional, a metric for the dierence between the predicted value f (x) and the actual value y . The expected
risk is dened to be

Depending on the type of output, supervised learning

problems are either problems of regression or problems
of classication. If the output takes a continuous range of
values, it is a regression problem. Using Ohms Law as
an example, a regression could be performed with voltage
as input and current as output. The regression would nd
the functional relationship between voltage and current to
be R1 , such that

I[f ] =

V (f (x), y) p(x, y) dx dy
XY

The target function, the best possible function f that can

be chosen, is given by the f that satises
I[f ] = inf I[h]
hH

Because the probability distribution p(x, y) is unknown,

a proxy measure for the expected risk must be used.
This measure is based on the training set, a sample from
this unknown probability distribution. It is called the
empirical risk

1
V
R

Classication problems are those for which the output

will be an element from a discrete set of labels. Classication is very common for machine learning applications.
In facial recognition, for instance, a picture of a persons
face would be the input, and the output label would be
that persons name. The input would be represented by a
large multidimensional vector whose elements represent
pixels in the picture.

1
V (f (xi ), yi )
n i=1
n

IS [f ] =

A learning algorithm that chooses the function fS that

minimizes the empirical risk is called empirical risk minimization.
1

5 SEE ALSO

Loss functions

The choice of loss function is a determining factor on the

function fS that will be chosen by the learning algorithm.
The loss function also aects the convergence rate for
an algorithm. It is important for the loss function to be
convex.[5]
Dierent loss functions are used depending on whether
the problem is one of regression or one of classication.

3.1

Regression

The most common loss function for regression is the

square loss function (also known as the L2-norm). This
familiar loss function is used in ordinary least squares regression. The form is:

V (f (x), y) = (y f (x))2
The absolute value loss (also known as the L1-norm) is
also sometimes used:

V (f (x), y) = |y f (x)|

3.2

Classication

Main article: Statistical classication

In some sense the 0-1 indicator function is the most natural loss function for classication. It takes the value 0
if the predicted output is the same as the actual output,
and it takes the value 1 if the predicted output is dierent from the actual output. For binary classication with
Y = {1, 1} , this is:

V (f (x), y) = (yf (x))

where is the Heaviside step function.

Regularization

This image represents an example of overtting in machine learning. The red dots represent training set data. The green line represents the true functional relationship, while the blue line shows
the learned function, which has fallen victim to overtting.

variation in the learned function. It can be shown that

if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well.[6][7]
Regularization can solve the overtting problem and give
the problem stability.
Regularization can be accomplished by restricting the hypothesis space H . A common example would be restricting H to linear functions: this can be seen as a reduction
to the standard problem of linear regression. H could also
be restricted to polynomial of degree p , exponentials, or
bounded functions on L1. Restriction of the hypothesis
space avoids overtting because the form of the potential
functions are limited, and so does not allow for the choice
of a function that gives empirical risk arbitrarily close to
zero.
One example of regularization is Tikhonov regularization. This consists of minimizing
1
V (f (xi , yi )) + f 2H
n i=1
n

where is a xed and positive parameter, the regularIn machine learning problems, a major problem that ization parameter. Tikhonov regularization ensures exis[8]
arises is that of overtting. Because learning is a predic- tence, uniqueness, and stability of the solution.
tion problem, the goal is not to nd a function that most
closely ts the (previously observed) data, but to nd one
that will most accurately predict output from future input. 5 See also
Empirical risk minimization runs this risk of overtting:
nding a function that matches the data exactly but does
Reproducing kernel Hilbert spaces are a useful
not predict future output well.
choice for H .
Overtting is symptomatic of unstable solutions; a small
Proximal gradient methods for learning
perturbation in the training set data would cause a large

References

[1] Trevor Hastie, Robert Tibshirani, Jerome Friedman

(2009) The Elements of Statistical Learning, SpringerVerlag ISBN 978-0-387-84857-0.
[2] Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar
(2012) Foundations of Machine Learning, The MIT Press
ISBN 9780262018258.
[3] Gagan Sidhu, Brian Cao. Exploiting pitcher decisionmaking using Reinforcement Learning. Annals of Applied
Statistics
[4] Tomaso Poggio, Lorenzo Rosasco, et al. Statistical Learning Theory and Applications, 2012, Class 1
[5] Rosasco, L., Vito, E.D., Caponnetto, A., Fiana, M., and
Verri A. 2004. Neural computation Vol 16, pp 1063-1076
[6] Vapnik, V.N. and Chervonenkis, A.Y. 1971. On the uniform convergence of relative frequencies of events to their
probabilities. Theory of Probability and its Applications
Vol 16, pp 264-280.
[7] Mukherjee, S., Niyogi, P. Poggio, T., and Rifkin, R. 2006.
Learning theory: stability is sucient for generalization
and necessary and sucient for consistency of empirical
risk minimization. Advances in Computational Mathematics. Vol 25, pp 161-193.
[8] Tomaso Poggio, Lorenzo Rosasco, et al. Statistical Learning Theory and Applications, 2012, Class 2

7 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

Text and image sources, contributors, and licenses

7.1

Text

Statistical learning theory Source: https://en.wikipedia.org/wiki/Statistical_learning_theory?oldid=757278160 Contributors: Michael

Hardy, Hike395, Jitse Niesen, Bearcat, Tomchiukc, Klemen Kocjancic, Rajah, Qwertyus, Chris the speller, Geach, ClydeC, Olaf, Nyq,
Katharineamy, VolkovBot, Ostrouchov, Melcombe, XLinkBot, Fgnievinski, MichalSylwester, Brightgalrs, RealityApologist, Olexa Riznyk,
OnceAlpha, Callanecc, Chire, Zephyrus Tavvier, Scientic29, ClueBot NG, BG19bot, Marcocapelle, Aisteco, Atastorino, MazinIssa, Francisbach, I3roly, Mgfbinae, Jala Daibajna, Slee325, Aetilley, Blu3r4y at and Anonymous: 24

7.2

Images

File:Overfitting_on_Training_Set_Data.pdf Source:
https://upload.wikimedia.org/wikipedia/commons/f/f4/Overfitting_on_
Training_Set_Data.pdf License: Attribution Contributors: http://www.mit.edu/~{}9.520/spring12/slides/class02/class02.pdf Original
artist: Tomaso Poggio
File:Portal-puzzle.svg Source: https://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ?
Original artist: ?

7.3

Content license

Creative Commons Attribution-Share Alike 3.0

The Hundred Page Machine Learning Book
No ratings yet
The Hundred Page Machine Learning Book
7 pages
Probability With Applications in Engineering, Science, and Technology, 2nd (Instructor's Solution Manual) - Matthew A. Carlton
100% (1)
Probability With Applications in Engineering, Science, and Technology, 2nd (Instructor's Solution Manual) - Matthew A. Carlton
400 pages
Statistics
No ratings yet
Statistics
212 pages
Practical20Guide20To20Quantitative20Finance20Interview 604244935
No ratings yet
Practical20Guide20To20Quantitative20Finance20Interview 604244935
212 pages
Depth Prediction Single Image
No ratings yet
Depth Prediction Single Image
8 pages
Solution 2
0% (1)
Solution 2
4 pages
Comics. Visual Representation For Ideas.
No ratings yet
Comics. Visual Representation For Ideas.
15 pages
Credit Scoring SAS
No ratings yet
Credit Scoring SAS
42 pages
Minor_in_AI_Vizuara_Engineering_Curriculum_COEP (1)
No ratings yet
Minor_in_AI_Vizuara_Engineering_Curriculum_COEP (1)
9 pages
ISB AMPBA Brochure Co2021Winter
No ratings yet
ISB AMPBA Brochure Co2021Winter
20 pages
1syllabus Machine Learning and Data Mining 2015
No ratings yet
1syllabus Machine Learning and Data Mining 2015
9 pages
Introductory Concepts of Probabability & Statistics
No ratings yet
Introductory Concepts of Probabability & Statistics
6 pages
Probabilistic Reasoning in Artificial Intelligence
No ratings yet
Probabilistic Reasoning in Artificial Intelligence
7 pages
Building A Career in Data Science - The Overview
No ratings yet
Building A Career in Data Science - The Overview
2 pages
Neural Networks and Statistical Models
No ratings yet
Neural Networks and Statistical Models
13 pages
1 Markov Chains: Indian Institute of Technology Bombay
No ratings yet
1 Markov Chains: Indian Institute of Technology Bombay
15 pages
Deep Learning Fundamentals Materials
100% (1)
Deep Learning Fundamentals Materials
216 pages
Solving A Traveling Salesman Problem Using Meta-Heuristics
No ratings yet
Solving A Traveling Salesman Problem Using Meta-Heuristics
9 pages
(eBook PDF) Miller & Freund's Probability and Statistics for Engineers 9th Edition download pdf
100% (5)
(eBook PDF) Miller & Freund's Probability and Statistics for Engineers 9th Edition download pdf
46 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
Pattern Recognition and Machine Learning Errata and Additional Comments
0% (1)
Pattern Recognition and Machine Learning Errata and Additional Comments
7 pages
Basics of Multivariate Normal
No ratings yet
Basics of Multivariate Normal
46 pages
Infosys Pragathi Report
No ratings yet
Infosys Pragathi Report
68 pages
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
No ratings yet
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
4 pages
Python-Linear Regression
No ratings yet
Python-Linear Regression
72 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
MATH1208AnnotatedBook Imp
No ratings yet
MATH1208AnnotatedBook Imp
145 pages
Practical Guide To SciPy For Data Science 1690206596
No ratings yet
Practical Guide To SciPy For Data Science 1690206596
39 pages
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
No ratings yet
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
55 pages
Max and Min PDF
No ratings yet
Max and Min PDF
19 pages
Ensemble Machine Learning With Python: 7-Day Mini-Course Jason Brownlee - The full ebook version is ready for instant download
100% (1)
Ensemble Machine Learning With Python: 7-Day Mini-Course Jason Brownlee - The full ebook version is ready for instant download
46 pages
(SpringerBriefs in Mathematics) Qi He, Le Yi Wang, George G. Yin - System Identification Using Regular and Quantized Observations - Applications of Large Deviations Principles-Springer (2013)
No ratings yet
(SpringerBriefs in Mathematics) Qi He, Le Yi Wang, George G. Yin - System Identification Using Regular and Quantized Observations - Applications of Large Deviations Principles-Springer (2013)
108 pages
Full Statistics
No ratings yet
Full Statistics
108 pages
Midsem Regular MFDS 22-12-2019 Answer Key PDF
No ratings yet
Midsem Regular MFDS 22-12-2019 Answer Key PDF
5 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Linear Programming and Game Theory
No ratings yet
Linear Programming and Game Theory
323 pages
ML Unit 1 Notes
100% (1)
ML Unit 1 Notes
19 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
CpyProbStatSection PDF
No ratings yet
CpyProbStatSection PDF
240 pages
Complete Supervised Machine Learning For Text Analysis in R 1st Edition Emil Hvitfeldt Julia Silge PDF For All Chapters
100% (1)
Complete Supervised Machine Learning For Text Analysis in R 1st Edition Emil Hvitfeldt Julia Silge PDF For All Chapters
59 pages
Statistical Inference
No ratings yet
Statistical Inference
158 pages
Random Variables
No ratings yet
Random Variables
12 pages
MFML PDF
No ratings yet
MFML PDF
101 pages
Schaum's Outlines
100% (1)
Schaum's Outlines
4 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Lecture 3 EdgeDetection
No ratings yet
Lecture 3 EdgeDetection
52 pages
Exploring Quantitative Skills-1
No ratings yet
Exploring Quantitative Skills-1
207 pages
Lecture Notes
100% (1)
Lecture Notes
324 pages
Linear Algebra Challenging Problems For Students, 2 Edition (Fuzhen Zhang)
100% (1)
Linear Algebra Challenging Problems For Students, 2 Edition (Fuzhen Zhang)
266 pages
Numpy
No ratings yet
Numpy
15 pages
A Review of Bayesian Machine Learning Principles, Methods, and Applications
No ratings yet
A Review of Bayesian Machine Learning Principles, Methods, and Applications
6 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
19 pages
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
No ratings yet
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
1 page
Time Series Forecasting ANN
No ratings yet
Time Series Forecasting ANN
8 pages
Computational Bayesian Statistics
100% (1)
Computational Bayesian Statistics
254 pages
Week 4-Normal Distribution and Empirical Rule
100% (1)
Week 4-Normal Distribution and Empirical Rule
27 pages
Mit Data Science Program
100% (1)
Mit Data Science Program
15 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Solutions Manual to accompany An Introduction to Numerical Methods and Analysis
From Everand
Solutions Manual to accompany An Introduction to Numerical Methods and Analysis
James F. Epperson
5/5 (1)
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
From Everand
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
Wolfram Hergert
No ratings yet
Andrea Camilleri
No ratings yet
Andrea Camilleri
5 pages
Computational Complexity Theory
No ratings yet
Computational Complexity Theory
12 pages
Jan The Typographer
No ratings yet
Jan The Typographer
5 pages
Hedge Fund
No ratings yet
Hedge Fund
31 pages
Software Testing
No ratings yet
Software Testing
22 pages
Distributed Control System
No ratings yet
Distributed Control System
6 pages
Homeopathy: 1 History
No ratings yet
Homeopathy: 1 History
32 pages
Paul Ekman: 1 Biography
No ratings yet
Paul Ekman: 1 Biography
7 pages
Inode: 2 Details
No ratings yet
Inode: 2 Details
4 pages
Adaptation: 1 General Principles
No ratings yet
Adaptation: 1 General Principles
14 pages
Lambda Calculus
No ratings yet
Lambda Calculus
17 pages
Kyra
No ratings yet
Kyra
2 pages
Programming Language
No ratings yet
Programming Language
15 pages
Proofs of Convergence of Random Variables
No ratings yet
Proofs of Convergence of Random Variables
5 pages
Factors That Can Affect Model Performance: Seonwoo Lee
No ratings yet
Factors That Can Affect Model Performance: Seonwoo Lee
34 pages
12 Classification
No ratings yet
12 Classification
16 pages
Neural Networks For Prediction
No ratings yet
Neural Networks For Prediction
4 pages
Qustionbank1 12
No ratings yet
Qustionbank1 12
40 pages
Anomaly Detection - Problem Motivation
No ratings yet
Anomaly Detection - Problem Motivation
9 pages
Study of Algorithm(s) For EEG Based Brain Computer Interface
No ratings yet
Study of Algorithm(s) For EEG Based Brain Computer Interface
7 pages
Introduction To Big Data & Basic Data Analysis
No ratings yet
Introduction To Big Data & Basic Data Analysis
51 pages
EC9560 Data Mining: Lab 02: Classification and Prediction Using WEKA
No ratings yet
EC9560 Data Mining: Lab 02: Classification and Prediction Using WEKA
5 pages
Predictive Modelling of Crime Dataset Using Data Mining
No ratings yet
Predictive Modelling of Crime Dataset Using Data Mining
16 pages
Machine Learning Yearning v0-5
No ratings yet
Machine Learning Yearning v0-5
27 pages
Sentiment Analysis of Talaash Movie Reviews Using Text Mining Approach
No ratings yet
Sentiment Analysis of Talaash Movie Reviews Using Text Mining Approach
9 pages
Weka Tutorial
No ratings yet
Weka Tutorial
53 pages
Machine Learning Toolkit User Manual
No ratings yet
Machine Learning Toolkit User Manual
7 pages
A Comparative Study On Different Types of Approaches To The Arabic Text Classification
No ratings yet
A Comparative Study On Different Types of Approaches To The Arabic Text Classification
12 pages
A Behavior Based Intrusion Detection System Using Machine Learning Algorithms
No ratings yet
A Behavior Based Intrusion Detection System Using Machine Learning Algorithms
16 pages
Fast Asymmetric Learning For Cascade Face Detection Training/Testing Utility
No ratings yet
Fast Asymmetric Learning For Cascade Face Detection Training/Testing Utility
11 pages
A Survey On Transfer Learning: Sinno Jialin Pan and Qiang Yang, Fellow, IEEE
No ratings yet
A Survey On Transfer Learning: Sinno Jialin Pan and Qiang Yang, Fellow, IEEE
15 pages
CH 12
No ratings yet
CH 12
37 pages
Network Traffic Intrusion Detection System Using Decision Tree & K-Means Clustering Algorithm
No ratings yet
Network Traffic Intrusion Detection System Using Decision Tree & K-Means Clustering Algorithm
3 pages
Intro To NN & FL
No ratings yet
Intro To NN & FL
42 pages
Cats and Dogs
No ratings yet
Cats and Dogs
8 pages
2015 - Using Artificial Neural Networks To Predict Winners in Horseraces - A Case Study at The Champs de Mars
100% (2)
2015 - Using Artificial Neural Networks To Predict Winners in Horseraces - A Case Study at The Champs de Mars
8 pages
EVT571 NEW (Lec 9 2) Accuracy Assessment
No ratings yet
EVT571 NEW (Lec 9 2) Accuracy Assessment
30 pages
Diabetes Research PDF
No ratings yet
Diabetes Research PDF
13 pages
Gas Turbine Modeling by Using Neural Networks Trained On Field Operating Data
No ratings yet
Gas Turbine Modeling by Using Neural Networks Trained On Field Operating Data
0 pages
SPE-165374-Global Model For Failure Prediction For Rod Pump Artificial Lift Systems
No ratings yet
SPE-165374-Global Model For Failure Prediction For Rod Pump Artificial Lift Systems
10 pages
Learning From Observations
No ratings yet
Learning From Observations
51 pages
GEMMA User Manual: Xiang Zhou January 12, 2014
No ratings yet
GEMMA User Manual: Xiang Zhou January 12, 2014
27 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Statistical Learning Theory

Uploaded by

Statistical Learning Theory

Uploaded by

Statistical learning theory

See also: Computational learning theory

Statistical learning theory is a framework for machine

Take X to be the vector space of all possible inputs, and

Every xi is an input vector from the training data, and yi

Depending on the type of output, supervised learning

The target function, the best possible function f that can

Because the probability distribution p(x, y) is unknown,

Classication problems are those for which the output

A learning algorithm that chooses the function fS that

The choice of loss function is a determining factor on the

The most common loss function for regression is the

Main article: Statistical classication

V (f (x), y) = (yf (x))

variation in the learned function. It can be shown that

[1] Trevor Hastie, Robert Tibshirani, Jerome Friedman

7 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

Text and image sources, contributors, and licenses

Statistical learning theory Source: https://en.wikipedia.org/wiki/Statistical_learning_theory?oldid=757278160 Contributors: Michael

Creative Commons Attribution-Share Alike 3.0

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.