Statistical Learning Theory
Statistical Learning Theory
2 Formal description
Introduction
S = {(x1 , y1 ), . . . , (xn , yn )} = {z1 , . . . , zn }
The goals of learning are prediction and understanding. Learning falls into many categories, including
supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective
of statistical learning theory, supervised learning is best
understood.[4] Supervised learning involves learning from
a training set of data. Every point in the training is an
input-output pair, where the input maps to an output.
The learning problem consists of inferring the function
that maps between the input and the output, such that the
learned function can be used to predict output from future
input.
I=
I[f ] =
V (f (x), y) p(x, y) dx dy
XY
1
V
R
1
V (f (xi ), yi )
n i=1
n
IS [f ] =
5 SEE ALSO
Loss functions
3.1
Regression
V (f (x), y) = (y f (x))2
The absolute value loss (also known as the L1-norm) is
also sometimes used:
V (f (x), y) = |y f (x)|
3.2
Classication
Regularization
This image represents an example of overtting in machine learning. The red dots represent training set data. The green line represents the true functional relationship, while the blue line shows
the learned function, which has fallen victim to overtting.
where is a xed and positive parameter, the regularIn machine learning problems, a major problem that ization parameter. Tikhonov regularization ensures exis[8]
arises is that of overtting. Because learning is a predic- tence, uniqueness, and stability of the solution.
tion problem, the goal is not to nd a function that most
closely ts the (previously observed) data, but to nd one
that will most accurately predict output from future input. 5 See also
Empirical risk minimization runs this risk of overtting:
nding a function that matches the data exactly but does
Reproducing kernel Hilbert spaces are a useful
not predict future output well.
choice for H .
Overtting is symptomatic of unstable solutions; a small
Proximal gradient methods for learning
perturbation in the training set data would cause a large
References
7.1
Text
7.2
Images
File:Overfitting_on_Training_Set_Data.pdf Source:
https://upload.wikimedia.org/wikipedia/commons/f/f4/Overfitting_on_
Training_Set_Data.pdf License: Attribution Contributors: http://www.mit.edu/~{}9.520/spring12/slides/class02/class02.pdf Original
artist: Tomaso Poggio
File:Portal-puzzle.svg Source: https://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ?
Original artist: ?
7.3
Content license