0% found this document useful (0 votes)

111 views

ML Kernel Methods

The document discusses kernel methods for machine learning. It introduces kernels as a way to efficiently compute inner products in high-dimensional feature spaces. Kernels must be positive definite symmetric to guarantee the existence of a feature mapping. Common kernels like polynomial and Gaussian kernels are presented. Kernel methods allow non-linear decision boundaries by mapping inputs into feature spaces. The reproducing kernel Hilbert space is constructed from a positive definite kernel.

Uploaded by

Atharva

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views

ML Kernel Methods

Uploaded by

Atharva

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Foundations of Machine Learning

Kernel Methods

Mehryar Mohri
Courant Institute and Google Research
mohri@cims.nyu.edu
Motivation
Efficient computation of inner products in high
dimension.
Non-linear decision boundary.
Non-vectorial inputs.
Flexible selection of more complex features.

Mehryar Mohri - Foundations of Machine Learning page 2

This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 3

Non-Linear Separation

Linear separation impossible in most problems.

Non-linear mapping from input space to high-
dimensional feature space: : X F .
Generalization ability: independent of dim(F ),
depends only on margin and sample size.
Mehryar Mohri - Foundations of Machine Learning page 4
Kernel Methods
Idea:
• Define K : X X R , called kernel, such that:
(x) · (y) = K(x, y).
• K often interpreted as a similarity measure.
Benefits:
• Efficiency: K is often more efficient to compute
than and the dot product.
• Flexibility: K can be chosen arbitrarily so long as
the existence of is guaranteed (PDS condition
or Mercer’s condition).
Mehryar Mohri - Foundations of Machine Learning page 5
PDS Condition
Definition: a kernel K: X X R is positive definite
symmetric (PDS) if for any {x1 , . . . , xm } X , the
matrix K = [K(xi , xj )]ij Rm m is symmetric
positive semi-definite (SPSD).
K SPSD if symmetric and one of the 2 equiv. cond.’s:

• its eigenvalues are non-negative. m

• for any c R , c Kc = c c K(x , x )

m 1

i,j=1
i j i j 0.

Terminology: PDS for kernels, SPSD for kernel

matrices (see (Berg et al., 1984)).
Mehryar Mohri - Foundations of Machine Learning page 6
Example - Polynomial Kernels
Definition:
x, y RN , K(x, y) = (x · y + c)d , c > 0.

Example: for N = 2 and d = 2 ,

K(x, y) = (x1 y1 + x2 y2 + c)2
x21 y12
x22 y22
2 x1 x2 2 y1 y2
= · .
2c x1 2c y1
2c x2 2c y2
c c
Mehryar Mohri - Foundations of Machine Learning page 7
XOR Problem
Use second-degree polynomial kernel with c = 1:
x2
√
2 x1 x2
(-1, 1)
√ √ √
(1, 1)
√ √ √
(1, 1, + 2, − 2, − 2, 1) (1, 1, + 2, + 2, + 2, 1)

√
x1 2 x1

(-1, -1) (1, -1) √ √ √

(1, 1, − 2, − 2, + 2, 1)
√ √ √
(1, 1, − 2, + 2, − 2, 1)

Linearly non-separable Linearly separable by

x1 x2 = 0.

Mehryar Mohri - Foundations of Machine Learning page 8

Normalized Kernels
Definition: the normalized kernel K associated to a
kernel K is defined by
0 if (K(x, x) = 0) (K(x , x ) = 0)
x, x X , K (x, x ) = K(x,x )
otherwise.
K(x,x)K(x ,x )

• If K is PDS, then K is PDS:

m m m 2
ci cj K(xi , xj ) ci cj (xi ), (xj ) ci (xi )
= = 0.
K(xi , xi )K(xj , xj ) i,j=1 (x )
i H (x )
j H (xi ) H
i,j=1 i=1 H

• By definition, for all x with K(x, x) = 0 ,

K (x, x) = 1.
Mehryar Mohri - Foundations of Machine Learning page 9
Other Standard PDS Kernels
Gaussian kernels:
||x y||2
K(x, y) = exp , = 0.
2 2

• Normalized kernel of (x, x ) exp x·x

2 .

Sigmoid Kernels:

K(x, y) = tanh(a(x · y) + b), a, b 0.

Mehryar Mohri - Foundations of Machine Learning page 10

Reproducing Kernel Hilbert Space
(Aronszajn, 1950)
Theorem: Let K: X X R be a PDS kernel. Then,
there exists a Hilbert space H and a mapping
from X to H such that
x, y X, K(x, y) = (x) · (y).

Proof: For any x X , define (x) : X RX as follows:

y X, (x)(y) = K(x, y).

• Let H = a (x ) : a R, x X, card(I) < .

0
i I
i i i i

• We are going to define an inner product ·, · on H . 0

Mehryar Mohri - Foundations of Machine Learning page 11

• Definition: for anyf = i I ai (xi ), g =
j J
bj (yj ),

f, g = ai bj K(xi , yj ) = bj f (yj ) = ai g(xi ).

i I,j J j J i I

• ·, · does not depend on representations of f and g.

• ·, · is bilinear and symmetric.

• ·, · is positive semi-definite since K is PDS: for any f,

f, f = ai aj K(xi , xj ) 0.
i,j I
• note: for any f , . . . , f m
1 m and c1 , . . . , cm ,
m m
ci cj f i , f j = ci f i , cj f j 0.
i,j=1 i=1 j=1

·, · is a PDS kernel on H0 .
Mehryar Mohri - Foundations of Machine Learning page 12
• ·, · is definite:

• first, Cauchy-Schwarz inequality for PDS kernels.

K(x,x) K(x,y)
If K is PDS,
M= is SPSD for all
x, y
K(y,x) K(y,y) X
In particular, the product of its eigenvalues, det(M)
is non-negative:
det(M) = K(x, x)K(y, y) K(x, y)2 0.

• since ·, · is a PDS kernel, for any f H0 and x X ,

f, (x) 2
f, f (x), (x) .
• observe the reproducingX
property of ·, · :
8f 2 H0 , 8x 2 X, f (x) = ai K(xi , x) = hf, (x)i.

•
i2I
Thus,[f (x)]2 f, f K(x, x) for all x X , which
shows the definiteness of ·, · .
Mehryar Mohri - Foundations of Machine Learning page 13
• Thus, ·, · defines an inner product on H , which0
thereby becomes a pre-Hilbert space.
• H can be completed to form a Hilbert space H in
0
which it is dense.
Notes:
• H is called the reproducing kernel Hilbert space
(RKHS) associated to K.
• A Hilbert space such that there exists : X H
with K(x, y) = (x)· (y) for all x, y X is also
called a feature space associated to K . is called
a feature mapping.
• Feature spaces associated to K are in general not
unique.
Mehryar Mohri - Foundations of Machine Learning page 14
This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 15

SVMs with PDS Kernels
(Boser, Guyon, and Vapnik, 1992)
Constrained optimization:
(xi )· (xj )
m m
1
max i i j yi yj K(xi , xj )
i=1
2 i,j=1
m
subject to: 0 i C i yi = 0, i [1, m].
i=1
Solution:
m
h(x) = sgn i yi K(xi , x) +b ,
m i=1
with b = yi j yj K(xj , xi ) for any xi with
j=1 0< i < C.
Mehryar Mohri - Foundations of Machine Learning page 16
Rad. Complexity of Kernel-Based Hypotheses

Theorem: Let K: X X R be a PDS kernel and

let : X ! H be a feature mapping associated to K.
Let S {x : K(x, x) R2 } be a sample of size m , and
let H = {x 7! w· (x) : kwkH  ⇤}. Then,
Tr[K] R2 2
RS (H) .
m m
m m
1
Proof: RS (H) =
m
E sup w · i (xi )
m
E i (xi )
w i=1 i=1
m 2 1/2 m 1/2
(Jensen’s ineq.) E i (xi ) E (xi ) 2
m i=1
m i=1
m 1/2
Tr[K] R2 2
= E K(xi , xi ) = .
m i=1
m m
Mehryar Mohri - Foundations of Machine Learning page 17
Generalization: Representer Theorem
(Kimeldorf and Wahba, 1971)
Theorem: Let K: X X R be a PDS kernel with H
the corresponding RKHS. Then, for any non-
decreasing function G: R R and any L: Rm R {+ }
problem
argmin F (h) = argmin G( h H) + L h(x1 ), . . . , h(xm )
h H h H
m
admits a solution of the form h = i K(xi , ·).
i=1
If G is further assumed to be increasing, then any
solution has this form.

Mehryar Mohri - Foundations of Machine Learning page 18

• Proof: let H = span({K(x , ·):h =i h [1,+m]})
1 . Any h
i H
admits the decomposition h 1 according
to H = H1 H1 .
• Since G is non-decreasing,
H) + h
= G( h H ).
G( h1 G h1 2 2
H H

• By the reproducing property, for all i [1, m],

h(xi ) = h, K(xi , ·) = h1 , K(xi , ·) = h1 (xi ).
• Thus, L h(x ), . . . , h(x 1 m) = L h1 (x1 ), . . . , h1 (xm )
and F (h1 ) F (h).

• If G is increasing, then F (h ) < F (h) when h 1 =0

and any solution of the optimization problem
must be in H1.

Mehryar Mohri - Foundations of Machine Learning page 19

Kernel-Based Algorithms
PDS kernels used to extend a variety of algorithms
in classification and other areas:
• regression.
• ranking.
• dimensionality reduction.
• clustering.
But, how do we define PDS kernels?

Mehryar Mohri - Foundations of Machine Learning page 20

This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 21

Closure Properties of PDS Kernels
Theorem: Positive definite symmetric (PDS)
kernels are closed under:
• sum,
• product,
• tensor product,
• pointwise limit,
• composition with a power series with non-
negative coefficients.

Mehryar Mohri - Foundations of Machine Learning page 22

Closure Properties - Proof
Proof: closure under sum:
c Kc 0 c Kc 0 c (K + K )c 0.

• closure under product: K✓ = MM ,

◆
m
X m
X hX
m i
ci cj (Kij K0ij ) = ci cj Mik Mjk K0ij
i,j=1 i,j=1 k=1
Xm  m
X
= ci cj Mik Mjk K0ij
k=1 i,j=1
2 3> 2 3
m
X c1 M1k c1 M1k
= 4 · · · 5 K0 4 · · · 5 0.
k=1 cm Mmk cm Mmk

Mehryar Mohri - Foundations of Machine Learning page 23

• Closure under tensor product:
• definition: for all x , x , y , y 1 2 1 2 X,
(K1 K2 )(x1 , y1 , x2 , y2 ) = K1 (x1 , x2 )K2 (y1 , y2 ).

• thus, PDS kernel as product of the kernels

(x1 , y1 , x2 , y2 ) K1 (x1 , x2 ) (x1 , y1 , x2 , y2 ) K2 (y1 , y2 ).

• Closure under pointwise limit: if for all x, y X,

lim Kn (x, y) = K(x, y),
n

Then, ( n, c Kn c 0) lim c Kn c = c Kc 0.
n

Mehryar Mohri - Foundations of Machine Learning page 24

• Closure under composition with power series:
• assumptions: Kf (x)
PDS kernel with|K(x, y)| < for
all x, y X and
= a x ,a
n=0 n 0
n
n power
series with radius of convergence .
• f K is a PDS kernel since K n is PDS by closure
N
under product, n=0 an K n is PDS by closure
under sum, and closure under pointwise limit.
Example: for any PDS kernel K, exp(K) is PDS.

Mehryar Mohri - Foundations of Machine Learning page 25

This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 26

Sequence Kernels
Definition: Kernels defined over pairs of strings.

• Motivation: computational biology, text and

speech classification.

• Idea: two sequences are related when they share

some common substrings or subsequences.

• Example: bigramXkernel;
K(x, y) = countx (u) ⇥ county (u).
bigram u

Mehryar Mohri - Foundations of Machine Learning page 27

Weighted Transducers
b:a/0.6
b:a/0.2
a:b/0.1 1 a:a/0.4 b:a/0.3
2 3/0.1
0 a:b/0.5

T (x, y) = Sum of the weights of all accepting

paths with input x and output y .
T (abb, baa) = .1 .2 .3 .1 + .5 .3 .6 .1

Mehryar Mohri - Foundations of Machine Learning page 28

Rational Kernels over Strings
(Cortes et al., 2004)
Definition: a kernel K : R is rational if K = T
for some weighted transducer T .

Definition: let T1 : R and T2 : R be

two weighted transducers. Then, the composition
of T1 and T2 is defined for all x ,y by
(T1 T2 )(x, y) = T1 (x, z) T2 (z, y).
z

Definition: the inverse of a transducer T : R

is the transducer T 1 : R obtained from T
by swapping input and output labels.
Mehryar Mohri - Foundations of Machine Learning page 29
PDS Rational Kernels
General Construction
Theorem: for any weighted transducer T : R,
the function K = T T 1 is a PDS rational kernel.
Proof: by definition, for all x, y ,
K(x, y) = T (x, z) T (y, z).
z

• K is pointwise limit of (K ) n n 0 defined by

x, y , Kn (x, y) = T (x, z) T (y, z).
|z| n
•K n is PDS since for any sample (x1 , . . . , xm ),
Kn = AA with A = (Kn (xi , zj )) i [1,m] .
j [1,N ]

Mehryar Mohri - Foundations of Machine Learning page 30

PDS Sequence Kernels
PDS sequences kernels in computational biology,
text classification, other applications:
• special instances of PDS rational kernels.
• PDS rational kernels easy to define and modify.
• single general algorithm for their computation:
composition + shortest-distance computation.
• no need for a specific ‘dynamic-programming’
algorithm and proof for each kernel instance.
• general sub-family: based on counting
transducers.
Mehryar Mohri - Foundations of Machine Learning page 31
Counting Transducers
b:ε/1
b:ε/1
a:ε/1
X = ab
a:ε/1 Z = bbabaabba
X:X/1
0 1/1
εεabεεεεε εεεεεabεε
TX

X may be a string or an automaton

representing a regular expression.
Counts of Z in X : sum of the weights of
accepting paths of Z TX .

Mehryar Mohri - Foundations of Machine Learning page 32

Transducer Counting Bigrams
b:ε/1 b:ε/1
a:ε/1 a:ε/1

0 a:a/1 1 a:a/1 2/1

b:b/1 b:b/1
Tbigram

Counts of Z given by Z Tbigram ab .

Mehryar Mohri - Foundations of Machine Learning page 33

Transducer Counting Gappy Bigrams

b:ε/1 b:ε/λ b:ε/1

a:ε/1 a:ε/λ a:ε/1

0 a:a/1 1 a:a/1 2/1

b:b/1 b:b/1
Tgappy bigram

Counts of Z given by Z Tgappy bigram ab ,

gap penalty (0, 1) .

Mehryar Mohri - Foundations of Machine Learning page 34

Composition
Theorem: the composition of two weighted
transducer is also a weighted transducer.
Proof: constructive proof based on composition
algorithm.
• states identified with pairs.
• -free case: transitions defined by
E= (q1 , q1 ), a, c, w1 w2 , (q2 , q2 ) .
(q1 ,a,b,w1 ,q2 ) E1
(q1 ,b,c,w2 ,q2 ) E2

• general case: use of intermediate -filter.

Mehryar Mohri - Foundations of Machine Learning page 35
Composition Algorithm
ε-Free Case
a:a/0.6
b:a/0.2
b:b/0.3 2 a:b/0.5 a:b/0.3 2 b:a/0.5
a:b/0.1 b:b/0.1
0 1 b:b/0.4 3/0.7 0 1 3/0.6
a:b/0.4
a:b/0.2

a:a/.04 (0, 1)

a:a/.02 (3, 2)
a:b/.18
a:b/.01 b:a/.06 a:a/0.1
(0, 0) (1, 1) (2, 1) (3, 1)
a:b/.24

b:a/.08 (3, 3)/.42

Complexity: O(|T1 | |T2 |) in general, linear in some cases.

Mehryar Mohri - Foundations of Machine Learning page 36
(c)
!:!1 !:!1 !:!1 !:!1 !:!1

A' 0
Redundant ε-Paths Problem
a:a
1
b:!2
2
c:!2
3
d:d
4

(d)
!2:! !2:! !2:! (MM,!2Pereira,
:! and Riley, 1996; Pereira and Riley, 1997)
T1 0 a:aa:d 1 b:ε !1:e2 c:ε d:a3 d:d 4 0 a:d 1 ε:e 2 d:a 3 T2
B' 0 1 2 3

ε:ε1 ε:ε1 ε:ε1 ε:ε1 ε:ε1 ε2:ε ε2:ε ε2:ε ε2:ε

T!1 0 a:a 1 b:ε2 2 c:ε2 3 d:d 4 0 a:d 1 ε1: e 2 d:a 3 T!2

a:d !:e
0,0 1,1 1,2 ε1:ε1
(x:x) (!1:!1)
b:! b:e b:! ε2:ε1 ε1:ε1
(!2:!2) (!2:!1) (!2:!2) x:x 1
!:e x:x
2,1 (!1:!1)
2,2
0 F
c:! c:!
ε2:ε2 ε2:ε2
(!2:!2) (!2:!2)
x:x 2
3,1 !:e 3,2
(!1:!1)
d:a
(x:x)

4,3
T = T!1 ◦ F ◦ T!2 .
Mehryar Mohri - Foundations of Machine Learning page 37
Kernels for Other Discrete Structures
Similarly, PDS kernels can be defined on other
discrete structures:

• Images,
• graphs,
• parse trees,
• automata,
• weighted automata.
Mehryar Mohri - Foundations of Machine Learning page 38
This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 39

Questions
Gaussian kernels have the form exp( d2 ) where d is
a metric.
• for what other functions d does exp( d2 ) define a
PDS kernel?
• what other PDS kernels can we construct from a
metric in a Hilbert space?

Mehryar Mohri - Foundations of Machine Learning page 40

Negative Definite Kernels
(Schoenberg, 1938)
Definition: A function K: X X R is said to be a
negative definite symmetric (NDS) kernel if it is
symmetric and if for all 1
{x , . . . , xm } X and c R m 1

with 1 c = 0 ,
c Kc 0.

Clearly, if K is PDS, then K is NDS, but the

converse does not hold in general.

Mehryar Mohri - Foundations of Machine Learning page 41

Examples
The squared distance ||x y||2 in a Hilbert space H
m
defines an NDS kernel. If i=1 ci = 0 ,
m m
ci cj ||xi xj ||2 = ci cj (xi xj ) · (xi xj )
i,j=1 i,j=1
m
= ci cj ( xi 2
+ xj 2
2xi · xj )
i,j=1
m m m
= ci cj ( xi 2
+ xj 2
) 2 ci xi · cj xj
i,j=1 i=1 j=1
m
ci cj ( xi 2
+ xj 2
)
i,j=1
m m m m
= cj ci ( xi 2
+ ci cj xj 2
= 0.
j=1 i=1 i=1 j=1

Mehryar Mohri - Foundations of Machine Learning page 42

NDS Kernels - Property
(Schoenberg, 1938)
Theorem: Let K: X X R be an NDS kernel such
that for all x, y X, K(x, y) = 0 iff x = y . Then, there
exists a Hilbert space H and a mapping : X H
such that
∀x, y ∈ X, K(x, y) = ∥Φ(x) − Φ(y)∥2 .

Thus, under the hypothesis of the theorem, K

√

defines a metric.

Mehryar Mohri - Foundations of Machine Learning page 43

PDS and NDS Kernels
(Schoenberg, 1938)
Theorem: let K: X X R be a symmetric kernel,
then:
• K is NDS iff exp( tK) is a PDS kernel for all t > 0 .
• Let K be defined for any x0 by
K (x, y) = K(x, x0 ) + K(y, x0 ) K(x, y) K(x0 , x0 )
for all x, y X. Then, K is NDS iff K is PDS.

Mehryar Mohri - Foundations of Machine Learning page 44

Example
The kernel defined by K(x, y) = exp( t||x y||2 )
is PDS for all t > 0 since ||x y||2 is NDS.
The kernel exp( |x y|p )is not PDS for p > 2 .
Otherwise, for any t > 0 ,{x1 , . . . , xm } X and c Rm 1

m m
t|xi xj |p |t1/p xi t1/p xj |p
ci cj e = ci cj e 0.
i,j=1 i,j=1

This would imply that |x y|p is NDS for p > 2, but

that cannot be (see past homework assignments).

Mehryar Mohri - Foundations of Machine Learning page 45

Conclusion
PDS kernels:
• rich mathematical theory and foundation.
• general idea for extending many linear
algorithms to non-linear prediction.
• flexible method: any PDS kernel can be used.
• widely used in modern algorithms and
applications.
• can we further learn a PDS kernel and a
hypothesis based on that kernel from labeled
data? (see tutorial: http://www.cs.nyu.edu/~mohri/icml2011-
tutorial/).
Mehryar Mohri - Foundations of Machine Learning page 46
References
• N. Aronszajn, Theory of Reproducing Kernels, Trans. Amer. Math. Soc., 68, 337-404, 1950.

• Peter Bartlett and John Shawe-Taylor. Generalization performance of support vector

machines and other pattern classifiers. In Advances in kernel methods: support vector learning,
pages 43–54. MIT Press, Cambridge, MA, USA, 1999.

• Christian Berg, Jens Peter Reus Christensen, and Paul Ressel. Harmonic Analysis on
Semigroups. Springer-Verlag: Berlin-New York, 1984.

• Bernhard Boser, Isabelle M. Guyon, and Vladimir Vapnik. A training algorithm for optimal
margin classifiers. In proceedings of COLT 1992, pages 144-152, Pittsburgh, PA, 1992.

• Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Rational Kernels: Theory and
Algorithms. Journal of Machine Learning Research (JMLR), 5:1035-1062, 2004.

• Corinna Cortes and Vladimir Vapnik, Support-Vector Networks, Machine Learning, 20,
1995.

• Kimeldorf, G. and Wahba, G. Some results on Tchebycheffian Spline Functions, J. Mathematical

Analysis and Applications, 33, 1 (1971) 82-95.

Mehryar Mohri - Foundations of Machine Learning page 47 Courant Institute, NYU

References
• James Mercer. Functions of Positive and Negative Type, and Their Connection with the
Theory of Integral Equations. In Proceedings of the Royal Society of London. Series A,
Containing Papers of a Mathematical and Physical Character,Vol. 83, No. 559, pp. 69-70, 1909.

• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and
Speech Processing, In Proceedings of the 12th biennial European Conference on Artificial
Intelligence (ECAI-96),Workshop on Extended finite state models of language. Budapest,
Hungary, 1996.

• Fernando C. N. Pereira and Michael D. Riley. Speech Recognition by Composition of

Weighted Finite Automata. In Finite-State Language Processing, pages 431-453. MIT Press,
1997.

• I. J. Schoenberg, Metric Spaces and Positive Definite Functions. Transactions of the American
Mathematical Society,Vol. 44, No. 3, pp. 522-536, 1938.

• Vladimir N.Vapnik. Estimation of Dependences Based on Empirical Data. Springer, Basederlin,

1982.

• Vladimir N.Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

• Vladimir N.Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, 1998.

Mehryar Mohri - Foundations of Machine Learning page 48 Courant Institute, NYU
Appendix
Mercer’s Condition
(Mercer, 1909)
Theorem: Let X X be a compact subset of RN and
let K : X X R be in L (X X) and symmetric.
Then, K admits a uniformly convergent expansion

K(x, y) = an n (x) n (y), with an > 0,

n=0

iff for any function c in L2 (X),

c(x)c(y)K(x, y)dxdy 0.
X X

Mehryar Mohri - Foundations of Machine Learning page 50

SVMs with PDS Kernels
Constrained optimization: Hadamard product

max 2 1 ( y) K( y)
subject to: 0 C y = 0.

Solution:
m
h = sgn i yi K(xi , ·) +b ,
i=1
with b = yi ( y) Kei for any xi with
0 < i < C.

Mehryar Mohri - Foundations of Machine Learning page 51

Byzantine Machine Learning: A Primer: Rachid Guerraoui Nirupam Gupta Rafael Pinot
No ratings yet
Byzantine Machine Learning: A Primer: Rachid Guerraoui Nirupam Gupta Rafael Pinot
39 pages
Benoîte de Saporta, Mounir Zili - Martingales and Financial Mathematics in Discrete Time-Wiley-ISTE (2022)
No ratings yet
Benoîte de Saporta, Mounir Zili - Martingales and Financial Mathematics in Discrete Time-Wiley-ISTE (2022)
226 pages
Dynamic Inefficiency When The Expected Return On Capital Is Greater Than The Growth Rate
No ratings yet
Dynamic Inefficiency When The Expected Return On Capital Is Greater Than The Growth Rate
9 pages
Convolution in 1D and 2D
No ratings yet
Convolution in 1D and 2D
18 pages
Neural Networks and Their Application To Finance: Martin P. Wallace (P D)
No ratings yet
Neural Networks and Their Application To Finance: Martin P. Wallace (P D)
10 pages
Cours Calcul Stochastique
100% (1)
Cours Calcul Stochastique
68 pages
TD Calcul Stochastique
No ratings yet
TD Calcul Stochastique
3 pages
BD Solutions PDF
100% (3)
BD Solutions PDF
31 pages
Stochastic Calculus Notes 4/5
No ratings yet
Stochastic Calculus Notes 4/5
22 pages
Haar Measure On Compact Groups
No ratings yet
Haar Measure On Compact Groups
12 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Martingales in Discrete-Time - (Kozdron)
No ratings yet
Martingales in Discrete-Time - (Kozdron)
5 pages
Master LN
No ratings yet
Master LN
135 pages
Bandits
No ratings yet
Bandits
2 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
Ulj FMF Fc2 Fm2 Sno Black Scholes Model 01
No ratings yet
Ulj FMF Fc2 Fm2 Sno Black Scholes Model 01
60 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
2,201 pages
Time Series
No ratings yet
Time Series
327 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
How Does SVD Work?: Singular Value Decomposition (SVD) On Wikipedia
No ratings yet
How Does SVD Work?: Singular Value Decomposition (SVD) On Wikipedia
6 pages
2020 Notes Numprofin
No ratings yet
2020 Notes Numprofin
170 pages
Stopping Times Solutions
No ratings yet
Stopping Times Solutions
3 pages
Download Complete Public vs. private: the early history of school choice in America Gross PDF for All Chapters
100% (7)
Download Complete Public vs. private: the early history of school choice in America Gross PDF for All Chapters
41 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Mathematical Statistics Final Exam
No ratings yet
Mathematical Statistics Final Exam
5 pages
Brownian Motion: A Tutorial
No ratings yet
Brownian Motion: A Tutorial
40 pages
Further Topics On Discrete-Time Markov Control Processes
No ratings yet
Further Topics On Discrete-Time Markov Control Processes
285 pages
Stochastic Control:: With Applications To Financial Mathematics
No ratings yet
Stochastic Control:: With Applications To Financial Mathematics
66 pages
C 2 OneFactor Vasicek
No ratings yet
C 2 OneFactor Vasicek
21 pages
Stochastic Processes Notes
No ratings yet
Stochastic Processes Notes
2 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
TravellingSalesmanProblem PDF
No ratings yet
TravellingSalesmanProblem PDF
212 pages
TD Processus MPCI
No ratings yet
TD Processus MPCI
11 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
A Revealing Introduction To Hidden Markov Models
No ratings yet
A Revealing Introduction To Hidden Markov Models
20 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Application of Neural Network Models For Mathematical Programming Problems - A State of The Art Review
No ratings yet
Application of Neural Network Models For Mathematical Programming Problems - A State of The Art Review
12 pages
Ramsey Ccass Koopman
No ratings yet
Ramsey Ccass Koopman
56 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Lehmann-Scheffe Theorem
No ratings yet
Lehmann-Scheffe Theorem
15 pages
A Proof of Jensen's Inequality
No ratings yet
A Proof of Jensen's Inequality
3 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
University of Tunis Fall 2013 Tunis Business School Decision & Game Theory Tutorial 3
No ratings yet
University of Tunis Fall 2013 Tunis Business School Decision & Game Theory Tutorial 3
4 pages
Probability
100% (2)
Probability
520 pages
A3 - Random Variables and Distributions
100% (1)
A3 - Random Variables and Distributions
19 pages
From Classical To Unsupervised Deep Learning For Solving Inverse Problem in Imaging To
No ratings yet
From Classical To Unsupervised Deep Learning For Solving Inverse Problem in Imaging To
248 pages
Rough Volatility 2023 Part 1 Handout
No ratings yet
Rough Volatility 2023 Part 1 Handout
43 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)
01 Stable Matching
No ratings yet
01 Stable Matching
31 pages
Data Structures - CS1353 - Assignment 1: Array Size (N) 50
No ratings yet
Data Structures - CS1353 - Assignment 1: Array Size (N) 50
2 pages
Vaccina1365tion FAQ
No ratings yet
Vaccina1365tion FAQ
2 pages
Breakfast Lunch Snacks Drinks Dinner
No ratings yet
Breakfast Lunch Snacks Drinks Dinner
3 pages
Vaccine Preventable Diseases Such As Meales, Mumps, Rubella, Chickenpox, Hepatis - A and Tyhoid
No ratings yet
Vaccine Preventable Diseases Such As Meales, Mumps, Rubella, Chickenpox, Hepatis - A and Tyhoid
2 pages
IGCSE Add Maths - Permutation and combination - Paper 2022
No ratings yet
IGCSE Add Maths - Permutation and combination - Paper 2022
9 pages
Konigsberg Bridge Problem
No ratings yet
Konigsberg Bridge Problem
8 pages
Digital and Kalman Filtering An Introduction to Discrete Time Filtering and Optimum Linear Estimation 2nd Edition S. M. Bozic pdf download
100% (1)
Digital and Kalman Filtering An Introduction to Discrete Time Filtering and Optimum Linear Estimation 2nd Edition S. M. Bozic pdf download
68 pages
Pnpcoin: Distributed Computing On Bitcoin Infrastructure: Martin Kol A R Brno University of Technology
No ratings yet
Pnpcoin: Distributed Computing On Bitcoin Infrastructure: Martin Kol A R Brno University of Technology
5 pages
Biometrics Assignment
No ratings yet
Biometrics Assignment
5 pages
Faculty of Engineering: Mechanical Engineering Department Master Engineering Programme
No ratings yet
Faculty of Engineering: Mechanical Engineering Department Master Engineering Programme
28 pages
AI lab 4 (1)
No ratings yet
AI lab 4 (1)
3 pages
P-Delta Effect: Articles Test Problems
No ratings yet
P-Delta Effect: Articles Test Problems
2 pages
CSC413 A2
No ratings yet
CSC413 A2
3 pages
ECG Final Group Project
No ratings yet
ECG Final Group Project
60 pages
Applied Econometrics Using Matlab
No ratings yet
Applied Econometrics Using Matlab
348 pages
Digital Transmission: Sampling Quantization
No ratings yet
Digital Transmission: Sampling Quantization
23 pages
Probability of Error 1
No ratings yet
Probability of Error 1
26 pages
Syllabus 1MA462 1
No ratings yet
Syllabus 1MA462 1
2 pages
Binary Search Tree PDF
No ratings yet
Binary Search Tree PDF
26 pages
IDS 572 - Data Mining Assignment 5
No ratings yet
IDS 572 - Data Mining Assignment 5
6 pages
DSP Practice Problem 3
No ratings yet
DSP Practice Problem 3
20 pages
Machine - Learning - Content - Python PDF
No ratings yet
Machine - Learning - Content - Python PDF
3 pages
Clustering Large Data Sets With Mixed Numeric and Categorical Values
No ratings yet
Clustering Large Data Sets With Mixed Numeric and Categorical Values
14 pages
Transportation Model
No ratings yet
Transportation Model
20 pages
Instant Access to Probabilistic Machine Learning for Civil Engineers James-A Goulet ebook Full Chapters
100% (4)
Instant Access to Probabilistic Machine Learning for Civil Engineers James-A Goulet ebook Full Chapters
84 pages
Cse UNIT 5
No ratings yet
Cse UNIT 5
28 pages
Application of Linear Systems: Problem, Investment Problem, or Agriculture Problem.
No ratings yet
Application of Linear Systems: Problem, Investment Problem, or Agriculture Problem.
8 pages
difference of dda and beresenham
No ratings yet
difference of dda and beresenham
8 pages
Unit-2 Data Structures Introductions-1
No ratings yet
Unit-2 Data Structures Introductions-1
16 pages
Maths G9 P1
No ratings yet
Maths G9 P1
9 pages
CSE-Mapping-all
No ratings yet
CSE-Mapping-all
1 page
communication theory
No ratings yet
communication theory
31 pages
DAA Model Lab - 230412 - 112849
No ratings yet
DAA Model Lab - 230412 - 112849
56 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.