ML Kernel Methods
ML Kernel Methods
Kernel Methods
Mehryar Mohri
Courant Institute and Google Research
mohri@cims.nyu.edu
Motivation
Efficient computation of inner products in high
dimension.
Non-linear decision boundary.
Non-vectorial inputs.
Flexible selection of more complex features.
i,j=1
i j i j 0.
√
x1 2 x1
Sigmoid Kernels:
·, · is a PDS kernel on H0 .
Mehryar Mohri - Foundations of Machine Learning page 12
• ·, · is definite:
•
i2I
Thus,[f (x)]2 f, f K(x, x) for all x X , which
shows the definiteness of ·, · .
Mehryar Mohri - Foundations of Machine Learning page 13
• Thus, ·, · defines an inner product on H , which0
thereby becomes a pre-Hilbert space.
• H can be completed to form a Hilbert space H in
0
which it is dense.
Notes:
• H is called the reproducing kernel Hilbert space
(RKHS) associated to K.
• A Hilbert space such that there exists : X H
with K(x, y) = (x)· (y) for all x, y X is also
called a feature space associated to K . is called
a feature mapping.
• Feature spaces associated to K are in general not
unique.
Mehryar Mohri - Foundations of Machine Learning page 14
This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels
Then, ( n, c Kn c 0) lim c Kn c = c Kc 0.
n
• Example: bigramXkernel;
K(x, y) = countx (u) ⇥ county (u).
bigram u
a:a/.04 (0, 1)
a:a/.02 (3, 2)
a:b/.18
a:b/.01 b:a/.06 a:a/0.1
(0, 0) (1, 1) (2, 1) (3, 1)
a:b/.24
A' 0
Redundant ε-Paths Problem
a:a
1
b:!2
2
c:!2
3
d:d
4
(d)
!2:! !2:! !2:! (MM,!2Pereira,
:! and Riley, 1996; Pereira and Riley, 1997)
T1 0 a:aa:d 1 b:ε !1:e2 c:ε d:a3 d:d 4 0 a:d 1 ε:e 2 d:a 3 T2
B' 0 1 2 3
4,3
T = T!1 ◦ F ◦ T!2 .
Mehryar Mohri - Foundations of Machine Learning page 37
Kernels for Other Discrete Structures
Similarly, PDS kernels can be defined on other
discrete structures:
• Images,
• graphs,
• parse trees,
• automata,
• weighted automata.
Mehryar Mohri - Foundations of Machine Learning page 38
This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels
with 1 c = 0 ,
c Kc 0.
defines a metric.
m m
t|xi xj |p |t1/p xi t1/p xj |p
ci cj e = ci cj e 0.
i,j=1 i,j=1
• Christian Berg, Jens Peter Reus Christensen, and Paul Ressel. Harmonic Analysis on
Semigroups. Springer-Verlag: Berlin-New York, 1984.
• Bernhard Boser, Isabelle M. Guyon, and Vladimir Vapnik. A training algorithm for optimal
margin classifiers. In proceedings of COLT 1992, pages 144-152, Pittsburgh, PA, 1992.
• Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Rational Kernels: Theory and
Algorithms. Journal of Machine Learning Research (JMLR), 5:1035-1062, 2004.
• Corinna Cortes and Vladimir Vapnik, Support-Vector Networks, Machine Learning, 20,
1995.
• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and
Speech Processing, In Proceedings of the 12th biennial European Conference on Artificial
Intelligence (ECAI-96),Workshop on Extended finite state models of language. Budapest,
Hungary, 1996.
• I. J. Schoenberg, Metric Spaces and Positive Definite Functions. Transactions of the American
Mathematical Society,Vol. 44, No. 3, pp. 522-536, 1938.
c(x)c(y)K(x, y)dxdy 0.
X X
max 2 1 ( y) K( y)
subject to: 0 C y = 0.
Solution:
m
h = sgn i yi K(xi , ·) +b ,
i=1
with b = yi ( y) Kei for any xi with
0 < i < C.