
4dfefbbefe1360fd931c6b1768fa9d39.ppt
- Количество слайдов: 17
The Hebrew University Jerusalem, Israel Online Learning with a Memory Harness using the Forgetron Shai Shalev-Shwartz joint work with Ofer Dekel and Yoram Singer Large Scale Kernel Machine Forgetron NIPS’ 05, Whistler Slide
Overview • Online learning with kernels • Goal: strict limit on the number of “support vectors” • The Forgetron algorithm • Analysis • Experiments Forgetron Slide 2
Kernel-based Perceptron for Online Learning Online Learner yt sign(ft(xt)) x current classifier: ft(x) = i 2 I yi K(xi, x) Current Active-Set I = {1, 3, 4} {1, 3} 1 Forgetron 2 3 4 5 #Mistakes M 6. . . 7 Slide 3
Kernel-based Perceptron for Online Learning Online Learner yt = sign(ft(xt)) x current classifier: ft(x) = i 2 I yi K(xi, x) Current Active-Set I = {1, 3, 4} 1 Forgetron 2 3 4 5 #Mistakes M 6. . . 7 Slide 4
Learning on a Budget • |I| = number of mistakes until round t • Memory + time inefficient • |I| might grow unboundedly • Goal: Construct a kernel-based online algorithm for which: • |I| · B for each t • Still performs “well” comes with performance guarantee Forgetron Slide 5
Mistake Bound for Perceptron • {(x 1, y 1), …, (x. T, y. T)} : a sequence of examples • A kernel K s. t. K(xt, xt) · 1 • g : a fixed competitor classifier in RKHS • Define `t(g)= max(0, 1 – yt g(xt)) • Then, Forgetron Slide 6
Previous Work Previous online budget algorithms do not provide a mistake bound Is our goal attainable ? • Crammer, Kandola, Singer (2003) • Kivinen, Smola, Williamson (2004) • Weston, Bordes, Bottu (2005) Forgetron Slide 7
Mission Impossible • Input space: {e 1, …, e. B+1} • Linear kernel: K(ei, ej) = ei ¢ ej = i, j • Budget constraint: |I| · B. Therefore, there exists j s. t. i 2 I i K(ei, ej) = 0 • We might always err • But, the competitor: g= i ei never errs ! • Perceptron makes B+1 mistakes Forgetron Slide 8
Redefine the Goal • We must restrict the competitor g somehow. One way: restrict ||g|| • The counter example implies that we cannot compete with ||g|| ¸ (B+1)1/2 • Main result: The Forgetron algorithm can compete with any classifier g s. t. Forgetron Slide 9
The Forgetron ft(x) = i 2 I i yi K(xi, x) 1 2 3 . . . t-1 t Step (1) - Perceptron I’ = I [ {t} Step (2) – Shrinking i t i Step (3) – Remove Oldest r = min I I I [ {t} Forgetron 1 Slide 10
Quantifying Deviation • “Progress” measure: t = ||ft – g||2 - ||ft+1 -g||2 • “Progress” for each update step t = t + t ||ft-g||2 -||f’-g||2 after Perceptron ||f’-g||2 -||f’’-g||2 after shrinking ||f’’-g||2 -||ft+1 -g||2 after removal • “Deviation” is measured by negative progress Forgetron Slide 12
Quantifying Deviation Gain from Perceptron step: Damage from shrinking: Damage from removal: The Forgetron sets: Forgetron Slide 13
Resulting Mistake Bound For any g s. t. the number of prediction mistakes the Forgetron makes is at most Forgetron Slide 14
Experiment I: MNIST dataset Forgetron Slide 22
Experiment II: Census-income (adult) … (Perceptron makes 16, 000 mistakes) Forgetron Slide 23
Experiment III: Synthetic Data with Label Noise Forgetron Slide 24
Summary • No budget algorithm can compete with arbitrary hypotheses • The Forgetron can compete with norm-bounded hypotheses • Works well in practice • Does not require parameters Future work: the Forgetron for batch learning Forgetron Slide 25
4dfefbbefe1360fd931c6b1768fa9d39.ppt