Title: CS621: Artificial Intelligence Lecture 15: perceptron training contd Proof of Convergence of PTA
1CS621 Artificial IntelligenceLecture 15
perceptron training contd Proof of Convergence
of PTA
- Pushpak Bhattacharyya
- Computer Science and Engineering Department
- IIT Bombay
2Perceptron Training Algorithm (PTA)
- Preprocessing
- The computation law is modified to
- y 1 if ?wixi gt ?
- y o if ?wixi lt ?
- ?
w3
3PTA preprocessing cont
- 2. Absorb ? as a weight
- ?
- 3. Negate all the zero-class examples
x1
4Example to demonstrate preprocessing
- OR perceptron
- 1-class lt1,1gt , lt1,0gt , lt0,1gt
- 0-class lt0,0gt
- Augmented x vectors-
- 1-class lt-1,1,1gt , lt-1,1,0gt , lt-1,0,1gt
- 0-class lt-1,0,0gt
- Negate 0-class- lt1,0,0gt
5Example to demonstrate preprocessing cont..
- Now the vectors are
- x0 x1 x2
- X1 -1 0 1
- X2 -1 1 0
- X3 -1 1 1
- X4 1 0 0
6Perceptron Training Algorithm
- Start with a random value of w
- ex lt0,0,0gt
- 2. Test for wxi gt 0
- If the test succeeds for i1,2,n
- then return w
- Modify w, wnext wprev xfail
- Goto 2
7Tracing PTA on OR-example
- wlt0,0,0gt wx1 fails
- wlt-1,0,1gt wx4 fails
- wlt0,0,1gt wx2 fails
- wlt-1,1,1gt wx4 fails
- wlt0,1,1gt wx4 fails
- wlt1,1,1gt wx1 fails
- wlt0,1,2gt wx4 fails
- wlt1,1,2gt wx2 fails
- wlt0,2,2gt wx4 fails
- wlt1,2,2gt success
8Proof of Convergence of PTA
- Perceptron Training Algorithm (PTA)
- Statement
- Whatever be the initial choice of weights and
whatever be the vector chosen for testing, PTA
converges if the vectors are from a linearly
separable function.
9Proof of Convergence of PTA
- Suppose wn is the weight vector at the nth step
of the algorithm. - At the beginning, the weight vector is w0
- Go from wi to wi1 when a vector Xj fails the
test wiXj gt 0 and update wi as - wi1 wi Xj
- Since Xjs form a linearly separable function,
- ? w s.t. wXj gt 0 ?j
10Proof of Convergence of PTA
- Consider the expression
- G(wn) wn . w
- wn
- where wn weight at nth iteration
- G(wn) wn . w . cos ?
- wn
- where ? angle between wn and w
- G(wn) w . cos ?
- G(wn) w ( as -1 cos ? 1)
11Behavior of Numerator of G
- wn . w (wn-1 Xn-1fail ) . w
- wn-1 . w Xn-1fail . w
- (wn-2 Xn-2fail ) . w Xn-1fail . w ..
- w0 . w ( X0fail X1fail .... Xn-1fail ).
w - w.Xifail is always positive note
carefully - Suppose Xj ? , where ? is the minimum
magnitude. - Num of G w0 . w n ? . w
- So, numerator of G grows with n.
12Behavior of Denominator of G
- wn ? wn . wn
- ? (wn-1 Xn-1fail )2
- ? (wn-1)2 2. wn-1. Xn-1fail (Xn-1fail )2
- ? (wn-1)2 (Xn-1fail )2 (as wn-1. Xn-1fail
0 ) - ? (w0)2 (X0fail )2 (X1fail )2 . (Xn-1fail
)2 - Xj ? (max magnitude)
- So, Denom ? (w0)2 n?2
13Some Observations
- Numerator of G grows as n
- Denominator of G grows as ? n
- gt Numerator grows faster than denominator
- If PTA does not terminate, G(wn) values will
become unbounded.
14Some Observations contd.
- But, as G(wn) w which is finite, this is
impossible! - Hence, PTA has to converge.
- Proof is due to Marvin Minsky.
15Convergence of PTA proved
- Whatever be the initial choice of weights and
whatever be the vector chosen for testing, PTA
converges if the vectors are from a linearly
separable function.