Markov Chains as a Learning Tool - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Markov Chains as a Learning Tool

Description:

Markov Chains as a Learning Tool. * * * * * * * * * * * * * * * * * * * * * * * * Weather: raining today 40% rain tomorrow 60% no rain tomorrow not raining ... – PowerPoint PPT presentation

Number of Views:245
Avg rating:3.0/5.0
Slides: 24
Provided by: IlanG1
Category:

less

Transcript and Presenter's Notes

Title: Markov Chains as a Learning Tool


1
Markov Chains as a Learning Tool
2
Markov ProcessSimple Example
  • Weather
  • raining today 40 rain tomorrow
  • 60 no rain tomorrow
  • not raining today 20 rain tomorrow
  • 80 no rain tomorrow

Stochastic Finite State Machine
3
Markov ProcessSimple Example
  • Weather
  • raining today 40 rain tomorrow
  • 60 no rain tomorrow
  • not raining today 20 rain tomorrow
  • 80 no rain tomorrow

The transition matrix
  • Stochastic matrix
  • Rows sum up to 1
  • Double stochastic matrix
  • Rows and columns sum up to 1

Rain No rain
Rain
No rain
4
Markov Process
Let Xi be the weather of day i, 1 lt i lt t. We
may decide the probability of Xt1 from Xi, 1 lt
i lt t.
Markov Property Xt1, the state of the system
at time t1 depends only on the state of the
system at time t
Stationary Assumption Transition probabilities
are independent of time (t)
5
Markov ProcessGamblers Example
Gambler starts with 10 (the initial state) -
At each play we have one of the following
Gambler wins 1 with probability p Gambler
looses 1 with probability 1-p Game ends when
gambler goes broke, or gains a fortune of
100 (Both 0 and 100 are absorbing states)
1-p
6
Markov Process
  • Markov process - described by a stochastic FSM
  • Markov chain - a random walk on this graph
  • (distribution over paths)
  • Edge-weights give us
  • We can ask more complex questions, like

7
Markov ProcessCoke vs. Pepsi Example
  • Given that a persons last cola purchase was
    Coke, there is a 90 chance that his next cola
    purchase will also be Coke.
  • If a persons last cola purchase was Pepsi,
    there is an 80 chance that his next cola
    purchase will also be Pepsi.

transition matrix
coke pepsi
coke
pepsi
8
Markov ProcessCoke vs. Pepsi Example (cont)
Given that a person is currently a Pepsi
purchaser, what is the probability that he will
purchase Coke two purchases from now? Pr
Pepsi???Coke Pr Pepsi?Coke?Coke Pr
Pepsi? Pepsi ?Coke 0.2
0.9 0.8 0.2
0.34
? ? Coke
Pepsi ? ?
9
Markov ProcessCoke vs. Pepsi Example (cont)
Given that a person is currently a Coke
purchaser, what is the probability that he will
buy Pepsi at the third purchase from now?
10
Markov ProcessCoke vs. Pepsi Example (cont)
  • Assume each person makes one cola purchase per
    week
  • Suppose 60 of all people now drink Coke, and 40
    drink Pepsi
  • What fraction of people will be drinking Coke
    three weeks from now?

PrX3Coke 0.6 0.781 0.4 0.438
0.6438 Qi - the distribution in week i Q0
(0.6,0.4) - initial distribution Q3 Q0 P3
(0.6438,0.3562)
11
Markov ProcessCoke vs. Pepsi Example (cont)
Simulation
2/3
PrXi Coke
week - i
12
How to obtain Stochastic matrix?
  • Solve the linear equations, e.g.,
  • Learn from examples, e.g., what letters follow
    what letters in English words mast, tame, same,
    teams, team, meat, steam, stem.

13
How to obtain Stochastic matrix?
  • Counts table vs Stochastic Matrix

P a s t m e \0
a 0 1/7 1/7 5/7 0 0
e 4/7 0 0 1/7 0 2/7
m 1/8 1/8 0 0 3/8 3/8
s 1/5 0 3/5 0 0 1/5
t 1/7 0 0 0 4/7 2/7
_at_ 0 3/8 3/8 2/8 0 0
14
Application of Stochastic matrix
  • Using Stochastic Matrix to generate a random
    word
  • Generate most likely first letter
  • For each current letter generate most likely next
    letter

A a s t m e \0
a - 1 2 7 - -
e 4 - - 5 - 7
m 1 2 - - 5 8
s 1 - 4 - - 5
t 1 - - - 5 7
_at_ - 3 6 8 - -
C
If Cr,j gt 0, let Ar,j Cr,1Cr,2Cr,j

15
Application of Stochastic matrix
  • Using Stochastic Matrix to generate a random
    word
  • Generate most likely first letter Generate a
    random number x between 1 and 8. If 1 lt x lt 3,
    the letter is s if 4 lt x lt 6, the letter is
    t otherwise, its m.
  • For each current letter generate
  • most likely next letter Suppose
  • the current letter is s and we
  • generate a random number x
  • between 1 and 5. If x 1, the next
  • letter is a if 2 lt x lt 4, the next
  • letter is t otherwise, the current
  • letter is an ending letter.

A a s t m e \0
a - 1 2 7 - -
e 4 - - 5 - 7
m 1 2 - - 5 8
s 1 - 4 - - 5
t 1 - - - 5 7
_at_ - 3 6 8 - -
If Cr,j gt 0, let Ar,j Cr,1Cr,2Cr,j

16
Supervised vs Unsupervised
  • Decision tree learning is supervised learning
    as we know the correct output of each example.
  • Learning based on Markov chains is unsupervised
    learning as we dont know which is the correct
    output of next letter.

17
K-Nearest Neighbor
  • Features
  • All instances correspond to points in an
    n-dimensional Euclidean space
  • Classification is delayed till a new instance
    arrives
  • Classification done by comparing feature vectors
    of the different points
  • Target function may be discrete or real-valued

18
1-Nearest Neighbor
19
3-Nearest Neighbor
20
ExampleIdentify Animal Type
14 examples 10 attributes 5 types Whats the
type of this new animal?
21
K-Nearest Neighbor
  • An arbitrary instance is represented by (a1(x),
    a2(x), a3(x),.., an(x))
  • ai(x) denotes features
  • Euclidean distance between two instances
  • d(xi, xj)sqrt (sum for r1 to n (ar(xi) -
    ar(xj))2)
  • Continuous valued target function
  • mean value of the k nearest training examples

22
Distance-Weighted Nearest Neighbor Algorithm
  • Assign weights to the neighbors based on their
    distance from the query point
  • Weight may be inverse square of the distances
  • All training points may influence a particular
    instance
  • Shepards method

23
Remarks
  • Highly effective inductive inference method for
    noisy training data and complex target functions
  • Target function for a whole space may be
    described as a combination of less complex local
    approximations
  • Learning is very simple
  • - Classification is time consuming (except 1NN)
Write a Comment
User Comments (0)
About PowerShow.com