Active Learning for Hidden Markov Models - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Active Learning for Hidden Markov Models

Description:

Active Learning for Hidden Markov Models. Brigham Anderson, Andrew Moore ... W = space of possible parameter values. Prior on parameters: Posterior over models: ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 84
Provided by: carnegieme
Category:

less

Transcript and Presenter's Notes

Title: Active Learning for Hidden Markov Models


1
Active Learning for Hidden Markov Models
???
  • Brigham Anderson, Andrew Moore
  • brigham_at_cmu.edu, awm_at_cs.cmu.edu
  • Computer Science
  • Carnegie Mellon University

2
Outline
  • Active Learning
  • Hidden Markov Models
  • Active Learning Hidden Markov Models

3
Notation
  • We Have
  • Dataset, D
  • Model parameter space, W
  • Query algorithm, q

4
Dataset (D) Example
5
Notation
  • We Have
  • Dataset, D
  • Model parameter space, W
  • Query algorithm, q

6
Model Example
St
Ot
Probabilistic Classifier
Notation T Number of examples Ot
Vector of features of example t St Class of
example t
7
Model Example
Patient state (St) St DiseaseState
Patient Observations (Ot) Ot1 Gender Ot2
Age Ot3 TestA Ot4 TestB Ot5 TestC
8
Possible Model Structures
9
Model Space
St
Ot
Model
P(St)
Model Parameters
P(OtSt)
Generative Model Must be able to compute P(Sti,
Otot w)
10
Model Parameter Space (W)
  • W space of possible parameter values
  • Prior on parameters
  • Posterior over models

11
Notation
  • We Have
  • Dataset, D
  • Model parameter space, W
  • Query algorithm, q

q(W,D) returns t, the next sample to label
12
Game
  • while NotDone
  • Learn P(W D)
  • q chooses next example to label
  • Expert adds label to D

13
Simulation
O1
O2
O3
O4
O5
O6
O7
S1
S2
S3
S4
S5
S6
S7
hmm
q
14
Active Learning Flavors
  • Pool
  • (random access to patients)
  • Sequential
  • (must decide as patients walk in the door)

15
q?
  • Recall q(W,D) returns the most interesting
    unlabelled example.
  • Well, what makes a doctor curious about a patient?

16
1994
17
Score Function
18
Uncertainty Sampling Example
FALSE
19
Uncertainty Sampling Example
FALSE
TRUE
20
(No Transcript)
21
Uncertainty Sampling
  • GOOD couldnt be easier
  • GOOD often performs pretty well
  • BAD H(St) measures information gain about the
    samples, not the model
  • Sensitive to noisy samples

22
Can we do better thanuncertainty sampling?
23
1992
24
Strategy 2Query by Committee
  • Temporary Assumptions
  • Pool ? Sequential
  • P(W D) ? Version Space
  • Probabilistic ? Noiseless
  • QBC attacks the size of the Version space

25
O1
O2
O3
O4
O5
O6
O7
S1
S2
S3
S4
S5
S6
S7
FALSE!
FALSE!
Model 1
Model 2
26
O1
O2
O3
O4
O5
O6
O7
S1
S2
S3
S4
S5
S6
S7
TRUE!
TRUE!
Model 1
Model 2
27
O1
O2
O3
O4
O5
O6
O7
S1
S2
S3
S4
S5
S6
S7
FALSE!
TRUE!
Ooh, now were going to learn something for sure!
One of them is definitely wrong.
Model 1
Model 2
28
The Original QBCAlgorithm
  • As each example arrives
  • Choose a committee, C, (usually of size 2)
    randomly from Version Space
  • Have each member of C classify it
  • If the committee disagrees, select it.

29
1992
30
QBC Choose Controversial Examples
STOP!
Doesnt model disagreement mean uncertainty?
Why not use Uncertainty Sampling?
Version Space
31
  • Remember our whiny objection to Uncertainty
    Sampling?
  • H(St) measures information gain about the
    samples not the model.
  • BUT If the source of the sample uncertainty is
    model uncertainty, then they equivalent!
  • Why?
  • Symmetry of mutual information.

32
(1995)
33
Dagan-Engelson QBC
  • For each example
  • Choose a committee, C, (usually of size 2)
    randomly from P(W D)
  • Have each member C classify it
  • Compute the Vote Entropy to measure
    disagreement

34
How to Generate the Committee?
  • This important point is not covered in the talk.
  • Vague Suggestions
  • Good conjugate priors for parameters
  • Importance sampling

35
  • OK, we could keep extending QBC, but lets cut to
    the chase

36
1992
37
Model Entropy
P(WD)
P(WD)
P(WD)
W
W
W
H(W) high
H(W) 0
better
38
Information-Gain
  • Choose the example that is expected to most
    reduce H(W)
  • I.e., Maximize H(W) H(W St)

39
Score Function
40
  • We usually cant just sum over all models to get
    H(StW)
  • but we can sample from P(W D)

41
Conditional Model Entropy
42
Score Function
43
(No Transcript)
44
Amazing Entropy Fact
Symmetry of Mutual Information MI(AB)
H(A) H(AB) H(B) H(BA)
45
Score Function
Familiar?
46
Uncertainty Sampling Information Gain
47
  • The information gain framework is cleaner than
    the QBC framework, and easy to build on
  • For instance, we dont need to restrict St to be
    the class variable

48
Any Missing Feature is Fair Game
49
Outline
  • Active Learning
  • Hidden Markov Models
  • Active Learning Hidden Markov Models

50
HMMs
Model parameters W p0,A,B
O0
O1
O2
O3
S0
S0
S1
S2
S3
51
HMM Light Switch
  • OUTPUT
  • Probability distribution over
  • Absent,
  • Meeting,
  • Computer, and
  • Other
  • E.g.,
  • There is an 86 chance
  • that the user is in a meeting
  • right now.

INPUT Binary stream of motion / no-motion
52
Light Switch HMM
53
CanonicalHMM Tasks
  • State Estimation
  • For each timestep today, what were the
    probabilities of each state?
  • P(StO1O2O3OT, W)
  • ML Path
  • Given todays observations, what was the most
    likely path?
  • S argmax P(O1O2O3OT S, W)
  • ML Model learning
  • Given the last 30 days of data, what are the best
    model parameters?
  • W argmax P(O1O2O3OT W)

Forward-Backward Algorithm
54
HMM Light Switch
55
Outline
  • Active Learning
  • Hidden Markov Models
  • Active Learning Hidden Markov Models

56
Active Learning!
Good Morning Sir! Heres the video footage of
yesterday. Could you just go through it and
label each frame?
Good Morning Sir! Can you tell me what you are
doing in this frame of video?
57
HMMs and Active Learning
1
0
0
1
1
0
1
S1
S3
S4
S2
S5
S6
S7
hmm
58
  • Note the dependencies between states does not
    affect the basic algorithm!
  • the only change in how we compute P(StO1T)
  • (we have to use Forward-Backward.)

59
HMM Active Learning
  • Choose a committee, C, randomly from P(W D)
  • Run Forward-Backward for each member of c
  • For each timestep, compute H(St) - H(StC)

Done!
60
Actively Selecting Excerpts
Good Morning Sir! Im still trying to learn
your HMM. Could you please label the following
scene from yesterday
61
  • Finding the optimal scene is useful for
  • Selecting scenes from video
  • Selecting utterances from audio
  • Selecting excerpts from text
  • Selecting sequences from DNA

62
Which sequence should I get labeled?
There are O(T2) of them!
hmm
63
Excerpt Selection
  • Lets maximize H(S) H(SC)
  • Trick question
  • Which subsequence maximizes H(S) H(SC)?

64
Sequence Selection
We have to include the cost incurred when we
force an expert to sit down and label 1000
examples
65
What is the Entropy of a Sequence?
  • H(S14) H(S1,S2,S3,S4) ?

66
Amazing Entropy Fact
The Chain Rule H(A,B,C,D) H(A) H(BA)
H(CA,B) H(DA,B,C)
67
  • and even better

A
B
C
D
H(A,B,C,D) H(A) H(BA) H(CB) H(DC)
68
Entropy of a Sequence
We still get the components of these expressions,
P(St i O1T), and P(St1i St j, O1T),
from a Forward-Backward run.
69
Score of a Sequence
70
Finding Best Excerpt ofLength k
71
Find Best Sequence ofLength k
  • Draw committee C from P(W D)
  • Run Forward-Backward for each c
  • Scan the entire sequence using scoreseqIG(S)


k5
O(T) !
72
Find Best Excerpt ofAny Length
73
Find Best Sequence ofAny Length
Hmm Thats O(T2). We could cleverly cache
some of the computation as we go But were
still going to be O(T2)
  • Score all possible intervals
  • Pick the best one

74
Similar Problem
Find the interval that has largest integral
f(t)
t
(Note this was a Google interview question!)
75
Similar Problem
Can be done using Dynamic Programming in O(T)!
f(t)
t
76
  • a,b best interval so far
  • atemp start of best interval ending at t
  • sum(a,b)
  • sum(atemp,t )

state(t)
Rules if ( sum(atemp,t-1) y(t) lt 0 )
then atemp t if ( sum(atemp,t) gt sum(a,b)
) then a,b atemp,t
77
Find Best Sequence ofAny Length
  • Draw committee C from P(W D)
  • Run Forward-Backward for each c
  • Find best-scoring interval using DP


78
Not Just HMMs
  • The max-MI Excerpt can be applied to any
    sequential process with the Markov property
  • E.g., Kalman filters

79
Aside Active Diagnosis
  • What if were not trying to learn a model?
  • What if we have a good model already, and we just
    want to learn the most about the sequence itself?
  • E.g., An HMM is trying to translate a news
    broadcast. It doesnt want to learn the model,
    it just wants the best transcription possible.

80
we can use the same DP trick to find the
optimal subsequence too
81
Conclusion
  • Uncertainty sampling is sometimes correct
  • QBC is an approximation to Information Gain
  • Finding the most-informative subsequence of a
    Markov time series is O(T)

82
(No Transcript)
83
Light Switch HMM
Write a Comment
User Comments (0)
About PowerShow.com