Inference and Learning via Integer Linear Programming - PowerPoint PPT Presentation

About This Presentation
Title:

Inference and Learning via Integer Linear Programming

Description:

(U1 U2) (U3 U4) U1 U2 1, U3 U4 1. Page 10. Text Chunking. Indicator Variables. U1,NP,U1,NULL,U2,VP... y1=NP, y1=NULL, y2=VP,.. U1,NP indicates that phrase 1 is ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 49
Provided by: davz8
Category:

less

Transcript and Presenter's Notes

Title: Inference and Learning via Integer Linear Programming


1
Inference and Learning via Integer Linear
Programming
  • Vasin,Dan,Scott,Dav

2
Outline
  • Problem Definition
  • Integer Linear Programming (ILP)
  • Its generality
  • Learning and Inference via ILP
  • Experiments
  • Extension to hierarchical learning
  • Future Direction
  • Hidden Variables

3
Problem Definition
  • X (X1,...,Xk) ? X k X
  • Y (Y1,...,Yl) ? Y l Y
  • Given X x, find Y y
  • Notation agreements
  • Capital letters mean variables
  • Non-capital letters mean values
  • Bold indicates vectors or matrixes
  • X,Y is a set

4
Example (Text Chunking)
y
NP
ADJP
VP
ADVP
VP
The
guy
presenting
now
is
so
tired
x
5
Classifiers
  • A classifier
  • h X ?Y (l-1)?Y ? 1,..,l ?R
  • Example
  • score(x,y-3,NP,3) 0.3
  • score(x,y-3,VP,3) 0.5
  • score(x,y-3,ADVP,3) 0.2
  • score(x,y-3,ADJP,3) 1.2
  • score(x,y-3,NULL,3) 0.1

6
Inference
  • Goal x ? y
  • Given
  • x input
  • score(x,y-t,y,t) for all (y-t,y) ? Y l, t ?
    1,..,l
  • C A set of constraints over Y
  • Find y
  • maximizes global function
  • score(x,y) ?t score(x,y-t,yt,t)
  • satisfies constraints C

7
Integer Linear Programming
  • Boolean variables U (U1,...,Ud) ? 0,1d
  • Cost vector p (p1,,pd) ? Rd
  • Cost Function p?U
  • Constraint Matrix c ? Re?Rd
  • Maximize p?U
  • Subject to cU ? 0 (cU0, cU?3, possible)

8
ILP (Example)
  • U (U1,U2,U3)
  • p (0.3, 0.5, 0.8)
  • c 1 2 3
  • -1 -2 2
  • 0 -3 2
  • Maximize p?U
  • Subject to cU ? 0

9
Boolean Functions as Linear Constraints
  • Conjunction
  • U1?U2?U3 ? U11, U21, U31
  • Disjunction
  • U1?U2?U3 ? U1 U2 U3 ? 1
  • CNF
  • (U1?U2)?(U3?U4) ? U1U2 ? 1, U3U4 ? 1

10
Text Chunking
  • Indicator Variables
  • U1,NP,U1,NULL,U2,VP...? y1NP, y1NULL, y2VP,..
  • U1,NP indicates that phrase 1 is labeled NP
  • Cost Vector
  • p1,NP score(x,NP,1)
  • p1,NULL score(x,NULL,1)
  • p2,VP score(x,VP,2)
  • ...
  • p?U score(x,y) ?t score(x,yt,t), subject to
    constraints

11
Structural Constraints
  • Coherency
  • yt can take only one value
  • ?y?NP,..NULL Ut,y 1
  • Non-Overlapping
  • y1 and y2 overlap
  • U1,NULL U2,NULL 1

12
Linguistic Constraints
  • Every sentence must have at least one VP
  • ?t Ut,VP ? 1
  • Every sentence must have at least one NP
  • ?t Ut,NP ? 1
  • ...

13
Interacting Classifiers
  • Classifier for an output yt uses other outputs
    y-t as inputs
  • score(x,y-t,y,t)
  • Need to ensure that the final output from ILP
    computed from a consistent y
  • Introduce additional variables
  • Introduce additional coherency constraints

14
Interacting Classifiers
  • Additional variables
  • Yy ? UY,y for all possible y-t,y
  • Additional coherency constraints
  • UY,y 1 iff Ut,yt 1 for all yt in y
  • ?yt in y Ut,yt - UY,y ? l 1
  • ?yt in y Ut,yt - lUY,y ? 0

15
Learning Classifiers
  • score(x,y-t,y,t) ?y??y(x,y-t,t)
  • Learn ?y, for all y ? Y
  • Multi-class learning
  • Example (x,y) ? ?y(x,y-t,t),ytt1..l
  • Learn each classifier independently

16
Learn with Inference Feedback
  • Learn by observing global behavior
  • For each example (x,y)
  • Make prediction with the current classifiers and
    ILP
  • y argmaxy ?t score(x,y-t,y,t)
  • For each t, update
  • If yt ? yt
  • Promote score(x,y-t,yt,t)
  • Demote score(x,y-t,yt,t)

17
Experiments
  • Semantics Role Labeling
  • Assume correct boundaries are given
  • Only sentences with more than 5 arguments are
    included

18
Experimental Results
Winnow
Perceptron
  • For difficult task
  • Inference feedback during training improves
    performance
  • For easy task
  • Learning without inference feedback is better

19
Conservative Updating
  • Update only if necessary
  • Example
  • U1 U2 1
  • Predict (U1, U2) (1,0)
  • Correct (U1, U2) (0,1)
  • Feedback ?Demote class 1, promote class 2
  • So, U10 ? U21, so only demote class 1

20
Conservative Updating
  • S minset(Constraints)
  • Set of functions that, if changed, would make
    global prediction correct.
  • Promote (Demote) only those functions in the
    minset S

21
Hierarchical Learning
  • Given x
  • Compute hierarchically
  • z1 h1(x)
  • z2 h2(x,z1)
  • y hs1(x,z1,,zs)
  • Assume all z are known in training

22
Hierarchical Learning
  • Assume each hj can be computed via ILP
  • pj, Uj, cj
  • y argmaxymaxz1,zs ?j?jpjUj
  • Subject to
  • c1U1 ? 0, c2U2 ? 0, , cs1Us1 ? 0
  • where ?j is a large enough constant to preserve
    hierarchy

23
Hidden Variables
  • Given x
  • y h(x,z)
  • z is not known in training
  • y argmaxymaxz score(x,z,y-t,y,t)
  • Subject to some constraints

24
Learning with Hidden Variables
  • Truncated EM styled learning
  • For each example (x,y)
  • Compute z with the current classifiers and ILP
  • z argmaxz score(x,z,y-t,y,t)
  • Make prediction with the current classifiers and
    ILP
  • (y,z) argmaxy,z ?t score(x,z,y-t,y,t)
  • For each t, update
  • If yt ? yt
  • Promote score(x,z,y-t,yt,t)
  • Demote score(x,z,y-t,yt,t)

25
Conclusion
  • ILP is
  • powerful
  • general
  • learnable
  • useful
  • fast (or at least not too slow)
  • extendable

26
(No Transcript)
27
(No Transcript)
28
Boolean Functions as Linear Constraints
  • Conjunction
  • a?b?c ? Ua Ub Uc ? 3
  • Disjunction
  • a?b?c ? Ua Ub Uc ? 1
  • DNF
  • ab cd ? Iab Icd ? 1
  • Introduce new variables Iab, Icd

29
Helper Variables
  • We must link Ia, Ib, and Iab
  • Iab ?ab
  • IaIb ? Iab
  • Ia Ib lt Iab
  • Iab ? IaIb
  • 2Iab lt Ia Ib

30
Semantic Role Labeling
  • a,b,c...? ph1A0, ph1A1,ph2A0,..
  • Cost Vector
  • pa score(ph1A0)
  • pb score(ph1A1)
  • ...
  • Indicator Variables
  • Ia indicates that phrase 1 is labeled A0
  • paIa 0.3 if Ia and 0 ow

31
(No Transcript)
32
Learning
  • X (X1,...,Xk) ? X1,,Xk X
  • Y-t (Y1,...,Yt-1,Yt1,Yl)
  • ? Y1,,Yt-1,Yt1,,Yl Y -t
  • Yt ? Yt
  • Given X x, and Y-t y-t, find Yt yt or score
    of each possible yt
  • X ?Y t ? Yt or X ?Y t?Yt ?R

33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
SRL via Generalized Inference
39
Outline
  • Find potential argument candidates
  • Classify arguments to types
  • Inference for Argument Structure
  • Integer linear programming (ILP)
  • Cost Function
  • Constraints
  • Features

40
Find Potential Arguments
  • Every chunk can be an argument
  • Restrict potential arguments
  • BEGIN(word)
  • BEGIN(word) 1 ? word begins argument
  • END(word)
  • END(word) 1 ? word ends argument
  • Argument
  • (wi,...,wj) is a potential argument iff
  • BEGIN(wi) 1 and END(wj) 1
  • Reduce set of potential argments

41
Details...
  • BEGIN(word)
  • Learn a function
  • ?B(word,context,structure) ? 0,1
  • END(word)
  • Learn a function
  • ?E(word,context,structure) ? 0,1
  • POTARG arg BEGIN(first(arg)) and
    END(last(arg))

42
Arguments Type Likelihood
  • Assign type-likelihood
  • How likely is it that arg a is type t?
  • For all a ? POTARG , t ? T
  • P (argument a type t )

0.3 0.2 0.2 0.3
0.6 0.0 0.0 0.4
A0
CA1
A1
Ø
43
Details...
  • Learn a classifier
  • ARGTYPE(arg)
  • ?P(arg) ? A0,A1,...,CA0,...,LOC,...
  • argmaxt?A0,A1,...,CA0,...,LOC,... wt ?P(arg)
  • Estimate Probabilites
  • P(a t) wt ?P(a) / Z

44
What is a Good Assignment?
  • Likelihood of being correct
  • P(Arg a Type t)
  • if t is the correct type for argument a
  • For a set of arguments a1, a2, ..., an
  • Expected number of arguments correct
  • ? i P( ai ti )
  • We search for the assignment with maximum
    expected correct

45
Inference
  • Maximize expected number correct
  • T argmaxT ? i P( ai ti )
  • Subject to some constraints
  • Structural and Linguistic

I left my nice pearls to her
I left my nice pearls to her
46
Everything is Linear
  • Cost function
  • ?a ? POTARG P(at) ?a ? POTARG , t ? T
    P(at)Iat
  • Constraints
  • Non-Overlapping
  • a and a overlap ? IaØ IaØ 0
  • Linguistic
  • ? CA0 ? ? A0 ? ?a IaCA0 ?a IaA0 ? 1
  • Integer Linear Programming

47
Features are Important
  • Here, a discussion of the features should go.
  • Which are most important?
  • Comparison to other people.

48
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com