Constrained Conditional Models Learning and Inference for Information Extraction and Natural Language Understanding PowerPoint PPT Presentation

presentation player overlay
1 / 62
About This Presentation
Transcript and Presenter's Notes

Title: Constrained Conditional Models Learning and Inference for Information Extraction and Natural Language Understanding


1
Constrained Conditional Models Learning and
Inference for Information Extraction and
Natural Language Understanding
  • Dan Roth
  • Department of Computer Science
  • University of Illinois at Urbana-Champaign

With thanks to Collaborators Ming-Wei Chang,
Dan Goldwasser, Vasin Punyakanok, Lev Ratinov,






Nick Rizzolo, Mark Sammons, Ivan Titov, Scott
Yih, Dav Zimak Funding ARDA, under the AQUAINT
program NSF ITR IIS-0085836, ITR
IIS-0428472, ITR IIS- 0085980, SoD-HCER-0613885
A DOI grant under the Reflex program
DHS DASH Optimization (Xpress-MP)
June 2009 ILPNLP Workshop _at_ NAACL-HLT
2
Constraints Conditional Models (CCMs)
  • Informally
  • Everything that has to do with global constraints
    (and learning models)
  • A bit more formally
  • We typically make decisions based on models such
    as
  • With CCMs we make decisions based on models such
    as
  • This is a global inference problem (you can solve
    it multiple ways)
  • We do not dictate how models are learned.
  • but well discuss it and make suggestions

CCMs assign values to variables in the
presence/guided by constraints
Page 2
3
Constraints Driven Learning and Decision Making
  • Why Constraints?
  • The Goal Building a good NLP systems easily
  • We have prior knowledge at our hand
  • How can we use it?
  • Often knowledge can be injected directly and be
    used to
  • improve decision making
  • guide learning
  • simplify the models we need to learn
  • How useful are constraints?
  • Useful for supervised learning
  • Useful for semi-supervised learning
  • Sometimes more efficient than labeling data
    directly

4
Make my day
5
Learning and Inference
  • Global decisions in which several local decisions
    play a role but there are mutual dependencies on
    their outcome.
  • E.g. Structured Output Problems multiple
    dependent output variables
  • (Main playground for these methods so far)
  • (Learned) models/classifiers for different
    sub-problems
  • In some cases, not all local models can be
    learned simultaneously
  • Key examples in NLP are Textual Entailment and QA
  • In these cases, constraints may appear only at
    evaluation time
  • Incorporate models information, along with prior
    knowledge/constraints, in making coherent
    decisions
  • decisions that respect the local models as well
    as domain context specific knowledge/constraints
    .

6
Comprehension
A process that maintains and updates a collection
of propositions about the state of affairs.
  • (ENGLAND, June, 1989) - Christopher Robin is
    alive and well. He lives in England. He is the
    same person that you read about in the book,
    Winnie the Pooh. As a boy, Chris lived in a
    pretty home called Cotchfield Farm. When Chris
    was three years old, his father wrote a poem
    about him. The poem was printed in a magazine
    for others to read. Mr. Robin then wrote a book.
    He made up a fairy tale land where Chris lived.
    His friends were animals. There was a bear
    called Winnie the Pooh. There was also an owl
    and a young pig, called a piglet. All the
    animals were stuffed toys that Chris owned. Mr.
    Robin made them come to life with his words. The
    places in the story were all near Cotchfield
    Farm. Winnie the Pooh was written in 1925.
    Children still love to read about Christopher
    Robin and his animal friends. Most people don't
    know he is a real person who is grown now. He
    has written two books of his own. They tell what
    it is like to be famous.

1. Christopher Robin was born in England. 2.
Winnie the Pooh is a title of a book. 3.
Christopher Robins dad was a magician. 4.
Christopher Robin must be at least 65 now.
This is an Inference Problem
7
This Talk Constrained Conditional Models
  • A general inference framework that combines
  • Learning conditional models with using
    declarative expressive constraints
  • Within a constrained optimization framework
  • Formulate a decision process as a constrained
    optimization problem
  • Break up a complex problem into a set of
    sub-problems and require components outcomes to
    be consistent modulo constraints
  • Has been shown useful in the context of many NLP
    problems
  • SRL, Summarization Co-reference Information
    Extraction Transliteration
  • RothYih04,07 Punyakanok et.al 05,08 Chang
    et.al 07,08 ClarkeLapata06,07
    DeniseBaldrige07GoldwasserRoth08
  • Here focus on Learning and Inference for
    Structured NLP Problems
  • Issues to attend to
  • While we formulate the problem as an ILP problem,
    Inference can be done multiple ways
  • Search sampling dynamic programming SAT ILP
  • The focus is on joint global inference
  • Learning may or may not be joint.
  • Decomposing models is often beneficial

8
Outline
  • Constrained Conditional Models
  • Motivation
  • Examples
  • Training Paradigms Investigate ways for
    training models and combining constraints
  • Joint Learning and Inference vs. decoupling
    Learning Inference
  • Training with Hard and Soft Constrains
  • Guiding Semi-Supervised Learning with Constraints
  • Examples
  • Semantic Parsing
  • Information Extraction
  • Pipeline processes

9
Pipeline
  • Most problems are not single classification
    problems

Raw Data
POS Tagging
Phrases
Semantic Entities
Relations
Parsing
WSD
Semantic Role Labeling
  • Conceptually, Pipelining is a crude approximation
  • Interactions occur across levels and down stream
    decisions often interact with previous decisions.
  • Leads to propagation of errors
  • Occasionally, later stage problems are easier but
    cannot correct earlier errors.
  • But, there are good reasons to use pipelines
  • Putting everything in one basket may not be right
  • How about choosing some stages and think about
    them jointly?

10
Inference with General Constraint Structure
RothYih04Recognizing Entities and Relations
Improvement over no inference 2-5
other 0.05
per 0.85
loc 0.10
other 0.05
per 0.50
loc 0.45
other 0.10
per 0.60
loc 0.30
other 0.05
per 0.85
loc 0.10
other 0.10
per 0.60
loc 0.30
other 0.05
per 0.50
loc 0.45
other 0.05
per 0.50
loc 0.45
x argmaxx ? c(xv) xv argmaxx
cE1 per xE1 per cE1 loc xE1
loc cR12 spouse_of xR12
spouse_of cR12 ? xR12 ? Subject to
Constraints
Non-Sequential
  • Key Components
  • Write down an objective function (Linear).
  • Write down constraints as linear inequalities

irrelevant 0.10
spouse_of 0.05
born_in 0.85
irrelevant 0.05
spouse_of 0.45
born_in 0.50
irrelevant 0.05
spouse_of 0.45
born_in 0.50
irrelevant 0.05
spouse_of 0.45
born_in 0.50
irrelevant 0.10
spouse_of 0.05
born_in 0.85
  • Some Questions
  • How to guide the global inference?
  • Why not learn Jointly?

Models could be learned separately constraints
may come up only at decision time.
11
Problem Setting
  • Random Variables Y
  • Conditional Distributions P (learned by
    models/classifiers)
  • Constraints C any Boolean function
  • defined over partial assignments
    (possibly weights W )
  • Goal Find the best assignment
  • The assignment that achieves the highest global
    performance.
  • This is an Integer Programming Problem

y7
observations
YargmaxY P?Y subject to
constraints C
12
Formal Model
Subject to constraints
(Soft) constraints component
How to solve? This is an Integer Linear
Program Solving using ILP packages gives an
exact solution. Search techniques are also
possible
How to train? How to decompose the global
objective function? Should we incorporate
constraints in the learning process?
13
Example Semantic Role Labeling
Who did what to whom, when, where, why,
  • I left my pearls to my daughter in my will .
  • IA0 left my pearlsA1 to my daughterA2 in
    my willAM-LOC .
  • A0 Leaver
  • A1 Things left
  • A2 Benefactor
  • AM-LOC Location
  • I left my pearls to my daughter in my will
    .

Special Case (structured output problem) here,
all the data is available at one time in
general, classifiers might be learned from
different sources, at different times, at
different contexts. Implications on training
paradigms
Overlapping arguments If A2 is present, A1 must
also be present.
14
Semantic Role Labeling (2/2)
  • PropBank Palmer et. al. 05 provides a large
    human-annotated corpus of semantic verb-argument
    relations.
  • It adds a layer of generic semantic labels to
    Penn Tree Bank II.
  • (Almost) all the labels are on the constituents
    of the parse trees.
  • Core arguments A0-A5 and AA
  • different semantics for each verb
  • specified in the PropBank Frame files
  • 13 types of adjuncts labeled as AM-arg
  • where arg specifies the adjunct type

15
Algorithmic Approach
Identify Vocabulary
candidate arguments
  • Identify argument candidates
  • Pruning XuePalmer, EMNLP04
  • Argument Identifier
  • Binary classification (SNoW)
  • Classify argument candidates
  • Argument Classifier
  • Multi-class classification (SNoW)
  • Inference
  • Use the estimated probability distribution given
    by the argument classifier
  • Use structural and linguistic constraints
  • Infer the optimal global output

EASY
Inference over (old and new) Vocabulary
I left my nice pearls to her
16
Inference
I left my nice pearls to her
  • The output of the argument classifier often
    violates some constraints, especially when the
    sentence is long.
  • Finding the best legitimate output is formalized
    as an optimization problem and solved via Integer
    Linear Programming. Punyakanok
    et. al 04, Roth Yih 040507
  • Input
  • The probability estimation (by the argument
    classifier)
  • Structural and linguistic constraints
  • Allows incorporating expressive (non-sequential)
    constraints on the variables (the arguments
    types).

17
Semantic Role Labeling (SRL)
  • I left my pearls to my daughter in my will .

0.5
0.15
0.15
0.1
0.1
0.05
0.1
0.2
0.6
0.05
0.15
0.6
0.05
0.05
0.05
0.05
0.05
0.7
0.05
0.15
0.3
0.2
0.2
0.1
0.2
Page 17
18
Semantic Role Labeling (SRL)
  • I left my pearls to my daughter in my will .

0.5
0.15
0.15
0.1
0.1
0.05
0.1
0.2
0.6
0.05
0.15
0.6
0.05
0.05
0.05
0.05
0.05
0.7
0.05
0.15
0.3
0.2
0.2
0.1
0.2
Page 18
19
Semantic Role Labeling (SRL)
  • I left my pearls to my daughter in my will .

0.5
0.15
0.15
0.1
0.1
0.05
0.1
0.2
0.6
0.05
0.15
0.6
0.05
0.05
0.05
0.05
0.05
0.7
0.05
0.15
0.3
0.2
0.2
0.1
0.2
One inference problem for each verb predicate.
Page 19
20
Integer Linear Programming Inference
  • For each argument ai
  • Set up a Boolean variable ai,t indicating
    whether ai is classified as t
  • Goal is to maximize
  • ? i score(ai t ) ai,t
  • Subject to the (linear) constraints
  • If score(ai t ) P(ai t ), the objective is
    to find the assignment that maximizes the
    expected number of arguments that are correct and
    satisfies the constraints.

The Constrained Conditional Model is completely
decomposed during training
21
Constraints
Any Boolean rule can be encoded as a linear
constraint.
  • No duplicate argument classes
  • ?a ? POTARG xa A0 ? 1
  • R-ARG
  • ? a2 ? POTARG , ?a ? POTARG xa A0 ? xa2
    R-A0
  • C-ARG
  • a2 ? POTARG , ? (a ? POTARG) ? (a is before a2 )
    xa A0 ? xa2 C-A0
  • Many other possible constraints
  • Unique labels
  • No overlapping or embedding
  • Relations between number of arguments order
    constraints
  • If verb is of type A, no argument of type B

If there is an R-ARG phrase, there is an ARG
Phrase
If there is an C-ARG phrase, there is an ARG
before it
Universally quantified rules
LBJ allows a developer to encode constraints in
FOL these are compiled into linear inequalities
automatically.
Joint inference can be used also to combine
different SRL Systems.
22
Learning Based Java (LBJ)
http//L2R.cs.uiuc.edu/cogcomp/software.php
  • A modeling language for Constrained Conditional
    Models
  • Supports programming along with building learned
    models, high level specification of constraints
    and inference with constraints
  • Learning operator
  • Functions defined in terms of data
  • Learning happens at compile time
  • Integrated constraint language
  • Declarative, FOL-like syntax defines constraints
    in terms of your Java objects
  • Compositionality
  • Use any function as feature extractor
  • Easily combine existing model specifications
    /learned models with each other

23
Example Semantic Role Labeling
LBJ site provides example code for NER, POS
tagger etc.
Declarative, FOL-style constraints written in
terms of functions applied to Java
objects Rizzolo, Roth07
Inference produces new functions that respect the
constraints
24
Semantic Role Labeling
Screen shot from a CCG demo http//L2R.cs.uiuc.edu
/cogcomp
Semantic parsing reveals several relations in
the sentence along with their arguments.
This approach produces a very good semantic
parser. F190 Easy and fast 7 Sent/Sec
(using Xpress-MP)
Top ranked system in CoNLL05 shared task Key
difference is the Inference
25
Textual Entailment
Phrasal verb paraphrasing ConnorRoth07
Semantic Role Labeling Punyakanok et. al05,08
Entity matching Li et. al, AAAI04, NAACL04
Inference for Entailment Braz et. al05, 07
Is it true that? (Textual Entailment)
Eyeing the huge market potential, currently led
by Google, Yahoo took over search company
Overture Services Inc. last year
?
Yahoo acquired Overture
Overture is a search company
Google is a search company
Google owns Overture
.
26
Outline
  • Constrained Conditional Models
  • Motivation
  • Examples
  • Training Paradigms Investigate ways for
    training models and combining constraints
  • Joint Learning and Inference vs. decoupling
    Learning Inference
  • Training with Hard and Soft Constrains
  • Guiding Semi-Supervised Learning with Constraints
  • Examples
  • Semantic Parsing
  • Information Extraction
  • Pipeline processes

27
Training Paradigms that Support Global Inference
  • Algorithmic Approach Incorporating general
    constraints
  • Allow both statistical and expressive declarative
    constraints ICML05
  • Allow non-sequential constraints (generally
    difficult) CoNLL04
  • Coupling vs. Decoupling Training and Inference.
  • Incorporating global constraints is important but
  • Should it be done only at evaluation time or also
    at training time?
  • How to decompose the objective function and train
    in parts?
  • Issues related to
  • Modularity, efficiency and performance,
    availability of training data
  • Problem specific considerations

28
Training in the presence of Constraints
  • General Training Paradigm
  • First Term Learning from data (could be further
    decomposed)
  • Second Term Guiding the model by constraints
  • Can choose if constraints weights trained, when
    and how, or taken into account only in evaluation.

Decompose Model (SRL case)
Decompose Model from constraints
29
Comparing Training Methods
  • Option 1 Learning Inference (with Constraints)
  • Ignore constraints during training
  • Option 2 Inference (with Constraints) Based
    Training
  • Consider constraints during training
  • In both cases Global Decision Making with
    Constraints
  • Question Isnt Option 2 always better?
  • Not so simple
  • Next, the Local model story

30
Training Methods
Cartoon each model can be more complex and may
have a view on a set of output variables.
Learning Inference (LI) Learn models
independently
Inference Based Training (IBT) Learn all models
together!
Y
Intuition Learning with constraints may make
learning more difficult
X
31
Training with Constraints Example
Perceptron-based Global Learning
f1(x)
X
f2(x)
f3(x)
Y
f4(x)
f5(x)
Which one is better? When and Why?
32
Claims Punyakanok et. al , IJCAI 2005
  • When the local modes are easy to learn, LI
    outperforms IBT.
  • In many applications, the components are
    identifiable and easy to learn (e.g., argument,
    open-close, PER).
  • Only when the local problems become difficult to
    solve in isolation, IBT outperforms LI, but
    needs a larger number of training examples.
  • Other training paradigms are possible
  • Pipeline-like Sequential Models Roth, Small,
    Titov AIStat09
  • Identify a preferred ordering among components
  • Learn k-th model jointly with previously learned
    models

LI cheaper computationally modular IBT is
better in the limit, and other extreme cases.
33
Bound Prediction
LI vs. IBT the more identifiable individual
problems are, the better overall performance is
with LI
  • Local ? ?opt ( ( d log m log 1/? ) / m )1/2
  • Global ? 0 ( ( cd log m c2d log 1/? ) /
    m )1/2

Indication for hardness of problem
34
Relative Merits SRL
Difficulty of the learning problem( features)
easy
hard
35
Comparing Training Methods (Cont.)
  • Local Models (train independently) vs.
    Structured Models
  • In many cases, structured models might be better
    due to expressivity
  • But, what if we use constraints?
  • Local ModelsConstraints vs. Structured Models
    Constraints
  • Hard to tell Constraints are expressive
  • For tractability reasons, structured models have
    less expressivity than the use of constraints.
  • Local can be better, because local models are
    easier to learn

Decompose Model (SRL case)
Decompose Model from constraints
36
Example CRFs are CCMs
But, you can do better
  • Consider a common model for sequential inference
    HMM/CRF
  • Inference in this model is done via
  • the Viterbi Algorithm.
  • Viterbi is a special case of the Linear
    Programming based Inference.
  • Viterbi is a shortest path problem, which is a
    LP, with a canonical matrix that is totally
    unimodular. Therefore, you can get integrality
    constraints for free.
  • One can now incorporate non-sequential/expressive/
    declarative constraints by modifying this
    canonical matrix
  • No value can appear twice a specific value must
    appear at least once A?B
  • And, run the inference as an ILP inference.

Learn a rather simple model make decisions with
a more expressive model
37
Example Semantic Role Labeling Revisited
Sequential Models Conditional Random Field
Global perceptron Training sentence
based Testing find the shortest path with
constraints
Local Models Logistic Regression Local Avg.
Perceptron Training token based. Testing find
the best assignment locally
with constraints
38
Which Model is Better? Semantic Role Labeling
  • Experiments on SRL Roth and Yih, ICML 2005
  • Story Inject constraints into conditional random
    field models

Sequential Models
Local
LI
LI
IBT
Model CRF CRF-D CRF-IBT Avg. P
Baseline 66.46 69.14 69.14 58.15
Constraints 71.94 73.91 69.82 74.49
Training Time 48 38 145 0.8
Local Models are now better than Sequential
Models! (With constraints)
Sequential Models are better than Local Models !
(No constraints)
39
Summary Training Methods
  • Many choices for training a CCM
  • Learning Inference (Training without
    constraints)
  • Inference based Learning (Training with
    constraints)
  • Model Decomposition
  • Advantages of LI
  • Require fewer training examples
  • More efficient most of the time, better
    performance
  • Modularity easier to incorporate already learned
    models.
  • Advantages of IBT
  • Better in the limit
  • Better when there are strong interactions among
    ys

Learn a rather simple model make decisions with
a more expressive model
40
Outline
  • Constrained Conditional Models
  • Motivation
  • Examples
  • Training Paradigms Investigate ways for
    training models and combining constraints
  • Joint Learning and Inference vs. decoupling
    Learning Inference
  • Training with Hard and Soft Constrains
  • Guiding Semi-Supervised Learning with Constraints
  • Examples
  • Semantic Parsing
  • Information Extraction
  • Pipeline processes

41
Constrained Conditional Model Soft Constraints
Subject to constraints
(Soft) constraints component
(1) Why use soft constraints? (2) How to model
degree of violations
(3) How to solve? This is an Integer Linear
Program Solving using ILP packages gives an
exact solution. Search techniques are also
possible
(4) How to train? How to decompose the global
objective function? Should we incorporate
constraints in the learning process?
42
(1) Why Are Soft Constraints Important?
  • Some constraints may be violated by the data.
  • Even when the gold data violates no constraints,
    the model may prefer illegal solutions.
  • If all solutions considered by the model violate
    constraints, we still want to rank solutions
    based on the level of constraints violation.
  • Important when beam search is used
  • Rather than eliminating illegal assignments,
    re-rank them
  • Working with soft constraints Chang et. al,
    ACL07
  • Need to define the degree of violation
  • Maybe be problem specific
  • Need to assign penalties for constraints

43
Information extraction without Prior Knowledge
Lars Ole Andersen . Program analysis and
specialization for the C Programming language.
PhD thesis. DIKU , University of Copenhagen, May
1994 .
Violates lots of natural constraints!
Page 43
44
Examples of Constraints
  • Each field must be a consecutive list of words
    and can appear at most once in a citation.
  • State transitions must occur on punctuation
    marks.
  • The citation can only start with AUTHOR or
    EDITOR.
  • The words pp., pages correspond to PAGE.
  • Four digits starting with 20xx and 19xx are DATE.
  • Quotations can appear only in TITLE
  • .

Easy to express pieces of knowledge
Non Propositional May use Quantifiers
45
Information Extraction with Constraints
  • Adding constraints, we get correct results!
  • Without changing the model
  • AUTHOR Lars Ole Andersen .
  • TITLE Program analysis and
    specialization for the
  • C Programming language .
  • TECH-REPORT PhD thesis .
  • INSTITUTION DIKU , University of Copenhagen
    ,
  • DATE May, 1994 .

Page 45
46
Hard Constraints vs. Weighted Constraints
Constraints are close to perfect
Labeled data might not follow the constraints
47
Training with Soft Constraints
  • Need to figure out the penalty as well
  • Option 1 Learning Inference (with Constraints)
  • Learn the weights and penalties separately
  • Penalty(c) -logP(C is violated)
  • Option 2 Inference (with Constraints) Based
    Training
  • Learn the weights and penalties together
  • The tradeoff between LI and IBT is similar to
    what we saw earlier.

48
Inference Based Training With Soft Constraints
  • Example Perceptron
  • Update penalties as well !
  • For each iteration
  • For each (X, YGOLD ) in the training data
  • If YPRED ! YGOLD
  • ? ? F(X, YGOLD ) - F(X, YPRED)
  • ?I ?I d(YGOLD,1C(X)) - d(YPRED,1C(X)), I
    1,..
  • endif
  • endfor

49
LI vs IBT for Soft Constraints
  • Test on citation recognition
  • LI HMM weighted constraints
  • IBT Perceptron weighted constraints
  • Same feature set
  • With constraints
  • Factored Model is better
  • More significant with a small of examples
  • Without constraints
  • Few labeled examples, HMM gt perceptron
  • Many labeled examples, perceptron gt HMM

50
Outline
  • Constrained Conditional Models
  • Motivation
  • Examples
  • Training Paradigms Investigate ways for
    training models and combining constraints
  • Joint Learning and Inference vs. decoupling
    Learning Inference
  • Training with Hard and Soft Constrains
  • Guiding Semi-Supervised Learning with Constraints
  • Examples
  • Semantic Parsing
  • Information Extraction
  • Pipeline processes

51
Outline
  • Constrained Conditional Models
  • Motivation
  • Examples
  • Training Paradigms Investigate ways for
    training models and combining constraints
  • Joint Learning and Inference vs. decoupling
    Learning Inference
  • Guiding Semi-Supervised Learning with Constraints
  • Features vs. Constraints
  • Hard and Soft Constraints
  • Examples
  • Semantic Parsing
  • Information Extraction
  • Pipeline processes

52
Constraints As a Way To Encode Prior Knowledge
  • Consider encoding the knowledge that
  • Entities of type A and B cannot occur
    simultaneously in a sentence
  • The Feature Way
  • Requires larger models
  • The Constraints Way
  • Keeps the model simple add expressive
    constraints directly
  • A small set of constraints
  • Allows for decision time incorporation of
    constraints

Need more training data
A effective way to inject knowledge
We can use constraints as a way to replace
training data
53
Guiding Semi-Supervised Learning with Constraints
  • In traditional Semi-Supervised learning the model
    can drift away from the correct one.
  • Constraints can be used to generate better
    training data
  • At decision time, to bias the objective function
    towards favoring constraint satisfaction.
  • At training to improve labeling of un-labeled
    data (and thus improve the model)

Constraints
Model
Un-labeled Data
Decision Time Constraints
54
Semi-supervised Learning with Constraints
Chang, Ratinov, Roth, ACL07ICML08
?learn(T)? For N iterations do T? For
each x in unlabeled dataset y1,,yK
?InferenceWithConstraints(x,C, ?)? TT ? (x,
yi)i1k ? ? ?(1-? )learn(T)?
Learn from new training data. Weigh supervised
and unsupervised model.
Page 54
55
Value of Constraints in Semi-Supervised Learning
Objective function
Learning w/o Constraints 300 examples.
Constraints are used to Bootstrap a
semi-supervised learner Poor model constraints
used to annotate unlabeled data, which in turn is
used to keep training the model.
Learning w 10 Constraints
Factored model.
of available labeled examples
56
Constraints in a hidden layer
Single Output Problem Only one output
y1
Y
X
56
57
Adding Constraints Through Hidden Variables
Single Output Problem with hidden variables
y1
Y
f5
X
57
58
Learning Good Feature Representation for
Discriminative Transliteration
  • (??????,Italy) ?
    ?Yes/No
  • Learning feature representation is a structured
    learning problem
  • Features are the graph edges the problem is
    choosing the optimal subset
  • Many constraints on the legitimacy of the active
    features representation
  • ? Formalize the problem as a constrained
    optimization problem
  • A successful solution depends on

features
Subject to One-to-One mapping
Non-crossing Length difference restriction
Language specific constraints
Learning a good objective function
?Iterative Unsupervised learning algorithm
?Good initial objective function
? Romanization table
59
Iterative Objective Function Learning
Generate features
Inference
Prediction
Initial objective function
Romanization Table
Predict labels for all word pairs
Update weight vector
Training
Language pair UCDL Prev. Sys
English-Russian (ACC) 73 63
English-Hebrew (MRR) 89.9 51
60
Summary Constrained Conditional Models
Conditional Markov Random Field
Constraints Network
  • y argmaxy ? wi Á(x y)
  • Linear objective functions
  • Typically Á(x,y) will be local functions, or
    Á(x,y) Á(x)
  • - ?i ½i dC(x,y)
  • Expressive constraints over output variables
  • Soft, weighted constraints
  • Specified declaratively as FOL formulae
  • Clearly, there is a joint probability
    distribution that represents this mixed model.
  • We would like to
  • Learn a simple model or several simple models
  • Make decisions with respect to a complex model

Key difference from MLNs, which provide a concise
definition of a model, but the whole joint one.
61
Conclusion
  • Constrained Conditional Models combine
  • Learning conditional models with using
    declarative expressive constraints
  • Within a constrained optimization framework
  • Use constraints! The framework supports
  • A clean way of incorporating constraints to bias
    and improve decisions of supervised learning
    models
  • Significant success on several NLP and IE tasks
    (often, with ILP)
  • A clean way to use (declarative) prior knowledge
    to guide semi-supervised learning
  • Training protocol matters
  • More work needed here

LBJ (Learning Based Java) http//L2R.cs.uiuc.edu/
cogcomp A modeling language for Constrained
Conditional Models. Supports programming along
with building learned models, high level
specification of constraints and inference with
constraints
62
Nice to Meet You
Write a Comment
User Comments (0)
About PowerShow.com