CRFs and Joint Inference in NLP - PowerPoint PPT Presentation

About This Presentation
Title:

CRFs and Joint Inference in NLP

Description:

CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta, Xuerui Wang, – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 56
Provided by: Andrew1349
Category:

less

Transcript and Presenter's Notes

Title: CRFs and Joint Inference in NLP


1
CRFs and Joint Inferencein NLP
  • Andrew McCallum
  • Computer Science Department
  • University of Massachusetts Amherst

Joint work with Charles Sutton, Aron Culotta,
Xuerui Wang, Ben Wellner, Fuchun Peng, Michael
Hay.
2
From Text to Actionable Knowledge
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Prediction Outlier detection Decision support
3
Joint Inference
Uncertainty Info
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Emerging Patterns
Prediction Outlier detection Decision support
4
An HLT Pipeline
SNA, KDD, Events TDT, Summarization Coreference
Relations NER Parsing MT ASR
5
An HLT Pipeline
SNA, KDD TDT, Summarization Coreference Relation
s NER Parsing MT ASR
Unified, joint inference.
6
Joint Inference
Uncertainty Info
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Emerging Patterns
Prediction Outlier detection Decision support
7
Solution
Unified Model
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Probabilistic Model
Documentcollection
Actionableknowledge
Prediction Outlier detection Decision support
8
(Linear Chain) Conditional Random Fields
Lafferty, McCallum, Pereira 2001
Undirected graphical model, trained to
maximize conditional probability of output
sequence given input sequence
Finite state model
Graphical model
OTHER PERSON OTHER ORG TITLE
output seq
y
y
y
y
y
t2
t3
t
-
1
t
t1
FSM states
. . .
observations
x
x
x
x
x
t
t
t
t
1
-
2
3
t
1
input seq
said Jones a Microsoft VP
9
Outline
a
  • Motivating Joint Inference for NLP.
  • Brief introduction of Conditional Random Fields
  • Joint inference Motivation and examples
  • Joint Labeling of Cascaded Sequences (Belief
    Propagation)
  • Joint Labeling of Distant Entities (BP by Tree
    Reparameterization)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Segmentation and Co-ref (Sparse BP)
  • Joint Extraction and Data Mining (Iterative)
  • Topical N-gram models

a
10
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
11
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
12
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
But errors cascade--must be perfect at every
stage to do well.
13
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
Joint prediction of part-of-speech and
noun-phrase in newswire, matching accuracy with
only 50 of the training data.
Inference Loopy Belief Propagation
14
2. Jointly labeling distant mentionsSkip-chain
CRFs
Sutton, McCallum, SRL 2004

Senator Joe Green said today .
Green ran for
Dependency among similar, distant mentions
ignored.
15
2. Jointly labeling distant mentionsSkip-chain
CRFs
Sutton, McCallum, SRL 2004

Senator Joe Green said today .
Green ran for
14 reduction in error on most repeated field in
email seminar announcements.
Inference Tree reparameterization BP
See also Finkel, et al, 2005
Wainwright et al, 2002
16
3. Joint co-reference among all pairsAffinity
Matrix CRF
Entity resolutionObject correspondence
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
Y/N
-99
Y/N
25 reduction in error on co-reference of
proper nouns in newswire.
11
. . . she . . .
Inference Correlational clustering graph
partitioning
McCallum, Wellner, IJCAI WS 2003, NIPS 2004
Bansal, Blum, Chawla, 2002
17
Transfer Learning with Factorial CRFs
Sutton, McCallum, 2005
Emailed seminar entities
Email English words
60k words training.
From Terri Stankus ltstankus_at_cs.cmu.edugt To
seminars_at_cs.cmu.edu Date 26 Feb 1992 GRAND
CHALLENGES FOR MACHINE LEARNING Jaime
Carbonell School of Computer Science
Carnegie Mellon University 330
pm 7500 Wean Hall Machine learning
has evolved from obscurity in the 1970s into a
vibrant and popular discipline in artificial
intelligence during the 1980s and 1990s. As a
result of its success and growth, machine
learning is evolving into a collection of related
disciplines inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning), learning
theory (e.g. PAC learning), genetic algorithms,
connectionist learning, hybrid systems, and so on.
Too little labeled training data.
18
Transfer Learning with Factorial CRFs
Sutton, McCallum, 2005
Train on related task with more data.
Newswire named entities
Newswire English words
200k words training.
CRICKET - MILLNS SIGNS FOR BOLAND CAPE TOWN
(1996-08-22) South African provincial side Boland
said on Thursday they had signed Leicestershire
fast bowler David Millns on a one year contract.
Millns, who toured Australia with England A in
1992, replaces former England all-rounder Phillip
DeFreitas as Boland's overseas professional.
19
Transfer Learning with Factorial CRFs
Sutton, McCallum, 2005
At test time, label email with newswire NEs...
Newswire named entities
Email English words
20
Transfer Learning with Factorial CRFs
Sutton, McCallum, 2005
then use these labels as features for final task
Emailed seminar annmt entities
Newswire named entities
Email English words
21
Transfer Learning with Factorial CRFs
Sutton, McCallum, 2005
Use joint inference at test time.
Seminar Announcement entities
Newswire named entities
English words
An alternative to hierarchical Bayes. Neednt
know anything about parameterization of subtask.
Accuracy No transfer lt Cascaded Transfer lt
Joint Inference Transfer
11 Reduction in Error
22
4. Joint segmentation and co-reference
Extraction from and matching of research paper
citations.
o
s
World Knowledge
Laurel, B. Interface Agents Metaphors with
Character, in The Art of Human-Computer
Interface Design, B. Laurel (ed), Addison-Wesley,
1990.
c
Co-reference decisions
y
y
p
Brenda Laurel. Interface Agents Metaphors with
Character, in Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
Databasefield values
c
y
c
Citation attributes
s
s
Segmentation
o
o
35 reduction in co-reference error by using
segmentation uncertainty.
6-14 reduction in segmentation error by using
co-reference.
Inference Sparse Generalized Belief Propagation
Wellner, McCallum, Peng, Hay, UAI 2004
see also Marthi, Milch, Russell, 2003
Pal, Sutton, McCallum, 2005
23
Joint IE and Coreference from Research Paper
Citations
4. Joint segmentation and co-reference
Textual citation mentions(noisy, with duplicates)
Paper database, with fields,clean, duplicates
collapsed
AUTHORS TITLE VENUE Cowell,
Dawid Probab Springer Montemerlo,
ThrunFastSLAM AAAI Kjaerulff
Approxi Technic
24
Citation Segmentation and Coreference
Laurel, B. Interface Agents Metaphors with
Character , in The Art of Human-Computer
Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents Metaphors
with Character , in Smith , The Art of
Human-Computr Interface Design , 355-366 ,
1990 .
25
Citation Segmentation and Coreference
Laurel, B. Interface Agents Metaphors with
Character , in The Art of Human-Computer
Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents Metaphors
with Character , in Smith , The Art of
Human-Computr Interface Design , 355-366 ,
1990 .
  1. Segment citation fields

26
Citation Segmentation and Coreference
Laurel, B. Interface Agents Metaphors with
Character , in The Art of Human-Computer
Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Y ? N
Brenda Laurel . Interface Agents Metaphors
with Character , in Smith , The Art of
Human-Computr Interface Design , 355-366 ,
1990 .
  1. Segment citation fields
  2. Resolve coreferent citations

27
Citation Segmentation and Coreference
Laurel, B. Interface Agents Metaphors with
Character , in The Art of Human-Computer
Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Y ? N
Brenda Laurel . Interface Agents Metaphors
with Character , in Smith , The Art of
Human-Computr Interface Design , 355-366 ,
1990 .
Segmentation Quality Citation Co-reference (F1)
No Segmentation 78
CRF Segmentation 91
True Segmentation 93
  1. Segment citation fields
  2. Resolve coreferent citations

28
Citation Segmentation and Coreference
Laurel, B. Interface Agents Metaphors with
Character , in The Art of Human-Computer
Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Y ? N
Brenda Laurel . Interface Agents Metaphors
with Character , in Smith , The Art of
Human-Computr Interface Design , 355-366 ,
1990 .
AUTHOR Brenda Laurel TITLE Interface
Agents Metaphors with CharacterPAGES
355-366BOOKTITLE The Art of Human-Computer
Interface DesignEDITOR T. SmithPUBLISHER
Addison-WesleyYEAR 1990
  1. Segment citation fields
  2. Resolve coreferent citations
  3. Form canonical database record

Resolving conflicts
29
Citation Segmentation and Coreference
Laurel, B. Interface Agents Metaphors with
Character , in The Art of Human-Computer
Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Y ? N
Brenda Laurel . Interface Agents Metaphors
with Character , in Smith , The Art of
Human-Computr Interface Design , 355-366 ,
1990 .
AUTHOR Brenda Laurel TITLE Interface
Agents Metaphors with CharacterPAGES
355-366BOOKTITLE The Art of Human-Computer
Interface DesignEDITOR T. SmithPUBLISHER
Addison-WesleyYEAR 1990
  1. Segment citation fields
  2. Resolve coreferent citations
  3. Form canonical database record

jointly.
Perform
30
IE Coreference Model
AUT AUT YR TITL TITL
CRF Segmentation
s
Observed citation
x
J Besag 1986 On the
31
IE Coreference Model
AUTHOR J Besag YEAR 1986 TITLE On
the
Citation mention attributes
c
CRF Segmentation
s
Observed citation
x
J Besag 1986 On the
32
IE Coreference Model
Smyth , P Data mining
Structure for each citation mention
c
s
x
J Besag 1986 On the
Smyth . 2001 Data Mining
33
IE Coreference Model
Smyth , P Data mining
Binary coreference variablesfor each pair of
mentions
c
s
x
J Besag 1986 On the
Smyth . 2001 Data Mining
34
IE Coreference Model
Smyth , P Data mining
Binary coreference variablesfor each pair of
mentions
y
n
n
c
s
x
J Besag 1986 On the
Smyth . 2001 Data Mining
35
IE Coreference Model
Smyth , P Data mining
AUTHOR P Smyth YEAR 2001 TITLE Data
Mining ...
Research paper entity attribute nodes
y
n
n
c
s
x
J Besag 1986 On the
Smyth . 2001 Data Mining
36
IE Coreference Model
Smyth , P Data mining
Research paper entity attribute node
y
y
y
c
s
x
J Besag 1986 On the
Smyth . 2001 Data Mining
37
IE Coreference Model
Smyth , P Data mining
y
n
n
c
s
x
J Besag 1986 On the
Smyth . 2001 Data Mining
38
Inference by Sparse Generalized BP
Pal, Sutton, McCallum 2005
Smyth , P Data mining
Exact inference onthese linear-chain regions
From each chainpass an N-best Listinto
coreference
J Besag 1986 On the
Smyth . 2001 Data Mining
39
Inference by Sparse Generalized BP
Pal, Sutton, McCallum 2005
Smyth , P Data mining
Approximate inferenceby graph partitioning
Make scale to 1Mcitations with
CanopiesMcCallum, Nigam, Ungar 2000
integrating outuncertaintyin samplesof
extraction
J Besag 1986 On the
Smyth . 2001 Data Mining
40
InferenceSample N-best List from CRF
Segmentation
When calculating similarity with another
citation, have more opportunity to find correct,
matching fields.
Name Title Book Title Year
Laurel, B. Interface Agents Metaphors with Character The Art of Human Computer Interface Design 1990
Laurel, B. Interface Agents Metaphors with Character The Art of Human Computer Interface Design 1990
Laurel, B. Interface Agents Metaphors with Character The Art of Human Computer Interface Design 1990
Name Title
Laurel, B Interface Agents Metaphors with Character The
Laurel, B. Interface Agents Metaphors with Character
Laurel, B. Interface Agents Metaphors with Character
y ? n
41
Inference by Sparse Generalized BP
Pal, Sutton, McCallum 2005
Smyth , P Data mining
Exact (exhaustive) inferenceover entity
attributes
y
n
n
J Besag 1986 On the
Smyth . 2001 Data Mining
42
Inference by Sparse Generalized BP
Pal, Sutton, McCallum 2005
Smyth , P Data mining
Revisit exact inferenceon IE linear chain,now
conditioned on entity attributes
y
n
n
J Besag 1986 On the
Smyth . 2001 Data Mining
43
Parameter Estimation Piecewise Training
Sutton McCallum 2005
Divide-and-conquer parameter estimation
IE Linear-chainExact MAP
Coref graph edge weightsMAP on individual edges
Entity attribute potentialsMAP, pseudo-likelihood
y
n
n
In all casesClimb MAP gradient
withquasi-Newton method
44
4. Joint segmentation and co-reference
Wellner, McCallum, Peng, Hay, UAI 2004
o
Extraction from and matching of research paper
citations.
s
World Knowledge
Laurel, B. Interface Agents Metaphors with
Character, in The Art of Human-Computer
Interface Design, B. Laurel (ed), Addison-Wesley,
1990.
c
Co-reference decisions
y
y
p
Databasefield values
Brenda Laurel. Interface Agents Metaphors with
Character, in Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
c
c
Citation attributes
y
s
s
Segmentation
o
o
35 reduction in co-reference error by using
segmentation uncertainty.
6-14 reduction in segmentation error by using
co-reference.
Inference Variant of Iterated Conditional Modes
Besag, 1986
45
Outline
a
  • Motivating Joint Inference for NLP.
  • Brief introduction of Conditional Random Fields
  • Joint inference Motivation and examples
  • Joint Labeling of Cascaded Sequences (Belief
    Propagation)
  • Joint Labeling of Distant Entities (BP by Tree
    Reparameterization)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Segmentation and Co-ref (Sparse BP)
  • Joint Extraction and Data Mining (Iterative)
  • Topical N-gram models

a
46
George W. Bushs father is George H. W. Bush
(son of Prescott Bush).
47
George W. Bushs father is George H. W. Bush
(son of Prescott Bush).
48
George W. Bushs father is George H. W. Bush
(son of Prescott Bush).
49
George W. Bushs father is George H. W. Bush
(son of Prescott Bush).
50
Relation Extraction as Sequence Labeling
  • George W. Bush
  • George H. W. Bush (son of Prescott Bush)

51
Learning Relational Database Features
  • George W. Bush
  • George H. W. Bush (son of Prescott Bush)

Name Son
Prescott Bush George H. W. Bush
George H. W. Bush George W. Bush
52
Highly weighted relational paths
  • Many Family equivalences
  • SiblingParent_Offspring
  • CousinParent_Sibling_Offspring
  • CollegeParent_College
  • ReligionParent_Religion
  • AllyOpponent_Opponent
  • FriendPerson_Same_School
  • Preliminary results nice performance boost using
    relational features (8 absolute F1)

53
Testing on Unknown Entities
  • John F. Kennedy
  • son of Joseph P. Kennedy, Sr. and Rose
    Fitzgerald

Name Son
Joseph P. Kennedy John F. Kennedy
Rose Fitzgerald John F. Kennedy
Use relational features with second-pass CRF
54
Next Steps
  • Feature induction to discover complex rules
  • Measure relational features sensitivity to noise
    in DB
  • Collective inference among related relations

55
Outline
a
  • Motivating Joint Inference for NLP.
  • Brief introduction of Conditional Random Fields
  • Joint inference Motivation and examples
  • Joint Labeling of Cascaded Sequences (Belief
    Propagation)
  • Joint Labeling of Distant Entities (BP by Tree
    Reparameterization)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Segmentation and Co-ref (Sparse BP)
  • Joint Extraction and Data Mining (Iterative)
  • Topical N-gram models

a
a
56
Topical N-gram Model - Our first attempt
Wang McCallum
?
?
0, 1, 12, 22, 13, 23, 33
z1
z2
z3
z4
. . .
y1
y2
y3
y4
. . .
w1
w2
w3
w4
. . .
D
?1
?2
?
?1
?
?2
W
W
T
T
57
Beyond bag-of-words
Wallach
?
?
z1
z2
z3
z4
. . .
w1
w2
w3
w4
. . .
D
?
?
TW
58
LDA-COL (Collocation) Model
Griffiths Steyvers
?
?
z1
z2
z3
z4
. . .
y1
y2
y3
y4
. . .
w1
w2
w3
w4
. . .
D
?1
?2
?1
?2
?
?
W
T
W
59
Topical N-gram Model
Wang McCallum
?
?
z1
z2
z3
z4
. . .
y1
y2
y3
y4
. . .
w1
w2
w3
w4
. . .
D
?1
?2
?
?1
?
?2
W
W
T
T
60
Topical N-gram Model
Wang McCallum
?
?
z1
z2
z3
z4
. . .
y1
y2
y3
y4
. . .
w1
w2
w3
w4
. . .
D
?1
?2
?
?1
?
?2
W
W
T
T
61
Topic Comparison
LDA
learning optimal reinforcement state problems poli
cy dynamic action programming actions function mar
kov methods decision rl continuous spaces step pol
icies planning
62
Topic Comparison
LDA
Topical N-grams (2)
Topical N-grams (1)
motion response direction cells stimulus figure co
ntrast velocity model responses stimuli moving cel
l intensity population image center tuning complex
directions
motion visual field position figure direction fiel
ds eye location retina receptive velocity vision m
oving system flow edge center light local
receptive field spatial frequency temporal
frequency visual motion motion energy tuning
curves horizontal cells motion detection preferred
direction visual processing area mt visual
cortex light intensity directional
selectivity high contrast motion
detectors spatial phase moving stimuli decision
strategy visual stimuli
63
Topic Comparison
LDA
Topical N-grams (2)
Topical N-grams (1)
speech word training system recognition hmm speake
r performance phoneme acoustic words context syste
ms frame trained sequence phonetic speakers mlp hy
brid
word system recognition hmm speech training perfor
mance phoneme words context systems frame trained
speaker sequence speakers mlp frames segmentation
models
speech recognition training data neural
network error rates neural net hidden markov
model feature vectors continuous speech training
procedure continuous speech recognition gamma
filter hidden control speech production neural
nets input representation output layers training
algorithm test set speech frames speaker dependent
64
Summary
  • Joint inference can avoid accumulating errors in
    an pipeline from extraction to data mining.
  • Examples
  • Factorial finite state models
  • Jointly labeling distant entities
  • Coreference analysis
  • Segmentation uncertainty aiding coreference
    vice-versa
  • Joint Extraction and Data Mining
  • Many examples of sequential topic models.
Write a Comment
User Comments (0)
About PowerShow.com