Inducing Structure for Perception - PowerPoint PPT Presentation

1 / 99

About This Presentation

Title:

Inducing Structure for Perception

Description:

John. Sept. Nov. Oct. him. them. it. PRP-2. PRP-1. PRP-0. they. he. it. I. He. It. Linguistic Candy ... (Movie) (Final Chart) Bracket Posteriors (Best Tree) ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 100

Provided by: EEC899

Category:

more less

Transcript and Presenter's Notes

Title: Inducing Structure for Perception

1
Inducing Structure for Perception
a.k.a. Slavs splitmerge Hammer

Slav Petrov
Advisors Dan Klein, Jitendra Malik
Collaborators L. Barrett, R. Thibaux, A. Faria,
A. Pauls, P. Liang, A. Berg

2
The Main Idea
True structure
Manually specifiedstructure
MLE structure
He was right.
Observation
Complex underlying process
3
The Main Idea
Automatically refinedstructure
EM
He was right.
Manually specifiedstructure
Observation
Complex underlying process
4
Why Structure?
the the the food cat dog ate and
t e c a e h t g f a o d o o d n h e t d a
5
Structure is important
6
Syntactic Ambiguity

Last night I shot an
elephant in my pajamas.

7
Visual Ambiguity
Old or young?
8
Three Peaks?
9
No, One Mountain!
10
Three Domains
11
Timeline
12
Syntax
Split Merge Learning
Coarse-to-Fine Inference
Syntax
Syntactic Machine Translation
Non- parametric Bayesian Learning
Language Modeling
Generative vs. Conditional Learning
13
Learning accurate, compact and interpretable Tree
Annotation

Slav Petrov, Leon Barrett, Romain Thibaux, Dan
Klein

14
Motivation (Syntax)

Task

He was right.

Why?
Information Extraction
Syntactic Machine Translation

15
Treebank Parsing
16
Non-Independence

Independence assumptions are often too strong.

All NPs
17
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98

18
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98
Head lexicalization Collins 99, Charniak 00

19
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98
Head lexicalization Collins 99, Charniak 00
Automatic clustering?

20
Learning Latent Annotations

EM algorithm

Brackets are known
Base categories are known
Only induce subcategories

Just like Forward-Backward for HMMs.
21
Inside/Outside Scores
Inside
Outside
Ax
22
Learning Latent Annotations (Details)

E-Step
M-Step

23
Overview
- Hierarchical Training - Adaptive Splitting -
Parameter Smoothing
24
Refinement of the DT tag
DT
25
Refinement of the DT tag
DT
26
Hierarchical refinement of the DT tag
DT
27
Hierarchical Estimation Results
Model F1
Baseline 87.3
Hierarchical Training 88.4
28
Refinement of the , tag

Splitting all categories the same amount is
wasteful

29
The DT tag revisited
30
Adaptive Splitting

Want to split complex categories more
Idea split everything, roll back splits which
were least useful

31
Adaptive Splitting

Want to split complex categories more
Idea split everything, roll back splits which
were least useful

32
Adaptive Splitting

Evaluate loss in likelihood from removing each
split
Data likelihood with split reversed
Data likelihood with split
No loss in accuracy when 50 of the splits are
reversed.

33
Adaptive Splitting (Details)

True data likelihood
Approximate likelihood with split at n reversed
Approximate loss in likelihood

34
Adaptive Splitting Results
Model F1
Previous 88.4
With 50 Merging 89.5
35
Number of Phrasal Subcategories
36
Number of Phrasal Subcategories
NP
VP
PP
37
Number of Phrasal Subcategories
NAC
X
38
Number of Lexical Subcategories
POS
TO
,
39
Number of Lexical Subcategories
RB
VBx
IN
DT
40
Number of Lexical Subcategories
NNP
JJ
NNS
NN
41
Smoothing

Heavy splitting can lead to overfitting

Idea Smoothing allows us to pool
statistics

42
Linear Smoothing
43
Result Overview
Model F1
Previous 89.5
With Smoothing 90.7
44
Linguistic Candy

Proper Nouns (NNP)
Personal pronouns (PRP)

NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
PRP-0 It He I
PRP-1 it he they
PRP-2 it them him
45
Linguistic Candy

Relative adverbs (RBR)
Cardinal Numbers (CD)

RBR-0 further lower higher
RBR-1 more less More
RBR-2 earlier Earlier later
CD-7 one two Three
CD-4 1989 1990 1988
CD-11 million billion trillion
CD-0 1 50 100
CD-3 1 30 31
CD-9 78 58 34
46
Nonparametric PCFGs using Dirichlet Processes

Percy Liang, Slav Petrov,
Dan Klein and Michael Jordan

47
Improved Inference for Unlexicalized Parsing

Slav Petrov and Dan Klein

1621 min

49
Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05

50
Prune?

For each chart item Xi,j, compute posterior
probability

lt threshold
E.g. consider the span 5 to 12
coarse
QP NP VP

refined

51

1621 min
111 min
(no search error)

52
Hierarchical Pruning

Consider again the span 5 to 12

coarse
QP NP VP

split in two

QP1 QP2 NP1 NP2 VP1 VP2
split in four
QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4

split in eight

53
Intermediate Grammars
X-BarG0
G
54

1621 min
111 min
35 min
(no search error)

55
State Drift (DT tag)
56
Projected Grammars
X-BarG0
G
57
Estimating Projected Grammars

Nonterminals?

NP0
NP1
VP1
VP0
S0
S1
Nonterminals in ?(G)
Nonterminals in G
58
Estimating Projected Grammars

Rules?

S ? NP VP
S1 ? NP1 VP1 0.20 S1 ? NP1 VP2 0.12 S1 ?
NP2 VP1 0.02 S1 ? NP2 VP2 0.03 S2 ? NP1
VP1 0.11 S2 ? NP1 VP2 0.05 S2 ? NP2 VP1
0.08 S2 ? NP2 VP2 0.12
59
Estimating Projected Grammars
Corazza Satta 06
Estimating Grammars
0.56
60
Calculating Expectations

Nonterminals
ck(X) expected counts up to depth k
Converges within 25 iterations (few seconds)
Rules

1621 min
111 min
35 min
15 min
(no search error)

62
Parsing times
X-BarG0
G
63
Bracket Posteriors
(after G0)
64
Bracket Posteriors (after G1)
65
Bracket Posteriors
(Movie)
(Final Chart)
66
Bracket Posteriors (Best Tree)
67
Parse Selection

Computing most likely unsplit tree is NP-hard
Settle for best derivation.
Rerank n-best list.
Use alternative objective function.

68
Final Results (Efficiency)

Berkeley Parser
15 min
91.2 F-score
Implemented in Java
Charniak Johnson 05 Parser
19 min
90.7 F-score
Implemented in C

69
Final Results (Accuracy)
40 words F1 all F1
ENG CharniakJohnson 05 (generative) 90.1 89.6
ENG This Work 90.6 90.1

GER Dubey 05 76.3 -
GER This Work 80.8 80.1

CHN Chiang et al. 02 80.0 76.6
CHN This Work 86.3 83.4
70
Conclusions (Syntax)

Split Merge Learning
Hierarchical Training
Adaptive Splitting
Parameter Smoothing
Hierarchical Coarse-to-Fine Inference
Projections
Marginalization
Multi-lingual Unlexicalized Parsing

71
Generative vs. Discriminative

Conditional Estimation
L-BFGS
Iterative Scaling
Conditional Structure
Alternative Merging Criterion

72
How much supervision?
73
Syntactic Machine Translation

Collaboration with ISI/USC
Use parse trees
Use annotated parse trees
Learn split synchronous grammars

74
Speech
Split Merge Learning
Coarse-to-Fine Decoding
Speech
Speech Synthesis
Combined Generative
Conditional Learning
75
Learning Structured Models for Phone Recognition

Slav Petrov, Adam Pauls,
Dan Klein

76
Motivation (Speech)
77
Traditional Models
d
a
d
End
Start
Begin - Middle - End Structure
78
Model Overview
Traditional
Our Model
79
Differences to Grammars
80
(No Transcript)
81
Refinement of the ih-phone
82
Inference

Coarse-To-Fine
Variational Approximation

83
Phone Classification Results
Method Error Rate
GMM Baseline (Sha and Saul, 2006) 26.0
HMM Baseline (Gunawardana et al., 2005) 25.1
SVM (Clarkson and Moreno, 1999) 22.4
Hidden CRF (Gunawardana et al., 2005) 21.7
This Paper 21.4
Large Margin GMM (Sha and Saul, 2006) 21.1
84
Phone Recognition Results
Method Error Rate
State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.1
Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1
This Paper 26.1
Bayesian Triphone HMM (Ming and Smith, 1998) 25.6
Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4
85
Confusion Matrix
86
How much supervision?