Inducing Structure for Perception - PowerPoint PPT Presentation

1 / 99
About This Presentation
Title:

Inducing Structure for Perception

Description:

John. Sept. Nov. Oct. him. them. it. PRP-2. PRP-1. PRP-0. they. he. it. I. He. It. Linguistic Candy ... (Movie) (Final Chart) Bracket Posteriors (Best Tree) ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 100
Provided by: EEC899
Category:

less

Transcript and Presenter's Notes

Title: Inducing Structure for Perception


1
Inducing Structure for Perception
a.k.a. Slavs splitmerge Hammer
  • Slav Petrov
  • Advisors Dan Klein, Jitendra Malik
  • Collaborators L. Barrett, R. Thibaux, A. Faria,
    A. Pauls, P. Liang, A. Berg

2
The Main Idea
True structure
Manually specifiedstructure
MLE structure
He was right.
Observation
Complex underlying process
3
The Main Idea
Automatically refinedstructure
EM
He was right.
Manually specifiedstructure
Observation
Complex underlying process
4
Why Structure?
the the the food cat dog ate and
t e c a e h t g f a o d o o d n h e t d a
5
Structure is important
6
Syntactic Ambiguity
  • Last night I shot an
  • elephant in my pajamas.

7
Visual Ambiguity
Old or young?
8
Three Peaks?
9
No, One Mountain!
10
Three Domains
11
Timeline
12
Syntax
Split Merge Learning
Coarse-to-Fine Inference
Syntax
Syntactic Machine Translation
Non- parametric Bayesian Learning
Language Modeling
Generative vs. Conditional Learning
13
Learning accurate, compact and interpretable Tree
Annotation
  • Slav Petrov, Leon Barrett, Romain Thibaux, Dan
    Klein

14
Motivation (Syntax)
  • Task

He was right.
  • Why?
  • Information Extraction
  • Syntactic Machine Translation

15
Treebank Parsing
16
Non-Independence
  • Independence assumptions are often too strong.

All NPs
17
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98

18
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98
  • Head lexicalization Collins 99, Charniak 00

19
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98
  • Head lexicalization Collins 99, Charniak 00
  • Automatic clustering?

20
Learning Latent Annotations
  • EM algorithm
  • Brackets are known
  • Base categories are known
  • Only induce subcategories

Just like Forward-Backward for HMMs.
21
Inside/Outside Scores
Inside
Outside
Ax
22
Learning Latent Annotations (Details)
  • E-Step
  • M-Step

23
Overview
- Hierarchical Training - Adaptive Splitting -
Parameter Smoothing
24
Refinement of the DT tag
DT
25
Refinement of the DT tag
DT
26
Hierarchical refinement of the DT tag
DT
27
Hierarchical Estimation Results
Model F1
Baseline 87.3
Hierarchical Training 88.4
28
Refinement of the , tag
  • Splitting all categories the same amount is
    wasteful

29
The DT tag revisited
30
Adaptive Splitting
  • Want to split complex categories more
  • Idea split everything, roll back splits which
    were least useful

31
Adaptive Splitting
  • Want to split complex categories more
  • Idea split everything, roll back splits which
    were least useful

32
Adaptive Splitting
  • Evaluate loss in likelihood from removing each
    split
  • Data likelihood with split reversed
  • Data likelihood with split
  • No loss in accuracy when 50 of the splits are
    reversed.

33
Adaptive Splitting (Details)
  • True data likelihood
  • Approximate likelihood with split at n reversed
  • Approximate loss in likelihood

34
Adaptive Splitting Results
Model F1
Previous 88.4
With 50 Merging 89.5
35
Number of Phrasal Subcategories
36
Number of Phrasal Subcategories
NP
VP
PP
37
Number of Phrasal Subcategories
NAC
X
38
Number of Lexical Subcategories
POS
TO
,
39
Number of Lexical Subcategories
RB
VBx
IN
DT
40
Number of Lexical Subcategories
NNP
JJ
NNS
NN
41
Smoothing
  • Heavy splitting can lead to overfitting
  • Idea Smoothing allows us to pool
  • statistics

42
Linear Smoothing
43
Result Overview
Model F1
Previous 89.5
With Smoothing 90.7
44
Linguistic Candy
  • Proper Nouns (NNP)
  • Personal pronouns (PRP)

NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
PRP-0 It He I
PRP-1 it he they
PRP-2 it them him
45
Linguistic Candy
  • Relative adverbs (RBR)
  • Cardinal Numbers (CD)

RBR-0 further lower higher
RBR-1 more less More
RBR-2 earlier Earlier later
CD-7 one two Three
CD-4 1989 1990 1988
CD-11 million billion trillion
CD-0 1 50 100
CD-3 1 30 31
CD-9 78 58 34
46
Nonparametric PCFGs using Dirichlet Processes
  • Percy Liang, Slav Petrov,
  • Dan Klein and Michael Jordan

47
Improved Inference for Unlexicalized Parsing
  • Slav Petrov and Dan Klein

48
  • 1621 min

49
Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05
















50
Prune?
  • For each chart item Xi,j, compute posterior
    probability

lt threshold
E.g. consider the span 5 to 12
coarse
QP NP VP

refined


51
  • 1621 min
  • 111 min
  • (no search error)

52
Hierarchical Pruning
  • Consider again the span 5 to 12

coarse
QP NP VP

split in two


QP1 QP2 NP1 NP2 VP1 VP2
split in four
QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4





split in eight

53
Intermediate Grammars
X-BarG0
G
54
  • 1621 min
  • 111 min
  • 35 min
  • (no search error)

55
State Drift (DT tag)
56
Projected Grammars
X-BarG0
G
57
Estimating Projected Grammars
  • Nonterminals?

NP0
NP1
VP1
VP0
S0
S1
Nonterminals in ?(G)
Nonterminals in G
58
Estimating Projected Grammars
  • Rules?

S ? NP VP
S1 ? NP1 VP1 0.20 S1 ? NP1 VP2 0.12 S1 ?
NP2 VP1 0.02 S1 ? NP2 VP2 0.03 S2 ? NP1
VP1 0.11 S2 ? NP1 VP2 0.05 S2 ? NP2 VP1
0.08 S2 ? NP2 VP2 0.12
59
Estimating Projected Grammars
Corazza Satta 06
Estimating Grammars
0.56
60
Calculating Expectations
  • Nonterminals
  • ck(X) expected counts up to depth k
  • Converges within 25 iterations (few seconds)
  • Rules

61
  • 1621 min
  • 111 min
  • 35 min
  • 15 min
  • (no search error)

62
Parsing times
X-BarG0
G
63
Bracket Posteriors
(after G0)
64
Bracket Posteriors (after G1)
65
Bracket Posteriors
(Movie)
(Final Chart)
66
Bracket Posteriors (Best Tree)
67
Parse Selection
  • Computing most likely unsplit tree is NP-hard
  • Settle for best derivation.
  • Rerank n-best list.
  • Use alternative objective function.

68
Final Results (Efficiency)
  • Berkeley Parser
  • 15 min
  • 91.2 F-score
  • Implemented in Java
  • Charniak Johnson 05 Parser
  • 19 min
  • 90.7 F-score
  • Implemented in C

69
Final Results (Accuracy)
40 words F1 all F1
ENG CharniakJohnson 05 (generative) 90.1 89.6
ENG This Work 90.6 90.1

GER Dubey 05 76.3 -
GER This Work 80.8 80.1

CHN Chiang et al. 02 80.0 76.6
CHN This Work 86.3 83.4
70
Conclusions (Syntax)
  • Split Merge Learning
  • Hierarchical Training
  • Adaptive Splitting
  • Parameter Smoothing
  • Hierarchical Coarse-to-Fine Inference
  • Projections
  • Marginalization
  • Multi-lingual Unlexicalized Parsing

71
Generative vs. Discriminative
  • Conditional Estimation
  • L-BFGS
  • Iterative Scaling
  • Conditional Structure
  • Alternative Merging Criterion

72
How much supervision?
73
Syntactic Machine Translation
  • Collaboration with ISI/USC
  • Use parse trees
  • Use annotated parse trees
  • Learn split synchronous grammars

74
Speech
Split Merge Learning
Coarse-to-Fine Decoding
Speech
Speech Synthesis
Combined Generative
Conditional Learning
75
Learning Structured Models for Phone Recognition
  • Slav Petrov, Adam Pauls,
  • Dan Klein

76
Motivation (Speech)
77
Traditional Models
d
a
d
End
Start
Begin - Middle - End Structure
78
Model Overview
Traditional
Our Model
79
Differences to Grammars
80
(No Transcript)
81
Refinement of the ih-phone
82
Inference
  • Coarse-To-Fine
  • Variational Approximation

83
Phone Classification Results
Method Error Rate
GMM Baseline (Sha and Saul, 2006) 26.0
HMM Baseline (Gunawardana et al., 2005) 25.1
SVM (Clarkson and Moreno, 1999) 22.4
Hidden CRF (Gunawardana et al., 2005) 21.7
This Paper 21.4
Large Margin GMM (Sha and Saul, 2006) 21.1
84
Phone Recognition Results
Method Error Rate
State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.1
Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1
This Paper 26.1
Bayesian Triphone HMM (Ming and Smith, 1998) 25.6
Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4
85
Confusion Matrix
86
How much supervision?
  • Hand-aligned
  • Exact phone boundaries are known
  • Automatically-aligned
  • Only sequence of phones is known

87
Generative Conditional Learning
  • Learn structure generatively
  • Estimate Gaussians conditionally
  • Collaboration with Fei Sha

88
Speech Synthesis
  • Acoustic phone model
  • Generative
  • Accurate
  • Models phone internal structure well
  • Use it for speech synthesis!

89
Large Vocabulary ASR
  • ASR System Acoustic Model Decoder
  • Coarse-to-Fine Decoder
  • Subphone ? Phone
  • Phone ? Syllable ? Word ? Bigram ?

90
Scenes
Split Merge Learning
Decoding
Scenes
91
Motivation (Scenes)
Seascape
92
Motivation (Scenes)
93
Learning
  • Oversegment the image
  • Extract vertical stripes
  • Extract features
  • Train HMMs

94
Inference
  • Decode stripes
  • Enforce horizontal consistency

95
Alternative Approach
  • Conditional Random Fields
  • Pro
  • Vertical and horizontal dependencies learnt
  • Inference more natural
  • Contra
  • Computationally more expensive

96
Timeline
97
Results so far
  • State of the art parser for different languages
  • Automatically learnt
  • Simple Compact
  • Fast Accurate
  • Available for download
  • Phone recognizer
  • Automatically learnt
  • Competitive performance
  • Good foundation for speech recognizer

98
Proposed Deliverables
  • Syntax Parser
  • Speech Recognizer
  • Speech Synthesizer
  • Syntactic Translation Machine
  • Scene Recognizer

99
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com