Unsupervised Learning for Natural Language Processing - PowerPoint PPT Presentation

1 / 104
About This Presentation
Title:

Unsupervised Learning for Natural Language Processing

Description:

Unsupervised Learning for Natural Language Processing – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 105
Provided by: carbonVide
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised Learning for Natural Language Processing


1
Unsupervised Learning forNatural Language
Processing
  • Dan Klein
  • Computer Science Division
  • University of California, Berkeley

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AA
2
Learning Language
Unsupervised NLP
Supervised NLP
3
Unsupervised NLP
  • Goal induce linguistic structure not in the data
  • Problem Characteristics
  • Complex linguistic phenomena
  • Rich, interacting, combinatorial structures
  • Lots of data
  • Solution Characteristics
  • Incremental / hierarchical learning
  • Careful choice of what to model
  • Careful choice of what not to model

4
Outline
  • Unsupervised Grammar Refinement
  • Unsupervised Coreference Resolution
  • Unsupervised Translation Mining

5
Syntactic Analysis
Hurricane Emily howled toward Mexico 's Caribbean
coast on Sunday packing 135 mph winds and
torrential rain and causing panic in Cancun,
where frightened tourists squeezed into musty
shelters .
6
Treebank PCFGs
Charniak 96
  • Use PCFGs for broad coverage parsing
  • Can take a grammar right off the trees (doesnt
    work well)

ROOT ? S 1 S ? NP VP . 1 NP ? PRP 1 VP ? VBD
ADJP 1 ..
Model F1
Baseline 72.0
7
Conditional Independence?
  • Not every NP expansion can fill every NP slot
  • A grammar with symbols like NP wont be
    context-free
  • Statistically, conditional independence too
    strong

8
Grammar Refinement
  • Refining symbols improves statistical fit
  • Parent annotation Johnson 98

9
Grammar Refinement
  • Refining symbols improves statistical fit
  • Parent annotation Johnson 98
  • Head lexicalization Collins 99, Charniak 00

10
Grammar Refinement
  • Refining symbols improves statistical fit
  • Parent annotation Johnson 98
  • Head lexicalization Collins 99, Charniak 00
  • Automatic clustering Petrov and Klein 06

11
Parses and Derivations
Derivations
-1
-1
-2
-1
-2
-1
-1
-2
-2
-1
-1
-1
-1
-2
  • Parses (T) now have multiple derivations (t)

12
Training Objectives
Matsuzaki et. al 05, Prescher 05
  • One option maximum likelihood using EM
  • Want derivation parameters which maximize parse
    likelihood
  • Other options possible
  • Variational inference Liang et al. 07
  • Conditional likelihood Petrov and Klein 08

13
Learning Latent Grammars
  • EM algorithm
  • Brackets are known
  • Base categories are known
  • Only induce subsymbols

Just like Forward-Backward for HMMs.
14
Refinement of the DT tag
DT
15
Refinement of the DT tag
DT
16
Hierarchical Refinement
DT
17
Grammar Ontogeny
X-BarG0
G
18
Hierarchical Estimation Results
Model F1
Flat Training 87.3
Hierarchical Training 88.4
19
Refinement of the , tag
  • Splitting all categories equally is wasteful

20
Adaptive Splitting
  • Want to split complex categories more
  • Idea split everything, roll back bad splits

21
Adaptive Splitting Results
Model F1
Previous 88.4
With 50 Merging 89.5
22
Number of Phrasal Subcategories
23
Number of Phrasal Subcategories
NP
VP
PP
24
Number of Phrasal Subcategories
NAC
X
25
Number of Lexical Subcategories
NNP
JJ
NNS
NN
26
Number of Lexical Subcategories
POS
TO
,
27
Learned Lexical Clusters
  • Proper Nouns (NNP)
  • Personal pronouns (PRP)

NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
PRP-0 It He I
PRP-1 it he they
PRP-2 it them him
28
Learned Lexical Clusters
  • Relative adverbs (RBR)
  • Cardinal Numbers (CD)

RBR-0 further lower higher
RBR-1 more less More
RBR-2 earlier Earlier later
CD-7 one two Three
CD-4 1989 1990 1988
CD-11 million billion trillion
CD-0 1 50 100
CD-3 1 30 31
CD-9 78 58 34
29
Incremental Learning
X-BarG0
G
30
Coarse-to-Fine Pruning
Charniak 98, Charniak and Johnson 05, Petrov and
Klein 07
  • Consider the span 5 to 12

coarse
QP NP VP

split in two


QP1 QP2 NP1 NP2 VP1 VP2
split in four
QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4





split in eight

31
Bracket Posteriors
32
State Drift (DT tag)
33
Projected Grammars
Petrov and Klein 07
X-BarG0
G
34
Coarse-to-Fine Parsing
Petrov and Klein 07
X-BarG0
G
35
Final Results (Accuracy)
40 words F1 all F1
ENG CharniakJohnson 05 (generative) 90.1 89.6
ENG Split / Merge 90.6 90.1

GER Dubey 05 76.3 -
GER Split / Merge 80.8 80.1

CHN Chiang et al. 02 80.0 76.6
CHN Split / Merge 86.3 83.4
36
Nonparametric PCFGs
Liang, Petrov, Jordan, Klein 07
37
Unstructured Phone Models
Petrov, Pauls, Klein 07
Standard Model
Automatic Splits
HMM Baseline 25.1
5 Split rounds 21.4
38
Summary
  • Latent-variable grammar refinement
  • Automatically learns good grammar splits
  • Gives state-of-the-art parsing accuracy
  • Admits very efficient parsing algorithms
  • More applications beyond parsing!

39
Outline
  • Unsupervised Grammar Refinement
  • Unsupervised Coreference Resolution
  • Unsupervised Translation Mining

40
Unsupervised Coreference
Haghighi and Klein 07
The Weir Group , whose headquarters
is in the U.S , is a large
specialized corporation . This power plant ,which
, will be situated in Jiangsu , has a large
generation capacity.
41
Generative Mention Models
Li et al 04, Haghighi and Klein 07
...
.
...
...

..............
..........
..........

.
.
.......
.......
42
Generative Mention Models
...
.
...
...

Inference Time
..............
..........
..........

.
.
.......
.......
43
Finite Mixture Model
44
Finite Mixture Model
Entity Distribution
Mention Parameters
K
P(W Weir Group) Weir Group0.4,
whose0.2, .......
Z1 Weir Group
Z2 Weir Group
Z3 Weir HQ
W1 Weir Group
W2 whose
W3 headqrts
45
Finite Mixture Model
Entity Distribution
Mention Parameters
K
Z1 Weir Group
Z2 Weir Group
Z3 Weir HQ
K
W1 Weir Group
W2 whose
W3 headqrts
46
Infinite Mixture Model
Entity Distribution
Mention Parameters
Z1 Weir Group
Z2 Weir Group
Z3 Weir HQ
W1 Weir Group
W2 whose
W3 headqrts
47
Infinite Mixture Model
MUC F1
The Weir Group , whose headquarters is in the
U.S is a large specialized corporation. This
power plant , which , will be situated in
Jiangsu, has a large generation capacity.
48
Enriching the Mention Model
Mention Model
Z
P(W Weir Group) Weir Group0.4,
whose0.2, .......
W
49
Enriching the Mention Model
Non-Pronoun
Pronoun
Z
Z
Type PERS, LOC, ORG, MISC
Number Sing, Plural
Gender M,F,N
T
G
N
W
W
50
Enriching the Mention Model
Entity Parameters
Pronoun
Z
Pronoun Parameters
T
G
N
W
51
Enriching the Mention Model
Non-Pronoun
Pronoun
Z
Z
T
G
N
W
W
52
Enriching Mention Model
Mention Type Proper, Pronoun, Nominal
Z
N
T
G
M
W
Non-pronoun
Pronoun
53
Enriching Mention Model
.....
.....
.....
Z
Z
W
W
54
Enriching Mention Model
.....
.....
.....
55
Pronoun Model
MUC F1
The Weir Group , whose headquarters is in the
U.S is a large specialized corporation. This
power plant , which , will be situated in
Jiangsu, has a large generation capacity.
56
Salience Model
Entity Activation
1 1.0
2 0.0
L
Z
Salience Values
TOP, HIGH, MED, LOW, NONE
S
Mention Type
Proper, Pronoun, Nominal
M
57
Salience Model
Entity Activation
1 0.5
2 1.0
Entity Activation
1 1.0
2 0.0
Entity Activation
1 0.0
2 0.0
1
2
2
NONE
NONE
TOP
PROPER
PROPER
PRONOUN
58
Salience Model
.....
.....
.....
L
L
Z
Z
S
S
T
G
N
T
G
N
M
M
W
W
59
Salience Model
60
Salience Model
MUC F1
The Weir Group , whose headquarters is in the
U.S is a large specialized corporation. This
power plant , which , will be situated in
Jiangsu, has a large generation capacity.
61
Global Coreference Resolution
62
Global Entity Model
63
Global Entity Model
N
64
Global Entity Model
N
65
HDP Model
MUC F1
The Weir Group , whose headquarters is in the
U.S is a large specialized corporation. This
power plant , which , will be situated in
Jiangsu, has a large generation capacity.
66
Global Entity Resolution
67
Experiments
  • MUC6 English NWIRE (all mentions)
  • 53.6 F1 Cardie and Wagstaff 99 Unsupervised
  • 70.3 F1 Unsup Entity-Mention Unsupervised
  • 73.4 F1 McCallum Wellner 04 Supervised
  • 81.3 F1 Luo et al 04 Supervised
  • MUC score

68
Summary
  • Fully generative unsupervised coref model
  • Basic model of pronoun structure
  • Sequential model of local attentional state
  • HDP global coreference model ties documents
  • Competitive with supervised results
  • Many features not exploited
  • Still lots of room to improve!

69
Outline
  • Unsupervised Grammar Refinement
  • Unsupervised Coreference Resolution
  • Unsupervised Translation Mining

70
Standard MT Approach
Source Text
Target Text
  • Trained using parallel sentences
  • May not always be available
  • Need (lots of) sentences

71
MT from Monotext
Source Text
Target Text
  • Translation without parallel text?
  • Need (lots of) sentences

Fung 95, Koehn and Knight 02, Haghighi and Klein
08
72
Task Lexicon Induction
Source Text
Target Text
nombre
73
Data Representation
state
Source Text
What are we generating?
74
Data Representation
estado
state
Source Text
Target Text
What are we generating?
75
Canonical Correlation Analysis
Target Space
Source Space
76
Canonical Correlation Analysis
2
3
1
1
2
3
2
1
PCA
1
3
2
3
Target Space
Source Space
77
Canonical Correlation Analysis
2
1
3
2
1
3
CCA
2
1
1
3
2
3
CCA
Target Space
Source Space
78
Canonical Correlation Analysis
Bach and Jordan 06
Canonical Space
2
1
3
2
1
1
3
2
3
Target Space
Source Space
79
Canonical Correlation Analysis
Bach and Jordan 06
Canonical Space
2
2
2
2
Target Space
Source Space
80
Generative Model
81
Generative Model
82
Generative Model
estado
state
nombre
world
name
politica
mundo
nation
83
Learning EM?
  • E-Step Obtain posterior over matching
  • M-Step Maximize CCA Parameters

84
Learning EM?
0.2
..
0.30
0.30
..
0.15
0.10
85
Inference Hard EM
Hard E-Step Find best matching M-Step Solve
CCA
86
Experimental Setup
  • Data 2K most frequent nouns, texts from
    Wikipedia
  • Seed 100 translation pairs
  • Evaluation Precision and Recall against lexicon
    obtained from Wiktionary
  • Report p0.33, precision at recall 0.33

87
Feature Experiments
  • Baseline Edit Distance

4k EN-ES Wikipedia Articles
88
Feature Experiments
  • MCCA Only orthographic features

4k EN-ES Wikipedia Articles
89
Feature Experiments
  • MCCA Only context features

Precision
4k EN-ES Wikipedia Articles
90
Feature Experiments
  • MCCA Orthographic and context features

4k EN-ES Wikipedia Articles
91
Feature Experiments
Precision
Recall
92
Feature Experiments
Precision
Recall
93
Corpus Variation
  • Identical Corpora

93.8
100k EN-ES Europarl Sentences
94
Corpus Variation
  • Comparable Corpora

Ā¼
4k EN-ES Wikipedia Articles
95
Corpus Variation
  • Unrelated Corpora

?
92
89
68
100k English and Spanish Gigaword
96
Seed Lexicon Source
  • Automatic Seed
  • Edit distance seed Koehn Knight 02

92
4k EN-ES Wikipedia Articles
97
Analysis
98
Analysis
Top Non-Cognates
99
Analysis
Interesting Mistakes
100
Language Variation
101
Language Variation
102
Analysis
103
Summary
  • Learned bilingual lexicon from monotext
  • Matching CCA model
  • Possible even from unaligned corpora
  • Possible for non-related languages
  • High-precision, but much left to do!

104
Conclusion
  • Three cases of unsupervised learning of
    non-trivial linguistic structure for NLP problems
  • Incremental structure learning
  • Careful control of structured training
  • Targeted modeling choices
  • In some cases, unsupervised systems are
    competitive with supervised systems (or better!)
  • Much more left to do!

105
Thank you!
  • nlp.cs.berkeley.edu

106
(No Transcript)
107
Outline
  • Latent-Variable Grammar Learning
  • Unsupervised Coreference Resolution
  • Unsupervised Translation Mining
  • Other Unsupervised Work

108
Agreement-Based Learning
109
Weakly Supervised Learning
Newly remodeled 2 Bdrms/1 Bath, spacious upper
unit, located in Hilltop Mall area. Walking
distance to shopping, public transportation,
schools and park. Paid water and garbage. No dogs
allowed.
Prototype Lists
NN president IN of
VBD said NNS shares
CC and TO to
NNP Mr. PUNC .
JJ new CD million
DET the VBP are
FEATURE kitchen, laundry
LOCATION near, close
TERMS paid, utilities
SIZE large, feet
RESTRICT cat, smoking
English POS
Information Extraction
110
Language Evolution
Write a Comment
User Comments (0)
About PowerShow.com