Unambiguous Unlimited = Unsupervised or Using the Web for Natural Language Processing Problems

About This Presentation

Title:

Unambiguous Unlimited = Unsupervised or Using the Web for Natural Language Processing Problems

Description:

Comma, dot, semi-colon. Following the first word. home. health care right ... lung cancer: patients left. PARC, Aug 3, 2006. Web-derived Surface Features: ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 76

Provided by: Pres3

Learn more at: https://people.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Unambiguous Unlimited = Unsupervised or Using the Web for Natural Language Processing Problems

1
Unambiguous Unlimited UnsupervisedorUsing
the Web for Natural Language Processing Problems

Marti Hearst
School of Information, UC Berkeley

This research supported in part by NSF DBI-0317510
2
Natural Language Processing

The ultimate goal write programs that read and
understand stories and conversations.
This is too hard! Instead we tackle
sub-problems.
There have been notable successes lately
Machine translation is vastly improved
Decent speech recognition in limited
circumstances
Text categorization works with some accuracy

3
Automatic Help Desk Translation at MS
4
Why is text analysis difficult?

One reason enormous vocabulary size.
The average English speakers vocabulary is
around 50,000 words,
Many of these can be combined with many others,
And they mean different things when they do!

5
How can a machine understand these?

Decorate the cake with the frosting.
Decorate the cake with the kids.
Throw out the cake with the frosting.
Get the sock from the cat with the gloves.
Get the glove from the cat with the socks.
Its in the plastic water bottle.
Its in the plastic bag dispenser.

6
How to tackle this problem?

The field was stuck for quite some time.
CYC hand-enter all semantic concepts and
relations
A new approach started around 1990
How to do it
Get large text collections
Compute statistics over the words in those
collections
Many different algorithms for doing this.

7
Size Matters

Recent realization bigger is better than
smarter!
Banko and Brill 01 Scaling to Very, Very Large
Corpora for Natural Language Disambiguation, ACL

8
Example Problem

Grammar checker example
Which word to use?
ltprincipalgt ltprinciplegt
Solution look at which words surround each use
I am in my third year as the principal of Anamosa
High School.
School-principal transfers caused some upset.
This is a simple formulation of the quantum
mechanical uncertainty principle.
Power without principle is barren, but principle
without power is futile. (Tony Blair)

9
Using Very, Very Large Corpora

Keep track of which words are the neighbors of
each spelling in well-edited text, e.g.
Principal high school
Principle rule
At grammar-check time, choose the spelling best
predicted by the surrounding words.
Surprising results
Log-linear improvement even to a billion words!
Getting more data is better than fine-tuning
algorithms!

10
The Effects of LARGE Datasets

From Banko Brill 01

11
How to Extend this Idea?

This is an exciting result
BUT relies on having huge amounts of text that
has been appropriately annotated!

12
How to Avoid Labeling?

Web as a baseline (Lapata Keller 04,05)
Main idea apply web-determined counts to every
problem imaginable.
Example for t in ltprincipalgt ltprinciplegt
Compute f(w1, t, w2)
The largest count wins

13
Web as a Baseline

Works very well in some cases
machine translation candidate selection
article generation
noun compound interpretation
noun compound bracketing
adjective ordering
But lacking in others
spelling correction
countability detection
prepositional phrase attachment
How to push this idea further?

Significantly better than the best supervised
algorithm.
Not significantly different from the best
supervised.
14
Using Unambiguous Cases

The trick look for unambiguous cases to start
Use these to improve the results beyond what
co-occurrence statistics indicate.
An Early Example
Hindle and Rooth, Structural Ambiguity and
Lexical Relations, ACL 90, Comp Ling93
Problem Prepositional Phrase attachment
I eat/v spaghetti/n1 with/p a fork/n2.
I eat/v spaghetti/n1 with/p sauce/n2.
quadruple (v, n1, p, n2)
Question does n2 attach to v or to n1?

15
Using Unambiguous Cases

How to do this with unlabeled data?
First try
Parse some text into phrase structure
Then compute certain co-occurrences
f(v, n1, p) f(n1, p) f(v, n1)
Problem results not accurate enough
The trick look for unambiguous cases
Spaghetti with sauce is delicious. (pre-verbal)
I eat it with a fork. (object of preposition
cant attach to a pronoun)
Use these to improve the results beyond what
co-occurrence statistics indicate.

16
Using Unambiguous Cases

Hindle Rooth, final algorithm
Parse text into phrase structure.
Create bigram counts (v, p) and (n1, p) as
follows
First, use unambiguous cases to populate bigram
table
Then, for the ambiguous cases
Compute a Lexical Association score comparing
(v1, n1, p) to (n1, p, n2).
If this is greater than a threshold, update the
bigram table with the assumed attachment
Else split the score and assign to both
attachments
The bigram table is used for further computations
of the Lexical Association score.

17
Unambiguous Unlimited Unsupervised

Apply the Unambiguous Case Idea to the Very, Very
Large Corpora idea
The potential of these approaches are not fully
realized
Our work
Structural Ambiguity Decisions (work with Preslav
Nakov)
PP-attachment
Noun compound bracketing
Coordination grouping
Semantic Relation Acquisition
Hypernym (ISA) relations
Verbal relations between nouns

18
Structural Ambiguity Problems

Apply the U U U idea to structural ambiguity
Noun compound bracketing
Prepositional Phrase attachment
Noun Phrase coordination
Motivation BioText project
In eukaryotes, the key to transcriptional
regulation of the Heat Shock Response is the Heat
Shock Transcription Factor (HSF).
Open-labeled long-term study of the subcutaneous
sumatriptan efficacy and tolerability in acute
migraine treatment.
BimL protein interact with Bcl-2 or Bcl-XL, or
Bcl-w proteins (Immuno-precipitation (anti-Bcl-2
OR Bcl-XL or Bcl-w)) followed by Western blot
(anti-EEtag) using extracts human 293T cells
co-transfected with EE-tagged BimL and (bcl-2 or
bcl-XL or bcl-w) plasmids)

19
Applying U U U to Structural Ambiguity

We introduce the use of (nearly) unambiguous
features
surface features
paraphrases
Combined with very, very large corpora
Achieve state-of-the-art results without labeled
examples.

20
Noun Compound Bracketing

(a) liver cell antibody (left
bracketing)
(b) liver cell line (right
bracketing)
In (a), the antibody targets the liver cell.
In (b), the cell line is derived from the liver.

21
Dependency Model

right bracketing w1w2w3
w2w3 is a compound (modified by w1)
home health care
w1 and w2 independently modify w3
adult male rat
left bracketing w1w2 w3
only 1 modificational choice possible
law enforcement officer

w1 w2 w3
w1 w2 w3
22
Related Work

Marcus(1980), Pustejoskyal.(1993), Resnik(1993)
adjacency model Pr(w1w2) vs. Pr(w2w3)
Lauer (1995)
dependency model Pr(w1w2) vs. Pr(w1w3)
Keller Lapata (2004)
use the Web
unigrams and bigrams
Girju al. (2005)
supervised model
bracketing in context
requires WordNet senses
to be given

Our approach
Web as data
?2 , n-grams
paraphrases
surface features

23
Computing Bigram Statistics

Dependency Model, Frequencies
Compare (w1,w2) to (w1,w3)
Dependency model, Probabilities
Pr(left) Pr(w1?w2w2)Pr(w2?w3w3)
Pr(right) Pr(w1?w3w3)Pr(w2?w3w3)
So we compare Pr(w1?w2w2) to Pr(w1?w3w3)

right
w1 w2 w3
left
24
Probabilities Estimation

Using page hits as a proxy for n-gram counts
Pr(w1?w2w2) (w1,w2) / (w2)
(w2) word frequency query for w2
(w1,w2) bigram frequency query for w1 w2
smoothed by 0.5

25
Association Models ?2 (Chi Squared)

A (wi,wj)
B (wi) (wi,wj)
C (wj) (wi,wj)
D N (ABC)
N 8 trillion ( ABCD)

8 billion Web pages x 1,000 words
26
Web-derived Surface Features

Authors often disambiguate noun compounds using
surface markers, e.g.
amino-acid sequence ? left
brain stems cell ? left
brains stem cell ? right
The enormous size of the Web makes these frequent
enough to be useful.

27
Web-derived Surface FeaturesDash (hyphen)

Left dash
cell-cycle analysis ? left
Right dash
donor T-cell ? right
fiber optics-system ? should be left..
Double dash
T-cell-depletion ? unusable

28
Web-derived Surface FeaturesPossessive Marker

Attached to the first word
brains stem cell ? right
Attached to the second word
brain stems cell ? left
Combined features
brains stem-cell ? right

29
Web-derived Surface FeaturesCapitalization

dont-care lowercase uppercase
Plasmodium vivax Malaria ? left
plasmodium vivax Malaria ? left
lowercase uppercase dont-care
brain Stem cell ? right
brain Stem Cell ? right
Disable this on
Roman digits
Single-letter words e.g. vitamin D deficiency

30
Web-derived Surface FeaturesEmbedded Slash

Left embedded slash
leukemia/lymphoma cell ? right

31
Web-derived Surface FeaturesParentheses

Single-word
growth factor (beta) ? left
(brain) stem cell ? right
Two-word
(growth factor) beta ? left
brain (stem cell) ? right

32
Web-derived Surface FeaturesComma, dot,
semi-colon

Following the first word
home. health care ? right
adult, male rat ? right
Following the second word
health care, provider ? left
lung cancer patients ? left

33
Web-derived Surface FeaturesDash to External
Word

External word to the left
mouse-brain stem cell ? right
External word to the right
tumor necrosis factor-alpha ? left

34
Web-derived Surface FeaturesProblems Solutions

Problem search engines ignore punctuation in
queries
brain-stem cell does not work
Solution
query for brain stem cell
obtain 1,000 document summaries
scan for the features in these summaries

35
Other Web-derived FeaturesPossessive Marker

We can also query directly for possessives
Yes, brain stems cell sort of works.
Search engines
drop the possessive marker
but s is kept
Still, we cannot query for brain stems cell

36
Other Web-derived FeaturesAbbreviation

After the second word
tumor necrosis factor (NF) ? right
After the third word
tumor necrosis (TN) factor ? right
We query for, e.g., tumor necrosis tn factor
Problems
Roman digits IV, VI
States CA
Short words me

37
Other Web-derived FeaturesConcatenation

Consider health care reform
healthcare 79,500,000
carereform 269
healthreform 812
Adjacency model
healthcare vs. carereform
Dependency model
healthcare vs. healthreform
Triples
healthcare reform vs. health carereform

38
Other Web-derived FeaturesUsing Googles

Each allows a one-word wildcard
Single star
health care reform ? left
health care reform ? right
More stars and/or reverse order
care reform health ? right

39
Other Web-derived FeaturesReorder

Reorders for health care reform
care reform health ? right
reform health care ? left

40
Other Web-derived FeaturesInternal Inflection
Variability

Vary inflection of second word
tyrosine kinase activation
tyrosine kinases activation

41
Other Web-derived FeaturesSwitch The First Two
Words

Predict right, if we can reorder
adult male rat as
male adult rat

42
Paraphrases

The semantics of a noun compound is often made
overt by a paraphrase (Warren,1978)
Prepositional
stem cells in the brain ? right
cells from the brain stem ? right
Verbal
virus causing human immunodeficiency ? left
Copula
office building that is a skyscraper ? right

43
Paraphrases

Lauer(1995), KellerLapata(2003), Girjual.
(2005) predict NC semantics by choosing the most
likely preposition
of, for, in, at, on, from, with, about, (like)
This could be problematic, when more than one
preposition is possible
In contrast
we try to predict syntax, not semantics
we do not disambiguate, just add up all counts
cells in (the) bone marrow ? left
cells from (the) bone marrow ? left

44
Paraphrases

prepositional paraphrases
We use 150 prepositions
verbal paraphrases
We use associated with, caused by, contained in,
derived from, focusing on, found in, involved in,
located at/in, made of, performed by, preventing,
related to and used by/in/for.
copula paraphrases
We use is/was and that/which/who
optional elements
articles a, an, the
quantifiers some, every, etc.
pronouns this, these, etc.

45
Evaluation Datasets

Lauer Set
244 noun compounds (NCs)
from Groliers encyclopedia
inter-annotator agreement 81.5
Biomedical Set
430 NCs
from MEDLINE
inter-annotator agreement 88 (? .606)

46
Evaluation Experiments

Exact phrase queries
Limited to English
Inflections
Lauer Set Carrolls morphological tools
Biomedical Set UMLS Specialist Lexicon

47
Co-occurrence Statistics

Lauer set

Bio set

48
Paraphrase and Surface Features Performance

Lauer Set
Biomedical Set

49
Individual Surface Features Performance Bio
50
Individual Surface Features Performance Bio
51
Results Lauer
52
Results Comparing with Others
53
Results Bio
54
Summary Results for Noun Compound Bracketing

Introduced search engine statistics that go
beyond the n-gram (applicable to other tasks)
surface features
paraphrases
Obtained new state-of-the-art results on NC
bracketing
more robust than Lauer (1995)
more accurate than KellerLapata (2004)

55
Prepositional Phrase Attachment

(a) Peter spent millions of dollars. (noun
attach)
(b) Peter spent time with his family. (verb
attach)
quadruple (v, n1, p, n2)
(a) (spent, millions, of, dollars)
(b) (spent, time, with, family)

56
Related Work

Supervised
(Brill Resnik, 94) transformation-based
learning, WordNet classes, P82
(Ratnaparkhi al., 94)
ME, word classes (MI), P81.6
(Collins Brooks, 95)
back-off, P84.5
(Stetina Makoto, 97) decision trees, WordNet,
P88.1
(Toutanova al., 04) morphology, syntax,
WordNet, P87.5

Unsupervised
(Hindle Rooth, 93) partially parsed corpus,
lexical associations over subsets of (v,n1,p),
P80,R80
(Ratnaparkhi, 98) POS tagged corpus, unambiguous
cases for (v,n1,p), (n1,p,n2), classifier
P81.9
(Pantel Lin,00) collocation database,
dependency parser, large corpus (125M words),
P84.3

Unsup. state-of-the-art
57
PP-attachment Our Approach

Unsupervised
(v,n1,p,n2) quadruples, Ratnaparkhi test set
Google and MSN Search
Exact phrase queries
Inflections WordNet 2.0
Adding determiners where appropriate
Models
n-gram association models
Web-derived surface features
paraphrases

58
N-gram models

(i) Pr(pn1) vs. Pr(pv)
(ii) Pr(p,n2n1) vs. Pr(p,n2v)
I eat/v spaghetti/n1 with/p a fork/n2.
I eat/v spaghetti/n1 with/p sauce/n2.
Pr or (frequency)
smoothing as in (Hindle Rooth, 93)
back-off from (ii) to (i)
N-grams unreliable, if n1 or n2 is a pronoun.
MSN Search no rounding of n-gram estimates

59
Web-derived Surface Features
P R

Example features
open the door / with a key ? verb (100.00,
0.13)
open the door (with a key) ? verb (73.58,
2.44)
open the door with a key? verb (68.18,
2.03)
open the door , with a key ? verb (58.44,
7.09)
eat Spaghetti with sauce ? noun (100.00,
0.14)
eat ? spaghetti with sauce? noun (83.33,
0.55)
eat , spaghetti with sauce ? noun (65.77,
5.11)
eat spaghetti with sauce ? noun (64.71,
1.57)
Summing achieves high precision, low recall.

sum
compare
sum
60
Paraphrases

v n1 p n2
v n2 n1 (noun)
v p n2 n1 (verb)
p n2 v n1 (verb)
n1 p n2 v (noun)
v PRONOUN p n2 (verb)
BE n1 p n2 (noun)

61
Evaluation

Ratnaparkhi dataset
3097 test examples, e.g.
prepare dinner for family V
shipped crabs from province V
n1 or n2 is a bare determiner 149 examples
problem for unsupervised methods
left chairmanship of the N
is the of kind N
acquire securities for an N
special symbols , /, etc. 230 examples
problem for Web queries
buy for 10 V
beat SP-down from V
is 43-owned by firm N

62
Results
For prepositions other then OF. (of ? noun
attachment)
Models in bold are combined in a majority vote.
Simpler but not significantly different from
84.3 (PantelLin,00).
63
Noun Phrase Coordination

(Modified) real sentence
The Department of Chronic Diseases and Health
Promotion leads and strengthens global efforts to
prevent and control chronic diseases or
disabilities and to promote health and quality of
life.

64
NC coordination ellipsis

Ellipsis
car and truck production
means car production and truck production
No ellipsis
president and chief executive
All-way coordination
Securities and Exchange Commission

65
NC Coordination ellipsis

Quadruple (n1,c,n2,h)
Penn Treebank annotations
ellipsis
(NP car/NN and/CC truck/NN production/NN).
no ellipsis
(NP (NP president/NN) and/CC (NP chief/NN
executive/NN))
all-way can be annotated either way
This is a problem a parser must deal with.

Collins parser always predicts ellipsis, but
other parsers (e.g. Charniaks) try to solve it.
66
Results428 examples from Penn TB
67
Semantic Relation Detection

Goal automatically augment a lexical database
Many potential relation types
ISA (hypernymy/hyponymy)
Part-Of (meronymy)
Idea find unambiguous contexts which (nearly)
always indicate the relation of interest

68
Lexico-Syntactic Patterns
69
Lexico-Syntactic Patterns
70
Adding a New Relation
71
Semantic Relation Detection

Lexico-syntactic Patterns
Should occur frequently in text
Should (nearly) always suggest the relation of
interest
Should be recognizable with little pre-encoded
knowledge.
These patterns have been used extensively by
other researchers.

72
Semantic Relation Detection

What relationship holds between two nouns?
olive oil oil comes from olives
machine oil oil used on machines
Assigning the meaning relations between these
terms has been seen as a very difficult solution
Our solution
Use clever queries against the web to figure out
the relations.

73
Queries for Semantic Relations

Convert the noun-noun compound into a query of
the form
noun2 that noun1
oil that olive(s)
This returns search result snippets containing
interesting verbs.
In this case
Come from
Be obtained from
Be extracted from
Made from

74
Queries for Semantic Relations

More examples
Migraine drug -gt treat, be used for, reduce,
prevent
Wrinkle drug -gt treat, be used for, reduce,
smooth
Printer tray -gt hold, come with, be folded, fit
under, be inserted into
Student protest -gt be led by, be sponsored by,
pit, be, be organized by

75
Conclusions

The enormous size of the web opens new
opportunities for text analysis
There are many words, but they are more likely to
appear together in a huge dataset
This allows us to do word-specific analysis
Unambiguous Unlimited Unsupervised
Weve applied it to structural and semantic
language problems.
These are stepping stones towards sophisticated
language understanding.

76
Conclusions

Tapping the potential of very large corpora for
unsupervised algorithms
Go beyond n-grams
Surface features
Paraphrases
Results competitive with best unsupervised
Results can rival supervised algorithms
Future Work
Unambiguous Unlimited Unsupervised
How to extend to other problems?

77
Thank you!

http//biotext.berkeley.edu
Supported in part by NSF DBI-0317510

78
What about Search?

Web search currently does not use very much
language analysis.
Queries are very short (2.1 words/avg) so most
queries match many pages
Improvements in ranking make use of the massive
size of the web
Anchor text (words on links pointed to pages)
Which hits users clicked on (starting to use
this)
As well as the structure of language
Where query terms occur (title, etc)
How close together query words occur

79
Using n-grams to make predictions

Say trying to distinguish
home health care
home health care
Main idea compare these co-occurrence
probabilities
home health vs
health care

80
Using n-grams to make predictions

Use search engines page hits as a proxy for
n-gram counts
compare Pr(w1?w2w2) to Pr(w1?w3w3)
Pr(w1 ?w2w2 ) (w1,w2) / (w2)
(w2) word frequency query for w2
(w1,w2) bigram frequency query for w1 w2

81
Probabilities Why? (1)

Why should we use
(a) Pr(w1?w2w2), rather than
(b) Pr(w2?w1w1)?
KellerLapata (2004) calculate
AltaVista queries
(a) 70.49
(b) 68.85
British National Corpus
(a) 63.11
(b) 65.57

82
Probabilities Why? (2)

Why should we use
(a) Pr(w1?w2w2), rather than
(b) Pr(w2?w1w1)?
Maybe to introduce a bracketing prior.
Just like Lauer (1995) did.
But otherwise, no reason to prefer either one.
Do we need probabilities? (association is OK)
Do we need a directed model? (symmetry is OK)

83
Adjacency Dependency (2)

right bracketing w1w2w3
w2w3 is a compound (modified by w1)
w1 and w2 independently modify w3
adjacency model
Is w2w3 a compound?
(vs. w1w2 being a compound)
dependency model
Does w1 modify w3?
(vs. w1 modifying w2)

w1 w2 w3
w1 w2 w3
w1 w2 w3
84
Paraphrases pattern (1)

v n1 p n2 ? v n2 n1 (noun)
Can we turn n1 p n2 into a noun compound n2
n1?
meet/v demands/n1 from/p customers/n2 ?
meet/v the customer/n2 demands/n1
Problem ditransitive verbs like give
gave/v an apple/n1 to/p him/n2 ?
gave/v him/n2 an apple/n1
Solution
no determiner before n1
determiner before n2 is required
the preposition cannot be to

85
Paraphrases pattern (2)

v n1 p n2 ? v p n2 n1 (verb)
If p n2 is an indirect object of v, then it
could be switched with the direct object n1.
had/v a program/n1 in/p place/n2 ?
had/v in/p place/n2 a program/n1

Determiner before n1 is required to prevent n2
n1 from forming a noun compound.
86
Paraphrases pattern (3)

v n1 p n2 ? p n2 v n1 (verb)
indicates a wildcard position (up to three
intervening words are allowed)
Looks for appositions, where the PP has moved in
front of the verb, e.g.
I gave/v an apple/n1 to/p him/n2 ?
to/p him/n2 I gave/v an apple/n1

87
Paraphrases pattern (4)

v n1 p n2 ? n1 p n2 v (noun)
Looks for appositions, where n1 p n2 has moved
in front of v
shaken/v confidence/n1 in/p markets/n2 ?
confidence/n1 in/p markets/n2 shaken/v

88
Paraphrases pattern (5)

v n1 p n2 ? v PRONOUN p n2 (verb)
n1 is a pronoun ? verb (HindleRooth, 93)
Pattern (5) substitutes n1 with a dative pronoun
(him or her), e.g.
put/v a client/n1 at/p odds/n2 ?
put/v him at/p odds/n2

pronoun
89
Paraphrases pattern (6)

v n1 p n2 ? BE n1 p n2 (noun)
BE is typically used with a noun attachment
Pattern (6) substitutes v with a form of to be
(is or are), e.g.
eat/v spaghetti/n1 with/p sauce/n2 ?
is spaghetti/n1 with/p sauce/n2

to be
90
Related Work

(Resnik, 99) similarity of form and meaning,
conceptual association, decision tree, P80,
R100
(Rus al., 02) deterministic, rule-based
bracketing in context, P87.42, R71.05
(Chantree al., 05) distributional similarities
from BNC, Sketch Engine (freqs., object/modifier
etc.), P80.3, R53.8

91
N-gram models