SIMS%20290-2:%20Applied%20Natural%20Language%20Processing

About This Presentation

Title:

SIMS%20290-2:%20Applied%20Natural%20Language%20Processing

Description:

Online phone directories and yellow pages for person and organisation names (e.g. ... Global Discovery database from Europa technologies Ltd, UK (e.g., [Ignat03] ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 64

Provided by: coursesIs8

Learn more at: https://courses.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: SIMS%20290-2:%20Applied%20Natural%20Language%20Processing

1
SIMS 290-2 Applied Natural Language Processing
Marti Hearst October 13, 2004
2
Today

Finish hand-built rule systems
Machine Learning approaches to information
extraction
Sliding Windows
Rule-learners (older)
Feature-base ML (more recent)
IE tools

3
Two kinds of NE approaches

Knowledge Engineering
rule based
developed by experienced language engineers
make use of human intuition
requires only small amount of training data
development could be very time consuming
some changes may be hard to accommodate

Learning Systems
use statistics or other machine learning
developers do not need LE expertise
requires large amounts of annotated training data
some changes may require re-annotation of the
entire training corpus
annotators are cheap (but you get what you pay
for!)

4
Baseline list lookup approach

System that recognises only entities stored in
its lists (gazetteers).
Advantages - Simple, fast, language independent,
easy to retarget (just create lists)
Disadvantages impossible to enumerate all
names, collection and maintenance of lists,
cannot deal with name variants, cannot resolve
ambiguity

5
Creating Gazetteer Lists

Online phone directories and yellow pages for
person and organisation names (e.g.
Paskaleva02)
Locations lists
US GEOnet Names Server (GNS) data 3.9 million
locations with 5.37 million names (e.g.,
Manov03)
UN site http//unstats.un.org/unsd/citydata
Global Discovery database from Europa
technologies Ltd, UK (e.g., Ignat03)
Automatic collection from annotated training data

6
Rule-based Example FACILE

FACILE - used in MUC-7 Black et al 98
Uses Inxights LinguistiX tools for tagging and
morphological analysis
Database for external information, role similar
to a gazetteer
Linguistic info per token, encoded as feature
vector
Text offsets
Orthographic pattern (first/all capitals, mixed,
lowercase)
Token and its normalised form
Syntax category and features
Semantics from database or morphological
analysis
Morphological analyses
Example(1192 1196 10 T C "Mrs." "mrs." (PROP
TITLE) (ˆPER_CIV_F)(("Mrs." "Title" "Abbr"))
NIL)PER_CIV_F female civilian (from database)

7
FACILE

Context-sensitive rules written in special rule
notation, executed by an interpreter
Writing rules in PERL is too error-prone and hard
Rules of the kind A gt B\C/D, where
A is a set of attribute-value expressions and
optional score, the attributes refer to elements
of the input token feature vector
B and D are left and right context respectively
and can be empty
B, C, D are sequences of attribute-value pairs
and Kleene regular expression operations
variables are also supported
synNP, semORG (0.9) gt\ norm"university",
token"of",semREGIONCOUNTRYCITY /

8
FACILE

Rule for the mark up of person names when the
first name is not
present or known from the gazetteers e.g 'Mr
J. Cass',
SYNPROP,SEMPER, FIRST_F, INITIALS_I,
MIDDLE_M, LAST_S _F, _I, _M, _S are
variables, transfer info from RHS
gt
SEMTITLE_MILTITLE_FEMALETITLE_MALE
\SYNNAME, ORTHIO, TOKEN_I?,
ORTHCA, SYNPROP, TOKEN_F?,
SYNNAME, ORTHIO, TOKEN_I?,
SYNNAME, TOKEN_M?,
ORTHCAO,SYNPROP,TOKEN_S, SOURCE!RULE
proper name, not recognised by a rule
/

9
FACILE

Preference mechanism
The rule with the highest score is preferred
Longer matches are preferred to shorter matches
Results are always one semantic categorisation of
the named entity in the text
Evaluation (MUC-7 scores)
Organization 86 precision, 66 recall
Person 90 precision, 88 recall
Location 81 precision, 80 recall
Dates 93 precision, 86 recall

10
Extraction by Sliding Window
11
Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
12
Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
13
Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
14
Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
15
A Naïve Bayes Sliding Window Model
Freitag 1997
00 pm Place Wean Hall Rm 5409
Speaker Sebastian Thrun

w t-m
w t-1
w t
w tn
w tn1
w tnm
prefix
contents
suffix
Estimate Pr(LOCATIONwindow) using Bayes
rule Try all reasonable windows (vary length,
position) Assume independence for length, prefix
words, suffix words, content words Estimate from
data quantities like Pr(Place in
prefixLOCATION)
If P(Wean Hall Rm 5409 LOCATION) is above
some threshold, extract it.
Other examples of sliding window Baluja et al
2000 (decision tree over individual words
their context)
16
Naïve Bayes Sliding Window Results
Domain CMU UseNet Seminar Announcements
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
Field F1 Person Name 30 Location 61 Start
Time 98
17
SRV a realistic sliding-window-classifier IE
system
Frietag AAAI 98

What windows to consider?
all windows containing as many tokens as the
shortest example, but no more tokens than the
longest example
How to represent a classifier? It might
Restrict the length of window
Restrict the vocabulary or formatting used
before/after/inside window
Restrict the relative order of tokens
Use inductive logic programming techniques to
express all these

lttitlegtCourse Information for CS213lt/titlegt lth1gtCS
213 C Programminglt/h1gt
18
SRV a rule-learner for sliding-window
classification

Primitive predicates used by SRV
token(X,W), allLowerCase(W), numerical(W),
nextToken(W,U), previousToken(W,V)
HTML-specific predicates
inTitleTag(W), inH1Tag(W), inEmTag(W),
emphasized(W) inEmTag(W) or inBTag(W) or
tableNextCol(W,U) U is some token in the
column after the column W is in
tablePreviousCol(W,V), tableRowHeader(W,T),

19
Automatic Pattern-Learning Systems
20
Automatic Pattern-Learning Systems

Pros
Portable across domains
Tend to have broad coverage
Robust in the face of degraded input.
Automatically finds appropriate statistical
patterns
System knowledge not needed by those who supply
the domain knowledge.
Cons
Annotated training data, and lots of it, is
needed.
Isnt necessarily better or cheaper than
hand-built soln
Examples Riloff et al., AutoSlog (UMass)
Soderland WHISK (UMass) Mooney et al. Rapier
(UTexas)
learn lexico-syntactic patterns from templates

21
Rapier Califf Mooney, AAAI-99

Rapier learns three regex-style patterns for each
slot
?Pre-filler pattern ? Filler pattern ?
Post-filler pattern

22
Features for IE Learning Systems

Part of speech syntactic role of a specific word
Semantic Classes Synonyms or other related words
Price class price, cost, amount,
Month class January, February, March, ,
December
US State class Alaska, Alabama, ,
Washington, Wyoming
WordNet large on-line thesaurus containing
(among other things) semantic classes

23
Rapier rule matching example

sold to the bank for an undisclosed
amount
POS vb pr det nn pr det jj
nn
SClass
price

Pre-filler Filler Post-Filler 1) tag
nn,nnp 1) word undisclosed 1) sem price 2)
list length 2 tag jj
paid Honeywell an undisclosed price POS
vb nnp det jj
nnSClass
price
24
Rapier Rules Details

Rapier rule
pre-filler pattern
filler pattern
post-filler pattern
pattern subpattern
subpattern constraint
constraint
Word - exact word that must be present
Tag - matched word must have given POS tag
Class - semantic class of matched word
Can specify disjunction with
List length N - between 0 and N words satisfying
other constraints

25
Rapiers Learning Algorithm

Input set of training examples (list of
documents annotated with extract this
substring)
Output set of rules
Init Rules a rule that exactly matches each
training example
Repeat several times
Seed Select M examples randomly and generate
the Kmost-accurate maximally-general filler-only
rules(prefiller postfiller match anything)
GrowRepeat For N 1, 2, 3, Try to improve
K best rules by adding N context words of
prefiller or postfiller context
KeepRules Rules ? the best of the K rules
subsumed rules

26
Learning example (one iteration)

2 examples located in Atlanta, Georgia
offices in Kansas City, Missouri

appropriately general rule (high precision, high
recall)
27
Rapier resultsPrecision vs. Training Examples
28
Rapier resultsRecall vs. Training Examples
29
Summary Rule-learning approaches to
sliding-window classification

SRV, Rapier, and WHISK Soderland KDD 97
Representations for classifiers allow restriction
of the relationships between tokens, etc
Representations are carefully chosen subsets of
even more powerful representations
Use of these heavyweight representations is
complicated, but seems to pay off in results

30
Successors to MUC

CoNNL Conference on Computational Natural
Language Learning
Different topics each year
2002, 2003 Language-independent NER
2004 Semantic Role recognition
2001 Identify clauses in text
2000 Chunking boundaries
http//cnts.uia.ac.be/conll2003/ (also conll2004,
conll2002)
Sponsored by SIGNLL, the Special Interest Group
on Natural Language Learning of the Association
for Computational Linguistics.
ACE Automated Content Extraction
Entity Detection and Tracking
Sponsored by NIST
http//wave.ldc.upenn.edu/Projects/ACE/
Several others recently
See http//cnts.uia.ac.be/conll2003/ner/

31
CoNNL-2003

Goal identify boundaries and types of named
entities
People, Organizations, Locations, Misc.
Experiment with incorporating external resources
(Gazeteers) and unlabeled data
Data
Using IOB notation
4 pieces of info for each term
Word POS Chunk EntityType

32
Details on Training/Test Sets
Reuters Newswire European Corpus Initiative
Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
33
Summary of Results

16 systems participated
Machine Learning Techniques
Combinations of Maximum Entropy Models (5)
Hidden Markov Models (4) Winnow/Perceptron (4)
Others used once were Support Vector Machines,
Conditional Random Fields, Transformation-Based
learning, AdaBoost, and memory-based learning
Combining techniques often worked well
Features
Choice of features is at least as important as ML
method
Top-scoring systems used many types
No one feature stands out as essential (other
than words)

Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
34
Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
35
Use of External Information

Improvement from using Gazeteers vs. unlabeled
data nearly equal
Gazeteers less useful for German than English
(higher quality)

Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
36
Precision, Recall, and F-Scores

Not significantly different
Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
37
Combining Results

What happens if we combine the results of all of
the systems?
Used a majority-vote of 5 systems for each set
English
F 90.30 (14 error reduction of best system)
German
F 74.17 (6 error reduction of best system)
Top four systems in more detail

38
Zhang and Johnson

Experimented with the effects of different
features
Used a learning method they developed called
Robust Risk Minimization
Related to the Winnow method
Used it to predict the class label ti associated
with each token wi
Estimate P(ti c xi) for every possible class
label c where xi is a feature vector associated
with token i
xi can including information about previous tags
Found that the relatively simple, language
independent features get you much of the way

39
Zhang and Johnson

Simple features include
The tokens themselves, in window of /- 2
The previous 2 predicted tags
The conjunction of the previous tag and the
current token
Initial capitalization of tokens, in window of
/- 2
More elaborate features include
Word shape information initial caps, all caps,
all digits, digits containing punctuation
Token prefix (len 3-4) and suffix (len 1-4)
POS
Chunking info (chunk bag-of-words at current
token)
Marked up entities from training data
Other dictionaries

40
Language independent
41
Florian, Ittycheria, Jing, Zhang

Combined four machine learning algorithms
The best-performing was the Zhang Johnson RRM
Voting algorithm
Giving them all equal-weight votes worked well
So did using the RRM algorithm to choose among
them
English F-measure went from 89.94 to 91.63
Did well with the supplied features did even
better with some complex additional features
The output of 2 other NER systems
Trained on 1.7M annotated words in 32 categories
A list of gazetteers
Improved English F-measure to 93.9
(21 error reduction)

42
Effects of Unknown Words

Florian et al. note that German is harder
Has more unknown words
All nouns are capitalized

43
Klein, Smarr, Nguyen, Manning

Standard approach for unknown words is to extract
features like suffixes, prefixes, and
capitalization
Idea use all-character n-grams, rather than
words, as the primary representation
Integrates unknown words seamlessly into the
model
Improved results of their classifier by 25

44
Balancing n-grams with Other Evidence

Example morning at Grace Road
Need the classifiers to determine Grace is part
of a location rather than a Person
Used Conditional Markov Model (aka Maximum
Entropy Model)
Also, added other shape information
20-month -gt d-x
Italy -gt Xx

45
(No Transcript)
46
(No Transcript)
47
Chieu and Ng

Used a Maximum Entropy approach
Estimates probabilities based on the principle of
making as few assumptions as possible
But allows specification of constraints between
featurs and outcome (derived from training data)
Used a rich feature set, like those already
discussed
Interesting additional features
Lists derived from training set
Global features look at how the words appeared
elsewhere within the document
Doesnt say which of these features do well

48
Lists Derived from Training Data

UNI (useful unigrams)
Top 20 words that precede instances of that class
Computed using a correlation metric
UBI (useful bigrams) pairs of preceding words
CITY OF, ARRIVES IN
The bigram have higher probability of preceding
the class than the unigram
CITY OF better evidence than just OF
NCS Useful Name Class Suffixes
Tokens that frequenty terminate a class
INC, COMMITTEE

49
Using Other Occurrences within the Document

Zone
Where is the token from? (headline, author,
body)
Unigrams
If UNI holds for an occurrence of w elsewhere
Bigrams
If UBI holds for an occurrence of w elsewhere
Suffix
If NCS holds of an occurrence of w elsewhere
InitCaps
A way to check if a word is capitalized due to
its position in the sentence or not. Also, check
the first work in sequence of capitalized words.
Even News Broadcasting Corp., noted for its
accurate reporting, made the erroneous
announcement.

50
MUC Redux

Task fill slots of templates
MUC-4 (1992)
All systems hand-engineered
One MUC-6 entry used learning failed miserably

51
(No Transcript)
52
MUC Redux

Fast forward 12 years now use ML!
Chieu et. al. show a machine learning approach
that can do as well as most of the
hand-engineered MUC-4 systems
Uses state-of-the-art
Sentence segmenter
POS tagger
NER
Statistical Parser
Co-reference resolution
Features look at syntactic context
Use subject-verb-object information
Use head-words of NPs
Train classifiers for each slot type

Chieu, Hai Leong, Ng, Hwee Tou, Lee, Yoong Keok
(2003). Closing the Gap Learning-Based
Information Extraction Rivaling
Knowledge-Engineering Methods, In (ACL-03).
53
Best systems took 10.5 person-months of
hand-coding!
54
IE Techniques Summary

Machine learning approaches are doing well, even
without comprehensive word lists
Can develop a pretty good starting list with a
bit of web page scraping
Features mainly have to do with the preceding and
following tags, as well as syntax and word
shape
The latter is somewhat language dependent
With enough training data, results are getting
pretty decent on well-defined entities
ML is the way of the future!

55
IE Tools

Research tools
Gate
http//gate.ac.uk/
MinorThird
http//minorthird.sourceforge.net/
Alembic (only NE tagging)
http//www.mitre.org/tech/alembic-workbench/
Commercial
?? I dont know which ones work well

56
NE Annotation Tools - GATE
57
NE Annotation Tools - Alembic
58
NE Annotation Tools Alembic (2)
59
GATE

GATE University of Sheffields open-source
infrastructure for language processing
Automatically deals with document formats, saving
of results, evaluation, and visualisation of
results for debugging
has a finite-state pattern-action rule language
Has an example rule-based system called ANNIE
ANNIE modified for MUC guidelines 89.5
f-measure on MUC-7 corpus

60
NE Components The ANNIE system a reusable and
easily extendable set of components
61
Gates Named Entity Grammars

Phases run sequentially and constitute a cascade
of FSTs over the pre-processing results
Hand-coded rules applied to annotations to
identify NEs
Annotations from format analysis, tokeniser,
sentence splitter, POS tagger, and gazetteer
modules
Use of contextual information
Finds person names, locations, organisations,
dates, addresses.

62
Named Entities in GATE
63
Named Entity Coreference

Write a Comment

User Comments (0)

About PowerShow.com

SIMS%20290-2:%20Applied%20Natural%20Language%20Processing - PowerPoint PPT Presentation

SIMS%20290-2:%20Applied%20Natural%20Language%20Processing

Online phone directories and yellow pages for person and organisation names (e.g. ... Global Discovery database from Europa technologies Ltd, UK (e.g., [Ignat03] ... – PowerPoint PPT presentation