SIMS%20290-2:%20Applied%20Natural%20Language%20Processing - PowerPoint PPT Presentation

About This Presentation
Title:

SIMS%20290-2:%20Applied%20Natural%20Language%20Processing

Description:

Online phone directories and yellow pages for person and organisation names (e.g. ... Global Discovery database from Europa technologies Ltd, UK (e.g., [Ignat03] ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 64
Provided by: coursesIs8
Category:

less

Transcript and Presenter's Notes

Title: SIMS%20290-2:%20Applied%20Natural%20Language%20Processing


1
SIMS 290-2 Applied Natural Language Processing
Marti Hearst October 13, 2004    
2
Today
  • Finish hand-built rule systems
  • Machine Learning approaches to information
    extraction
  • Sliding Windows
  • Rule-learners (older)
  • Feature-base ML (more recent)
  • IE tools

3
Two kinds of NE approaches
  • Knowledge Engineering
  • rule based
  • developed by experienced language engineers
  • make use of human intuition
  • requires only small amount of training data
  • development could be very time consuming
  • some changes may be hard to accommodate
  • Learning Systems
  • use statistics or other machine learning
  • developers do not need LE expertise
  • requires large amounts of annotated training data
  • some changes may require re-annotation of the
    entire training corpus
  • annotators are cheap (but you get what you pay
    for!)

4
Baseline list lookup approach
  • System that recognises only entities stored in
    its lists (gazetteers).
  • Advantages - Simple, fast, language independent,
    easy to retarget (just create lists)
  • Disadvantages impossible to enumerate all
    names, collection and maintenance of lists,
    cannot deal with name variants, cannot resolve
    ambiguity

5
Creating Gazetteer Lists
  • Online phone directories and yellow pages for
    person and organisation names (e.g.
    Paskaleva02)
  • Locations lists
  • US GEOnet Names Server (GNS) data 3.9 million
    locations with 5.37 million names (e.g.,
    Manov03)
  • UN site http//unstats.un.org/unsd/citydata
  • Global Discovery database from Europa
    technologies Ltd, UK (e.g., Ignat03)
  • Automatic collection from annotated training data

6
Rule-based Example FACILE
  • FACILE - used in MUC-7 Black et al 98
  • Uses Inxights LinguistiX tools for tagging and
    morphological analysis
  • Database for external information, role similar
    to a gazetteer
  • Linguistic info per token, encoded as feature
    vector
  • Text offsets
  • Orthographic pattern (first/all capitals, mixed,
    lowercase)
  • Token and its normalised form
  • Syntax category and features
  • Semantics from database or morphological
    analysis
  • Morphological analyses
  • Example(1192 1196 10 T C "Mrs." "mrs." (PROP
    TITLE) (ˆPER_CIV_F)(("Mrs." "Title" "Abbr"))
    NIL)PER_CIV_F female civilian (from database)

7
FACILE
  • Context-sensitive rules written in special rule
    notation, executed by an interpreter
  • Writing rules in PERL is too error-prone and hard
  • Rules of the kind A gt B\C/D, where
  • A is a set of attribute-value expressions and
    optional score, the attributes refer to elements
    of the input token feature vector
  • B and D are left and right context respectively
    and can be empty
  • B, C, D are sequences of attribute-value pairs
    and Kleene regular expression operations
    variables are also supported
  • synNP, semORG (0.9) gt\ norm"university",
    token"of",semREGIONCOUNTRYCITY /

8
FACILE
  • Rule for the mark up of person names when the
    first name is not
  • present or known from the gazetteers e.g 'Mr
    J. Cass',
  • SYNPROP,SEMPER, FIRST_F, INITIALS_I,
    MIDDLE_M, LAST_S _F, _I, _M, _S are
    variables, transfer info from RHS
  • gt
  • SEMTITLE_MILTITLE_FEMALETITLE_MALE
  • \SYNNAME, ORTHIO, TOKEN_I?,
  • ORTHCA, SYNPROP, TOKEN_F?,
  • SYNNAME, ORTHIO, TOKEN_I?,
  • SYNNAME, TOKEN_M?,
  • ORTHCAO,SYNPROP,TOKEN_S, SOURCE!RULE
  • proper name, not recognised by a rule
  • /

9
FACILE
  • Preference mechanism
  • The rule with the highest score is preferred
  • Longer matches are preferred to shorter matches
  • Results are always one semantic categorisation of
    the named entity in the text
  • Evaluation (MUC-7 scores)
  • Organization 86 precision, 66 recall
  • Person 90 precision, 88 recall
  • Location 81 precision, 80 recall
  • Dates 93 precision, 86 recall

10
Extraction by Sliding Window
11
Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
12
Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
13
Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
14
Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
15
A Naïve Bayes Sliding Window Model
Freitag 1997
00 pm Place Wean Hall Rm 5409
Speaker Sebastian Thrun


w t-m
w t-1
w t
w tn
w tn1
w tnm
prefix
contents
suffix
Estimate Pr(LOCATIONwindow) using Bayes
rule Try all reasonable windows (vary length,
position) Assume independence for length, prefix
words, suffix words, content words Estimate from
data quantities like Pr(Place in
prefixLOCATION)
If P(Wean Hall Rm 5409 LOCATION) is above
some threshold, extract it.
Other examples of sliding window Baluja et al
2000 (decision tree over individual words
their context)
16
Naïve Bayes Sliding Window Results
Domain CMU UseNet Seminar Announcements
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
Field F1 Person Name 30 Location 61 Start
Time 98
17
SRV a realistic sliding-window-classifier IE
system
Frietag AAAI 98
  • What windows to consider?
  • all windows containing as many tokens as the
    shortest example, but no more tokens than the
    longest example
  • How to represent a classifier? It might
  • Restrict the length of window
  • Restrict the vocabulary or formatting used
    before/after/inside window
  • Restrict the relative order of tokens
  • Use inductive logic programming techniques to
    express all these

lttitlegtCourse Information for CS213lt/titlegt lth1gtCS
213 C Programminglt/h1gt
18
SRV a rule-learner for sliding-window
classification
  • Primitive predicates used by SRV
  • token(X,W), allLowerCase(W), numerical(W),
  • nextToken(W,U), previousToken(W,V)
  • HTML-specific predicates
  • inTitleTag(W), inH1Tag(W), inEmTag(W),
  • emphasized(W) inEmTag(W) or inBTag(W) or
  • tableNextCol(W,U) U is some token in the
    column after the column W is in
  • tablePreviousCol(W,V), tableRowHeader(W,T),

19
Automatic Pattern-Learning Systems
20
Automatic Pattern-Learning Systems
  • Pros
  • Portable across domains
  • Tend to have broad coverage
  • Robust in the face of degraded input.
  • Automatically finds appropriate statistical
    patterns
  • System knowledge not needed by those who supply
    the domain knowledge.
  • Cons
  • Annotated training data, and lots of it, is
    needed.
  • Isnt necessarily better or cheaper than
    hand-built soln
  • Examples Riloff et al., AutoSlog (UMass)
    Soderland WHISK (UMass) Mooney et al. Rapier
    (UTexas)
  • learn lexico-syntactic patterns from templates

21
Rapier Califf Mooney, AAAI-99
  • Rapier learns three regex-style patterns for each
    slot
  • ?Pre-filler pattern ? Filler pattern ?
    Post-filler pattern

22
Features for IE Learning Systems
  • Part of speech syntactic role of a specific word
  • Semantic Classes Synonyms or other related words
  • Price class price, cost, amount,
  • Month class January, February, March, ,
    December
  • US State class Alaska, Alabama, ,
    Washington, Wyoming
  • WordNet large on-line thesaurus containing
    (among other things) semantic classes

23
Rapier rule matching example
  • sold to the bank for an undisclosed
    amount
  • POS vb pr det nn pr det jj
    nn
  • SClass
    price

Pre-filler Filler Post-Filler 1) tag
nn,nnp 1) word undisclosed 1) sem price 2)
list length 2 tag jj
paid Honeywell an undisclosed price POS
vb nnp det jj
nnSClass
price
24
Rapier Rules Details
  • Rapier rule
  • pre-filler pattern
  • filler pattern
  • post-filler pattern
  • pattern subpattern
  • subpattern constraint
  • constraint
  • Word - exact word that must be present
  • Tag - matched word must have given POS tag
  • Class - semantic class of matched word
  • Can specify disjunction with
  • List length N - between 0 and N words satisfying
    other constraints

25
Rapiers Learning Algorithm
  • Input set of training examples (list of
    documents annotated with extract this
    substring)
  • Output set of rules
  • Init Rules a rule that exactly matches each
    training example
  • Repeat several times
  • Seed Select M examples randomly and generate
    the Kmost-accurate maximally-general filler-only
    rules(prefiller postfiller match anything)
  • GrowRepeat For N 1, 2, 3, Try to improve
    K best rules by adding N context words of
    prefiller or postfiller context
  • KeepRules Rules ? the best of the K rules
    subsumed rules

26
Learning example (one iteration)
  • 2 examples located in Atlanta, Georgia
    offices in Kansas City, Missouri

appropriately general rule (high precision, high
recall)
27
Rapier resultsPrecision vs. Training Examples
28
Rapier resultsRecall vs. Training Examples
29
Summary Rule-learning approaches to
sliding-window classification
  • SRV, Rapier, and WHISK Soderland KDD 97
  • Representations for classifiers allow restriction
    of the relationships between tokens, etc
  • Representations are carefully chosen subsets of
    even more powerful representations
  • Use of these heavyweight representations is
    complicated, but seems to pay off in results

30
Successors to MUC
  • CoNNL Conference on Computational Natural
    Language Learning
  • Different topics each year
  • 2002, 2003 Language-independent NER
  • 2004 Semantic Role recognition
  • 2001 Identify clauses in text
  • 2000 Chunking boundaries
  • http//cnts.uia.ac.be/conll2003/ (also conll2004,
    conll2002)
  • Sponsored by SIGNLL, the Special Interest Group
    on Natural Language Learning of the Association
    for Computational Linguistics.
  • ACE Automated Content Extraction
  • Entity Detection and Tracking
  • Sponsored by NIST
  • http//wave.ldc.upenn.edu/Projects/ACE/
  • Several others recently
  • See http//cnts.uia.ac.be/conll2003/ner/

31
CoNNL-2003
  • Goal identify boundaries and types of named
    entities
  • People, Organizations, Locations, Misc.
  • Experiment with incorporating external resources
    (Gazeteers) and unlabeled data
  • Data
  • Using IOB notation
  • 4 pieces of info for each term
  • Word POS Chunk EntityType

32
Details on Training/Test Sets
Reuters Newswire European Corpus Initiative
Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
33
Summary of Results
  • 16 systems participated
  • Machine Learning Techniques
  • Combinations of Maximum Entropy Models (5)
    Hidden Markov Models (4) Winnow/Perceptron (4)
  • Others used once were Support Vector Machines,
    Conditional Random Fields, Transformation-Based
    learning, AdaBoost, and memory-based learning
  • Combining techniques often worked well
  • Features
  • Choice of features is at least as important as ML
    method
  • Top-scoring systems used many types
  • No one feature stands out as essential (other
    than words)

Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
34
Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
35
Use of External Information
  • Improvement from using Gazeteers vs. unlabeled
    data nearly equal
  • Gazeteers less useful for German than English
    (higher quality)

Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
36
Precision, Recall, and F-Scores





Not significantly different
Sang and De Meulder, Introduction to the
CoNLL-2003 Shared Task Language-Independent
Named Entity Recognition, Proceedings of
CoNLL-2003
37
Combining Results
  • What happens if we combine the results of all of
    the systems?
  • Used a majority-vote of 5 systems for each set
  • English
  • F 90.30 (14 error reduction of best system)
  • German
  • F 74.17 (6 error reduction of best system)
  • Top four systems in more detail

38
Zhang and Johnson
  • Experimented with the effects of different
    features
  • Used a learning method they developed called
    Robust Risk Minimization
  • Related to the Winnow method
  • Used it to predict the class label ti associated
    with each token wi
  • Estimate P(ti c xi) for every possible class
    label c where xi is a feature vector associated
    with token i
  • xi can including information about previous tags
  • Found that the relatively simple, language
    independent features get you much of the way

39
Zhang and Johnson
  • Simple features include
  • The tokens themselves, in window of /- 2
  • The previous 2 predicted tags
  • The conjunction of the previous tag and the
    current token
  • Initial capitalization of tokens, in window of
    /- 2
  • More elaborate features include
  • Word shape information initial caps, all caps,
    all digits, digits containing punctuation
  • Token prefix (len 3-4) and suffix (len 1-4)
  • POS
  • Chunking info (chunk bag-of-words at current
    token)
  • Marked up entities from training data
  • Other dictionaries

40
Language independent
41
Florian, Ittycheria, Jing, Zhang
  • Combined four machine learning algorithms
  • The best-performing was the Zhang Johnson RRM
  • Voting algorithm
  • Giving them all equal-weight votes worked well
  • So did using the RRM algorithm to choose among
    them
  • English F-measure went from 89.94 to 91.63
  • Did well with the supplied features did even
    better with some complex additional features
  • The output of 2 other NER systems
  • Trained on 1.7M annotated words in 32 categories
  • A list of gazetteers
  • Improved English F-measure to 93.9
  • (21 error reduction)

42
Effects of Unknown Words
  • Florian et al. note that German is harder
  • Has more unknown words
  • All nouns are capitalized

43
Klein, Smarr, Nguyen, Manning
  • Standard approach for unknown words is to extract
    features like suffixes, prefixes, and
    capitalization
  • Idea use all-character n-grams, rather than
    words, as the primary representation
  • Integrates unknown words seamlessly into the
    model
  • Improved results of their classifier by 25

44
Balancing n-grams with Other Evidence
  • Example morning at Grace Road
  • Need the classifiers to determine Grace is part
    of a location rather than a Person
  • Used Conditional Markov Model (aka Maximum
    Entropy Model)
  • Also, added other shape information
  • 20-month -gt d-x
  • Italy -gt Xx

45
(No Transcript)
46
(No Transcript)
47
Chieu and Ng
  • Used a Maximum Entropy approach
  • Estimates probabilities based on the principle of
    making as few assumptions as possible
  • But allows specification of constraints between
    featurs and outcome (derived from training data)
  • Used a rich feature set, like those already
    discussed
  • Interesting additional features
  • Lists derived from training set
  • Global features look at how the words appeared
    elsewhere within the document
  • Doesnt say which of these features do well

48
Lists Derived from Training Data
  • UNI (useful unigrams)
  • Top 20 words that precede instances of that class
  • Computed using a correlation metric
  • UBI (useful bigrams) pairs of preceding words
  • CITY OF, ARRIVES IN
  • The bigram have higher probability of preceding
    the class than the unigram
  • CITY OF better evidence than just OF
  • NCS Useful Name Class Suffixes
  • Tokens that frequenty terminate a class
  • INC, COMMITTEE

49
Using Other Occurrences within the Document
  • Zone
  • Where is the token from? (headline, author,
    body)
  • Unigrams
  • If UNI holds for an occurrence of w elsewhere
  • Bigrams
  • If UBI holds for an occurrence of w elsewhere
  • Suffix
  • If NCS holds of an occurrence of w elsewhere
  • InitCaps
  • A way to check if a word is capitalized due to
    its position in the sentence or not. Also, check
    the first work in sequence of capitalized words.
  • Even News Broadcasting Corp., noted for its
    accurate reporting, made the erroneous
    announcement.

50
MUC Redux
  • Task fill slots of templates
  • MUC-4 (1992)
  • All systems hand-engineered
  • One MUC-6 entry used learning failed miserably

51
(No Transcript)
52
MUC Redux
  • Fast forward 12 years now use ML!
  • Chieu et. al. show a machine learning approach
    that can do as well as most of the
    hand-engineered MUC-4 systems
  • Uses state-of-the-art
  • Sentence segmenter
  • POS tagger
  • NER
  • Statistical Parser
  • Co-reference resolution
  • Features look at syntactic context
  • Use subject-verb-object information
  • Use head-words of NPs
  • Train classifiers for each slot type

Chieu, Hai Leong, Ng, Hwee Tou, Lee, Yoong Keok
(2003). Closing the Gap Learning-Based
Information Extraction Rivaling
Knowledge-Engineering Methods, In (ACL-03).
53
Best systems took 10.5 person-months of
hand-coding!
54
IE Techniques Summary
  • Machine learning approaches are doing well, even
    without comprehensive word lists
  • Can develop a pretty good starting list with a
    bit of web page scraping
  • Features mainly have to do with the preceding and
    following tags, as well as syntax and word
    shape
  • The latter is somewhat language dependent
  • With enough training data, results are getting
    pretty decent on well-defined entities
  • ML is the way of the future!

55
IE Tools
  • Research tools
  • Gate
  • http//gate.ac.uk/
  • MinorThird
  • http//minorthird.sourceforge.net/
  • Alembic (only NE tagging)
  • http//www.mitre.org/tech/alembic-workbench/
  • Commercial
  • ?? I dont know which ones work well

56
NE Annotation Tools - GATE
57
NE Annotation Tools - Alembic
58
NE Annotation Tools Alembic (2)
59
GATE
  • GATE University of Sheffields open-source
    infrastructure for language processing
  • Automatically deals with document formats, saving
    of results, evaluation, and visualisation of
    results for debugging
  • has a finite-state pattern-action rule language
  • Has an example rule-based system called ANNIE
  • ANNIE modified for MUC guidelines 89.5
    f-measure on MUC-7 corpus

60
NE Components The ANNIE system a reusable and
easily extendable set of components
61
Gates Named Entity Grammars
  • Phases run sequentially and constitute a cascade
    of FSTs over the pre-processing results
  • Hand-coded rules applied to annotations to
    identify NEs
  • Annotations from format analysis, tokeniser,
    sentence splitter, POS tagger, and gazetteer
    modules
  • Use of contextual information
  • Finds person names, locations, organisations,
    dates, addresses.

62
Named Entities in GATE
63
Named Entity Coreference
Write a Comment
User Comments (0)
About PowerShow.com