NYU: Description of the ProteusPET System as Used for MUC7 ST - PowerPoint PPT Presentation

About This Presentation
Title:

NYU: Description of the ProteusPET System as Used for MUC7 ST

Description:

NYU built a set of tools, which allow the user to adapt the system to new ... Decoding. Results. Conclusion. Maximum Entropy. Problem Definition ... Decoding. Simple ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 37
Provided by: jiny7
Category:

less

Transcript and Presenter's Notes

Title: NYU: Description of the ProteusPET System as Used for MUC7 ST


1
NYUDescription of the Proteus/PET System as
Used for MUC-7 ST
  • Roman Yangarber Ralph Grishman
  • Presented by Jinying Chen
  • 10/04/2002

2
Outline
  • Introduction
  • Proteus IE System
  • PET User Interface
  • Performance on the Launch Scenario

3
Introduction
  • Problem portability and customization of IE
    engines at the scenario level
  • To address this problem
  • NYU built a set of tools, which allow the user to
    adapt the system to new scenarios rapidly through
    example-based learning
  • The present system operates on two tiers Proteus
    PET

4
Introduction (Cont.)
  • Proteus
  • Core extraction engine, an enhanced version of
    the one employed at MUC-6
  • PET
  • GUI front end, through which the user interacts
    with Proteus
  • The user provide the system examples of events in
    text, and examples of associated database entries
    to be created

5
Proteus IE System
  • Modular design
  • Control is encapsulated in immutable,
    domain-independent core components
  • Domain-specific information resides in the
    knowledge bases

6

7
Proteus IE System (Cont.)
  • Lexical analysis module
  • Assign each token a reading or a list of
    alternative readings by consulting a set of
    on-line dictionaries
  • Name Recognition
  • Identify proper names in the text by using local
    contextual cues

8
Proteus IE System (Cont.)
  • Partial Syntax
  • Find small syntactic units, such as basic NPs and
    VPs
  • Marks the phrase with semantic information, e.g.
    the semantic class of the head of the phrase
  • Scenario Patterns
  • Find higherlevel syntactic constructions using
    local semantic information apposition,
    prepositional phrase attachment, limited
    conjucntions, and clausal constructions.

9
Proteus IE System (Cont.)
  • Note
  • The above three modules are Pattern matching
    phrases, they operate by deterministic,
    bottom-up, partial parsing or pattern matching.
  • The output is a sequence of LFs corresponding to
    the entities, relationships, and events
    encountered in the analysis.

10
Figure 2 LF for the NP a satellite built by
Loral Corp. of New York for Intelsat
11
Proteus IE System (Cont.)
  • Reference Resolution (RefRes)
  • Links anaphoric pronouns to their antecedents and
    merges other co-referring expressions
  • Discourse Analysis
  • Uses higher-level inference rules to build more
    complex event structures
  • E.g. a rule that merges a Mission entity with a
    corresponding Launch event.
  • Output Generation

12
PET User Interface
  • A disciplined method of customization of
    knowledge bases, and the pattern base in
    particular
  • Organization of Patterns
  • The pattern base is organized in layers
  • Proteus treats the patterns at the different
    levels differently
  • Acquires the most specific patterns directly from
    user, on a per-scenario basis

13
user
Pattern Lib
Core part of System
14
PET User Interface (Cont.)
  • Pattern Acquisition
  • Enter an example
  • Choose an event template
  • Apply existing patterns (step 3)
  • Tune pattern elements (step 4)
  • Fill event slots (step 5)
  • Build pattern
  • Syntactic generalization

15
Step 3
Step 4
Step 5
16
Performance on the Launch Scenario
  • Scenario Patterns
  • Basically two types launch events and mission
    events
  • In cases there is no direct connection between
    these two events, the post-processing inference
    rules attempted to tie the mission to a launch
    event
  • Inference Rules
  • Involve many-to-many relations (e.g. multiple
    payloads correspond to a single event)
  • Extending inference rule set with heuristics,
    e.g. find date and site

17
  • Conclusion
  • Example-based pattern acquisition is appropriate
    for ST-level task, especially when training data
    is quite limited
  • Pattern editing tools are useful and effective

18
NYUDescription of the MENE Named Entity System
as Used in MUC-7
  • Andrew Borthwick, John Sterling etc.
  • Presented by Jinying Chen
  • 10/04/2002

19
Outline
  • Maximum Entropy
  • MENEs Feature Classes
  • Feature Selection
  • Decoding
  • Results
  • Conclusion

20
Maximum Entropy
  • Problem Definition
  • The problem of named entity recognition can be
    reduced to the problem of assigning 4n1 tags to
    each token
  • n the number of name categories, such as
    company, product, etc. For MUC-7, n7
  • 4 states x_start, x_continue, x_end, x_unique
  • other not part of a named entity

21
Maximum Entropy (cont.)
  • Maximum Solution
  • compute p(f h), where f is the prediction among
    the 4n1 tags and h is the history
  • the computation of p(f h) depends on a set of
    binary-valued features, e.g.

22
Maximum Entropy (cont.)
  • Given a set of features and some training data,
    the maximum entropy estimation process produces a
    model

23
MENEs Feature Classes
  • Binary Features
  • Lexical Features
  • Section Features
  • Dictionary Features
  • External Systems Features

24
Binary Features
  • Features whose history can be considered to be
    either on or off for a given token.
  • Example
  • The token begins with a capitalized letter
  • The token is a four-digit number

25
Lexical Features
  • Example

26
Section Features
  • Features make predictions based on the current
    section of the article, like Date, Preamble,
    and Text.
  • Play a key role by establishing the background
    probability of the occurrence of the different
    futures (predictions).

27
Dictionary Features
  • Make use of a broad array of dictionaries of
    useful single or multi-word terms such as first
    names, company names, and corporate suffixes.
  • Require no manual editing

28
(No Transcript)
29
External Systems Features
  • MENE incorporates the outputs of three NE taggers
  • a significantly enhanced version of the
    traditional , hand-coded Proteus named-entity
    tagger
  • Manitoba
  • IsoQuest

30
  • Example

31
Feature Selection
  • Simple
  • Select all features which fire at least 3 times
    in the training corpus

32
Decoding
  • Simple
  • For each token, check all the active features for
    this token and compute p(f h)
  • Run a viterbi search to find the highest
    probability coherent path through the lattice of
    conditional probabilities

33
Results
  • Training set 350 aviation disaster articles
    (consisted of about 270,000 words)
  • Test set
  • Dry run within-domain corpus
  • Formal run out-of-domain corpus

34
Result (cont.)
35
Result (cont.)
36
Conclusion
  • A new, still immature system. Can improve the
    performance by
  • Adding long-range reference-resolution features
  • Exploring compound features
  • Sophisticated methods of feature selection
  • Highly portable
  • An efficient method to combine NE systems
Write a Comment
User Comments (0)
About PowerShow.com