Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

Description:

Zhai and Lafferty 2001 (Language-modeling) August 9, 2002 ... Language Portability. Japanese. English. August 9, 2002. Kiyoshi Sudo Thesis Proposal Presentation ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 36
Provided by: csN6
Category:

less

Transcript and Presenter's Notes

Title: Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction


1
Automatic Acquisition ofLexical Classes and
Extraction Patternsfor Information Extraction
  • Kiyoshi Sudo
  • Ph.D. Research Proposal
  • New York University

Committee Ralph Grishman Satoshi Sekine I. Dan
Melamed
2
Outline
  • Introduction
  • Research Proposal
  • Problem Setting
  • Approach
  • Application to Information Extraction
  • Discussion

3
MUC Scenario Template Task
  • MURREE, Pakistan (AP) -- Masked gunmen firing
    Kalashnikov rifles burst through the front gates
    of a Christian school Monday, killing six people
    and wounding three in the latest attack against
    Western interests since Pakistan joined the war
    against terrorism.

4
MUC Scenario Template Task
  • MURREE, Pakistan (AP) -- Masked gunmen firing
    Kalashnikov rifles burst through the front gates
    of a Christian school Monday, killing six people
    and wounding three in the latest attack against
    Western interests since Pakistan joined the war
    against terrorism.

5
High Cost forAcquiring Knowledge-Base
  • Find extraction patterns
  • Find relevant documents
  • Find relevant events
  • Analyze sentences
  • Find domain-specific lexicon
  • Find existing KB (e.g. thesaurus, gazetteers)

6
Prior Work
Automatic Knowledge Acquisition
Lexical Acquisition
Pattern Acquisition
Mutual Bootstrapping (Riloff and Jones 1999)
Pattern Discovery with Document
Re-ranking (Yangarber et al. 2000)
Simultaneous Multi-Semantic Class (Thelen and
Riloff 2002)
(Yangarber et al. 2002)
Pattern Acquisition for QA (Ravichandran and
Hovy 2002)
7
Challenge
User
Seed Lexicon Seed Pattern
Expanded Lexicon Expanded Pattern Set
Knowledge Base
8
Meeting the Challenge
User
Seed Lexicon Seed Pattern
Expanded Lexicon Expanded Pattern Set
Knowledge Base
9
Semantic Clustering
  • Input
  • Description specific enough
  • to define the scenario
  • (terrorism, bombing, kidnapping)
  • Tell me about the terrorism action,
  • such as bombing and kidnapping.
  • Goal
  • Find Scenario-specific Semantic Clusters
  • each of which consists of
  • Semantic Lexicon
  • Extraction Patterns

10
Benefit for User
  • Simplify Domain Analysis
  • Low-cost
  • Knowledge-base Acquisition
  • for IE systems

11
Extraction Patterns
  • Definition

where
c unifies with the context that is defined by
semantic class L
Vsubj
Vobj
(cf. Sudo et al. 2001)
12
Outline
  • Introduction
  • Research Proposal
  • Problem Setting
  • Approach
  • Information Extraction
  • Evaluation

13
Overview
Semantic Clustering
14
Overview
Semantic Clustering
15
Information Retrieval
  • Get Relevant Document set
  • Get list of lexical items and extraction patterns
    ordered by relevance to the scenario
  • TF/IDF scoring

R
16
Example of TF/IDF scoring(Management Succession
Business)
300 documents retrieved From WSJ (7/94 - 8/94)
Extracted by MINIPAR (Lin 1998)
17
Overview
Semantic Clustering
18
Bootstrapping
  • Find one cluster that consists of Lexicon and
    Extraction Patterns
  • Assumption
  • Patterns provide Lexical Classes.
  • Lexicon provides contextual information.

Riloff and Jones 1999 Agichtein and Gravano 2000
19
Bootstrapping (Cont.)
  • Algorithm (cf. Riloff and Jones 1999)
  • Given
  • the ordered list of terms
  • the ordered list of extraction patterns
  • Lexicon (), Pattern ()
  • w ? the most relevant term in the list and add it
    into Lexicon
  • p ? the most relevant pattern among those that
    extract w.
  • Add p into Pattern
  • w ?the most relevant term among those that are
    extracted by p
  • Add w into Lexicon
  • Go to 1

20
Example of Bootstrapping(Management Succession
Business)
From WSJ (7/94 - 8/94)
Extracted by MINIPAR (Lin 1998)
21
Example of Bootstrapping(Management Succession
Business)
From WSJ (7/94 - 8/94)
Extracted by MINIPAR (Lin 1998)
22
ProblemPolysemous Lexicon, Pattern
  • Lexicon can be ambiguous
  • e.g. Clinton (Person, Organization, Location )
  • Extraction patterns can be ambiguous
  • e.g. be killed in ltxgt (x Location, Date )
  • Needs more study
  • more restriction
  • Probabilistic Model ??

23
Overview
Semantic Clustering
Source
Information Retrieval
Boot- strapping
Query Expansion
24
Query Expansion
  • Generalize terms in a query with a newly
    discovered cluster
  • cf. Rocchio 1971 (Vector model)
  • Zhai and Lafferty 2001 (Language-modeling)

25
Overview
Semantic Clustering
Source
Information Retrieval
Boot- strapping
Query Expansion
26
Outline
  • Introduction
  • Research Proposal
  • Problem Setting
  • Approach
  • Application to Information Extraction
  • Discussion

27
Application toInformation Extraction
28
Human Intervention
  • Extraction patterns
  • Event pattern
  • Context contains a verb or nominalization of verb
  • Used for event extraction and role assignment
  • e.g. (terrorist, fire, x)
  • Local pattern
  • Context contains only enough information to
    recognize semantic class
  • Used for entity recognition only
  • e.g. (x,Inc.)
  • Association of Event Pattern to Role
  • e.g. (company, hire, x)?PersonIn and (company,
    fire, x)?PersonOut

29
Outline
  • Introduction
  • Research Proposal
  • Problem Setting
  • Approach
  • Application to Information Extraction
  • Discussion

30
Discussion
  • Domain Portability
  • User only needs to specify the scenario
  • Language Portability
  • Language-dependent Tools
  • Segmentation (Lemmatization)
  • Dependency Parsing

31
Evaluation
  • MUC-style (Scenario-Template task)
  • Slot-base
  • Precision, Recall, F-measure
  • Domain Portability
  • Several pre-defined tasks that differ in
    difficulty
  • Language Portability
  • Japanese
  • English

32
Contribution
  • Tool for Domain Analysis
  • Low-cost Knowledge-base Acquisition
  • Towards Open-domain Information Extraction

33
Conclusion
  • Proposed New Approach for Knowledge-base
    Acquisition (Semantic Clustering)
  • Discussed Application of Acquired KB to
    Information Extraction (Human Intervention and
    Local vs. Event patterns)
  • Discussed Evaluation with several predefined
    MUC-style tasks different in difficulty and
    across languages (Domain portability and Language
    portability)

34
ToDo
  • Implementation
  • Preparation for Evaluation
  • Evaluation

35
Time for Questions(Conclusion)
  • Proposed New Approach for Knowledge-base
    Acquisition (Semantic Clustering)
  • Discussed Application of Acquired KB to
    Information Extraction (Human Intervention and
    Local vs. Event patterns)
  • Discussed Evaluation with several predefined
    MUC-style tasks different in difficulty and
    across languages (Domain portability and Language
    portability)
Write a Comment
User Comments (0)
About PowerShow.com