Relational Learning of Pattern-Match Rules for Information Extraction - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Relational Learning of Pattern-Match Rules for Information Extraction

Description:

Starts with rules containing only generalizations of the filler patterns. Employs top-down beam search for pre and post fillers ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 21
Provided by: max85
Category:

less

Transcript and Presenter's Notes

Title: Relational Learning of Pattern-Match Rules for Information Extraction


1
Relational Learning of Pattern-Match Rules for
Information Extraction
  • Mary Elaine Califf
  • Raymond J. Mooney

2
Motivation
  • Increasing electronic documents contain a large
    amount of information
  • Time-consuming to build IE systems
  • Highly domain-specific components

3
RAPIER
  • Uses relational learning to construct unbounded
    pattern-match rules, given a database of texts
    and filled templates
  • Primarily consists of a bottom-up search
  • Employs limited syntactic and semantic
    information
  • Learn rules for the complete IE task

4
Filled template of RAPIER
5
Relational learning and Inductive Logic
Programming (ILP)
  • Allow induction over structured examples that can
    include first-order logical representations and
    unbounded data structures
  • Work well in text categorization and generation
    of the past tense of English verbs

6
Other ILP Systems
  • GOLEM
  • CHILLIN
  • PROGOL

7
RAPIERs rule representation
  • Indexed by template name and slot name
  • Consists of three parts
  • 1. A pre-filler pattern
  • 2. Filler pattern (matches the actual slot)
  • 3. Post-filler

8
Pattern
  • Pattern item matches exactly one word
  • Pattern list has a maximum length N and matches
    0..N words.
  • Must satisfy a set of constraints
  • 1. Specific word, POS, Semantic class
  • 2. Disjunctive lists

9
An example of rule
Sold to the bank for an undisclosed amount Paid
Honeywell an undisclosed price
10
RAPIERS Learning Algorithm
  • Begins with a most specific definition and
    compresses it by replacing with more general ones
  • Attempts to compress the rules for each slot
  • Preferring more specific rules

11
Implementation
  • Least general generalization (LGG)
  • Starts with rules containing only generalizations
    of the filler patterns
  • Employs top-down beam search for pre and post
    fillers
  • Rules are ordered using an information gain
    metric and weighted by the size of the rule
    (preferring smaller rules)

12
Example
Located in Atlanta, Georgia. Offices in Kansas
City, Missouri
13
Example (cont)
14
Example (cont)
Final best rule
15
Experimental Evaluation
  • A set of 300 computer-related job posting from
    austin.jobs
  • A set of 485 seminar announcements from CMU.
  • Three different versions of RAPIER were tested
  • 1.words, POS tags, semantic classes
  • 2. words, POS tags
  • 3. words

16
Other learning IE systems
  • Naïve Bayes system, uses words in a fixed-length
    window to locate slot
  • SRV, uses top-down, set-covering rule learner and
    four pre-determined predicates.
  • WHISK, uses pattern match and restricted form of
    regular expressions

17
Performance on job postings
18
Results for seminar announcement task
19
Conclusion
  • Pros
  • 1. Have the potential to help automate the
    development process of IE systems.
  • 2. Work well in locating specific data in
    newsgroup messages
  • 3. Identify potential slot fillers and their
    surrounding context with limited syntactic and
    semantic information
  • 4. Learn rules from relatively small sets of
    examples in some specific domains
  • Cons
  • 1.single slot
  • 2.regular expression
  • 3. Unknown performances for more complicated
    situations

20
Question?
Write a Comment
User Comments (0)
About PowerShow.com