Information Extraction, Data Mining and Joint Inference - PowerPoint PPT Presentation

About This Presentation
Title:

Information Extraction, Data Mining and Joint Inference

Description:

Information Extraction, Data Mining and Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton ... – PowerPoint PPT presentation

Number of Views:226
Avg rating:3.0/5.0
Slides: 79
Provided by: uma114
Category:

less

Transcript and Presenter's Notes

Title: Information Extraction, Data Mining and Joint Inference


1
Information Extraction,Data Miningand Joint
Inference
  • Andrew McCallum
  • Computer Science Department
  • University of Massachusetts Amherst

Joint work with Charles Sutton, Aron Culotta,
Xuerui Wang, Ben Wellner, David Mimno, Gideon
Mann.
2
Goal
Mine actionable knowledgefrom unstructured text.
3
Extracting Job Openings from the Web
4
A Portal for Job Openings
5
Job Openings Category High Tech Keyword Java
Location U.S.
6
Data Mining the Extracted Job Information
7
IE from Research Papers
McCallum et al 99
8
IE from Research Papers
9
Mining Research Papers
Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004
Giles et al
10
IE fromChinese Documents regarding Weather
Department of Terrestrial System, Chinese Academy
of Sciences
200k documents several millennia old - Qing
Dynasty Archives - memos - newspaper articles -
diaries
11
What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification clustering association
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
12
What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
13
What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
14
What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation

Free Soft..
Microsoft
Microsoft
TITLE ORGANIZATION

founder

CEO
VP

Stallman
NAME
Veghte
Bill Gates
Richard
Bill
15
From Text to Actionable Knowledge
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Prediction Outlier detection Decision support
16
Problem
  • Combined in serial juxtaposition,
  • IE and DM are unaware of each others
  • weaknesses and opportunities.
  • DM begins from a populated DB, unaware of where
    the data came from, or its inherent errors and
    uncertainties.
  • IE is unaware of emerging patterns and
    regularities in the DB.
  • The accuracy of both suffers, and significant
    mining of complex text sources is beyond reach.

17
Solution
Uncertainty Info
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Emerging Patterns
Prediction Outlier detection Decision support
18
Solution
Unified Model
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Probabilistic Model
Documentcollection
Actionableknowledge
Prediction Outlier detection Decision support
19
Scientific Questions
  • What model structures will capture salient
    dependencies?
  • Will joint inference actually improve accuracy?
  • How to do inference in these large graphical
    models?
  • How to do parameter estimation efficiently in
    these models,which are built from multiple large
    components?
  • How to do structure discovery in these models?

20
Scientific Questions
  • What model structures will capture salient
    dependencies?
  • Will joint inference actually improve accuracy?
  • How to do inference in these large graphical
    models?
  • How to do parameter estimation efficiently in
    these models,which are built from multiple large
    components?
  • How to do structure discovery in these models?

21
Outline
a
  • Examples of IE and Data Mining.
  • Motivate Joint Inference
  • Brief introduction to Conditional Random Fields
  • Joint inference Examples
  • Joint Labeling of Cascaded Sequences (Loopy
    Belief Propagation)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Co-reference with Weighted 1st-order Logic
    (MCMC)
  • Joint Relation Extraction and Data Mining
    (Bootstrapping)
  • Ultimate application area Rexa, a Web portal
    for researchers

a
22
(Linear Chain) Conditional Random Fields
Lafferty, McCallum, Pereira 2001
Undirected graphical model, trained to
maximize conditional probability of output
(sequence) given input (sequence)
Finite state model
Graphical model
OTHER PERSON OTHER ORG TITLE
output seq
y
y
y
y
y
t2
t3
t
-
1
t
t1
FSM states
. . .
observations
x
x
x
x
x
t
t
t
t
1
-
2
3
t
1
input seq
said Jones a Microsoft VP
23
Table Extraction from Government Reports
Cash receipts from marketings of milk during 1995
at 19.9 billion dollars, was slightly below
1994. Producer returns averaged 12.93 per
hundredweight, 0.19 per hundredweight
below 1994. Marketings totaled 154 billion
pounds, 1 percent above 1994. Marketings
include whole milk sold to plants and dealers as
well as milk sold directly to consumers.


An estimated 1.56 billion pounds of milk
were used on farms where produced, 8 percent
less than 1994. Calves were fed 78 percent of
this milk with the remainder consumed in
producer households.



Milk Cows
and Production of Milk and Milkfat
United States,
1993-95
-------------------------------------------------
-------------------------------
Production of Milk and Milkfat
2/ Number
-------------------------------------------------
------ Year of Per Milk Cow
Percentage Total
Milk Cows 1/------------------- of Fat in All
------------------
Milk Milkfat Milk Produced Milk
Milkfat ----------------------------------------
----------------------------------------
1,000 Head --- Pounds --- Percent
Million Pounds

1993 9,589 15,704 575
3.66 150,582 5,514.4 1994
9,500 16,175 592 3.66
153,664 5,623.7 1995 9,461
16,451 602 3.66 155,644
5,694.3 ----------------------------------------
---------------------------------------- 1/
Average number during year, excluding heifers not
yet fresh. 2/ Excludes milk
sucked by calves.

24
Table Extraction from Government Reports
Pinto, McCallum, Wei, Croft, 2003 SIGIR
100 documents from www.fedstats.gov
Labels
CRF
  • Non-Table
  • Table Title
  • Table Header
  • Table Data Row
  • Table Section Data Row
  • Table Footnote
  • ... (12 in all)

Cash receipts from marketings of milk during 1995
at 19.9 billion dollars, was slightly below
1994. Producer returns averaged 12.93 per
hundredweight, 0.19 per hundredweight
below 1994. Marketings totaled 154 billion
pounds, 1 percent above 1994. Marketings
include whole milk sold to plants and dealers as
well as milk sold directly to consumers.


An estimated 1.56 billion pounds of milk
were used on farms where produced, 8 percent
less than 1994. Calves were fed 78 percent of
this milk with the remainder consumed in
producer households.



Milk Cows
and Production of Milk and Milkfat
United States,
1993-95
-------------------------------------------------
-------------------------------
Production of Milk and Milkfat
2/ Number
-------------------------------------------------
------ Year of Per Milk Cow
Percentage Total
Milk Cows 1/------------------- of Fat in All
------------------
Milk Milkfat Milk Produced Milk
Milkfat ----------------------------------------
----------------------------------------
1,000 Head --- Pounds --- Percent
Million Pounds

1993 9,589 15,704 575
3.66 150,582 5,514.4 1994
9,500 16,175 592 3.66
153,664 5,623.7 1995 9,461
16,451 602 3.66 155,644
5,694.3 ----------------------------------------
---------------------------------------- 1/
Average number during year, excluding heifers not
yet fresh. 2/ Excludes milk
sucked by calves.
Features
  • Percentage of digit chars
  • Percentage of alpha chars
  • Indented
  • Contains 5 consecutive spaces
  • Whitespace in this line aligns with prev.
  • ...
  • Conjunctions of all previous features, time
    offset 0,0, -1,0, 0,1, 1,2.

25
Table Extraction Experimental Results
Pinto, McCallum, Wei, Croft, 2003 SIGIR
Line labels, percent correct
Table segments, F1
HMM
65
64
Stateless MaxEnt
85
-
95
92
CRF
26
IE from Research Papers
McCallum et al 99
27
IE from Research Papers
Field-level F1 Hidden Markov Models
(HMMs) 75.6 Seymore, McCallum, Rosenfeld,
1999 Support Vector Machines (SVMs) 89.7 Han,
Giles, et al, 2003 Conditional Random Fields
(CRFs) 93.9 Peng, McCallum, 2004
? error 40
28
Named Entity Recognition
CRICKET - MILLNS SIGNS FOR BOLAND CAPE TOWN
1996-08-22 South African provincial side Boland
said on Thursday they had signed Leicestershire
fast bowler David Millns on a one year contract.
Millns, who toured Australia with England A in
1992, replaces former England all-rounder Phillip
DeFreitas as Boland's overseas professional.
Labels Examples
PER Yayuk Basuki Innocent Butare ORG 3M KDP
Cleveland LOC Cleveland Nirmal Hriday The
Oval MISC Java Basque 1,000 Lakes Rally
29
Automatically Induced Features
McCallum Li, 2003, CoNLL
Index Feature 0 inside-noun-phrase
(ot-1) 5 stopword (ot) 20 capitalized
(ot1) 75 wordthe (ot) 100 in-person-lexicon
(ot-1) 200 wordin (ot2) 500 wordRepublic
(ot1) 711 wordRBI (ot) headerBASEBALL 1027 he
aderCRICKET (ot) in-English-county-lexicon
(ot) 1298 company-suffix-word (firstmentiont2) 40
40 location (ot) POSNNP (ot) capitalized
(ot) stopword (ot-1) 4945 moderately-rare-first-
name (ot-1) very-common-last-name
(ot) 4474 wordthe (ot-2) wordof (ot)
30
Named Entity Extraction Results
McCallum Li, 2003, CoNLL
Method F1 HMMs BBN's Identifinder 73 CRFs
w/out Feature Induction 83 CRFs with Feature
Induction 90 based on LikelihoodGain
31
Outline
a
  • Examples of IE and Data Mining.
  • Motivate Joint Inference
  • Brief introduction to Conditional Random Fields
  • Joint inference Examples
  • Joint Labeling of Cascaded Sequences (Loopy
    Belief Propagation)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Co-reference with Weighted 1st-order Logic
    (MCMC)
  • Joint Relation Extraction and Data Mining
    (Bootstrapping)
  • Ultimate application area Rexa, a Web portal
    for researchers

a
a
32
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
33
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
34
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
But errors cascade--must be perfect at every
stage to do well.
35
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
Joint prediction of part-of-speech and
noun-phrase in newswire, matching accuracy with
only 50 of the training data.
Inference Loopy Belief Propagation
36
Outline
a
  • Examples of IE and Data Mining.
  • Motivate Joint Inference
  • Brief introduction to Conditional Random Fields
  • Joint inference Examples
  • Joint Labeling of Cascaded Sequences (Loopy
    Belief Propagation)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Co-reference with Weighted 1st-order Logic
    (MCMC)
  • Joint Relation Extraction and Data Mining
    (Bootstrapping)
  • Ultimate application area Rexa, a Web portal
    for researchers

a
37
Joint co-reference among all pairsAffinity
Matrix CRF
Entity resolutionObject correspondence
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
Y/N
-99
Y/N
25 reduction in error on co-reference of
proper nouns in newswire.
11
. . . she . . .
Inference Correlational clustering graph
partitioning
McCallum, Wellner, IJCAI WS 2003, NIPS 2004
Bansal, Blum, Chawla, 2002
38
Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Stuart Russell
Y/N
Stuart Russell
Y/N
Y/N
S. Russel
39
Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Organizations
Stuart Russell
University of California at Berkeley
Y/N
Y/N
Stuart Russell
Y/N
Berkeley
Y/N
Y/N
Y/N
S. Russel
Berkeley
40
Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Organizations
Stuart Russell
University of California at Berkeley
Y/N
Y/N
Stuart Russell
Y/N
Berkeley
Y/N
Y/N
Y/N
Reduces error by 22
S. Russel
Berkeley
41
Joint Co-reference Experimental Results
Culotta McCallum 2005
CiteSeer Dataset 1500 citations, 900 unique
papers, 350 unique venues Paper
Venue indep joint indep joint constraint 88.
9 91.0 79.4 94.1 reinforce 92.2 92.2 56.5 60.1
face 88.2 93.7 80.9 82.8 reason 97.4 97.0 75
.6 79.5 Micro Average 91.7 93.4 73.1 79.1 ?
error20 ?error22
42
4. Joint segmentation and co-reference
Extraction from and matching of research paper
citations.
o
s
World Knowledge
Laurel, B. Interface Agents Metaphors with
Character, in The Art of Human-Computer
Interface Design, B. Laurel (ed), Addison-Wesley,
1990.
c
Co-reference decisions
y
y
p
Brenda Laurel. Interface Agents Metaphors with
Character, in Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
Databasefield values
c
y
c
Citation attributes
s
s
Segmentation
o
o
35 reduction in co-reference error by using
segmentation uncertainty.
6-14 reduction in segmentation error by using
co-reference.
Inference Sparse Generalized Belief Propagation
Wellner, McCallum, Peng, Hay, UAI 2004
see also Marthi, Milch, Russell, 2003
Pal, Sutton, McCallum, 2005
43
Outline
a
  • Examples of IE and Data Mining.
  • Motivate Joint Inference
  • Brief introduction to Conditional Random Fields
  • Joint inference Examples
  • Joint Labeling of Cascaded Sequences (Loopy
    Belief Propagation)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Co-reference with Weighted 1st-order Logic
    (MCMC)
  • Joint Relation Extraction and Data Mining
    (Bootstrapping)
  • Ultimate application area Rexa, a Web portal
    for researchers

a
a
a
a
44
Sometimes pairwise comparisonsare not enough.
  • Entities have multiple attributes (name, email,
    institution, location)need to measure
    compatibility among them.
  • Having 2 given names is common, but not 4.
  • Need to measure size of the clusters of mentions.
  • ? a pair of lastname strings that differ gt 5?
  • We need measures on hypothesized entities
  • We need First-order logic

45
Toward High-Order Representations Identity
Uncertainty
..Howard Dean..
..H Dean..
..Dean Martin..
..Howard Martin..
..Dino..
..Howard..
46
Toward High-Order Representations Identity
Uncertainty
..Howard Dean..
..H Dean..
..Dean Martin..
..Howard Martin..
..Dino..
..Howard..
47
Pairwise Co-reference Features
Howard Dean
Dean Martin
Howard Martin
48
Cluster-wise (higher-order) Representations
Howard Dean
SamePerson(Howard Dean, Howard Martin,
Dean Martin)?
Dean Martin
Howard Martin
49
Cluster-wise (higher-order) Representations

Dino
Martin
Dean Martin
Howard Dean
Howard Martin
Howie
50
This space complexity is common in first-order
probabilistic models
51
Markov Logic (Weighted 1st-order Logic)Using
1st-order Logic as a Template to Construct a CRF
Richardson Domingos 2005
ground Markov network
grounding Markov network requires space
O(nr) n number constants
r highest clause arity
52
How can we perform inference and learning in
models that cannot be grounded?
53
Inference in First-Order ModelsSAT Solvers
  • Weighted SAT solvers Kautz et al 1997
  • Requires complete grounding of network
  • LazySAT Singla Domingos 2006
  • Saves memory by only storing clauses that may
    become unsatisfied
  • Still requires exponential time to visit all
    ground clauses at initialization.

54
Inference in First-Order ModelsSampling
  • Gibbs Sampling
  • Difficult to move between high probability
    configurations by changing single variables
  • Although, consider MC-SAT Poon Domingos 06
  • An alternative Metropolis-Hastings sampling
  • Can be extended to partial configurations
  • Only instantiate relevant variables
  • Successfully used in BLOG models Milch et al
    2005
  • 2 parts proposal distribution, acceptance
    distribution.

Culotta McCallum 2006
55
Learning in First-Order Models
  • Sampling
  • Pseudo-likelihood
  • Voted Perceptron
  • We propose
  • Conditional model to rank configurations
  • Intuitive objective function for
    Metropolis-Hastings

56
Contributions
  • Metropolis-Hastings sampling in an undirected
    model with first-order features
  • Discriminative training for Metropolis-Hastings

57
An Undirected Model of Identity Uncertainty
58
Toward High-Order Representations Identity
Uncertainty

Dino
Martin
Dean Martin
Howard Dean
Howard Martin
Howie
59
Model
First-order features
Dean Martin Dino Howard Martin Howie Martin
Howard Dean Governor Howie
fw SamePerson(x) fb DifferentPerson(x, x )
60
Model
Howard Martin Howie Martin
Howard Dean Governor Howie
Dean Martin Dino
61
Model
ZX Sum over all possible configurations!
62
Proposal Distribution
Dean Martin Howie Martin
Howard Martin Dino
y y
63
Proposal Distribution
Dean Martin Howie Martin
Howard Martin Dino
y y
Dean Martin Howie Martin Howard Martin
Howie Martin
64
Proposal Distribution
y y
Dean Martin Howie Martin Howard Martin
Howie Martin
Dean Martin Howie Martin
Howard Martin Dino
65
Inference with Metropolis-Hastings
  • y configuration
  • p(y)/p(y) likelihood ratio
  • Ratio of P(YX)
  • ZX cancels
  • q(yy) proposal distribution
  • probability of proposing move y ?y

66
Learning the Likelihood Ratio
Given a pair of configurations, learn to rank the
better configuration higher.
67
Learning the Likelihood Ratio
S(Y) true evaluation of configuration (e.g. F1)
68
Sampling Training Examples
  • Run sampler on training data
  • Generate training example for each proposed move
  • Iteratively retrain during sampling

69
Tying Parameters with Proposal Distribution
  • Proposal distribution q(yy) cheap
    approximation to p(y)
  • Reuse subset of parameters in p(y)
  • E.g. in identity uncertainty model
  • Sample two clusters
  • Stochastic agglomerative clustering to propose
    new configuration

70
Experiments
71
Simplified Model
  • Use only within-cluster factors.
  • Inference with agglomerative clustering

Dean Martin Dino
Howard Martin Howie Martin
72
Experiments
  • Paper citation coreference
  • Author coreference
  • First-order features
  • All Titles Match
  • Exists Year MisMatch
  • Average String Edit Distance gt X
  • Number of mentions

73
Results on Citation Data
Citeseer paper coreference results (pair F1)
First-Order Pairwise
constraint 82.3 76.7
reinforce 93.4 78.7
face 88.9 83.2
reason 81.0 84.9
Author coreference results (pair F1)
First-Order Pairwise
miller_d 41.9 61.7
li_w 43.2 36.2
smith_b 65.4 25.4
74
Outline
a
  • Examples of IE and Data Mining.
  • Motivate Joint Inference
  • Brief introduction to Conditional Random Fields
  • Joint inference Examples
  • Joint Labeling of Cascaded Sequences (Loopy
    Belief Propagation)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Co-reference with Weighted 1st-order Logic
    (MCMC)
  • Joint Relation Extraction and Data Mining
    (Bootstrapping)
  • Ultimate application area Rexa, a Web portal
    for researchers

a
a
a
a
a
75
Motivation Robust Relation Extraction
George W. Bush graduated from Yale
Name Education
George W. Bush Yale
George W. Bush attended Yale
Bill Clinton attended Yale. Fellow alumnus George
W. Bush
Yale is located in New Haven. When George W.
Bush visited
  • Pattern matching
  • Classifier with contextual and external
    features (thesaurus)
  • Relations with sparse, noisy, or complex
    contextual evidence?
  • How to learn predictive relational patterns?
  • Knowledge discovery from text
  • Mine web to discover unknown facts

76
Data
  • 270 Wikipedia articles
  • 1000 paragraphs
  • 4700 relations
  • 52 relation types
  • JobTitle, BirthDay, Friend, Sister, Husband,
    Employer, Cousin, Competition, Education,
  • Targeted for density of relations
  • Bush/Kennedy/Manning/Coppola families and friends

77
(No Transcript)
78
Relation Extraction as
Named-Entity Recognition Classification
  • George W. Bush and his father, George H. W.
    Bush,

79
Relation Extraction as
Named-Entity Recognition Classification
  • Difficulties with this approach
  • enumerate all pairs of entities in document
  • low signal/noise
  • errors in NER
  • if Ford mislabeled as company, wont be part
    of brother relation.

80
Relation Extraction as Sequence Labeling
  • George W. Bush
  • the son of George H. W. Bush
  • Most entities are related to subject
  • Folds together NER and relation extraction
  • Models dependency of adjacent relations
  • Austrian physicist nationality jobTitle
  • Lots of work on sequence labeling
  • HMMs, CRFs,

81
CRF Features
  • Context words
  • Lexicons
  • cities, states, names, companies
  • Regexp
  • Capitalization, ContainsDigits,
    ContainsPunctuation
  • Part-of-speech
  • Prefixes/suffixes
  • Conjunctions of these within window of size 6

82
Example Features
  • Father son of NAME father, NAME
  • Brother hisher brother X
  • Executive JOBTITLE of X
  • Birthday born MONTH 0-9
  • Boss under JOBTITLE X
  • Competition defeating NAME
  • Award awarded DET X won DET X

83
(No Transcript)
84
Mining Relational Features
  • Want to discover database regularities across
    documents that provide strong evidence of
    relation
  • High Precision
  • parent(x,z) sibling(z,w) child(w,y) gt
    cousin(x,y)
  • Low Precision
  • friends tend to attend the same schools

85
Mining Relational Features
  • Generate relational path features from extracted
    (or true) database.
  • Paths between entities up to length k

86
George W. Bush his father George H. W.
Bush his cousin John Prescott Ellis
George H. W. Bush his sister Nancy Ellis Bush
Nancy Ellis Bush her son John Prescott Ellis
Cousin Fathers Sisters Son
87
John Kerry celebrated with Stuart Forbes
likely a cousin
88
Iterative DB Construction
  • Joseph P. Kennedy, Sr
  • son John F. Kennedy with Rose
    Fitzgerald

Name Son
Joseph P. Kennedy John F. Kennedy
Rose Fitzgerald John F. Kennedy
(0.3)
89
Results
ME CRF RCRF RCRF .9 RCRF .5 RCRF Truth RCRF Truth.5
F1 .5489 .5995 .6100 .6008 .6136 .6791 .6363
Prec .6475 .7019 .6799 .7177 .7095 .7553 .7343
Recall .4763 .5232 .5531 .5166 .5406 .6169 .5614
ME maximum entropy CRF conditional random
field RCRF CRF mined features
90
Examples of Discovered Relational Features
  • Mother Father?Wife
  • Cousin Mother?Husband?Nephew
  • Friend Education?Student
  • Education Father?Education
  • Boss Boss?Son
  • MemberOf Grandfather?MemberOf
  • Competition PoliticalParty?Member?Competition

91
Outline
a
  • Examples of IE and Data Mining.
  • Motivate Joint Inference
  • Brief introduction to Conditional Random Fields
  • Joint inference Examples
  • Joint Labeling of Cascaded Sequences (Loopy
    Belief Propagation)
  • Joint Co-reference Resolution (Graph
    Partitioning)
  • Joint Co-reference with Weighted 1st-order Logic
    (MCMC)
  • Joint Relation Extraction and Data Mining
    (Bootstrapping)
  • Ultimate application area Rexa, a Web portal
    for researchers

a
a
a
92
Mining our Research Literature
  • Better understand structure of our own research
    area.
  • Structure helps us learn a new field.
  • Aid collaboration
  • Map how ideas travel through social networks of
    researchers.
  • Aids for hiring and finding reviewers!

93
Previous Systems
94
(No Transcript)
95
Previous Systems
Cites
Research Paper
96
More Entities and Relations
Expertise
Cites
Research Paper
Person
Grant
University
Venue
Groups
97
(No Transcript)
98
(No Transcript)
99
(No Transcript)
100
(No Transcript)
101
(No Transcript)
102
(No Transcript)
103
(No Transcript)
104
(No Transcript)
105
(No Transcript)
106
(No Transcript)
107
(No Transcript)
108
(No Transcript)
109
(No Transcript)
110
(No Transcript)
111
(No Transcript)
112
(No Transcript)
113
(No Transcript)
114
(No Transcript)
115
Topical Transfer
Mann, Mimno, McCallum, JCDL 2006
Citation counts from one topic to another.
Map producers and consumers
116
Impact Diversity
Topic Diversity Entropy of the distribution of
citing topics
117
Summary
  • Joint inference needed for avoiding cascading
    errors in information extraction and data mining.
  • Challenge making inference learning scale to
    massive graphical models.
  • Markov-chain Monte Carlo
  • Rexa New research paper search engine, mining
    the interactions in our community.
Write a Comment
User Comments (0)
About PowerShow.com