UMass and Learning for CALO - PowerPoint PPT Presentation

About This Presentation
Title:

UMass and Learning for CALO

Description:

UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 57
Provided by: Andrew1629
Category:

less

Transcript and Presenter's Notes

Title: UMass and Learning for CALO


1
UMass andLearning for CALO
  • Andrew McCallum
  • Information Extraction Synthesis Laboratory
  • Department of Computer Science
  • University of Massachusetts

2
Outline
  • CC-Prediction
  • Learning in the wild from user email usage
  • DEX
  • Learning in the wild from user correction...as
    well as KB records filled by other CALO
    components
  • Rexa
  • Learning in the wild from user corrections to
    coreference... propagating constraints in a
    Markov-Logic-like system that scales to 20
    million objects
  • Several new topic models
  • Discover interesting useful structure without the
    need for supervision... learning from newly
    arrived data on the fly

3
CC Prediction Using Various Exponential Family
Factor Graphs
  • Learning to keep an org. connected avoid
    stove-piping.
  • First steps toward ad-hoc team creation.
  • Learning in the wild from users CC behavior,and
    from other parts of the CALO ontology.

4
Graphical Models for Email
  • Compute P(yx) for CC prediction

- function - random variable - N
replications
Recipient of Email
y
N
The graph describes the joint distribution of
random variables in term of the product of local
functions
xb
xs
xr
Nb
Ns
Nr-1
Email Model Nb words in the body, Ns words in
the subject, Nr recipients
Body Subject Other Words Words
Recipients
Nr
  • Local functions facilitate system engineering
    through modularity

5
Document Models
  • Models may relational attributes

Na
Author ofDocument
y
xb
xs
xb
xr
xt
Nb
Ns
Na-1
Nr
Nt
Title Abstract Body Co-authors
References
  • We can optimize P(yx) for classification
    performance and P(xy) for model interpretability
    and parameter transfer (to other models)

6
CC Prediction and Relational Attributes
Nr
Target Recipient
y
xb
xs
xr
xr
xtr
Nb
Ns
Nr-1
Ntr
Thread Body Subject Other
Relation Relation Words Words Recipients
Thread Relations e.g. Was a given recipient
ever included on this email thread? Recipient
Relationships e.g. Does one of the other
recipients report to the target recipient?
7
CC-Prediction Learning in the Wild
  • As documents are added to Rexa, models of
    expertise for authors grows
  • As DEX obtains more contact information and
    keywords, organizational relations emerge
  • Model parameters can be adapted on-line
  • Priors on parameters can be used to transfer
    learned information between models
  • New relations can be added on-line
  • Modular model construction and intelligent model
    optimization enable these goals

8
CC Prediction Upcoming work on
Multi-Conditional Learning
  • A discriminatively-trained topic model,
  • discovering low-dimensional representations for
  • transfer learning and improved regularization
    generalization.

9
Objective Functions for Parameter Estimation
Traditional
New, multi-conditional
10
Multi-Conditional Learning (Regularization)
McCallum, Pal, Wang, 2006
11
Multi-Conditional Mixtures
12
Predictive Random Fieldsmixture of Gaussians on
synthetic data
McCallum, Wang, Pal, 2005
Data, classify by color
Generatively trained
Multi-Conditional
Conditionally-trained Jebara 1998
13
Multi-Conditional Mixturesvs. Harmoniunon
document retrieval task
McCallum, Wang, Pal, 2005
Multi-Conditional,multi-way conditionally trained
Conditionally-trained,to predict class labels
Harmonium, joint,with class labels and words
Harmonium, joint with words, no labels
14
DEX
  • Beginning with a review of previous work,
  • then new work on record extraction,
  • with the ability to leverage new KBs in the wild,
    and for transfer

15
System Overview
CRF
WWW
Email
names
16
An Example
To Andrew McCallum mccallum_at_cs.umass.edu Subjec
t ...
First Name Andrew
Middle Name Kachites
Last Name McCallum
JobTitle Associate Professor
Company University of Massachusetts
Street Address 140 Governors Dr.
City Amherst
State MA
Zip 01003
Company Phone (413) 545-1323
Links Fernando Pereira, Sam Roweis,
Key Words Information extraction, social network,
Search for new people
17
Summary of Results
Example keywords extracted
Person Keywords
William Cohen Logic programming Text categorization Data integration Rule learning
Daphne Koller Bayesian networks Relational models Probabilistic models Hidden variables
Deborah McGuiness Semantic web Description logics Knowledge representation Ontologies
Tom Mitchell Machine learning Cognitive states Learning apprentice Artificial intelligence
Contact info and name extraction performance (25
fields)
Token Acc Field Prec Field Recall Field F1
CRF 94.50 85.73 76.33 80.76
  1. Expert Finding When solving some task, find
    friends-of-friends with relevant expertise.
    Avoid stove-piping in large orgs by
    automatically suggesting collaborators. Given a
    task, automatically suggest the right team for
    the job. (Hiring aid!)
  2. Social Network Analysis Understand the social
    structure of your organization. Suggest
    structural changes for improved efficiency.

18
Importance of accurate DEX fields in IRIS
  • Information about
  • people
  • contact information
  • email
  • affiliation
  • job title
  • expertise
  • ...
  • are key to answering many CALO questions...
  • both directly, and as supporting inputs to
    higher-level questions.

19
Learning Field Compatibilities in DEX
Professor Jane Smith University of
California 209-555-5555 Professor Smith chairs
the Computer Science Department. She hails from
Boston, her administrative assistant John
Doe Administrative Assistant University of
California 209-444-4444
20
Learning Field Compatibilities in DEX
Extracted Record
Professor Jane Smith University of
California 209-555-5555 Professor Smith chairs
the Computer Science Department. She hails from
Boston, her administrative assistant John
Doe Administrative Assistant University of
California 209-444-4444
Name Jane Smith, John Doe JobTitle Professor,
Administrative Assistant Company U of
California Department Computer Science Phone
209-555-5555, 209-444-4444 City Boston
Jane Smith
University of California
209-555-5555
Computer Science
Professor
Boston
Administrative Assistant
University of California
John Doe
209-444-4444
21
Learning Field Compatibilities in DEX
  • 35 error reduction over transitive closure
  • Qualitatively better than heuristic approach
  • Mine Knowledge Bases from other parts of IRIS
    for learning compatibility rules among fields
  • Professor job title co-occurs with University
    company
  • Area code / city compatibility
  • Senator job title co-occurs with Washington,
    D.C location
  • In the wild
  • As the user adds new fields make corrections,
    DEX learns from this KB data
  • Transfer learning
  • between departments/industries

22
Rexa A knowledge base of publications, grants,
people, their expertise, topics, and
inter-connections
  • Learning for information extraction and
    coreference.
  • Incrementally leveraging multiple sources of
    information for improved coreference
  • Gathering information about peoples expertise
    and co-author, citation relations
  • First a tour of Rexa, then slides about learning

23
Previous Systems
24
Previous Systems
Cites
Research Paper
25
More Entities and Relations
Expertise
Cites
Research Paper
Person
Grant
University
Venue
Groups
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Learning in Rexa
  • Extraction, coreferenceIn the wild Re-adjusting
    KB after corrections from a user
  • Also, learning research topics/expertise, and
    their interconnections

39
(Linear Chain) Conditional Random Fields
Lafferty, McCallum, Pereira 2001
Undirected graphical model, trained to
maximize conditional probability of output
sequence given input sequence
where
Finite state model
Graphical model
OTHER PERSON OTHER ORG TITLE
output seq
y
y
y
y
y
t2
t3
t
-
1
t
t1
FSM states
. . .
observations
x
x
x
x
x
t
2
t
3
t
t
1
-
t
1
said Jones a Microsoft VP
input seq
(500 citations)
40
IE from Research Papers
McCallum et al 99
41
IE from Research Papers
Field-level F1 Hidden Markov Models
(HMMs) 75.6 Seymore, McCallum, Rosenfeld,
1999 Support Vector Machines (SVMs) 89.7 Han,
Giles, et al, 2003 Conditional Random Fields
(CRFs) 93.9 Peng, McCallum, 2004
? error 40
(Word-level accuracy is gt99)
42
Joint segmentation and co-reference
Extraction from and matching of research paper
citations.
o
s
World Knowledge
Laurel, B. Interface Agents Metaphors with
Character, in The Art of Human-Computer
Interface Design, B. Laurel (ed), Addison-Wesley,
1990.
c
Co-reference decisions
y
y
p
Brenda Laurel. Interface Agents Metaphors with
Character, in Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
Databasefield values
c
y
c
Citation attributes
s
s
Segmentation
o
o
35 reduction in co-reference error by using
segmentation uncertainty.
6-14 reduction in segmentation error by using
co-reference.
Inference Variant of Iterated Conditional Modes
Wellner, McCallum, Peng, Hay, UAI 2004
see also Marthi, Milch, Russell, 2003
Besag, 1986
43
Rexa Learning in the Wildfrom User Feedback
  • Coreference will never be perfect.
  • Rexa allows users to enter corrections to
    coreference decisions
  • Rexa then uses this feedback to
  • re-consider other inter-related parts of the KB
  • automatically make further error correctionsby
    propagating constraints
  • (Our coreference system uses underlying ideas
    very much like Markov Logic, and scales to 20
    million mention objects.)

44
Finding Topics in 1 million CS papers
200 topics keywords automatically discovered.
45
Topical Transfer
Citation counts from one topic to another.
Map producers and consumers
46
Topical Diversity
Find the topics that are cited by many other
topics---measuring diversity of impact. Entropy
of the topic distribution among papers that
cite this paper (this topic).
LowDiversity
HighDiversity
47
Some New Work onTopic Models
  • Robustly capturing topic correlations
  • Pachkinko Allocation Model
  • Capturing phrases in topic-specific ways
  • Topical N-Gram Model

48
Pachinko Machine
49
Pachinko Allocation Model
Li, McCallum, 2005
?11
Model structure, not the graphical model
?22
?21
Distributions over distributions over topics...
Distributions over topicsmixtures, representing
topic correlations
?31
?33
?32
?41
?42
?43
?44
?45
Distributions over words (like LDA topics)
word1
word2
word3
word4
word5
word6
word7
word8
Some interior nodes could contain one
multinomial, used for all documents. (i.e. a very
peaked Dirichlet)
50
Topic Coherence Comparison
models, estimation, stopwords
estimation, some junk
LDA 100 estimation likelihood maximum noisy estima
tes mixture scene surface normalization generated
measurements surfaces estimating estimated iterati
ve combined figure divisive sequence ideal
LDA 20 models model parameters distribution bayes
ian probability estimation data gaussian methods l
ikelihood em mixture show approach paper density f
ramework approximation markov
Example super-topic 33 input hidden units
function number 27 estimation bayesian parameters
data methods 24 distribution gaussian markov
likelihood mixture 11 exact kalman full
conditional deterministic 1 smoothing
predictive regularizers intermediate slope
51
Topic Correlations in PAM
5000 research paper abstracts, from across all CS
Numbers on edges are supertopics Dirichlet
parameters
52
Likelihood Comparison
  • Varying number of topics

53
Want to Model Trends over Time
  • Is prevalence of topic growing or waning?
  • Pattern appears only briefly
  • Capture its statistics in focused way
  • Dont confuse it with patterns elsewhere in time
  • How do roles, groups, influence shift over time?

54
Topics over Time (TOT)
Wang, McCallum 2006
?
Dirichlet
?
multinomialover topics
Uniformprior
Dirichlet prior
topicindex
z
?
?
timestamp
word
w
t
?
?
T
T
Nd
Betaover time
Multinomialover words
D
55
State of the Union Address
208 Addresses delivered between January 8, 1790
and January 29, 2002.
  • To increase the number of documents, we split the
    addresses into paragraphs and treated them as
    documents. One-line paragraphs were excluded.
    Stopping was applied.
  • 17156 documents
  • 21534 words
  • 669,425 tokens

Our scheme of taxation, by means of which this
needless surplus is taken from the people and put
into the public Treasury, consists of a tariff
or duty levied upon importations from abroad and
internal-revenue taxes levied upon the
consumption of tobacco and spirituous and malt
liquors. It must be conceded that none of the
things subjected to internal-revenue
taxation are, strictly speaking, necessaries.
There appears to be no just complaint of this
taxation by the consumers of these articles, and
there seems to be nothing so well able to bear
the burden without hardship to any portion of the
people.
1910
56
ComparingTOTagainst LDA
57
Topic Distributions Conditioned on Time
NIPS vol 1-14
topic mass (in vertical height)
time
Write a Comment
User Comments (0)
About PowerShow.com