UMass and Learning for CALO - PowerPoint PPT Presentation

About This Presentation

Title:

UMass and Learning for CALO

Description:

UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts – PowerPoint PPT presentation

Number of Views:149

Avg rating:3.0/5.0

Slides: 57

Provided by: Andrew1629

Learn more at: https://people.cs.umass.edu

Category:

more less

Transcript and Presenter's Notes

Title: UMass and Learning for CALO

1
UMass andLearning for CALO

Andrew McCallum
Information Extraction Synthesis Laboratory
Department of Computer Science
University of Massachusetts

2
Outline

CC-Prediction
Learning in the wild from user email usage
DEX
Learning in the wild from user correction...as
well as KB records filled by other CALO
components
Rexa
Learning in the wild from user corrections to
coreference... propagating constraints in a
Markov-Logic-like system that scales to 20
million objects
Several new topic models
Discover interesting useful structure without the
need for supervision... learning from newly
arrived data on the fly

3
CC Prediction Using Various Exponential Family
Factor Graphs

Learning to keep an org. connected avoid
stove-piping.
First steps toward ad-hoc team creation.
Learning in the wild from users CC behavior,and
from other parts of the CALO ontology.

4
Graphical Models for Email

Compute P(yx) for CC prediction

- function - random variable - N
replications
Recipient of Email
y
N
The graph describes the joint distribution of
random variables in term of the product of local
functions
xb
xs
xr
Nb
Ns
Nr-1
Email Model Nb words in the body, Ns words in
the subject, Nr recipients
Body Subject Other Words Words
Recipients
Nr

Local functions facilitate system engineering
through modularity

5
Document Models

Models may relational attributes

Na
Author ofDocument
y
xb
xs
xb
xr
xt
Nb
Ns
Na-1
Nr
Nt
Title Abstract Body Co-authors
References

We can optimize P(yx) for classification
performance and P(xy) for model interpretability
and parameter transfer (to other models)

6
CC Prediction and Relational Attributes
Nr
Target Recipient
y
xb
xs
xr
xr
xtr
Nb
Ns
Nr-1
Ntr
Thread Body Subject Other
Relation Relation Words Words Recipients
Thread Relations e.g. Was a given recipient
ever included on this email thread? Recipient
Relationships e.g. Does one of the other
recipients report to the target recipient?
7
CC-Prediction Learning in the Wild

As documents are added to Rexa, models of
expertise for authors grows
As DEX obtains more contact information and
keywords, organizational relations emerge
Model parameters can be adapted on-line
Priors on parameters can be used to transfer
learned information between models
New relations can be added on-line
Modular model construction and intelligent model
optimization enable these goals

8
CC Prediction Upcoming work on
Multi-Conditional Learning

A discriminatively-trained topic model,
discovering low-dimensional representations for
transfer learning and improved regularization
generalization.

9
Objective Functions for Parameter Estimation
Traditional
New, multi-conditional
10
Multi-Conditional Learning (Regularization)
McCallum, Pal, Wang, 2006
11
Multi-Conditional Mixtures
12
Predictive Random Fieldsmixture of Gaussians on
synthetic data
McCallum, Wang, Pal, 2005
Data, classify by color
Generatively trained
Multi-Conditional
Conditionally-trained Jebara 1998
13
Multi-Conditional Mixturesvs. Harmoniunon
document retrieval task
McCallum, Wang, Pal, 2005
Multi-Conditional,multi-way conditionally trained
Conditionally-trained,to predict class labels
Harmonium, joint,with class labels and words
Harmonium, joint with words, no labels
14
DEX

Beginning with a review of previous work,
then new work on record extraction,
with the ability to leverage new KBs in the wild,
and for transfer

15
System Overview
CRF
WWW
Email
names
16
An Example
To Andrew McCallum mccallum_at_cs.umass.edu Subjec
t ...
First Name Andrew
Middle Name Kachites
Last Name McCallum
JobTitle Associate Professor
Company University of Massachusetts
Street Address 140 Governors Dr.
City Amherst
State MA
Zip 01003
Company Phone (413) 545-1323
Links Fernando Pereira, Sam Roweis,
Key Words Information extraction, social network,
Search for new people
17
Summary of Results
Example keywords extracted
Person Keywords
William Cohen Logic programming Text categorization Data integration Rule learning
Daphne Koller Bayesian networks Relational models Probabilistic models Hidden variables
Deborah McGuiness Semantic web Description logics Knowledge representation Ontologies
Tom Mitchell Machine learning Cognitive states Learning apprentice Artificial intelligence
Contact info and name extraction performance (25
fields)
Token Acc Field Prec Field Recall Field F1
CRF 94.50 85.73 76.33 80.76

Expert Finding When solving some task, find
friends-of-friends with relevant expertise.
Avoid stove-piping in large orgs by
automatically suggesting collaborators. Given a
task, automatically suggest the right team for
the job. (Hiring aid!)
Social Network Analysis Understand the social
structure of your organization. Suggest
structural changes for improved efficiency.

18
Importance of accurate DEX fields in IRIS

Information about
people
contact information
email
affiliation
job title
expertise
...
are key to answering many CALO questions...
both directly, and as supporting inputs to
higher-level questions.

19
Learning Field Compatibilities in DEX
Professor Jane Smith University of
California 209-555-5555 Professor Smith chairs
the Computer Science Department. She hails from
Boston, her administrative assistant John
Doe Administrative Assistant University of
California 209-444-4444
20
Learning Field Compatibilities in DEX
Extracted Record
Professor Jane Smith University of
California 209-555-5555 Professor Smith chairs
the Computer Science Department. She hails from
Boston, her administrative assistant John
Doe Administrative Assistant University of
California 209-444-4444
Name Jane Smith, John Doe JobTitle Professor,
Administrative Assistant Company U of
California Department Computer Science Phone
209-555-5555, 209-444-4444 City Boston
Jane Smith
University of California
209-555-5555
Computer Science
Professor
Boston
Administrative Assistant
University of California
John Doe
209-444-4444
21
Learning Field Compatibilities in DEX

35 error reduction over transitive closure
Qualitatively better than heuristic approach
Mine Knowledge Bases from other parts of IRIS
for learning compatibility rules among fields
Professor job title co-occurs with University
company
Area code / city compatibility
Senator job title co-occurs with Washington,
D.C location
In the wild
As the user adds new fields make corrections,
DEX learns from this KB data
Transfer learning
between departments/industries

22
Rexa A knowledge base of publications, grants,
people, their expertise, topics, and
inter-connections

Learning for information extraction and
coreference.
Incrementally leveraging multiple sources of
information for improved coreference
Gathering information about peoples expertise
and co-author, citation relations
First a tour of Rexa, then slides about learning

23
Previous Systems
24
Previous Systems
Cites
Research Paper
25
More Entities and Relations
Expertise
Cites
Research Paper
Person
Grant
University
Venue
Groups
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Learning in Rexa

Extraction, coreferenceIn the wild Re-adjusting
KB after corrections from a user
Also, learning research topics/expertise, and
their interconnections

39
(Linear Chain) Conditional Random Fields
Lafferty, McCallum, Pereira 2001
Undirected graphical model, trained to
maximize conditional probability of output
sequence given input sequence
where
Finite state model
Graphical model
OTHER PERSON OTHER ORG TITLE
output seq
y
y
y
y
y
t2
t3
t
-
1
t
t1
FSM states
. . .
observations
x
x
x
x
x
t
2
t
3
t
t
1
-
t
1
said Jones a Microsoft VP
input seq
(500 citations)
40
IE from Research Papers
McCallum et al 99
41
IE from Research Papers
Field-level F1 Hidden Markov Models
(HMMs) 75.6 Seymore, McCallum, Rosenfeld,
1999 Support Vector Machines (SVMs) 89.7 Han,
Giles, et al, 2003 Conditional Random Fields
(CRFs) 93.9 Peng, McCallum, 2004
? error 40
(Word-level accuracy is gt99)
42
Joint segmentation and co-reference
Extraction from and matching of research paper
citations.
o
s
World Knowledge
Laurel, B. Interface Agents Metaphors with
Character, in The Art of Human-Computer
Interface Design, B. Laurel (ed), Addison-Wesley,
1990.
c
Co-reference decisions
y
y
p
Brenda Laurel. Interface Agents Metaphors with
Character, in Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
Databasefield values
c
y
c
Citation attributes
s
s
Segmentation
o
o
35 reduction in co-reference error by using
segmentation uncertainty.
6-14 reduction in segmentation error by using
co-reference.
Inference Variant of Iterated Conditional Modes
Wellner, McCallum, Peng, Hay, UAI 2004
see also Marthi, Milch, Russell, 2003
Besag, 1986
43
Rexa Learning in the Wildfrom User Feedback

Coreference will never be perfect.
Rexa allows users to enter corrections to
coreference decisions
Rexa then uses this feedback to
re-consider other inter-related parts of the KB
automatically make further error correctionsby
propagating constraints
(Our coreference system uses underlying ideas
very much like Markov Logic, and scales to 20
million mention objects.)

44
Finding Topics in 1 million CS papers
200 topics keywords automatically discovered.
45
Topical Transfer
Citation counts from one topic to another.
Map producers and consumers
46
Topical Diversity
Find the topics that are cited by many other
topics---measuring diversity of impact. Entropy
of the topic distribution among papers that
cite this paper (this topic).
LowDiversity
HighDiversity
47
Some New Work onTopic Models

Robustly capturing topic correlations
Pachkinko Allocation Model
Capturing phrases in topic-specific ways
Topical N-Gram Model

48
Pachinko Machine
49
Pachinko Allocation Model
Li, McCallum, 2005
?11
Model structure, not the graphical model
?22
?21
Distributions over distributions over topics...
Distributions over topicsmixtures, representing
topic correlations
?31
?33
?32
?41
?42
?43
?44
?45
Distributions over words (like LDA topics)
word1
word2
word3
word4
word5
word6
word7
word8
Some interior nodes could contain one
multinomial, used for all documents. (i.e. a very
peaked Dirichlet)
50
Topic Coherence Comparison
models, estimation, stopwords
estimation, some junk
LDA 100 estimation likelihood maximum noisy estima
tes mixture scene surface normalization generated
measurements surfaces estimating estimated iterati
ve combined figure divisive sequence ideal
LDA 20 models model parameters distribution bayes
ian probability estimation data gaussian methods l
ikelihood em mixture show approach paper density f
ramework approximation markov
Example super-topic 33 input hidden units
function number 27 estimation bayesian parameters
data methods 24 distribution gaussian markov
likelihood mixture 11 exact kalman full
conditional deterministic 1 smoothing
predictive regularizers intermediate slope
51
Topic Correlations in PAM
5000 research paper abstracts, from across all CS
Numbers on edges are supertopics Dirichlet
parameters
52
Likelihood Comparison

Varying number of topics

53
Want to Model Trends over Time

Is prevalence of topic growing or waning?
Pattern appears only briefly
Capture its statistics in focused way
Dont confuse it with patterns elsewhere in time
How do roles, groups, influence shift over time?

54
Topics over Time (TOT)
Wang, McCallum 2006
?
Dirichlet
?
multinomialover topics
Uniformprior
Dirichlet prior
topicindex
z
?
?
timestamp
word
w
t
?
?
T
T
Nd
Betaover time
Multinomialover words
D
55
State of the Union Address
208 Addresses delivered between January 8, 1790
and January 29, 2002.

To increase the number of documents, we split the
addresses into paragraphs and treated them as
documents. One-line paragraphs were excluded.
Stopping was applied.
17156 documents
21534 words
669,425 tokens

Our scheme of taxation, by means of which this
needless surplus is taken from the people and put
into the public Treasury, consists of a tariff
or duty levied upon importations from abroad and
internal-revenue taxes levied upon the
consumption of tobacco and spirituous and malt
liquors. It must be conceded that none of the
things subjected to internal-revenue
taxation are, strictly speaking, necessaries.
There appears to be no just complaint of this
taxation by the consumers of these articles, and
there seems to be nothing so well able to bear
the burden without hardship to any portion of the
people.
1910
56
ComparingTOTagainst LDA
57
Topic Distributions Conditioned on Time
NIPS vol 1-14
topic mass (in vertical height)
time

Write a Comment

User Comments (0)