Probabilistic Models of Text and Link Structure for Hypertext Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Probabilistic Models of Text and Link Structure for Hypertext Classification

Description:

Probabilistic Models of Text and Link Structure for Hypertext Classification Lise Getoor Stanford Univ. MD, College Park Eran Segal Stanford Benjamin Taskar – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 23
Provided by: mitEdu69
Learn more at: http://www.ai.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Models of Text and Link Structure for Hypertext Classification


1
Probabilistic Models of Text and Link Structure
for Hypertext Classification
  • Lise Getoor
  • Stanford ?
  • Univ. MD, College Park

Eran Segal Stanford
Benjamin Taskar Stanford
Daphne Koller Stanford
2
Introduction
  • Many domains are inherently relational
  • Related objects are not IID
  • Exploit dependency
  • Link structure can be predictive
  • Unlabelled data also useful
  • Active Research Area
  • Chakrabarti, et al., Cohn and Hofmann, et al.,
    Ghani, et al., Jensen Neville, Slattery and
    Mitchel, many others...

3
Introduction
Topic
Theory
AI
Agent
Scientific Paper
  • Attributes of object
  • Attributes of linked objects
  • Attributes of heterogeneous linked objects
  • Unlabelled data

4
Our Approach
  • Motivation relational structure provides useful
    information for density estimation and prediction
  • Provide a unified probabilistic framework
  • Construct probabilistic models of relational
    structure that capture link uncertainty
  • Use unlabelled data for improved accuracy

5
Probabilistic Relational Models
  • Extend Bayes net representation to relational
    setting
  • Specify dependence of each attribute on other
    attributes
  • A template for a Bayes net over a relational
    database

Koller Pfeffer, 98
6
Relational Schema
  • A relational schema describes the classes,
    attributes and relations in a domain

Author
Institution
Research Area
Wrote
Paper
Paper
Topic
Topic
Word1
Word1
Word2
Cites

Word2
Count

Citing Paper
WordN
Cited Paper
WordN
7
Attribute Uncertainty
Author
Institution
Research Area
Wrote
Paper
Topic
...
Word1
WordN
8
Link Uncertainty
Document Collection
Document Collection
9
PRM w/ Exists Uncertainty
Paper
Paper
Topic
Topic
Cites
Words
Words
Exists
Dependency model for existence of relationship
10
Exists Uncertainty Example
Paper
Paper
Topic
Topic
Cites
Words
Words
Exists
False
True
Cited.Topic
Citer.Topic
11
Ground Bayes Net
Author2
Author1
Inst
Inst
Area
Area
Paper2
Paper3
Topic
Paper1
Topic
Topic
WordN
WordN
Word1
Word1
...
Word1
...
...
WordN
Exists
Exists
Exists
Exists
Exists
Exists
1-2
2-3
2-1
3-1
3-2
1-3
  • Captures correlations between topics of related
    papers
  • Information flows along active paths in the
    Bayes net

12
PRM Learning Algorithm
Author
Paper
Cites
Paper
Database
PRM
  • Learn parameters qualitative dependency
    structure
  • Extend known techniques for learning Bayesian
    networks from data

Friedman, et al., IJCAI99 Getoor, et al.,
ICML01
13
Parameter Estimation in PRMs
  • Assume known dependency structure S
  • Goal estimate PRM parameters q
  • entries in local probability models,
  • q is good if it is likely to generate the
    observed data, instance I .

14
Parameter Estimation II
  • MLE Principle Choose q so as to maximize l
  • Key computational step
  • computation of sufficient statistics - frequency
    of different instantiations of a node and its
    parents in DB

As in Bayesian network learning, crucial
property decomposition separate terms for
different X.A
15
Experiment I Prediction
Paper P506
Topic ??
. . .
w1
wN
16
Domains
Paper
Paper
Topic
Topic
Cites
. . .
. . .
w1
wN
w1
wN
Exists
cited paper
citing paper
Cora Dataset, McCallum, et. al
Web Page
Web Page
Category
Category
Link
. . .
. . .
w1
wN
w1
wN
Exists
From Page
To Page
WebKB, Craven, et. al
17
Prediction Accuracy
18
Using Unlabelled Data
Author2
Author1
Inst
Inst
Area
Area
Area
Area
Paper2
Paper3
Topic
Paper1
Topic
Topic
Topic
Topic
WordN
WordN
Word1
Word1
...
Word1
...
...
WordN
Exists
Exists
Exists
Exists
Exists
Exists
1-2
2-3
2-1
3-1
1-3
19
EM
  • Use EM to learn with latent variables
  • E-step involves inference in unrolled network
  • Infeasible for large networks
  • Use approximate inference for E-step
  • Loopy belief propagation (Pearl, 88 McEliece,
    98)
  • Scales linearly with size of network
  • Guaranteed to converge only for polytrees
  • Empirically, often converges in general nets
    (Murphy,99)
  • Local message passing
  • Belief messages transferred between related
    instances
  • Induces a natural influence propagation
    behavior
  • Instances give information about related instances

Taskar, et al., Probabilistic Classification and
Clustering in Relational Data, IJCAI01
20
Webpage Classification
  • Webpages from four CS departments (Cravenal)
  • Each webpage has
  • Content words
  • Links
  • Type Student, Faculty, Course, Project, Other

Trained on 3 schools tested on 4th
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
21
Webpage Classification
From-
Anchor
NB
Links
LinkHub
LinkAnchor
All
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
22
Conclusions
  • PRMs provide unified probabilistic framework for
    prediction and density estimation in relational
    domains
  • We can model dependencies between
  • Allows us to make use of unlabelled data in a
    principled manner
  • Future work more expressive link uncertainty
    models
  • Attributes of object
  • Attributes of linked objects
  • Attributes of heterogeneous linked objects
Write a Comment
User Comments (0)
About PowerShow.com