EntityAuthority Semantically Enriched GraphBased Authority Propagation

1 / 44
About This Presentation
Title:

EntityAuthority Semantically Enriched GraphBased Authority Propagation

Description:

Julia Stoyanovich Columbia University. Srikanta Bedathur. Klaus Berberich ... To improve ranking, exploit mutual reinforcement between pages and entities. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 45
Provided by: koko9
Learn more at: http://www.cis.upenn.edu

less

Transcript and Presenter's Notes

Title: EntityAuthority Semantically Enriched GraphBased Authority Propagation


1
EntityAuthoritySemantically Enriched Graph-Based
Authority Propagation
  • Julia Stoyanovich Columbia University
  • Srikanta Bedathur
  • Klaus Berberich
  • Gerhard Weikum

Max-Planck Institute for Informatics
2
Query NBA Team
Date June 5, 2007
3
Query NBA Team
Date June 5, 2007
4
Query NBA Team
Date June 5, 2007
5
What is the Problem?
  • No NBA team homepages appear at high ranks (with
    one exception)
  • User cannot tell what the NBA teams are by
    looking at titles and snippets at high ranks
  • Portals like ESPN.com, NBA.com dominate high
    ranks

6
The Knowledge Soup
7
The Knowledge Soup
YAGO Suchanek et al, WWW07 GATE Cunningham et
al, ACL02
8
The Knowledge Soup
YAGO Suchanek et al, WWW07 GATE Cunningham et
al, ACL02
9
Central Idea
  • Authority is based on hyperlinks, but reflects
    quality of information. Pages are collections of
    semantic entities!
  • To improve ranking, exploit mutual reinforcement
    between pages and entities.
  • Can construct a graph that includes pages,
    entities, and ontological concepts a richer
    substrate for authority propagation.

10
Our Contributions
  • Generalized Data Graph (GDG) data model
  • Several models of authority propagation on GDG,
    most notably EVA (Entity deriVed Authority)
    inspired by HITS but with richer semantics
  • Prototype implementation that combines
    information extraction with a rich ontology
  • Experimental results on Wikipedia

11
Related Work
  • PageRank, HITS the classics simple directed
    graph models of the Web
  • ObjectRank also HITS-inspired, but developed
    with DB data in mind
  • PopRank combines the OR graph with PageRank
    values (similar to PIA one of the options we
    consider)
  • EntityRank focuses on frequency-based content
    strength, not on the graph structure of the Web
    embedded entities

12
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
13
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
14
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
15
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
OntoGraph
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
16
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Staford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
OntoGraph
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
17
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
OntoGraph
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
18
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
OntoGraph
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
19
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
20
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Generalized Data Graph
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
21
PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
1.0
0.5
0.5
0.33
0.5
0.5
0.5
0.33
Generalized Data Graph
1.0
1.0
0.33
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
22
Authority Propagation How?
  • PIA (Page-Inherited Authority)
  • Compute authority on pages, propagate to
    entities
  • AP(x) ? AP(x) ? w (p ? x)
  • UTA (UnTyped Authority)
  • Compute authority on the entire data graph
  • Can we do any better?

p ? P(x)
23
Entity Derived Authority (EVA)







24
Entity Derived Authority (EVA)







25
Entity Derived Authority (EVA)







26
Entity Derived Authority (EVA)







27
Entity Derived Authority (EVA)
28
Authority Propagation Where?
  • On the entire Generalized Data Graph
  • vs.
  • On the query-relevant sub-graph of GDG,
  • Query Result Graph
  • select a sub-set of relevant nodes (with scores)
  • expand by successors and predecessors, possibly
    several levels
  • re-scale edge weights

29
Query Stanford
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
entity
30
Query Stanford
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
entity
31
Query Stanford
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
entity
32
Query Processing
  • Keyword queries, evaluated as conjunctions
  • Pages TF/IDF score of page content
  • Entities TF/IDF of YAGO thematic neighborhood

33
Query Processing
  • Keyword queries, evaluated as conjunctions
  • Pages TF/IDF score of page content
  • Entities TF/IDF of YAGO thematic neighborhood

Serbian basketball players
Olympic competitors for Serbia
LA Lakers players
Vlade Divac
34
Query Processing
  • Keyword queries, evaluated as conjunctions
  • Pages TF/IDF score of page content
  • Entities TF/IDF of YAGO thematic neighborhood

Serbian basketball players
Olympic competitors for Serbia
LA Lakers players
Vlade Divac
Queries Olympic basketball competitors Serbia
n LA Lakers players
35
Experimental Evaluation
  • Test bed
  • two thematic slices of Wikipedia Serbia and
    basketball, comparable in size
  • 7800 articles, 1.2M InfoUnits, 240K entities
  • Queries
  • 20 total, 10 per slice, 1-6 words
  • e.g. lake, living writer prize winner, NBA venue,
    African American basketball player Olympic
    competitor

36
Experimental Evaluation Metrics
  • Simple goodness metric (between 0 and 2)
  • Results from different methods pooled, each
    evaluated by 2 people, scores averaged
  • Evaluation based on
  • Precision (avg(goodness) gt 0.5)
  • Recall (w.r.t. pooled ideal)
  • Discounted Cumulative Gain (DCG)
  • Normalized Discounted Cumulative Gain (NDCG)
  • (Järvelin, Kekäläinen, TOIS02)

37
Results Ranking on Pages
38
Results Ranking on Entities
39
Observations
  • Highly-ranked entities consistently and
    significantly outperform highly-ranked pages,
    w.r.t. all metrics.
  • EVA significantly outperforms other methods
    w.r.t. highly ranked pages, for all metrics.
  • No conclusion can be drawn about the relative
    performance of ranking methods w.r.t. entities
  • Likely because of relatively small slices, no
    inter-entity edges in the GDG. Both are ongoing
    work.

40
Relative Performance of Ranking Example
  • Slice Serbia, query basketball
  • Top-20 pages
  • PageRank 1977, Greece, Belgrade
  • UTA August 2004 in Sports
  • EVA Basketball in Yugoslavia, Vlade Divac
  • Top-20 entities (various methods)
  • Michael Jordan, LA Lakers, Vlade Divac,
    Predrag Danilovic etc.

41
Conclusions and Future Work
  • Conclusions
  • Mutual reinforcement between pages and entities
    improves ranking!
  • A rich ontology (e.g. YAGO), can be used for
    query processing in this setting
  • Future work
  • Incorporate inter-entity edges into the GDG
  • Extensive experimental evaluation
  • Scaling up the framework
  • Evaluating on a corpus other than Wikipedia

42
(No Transcript)
43
PAGES
ACM DL page
Google Scholar page
Lab page
Project page
University page
DBLP page x
DBLP page y
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
DBLP page z
UW
InfoUnits
VLDB
Stanford
Berkeley
DeWitt
Alon Levy
Stanford University
Google
Alon Halevy
UC Berkeley
founded
advisor
George Berkeley
Leland Stanford
Alon Halevy
Joseph M. Hellerstein
David J. DeWitt
student
computer scientist
philosopher
Stanford University
University of Wisconsin
University of Washington
spin off
scientist
OntoGraph
UC Berkeley
located in
Berkeley
Stanford
Google
Bay Area
Wisconsin
Seattle
company
university
organization
CONCEPTS ENTITIES
person
location
entity
44
Data Model Generalized Data Graph
  • Typed nodes pages, IntoUnits, onto entities and
    concepts
  • Typed and weighted edges
  • Hyperlinks (normalized by out-degree)
  • Extraction edges (pages -gt InfoUnits), weighted
    by confidence of extraction
  • Mapping edges (InfoUnits -gt onto entities/
    concepts), weighted by confidence in mapping
  • Direct page lt-gt onto entity/concept
Write a Comment
User Comments (0)