Title: Graphical Models and Probabilistic Reasoning for Generating Linked Data from Tables
1Graphical Models and Probabilistic Reasoning for
Generating Linked Data from Tables
- Varish Mulwad (_at_varish)University of Maryland,
Baltimore CountyDoctoral Consortium at ISWC
2011October 24, 2011
Guru Dr. Tim Finin
2What ?
3Contribution
http//dbpedia.org/class/yago/NationalBasketballAs
sociationTeams
dbpropteam
Name Team Position Height
Michael Jordan Chicago Shooting guard 1.98
Allen Iverson Philadelphia Point guard 1.83
Yao Ming Houston Center 2.29
Tim Duncan San Antonio Power forward 2.11
http//dbpedia.org/resource/Allen_Iverson
Map literals as values of properties
4Contribution
Name Team Position Height
Michael Jordan Chicago Shooting guard 1.98
Allen Iverson Philadelphia Point guard 1.83
Yao Ming Houston Center 2.29
Tim Duncan San Antonio Power forward 2.11
_at_prefix dbpedia lthttp//dbpedia.org/resource/gt
. _at_prefix dbpedia-owl lthttp//dbpedia.org/ontolog
y/gt . _at_prefix yago lthttp//dbpedia.org/class/yago
/gt . "Name"_at_en is rdfslabel of
dbpedia-owlBasketballPlayer . "Team"_at_en is
rdfslabel of yagoNationalBasketballAssociationTe
ams . "Michael Jordan"_at_en is rdfslabel of
dbpediaMichael Jordan . dbpediaMichael Jordan a
dbpedia-owlBasketballPlayer . "Chicago
Bulls"_at_en is rdfslabel of dbpediaChicago Bulls
. dbpediaChicago Bulls a yagoNationalBasketballA
ssociationTeams .
All this in a completely automated way !! ?
5Why ?
6Tables are everywhere !! yet
The web 154 million high quality relational
tables 1
389, 697 raw and geospatial datasets0.071 in
RDF
7Current Systems
- Problems with systems on the Semantic Web
- Require users to have knowledge of the Semantic
Web - Do not automatically link to existing classes and
entities on the Semantic Web / Linked Data cloud - RDF data in some cases is as useless as raw data
- Majority of the work focused on relational data
where schema is available
8How ?
9A Table Interpretation Framework
Linked Data
Probabilistic Graphical Model / Joint Inference
10Joint Inference over evidence in a table
- Probabilistic Graphical Models
11A graphical model for tables
Class
C2
C3
C1
Team
Chicago
Philadelphia
Houston
San Antonio
R21
R31
R11
R12
R22
R32
R13
R23
R33
Instance
12Parameterized graphical model
Captures interaction between row values
R33
R11
R12
R13
R21
R22
R23
R31
R32
Row value
Factor Node
C2
C1
C3
Function that captures the affinity between the
column headers and row values
Variable Node Column header
Captures interaction between column headers
13Nice, but Will it work ?
14Evaluation
- Dataset of gt 6000 tables 2
- Compare our accuracy against our baseline system
and the results in 2 - Use Mean Average Precision 3 to compare a
ranked list of possible classes/entities
against a ranked list obtained from human
evaluators - Experiment with datasets from www.data.gov
15References
- Cafarella, M. J., Halevy, A., Wang, D. Z., Wu,
E., Zhang, Y., 2008. Webtables exploring the
power of tables on the web. Proc. VLDB Endow.1
(1), 538-549. - Limaye, G., Sarawagi, S., Chakrabarti, S.
Annotating and searching web tables using
entities, types and relationships. In Proc. 36th
Int. Conf. on Very Large Databases (2010) - Salton, G., Mcgill, M.J. Introduction to Modern
Information Retrieval. McGraw-Hill, Inc., New
York (1986)
16- Thank You !
- Questions ?
- varish1_at_cs.umbc.edu_at_varish
- Webhttp//ebiq.org/h/Varish/Mulwad