Search Engines for Semantic Web Knowledge - PowerPoint PPT Presentation

About This Presentation
Title:

Search Engines for Semantic Web Knowledge

Description:

Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, ... Legends. Swoogle Architecture. A Hybrid Harvesting Framework. Manual submission. RDF crawling ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 54
Provided by: ebiqui
Category:

less

Transcript and Presenter's Notes

Title: Search Engines for Semantic Web Knowledge


1
Search Engines for Semantic WebKnowledge
  • Tim Finin
  • University of Maryland, Baltimore County
  • Joint work with Li Ding, Anupam Joshi, Yun Peng,
    Pranam Kolari, Pavan Reddivari, Sandor Dornbush,
    Rong Pan, Akshay Java, Joel Sachs, Scott Cost and
    Vishal Doshi

? http//creativecommons.org/licenses/by-nc-sa/2.0
/ This work was partially supported by DARPA
contract F30602-97-1-0215, NSF grants CCR007080
and IIS9875433 and grants from IBM, Fujitsu and
HP.
2
This talk
  • Motivation
  • Semantic web 101
  • Swoogle Semantic Websearch engine
  • Use cases and applications
  • State of the Semantic Web
  • Conclusions

3
Once there were only afew large computers
4
Then there were many,
5
All connected 24x7,
  • Cellulartelephony

RFID
802.11
TCP/IP
UltraWideBand
Bluetooth
SoftwareRadio
IRDA
6
Interoperating
  • tcp/ip ftp smtp
  • rpc corba ssh
  • http html
  • xml
  • gif jpg mpg mp3
  • pdf

7
Access to the worlds knowledge
del.icio.us
8
Google has made us smarter
9
But what about our agents?
  • Agents still have a very minimal understanding of
    text and images.

10
This talk
  • Motivation
  • Semantic web 101
  • Swoogle Semantic Websearch engine
  • Use cases and applications
  • State of the Semantic Web
  • Conclusions

11
XML helps
  • XML is Lisp's bastard nephew, with uglier syntax
    and no semantics. Yet XML is poised to enable the
    creation of a Web of data that dwarfs anything
    since the Library at Alexandria.
  • -- Philip Wadler, Et tu XML? The fall of
    the relational empire, VLDB, Rome, September
    2001.

12
Semantic Web adds semantics
  • The Semantic Web will globalize KR, just as the
    WWW globalize hypertext
  • -- Tim Berners-Lee

13
Semantic Web 101
  • RDF/XML
  • rdfRDF tag
  • namespaces ? ontologies
  • Semantic graph, URIs as nodes links
  • triples

14
Wheres the semantics?
  • URIs as common rigid designators
  • Conventions let URIs denote things in the real
    world
  • Namespaces URIs give an unambiguous shared
    vocabulary
  • RDF, RDFS and OWL have semantics defined using
    model theory and also axioms
  • Ontologies allow agents to draw inferences
  • uniStudent is a subclass of foafPerson
  • Every uniStudent uniattends at least one
    uniSchool
  • A foafPerson with a unischool is necessarily a
    uniStudent

15
Much of the RDF data will come from databases,
just like HTML content.
16
(No Transcript)
17
RDF/a
  • RDF/a is a W3C proposal for embedding RDF in
    XHTML documents

lthtml xmlnsfoaf"http//xmlns.com/foaf/0.1/"gt
ltheadgtlttitlegtJo Lambda's Home Pagelt/titlegtlt/headgt
ltbodygt Hello. This is ltspan
property"foafname"gtJo Lambdalt/spangt's home
page. lth2gtWorklt/h2gt If you want to contact
me at work, you can either lta rel"foafmbox"
href"mailtojo.lambda_at_example.org"gtemail
melt/agt, or call ltspan property"foafphone"gt1
777 888 9999lt/spangt. lt/bodygt lt/htmlgt
An HTML Document with RDF embedded
The triples in ntriple format.
ltgt foafname "Jo Lambda"rdfXMLLiteral
foafmbox ltmailtojo.lambda_at_example.orggt
foafphone "1 777 888 9999"rdfXMLLiteral .
18
But what about our agents?
  • A Google for knowledge on the Semantic Web is
    needed by software agents and programs

19
This talk
  • Motivation
  • Semantic web 101
  • Swoogle Semantic Websearch engine
  • Use cases and applications
  • State of the Semantic Web
  • Conclusions

20
  • http//swoogle.umbc.edu/
  • Running since summer 2004
  • 1.4M RDF documents, 250M RDF triples, 10K
    ontologies

21
Swoogle Architecture
22
A Hybrid Harvesting Framework
true
Swoogle Sample Dataset
Manual submission
Inductive learner
would
Seeds R
Seeds M
Seeds H
RDF crawling
Bounded HTML crawling
Meta crawling
google
Google API call
crawl
crawl
the Web
23
Performance Site Coverage
  • SW06MAR - Basic statistics (Mar 31, 2006)
  • 1.3M SWDs from 157K websites
  • 268M triples
  • 61K SWOs including gt10K in high quality
  • 1.4M SWTs using 12K namespaces
  • Significance
  • Compare with existing works ( DAML crawler,
    scutter )
  • Compare SW06MAR with Googles estimated SWDs

SWDs per website
Website
24
Performance crawlers contribution
  • High SWD ratio 42 URLs are confirmed as SWD
  • Consistent growth rate 3000 SWDs per day
  • RDF crawler best harvesting method
  • HTML crawler best accuracy
  • Meta crawler best in detecting websites

of documents
25
This talk
  • Motivation
  • Semantic web 101
  • Swoogle Semantic Websearch engine
  • Use cases and applications
  • State of the Semantic Web
  • Conclusions

26
Applications and use cases
  • Supporting Semantic Web developers
  • Ontology designers, vocabulary discovery, whos
    using my ontologies or data?, use analysis,
    errors,statistics, etc.
  • Searching specialized collections
  • Spire aggregating observations and data from
    biologists
  • InferenceWeb searching over and enhancing proofs
  • SemNews Text Meaning of news stories
  • Supporting SW tools
  • Triple shop finding data for SPARQL queries

27
(No Transcript)
28
80 ontologies were found that had these three
terms
By default, ontologies are ordered by their
popularity, but they can also be ordered by
recency or size.
Lets look at this one
29
Basic Metadata hasDateDiscovered  2005-01-17
hasDatePing  2006-03-21 hasPingState
 PingModified type  SemanticWebDocument
isEmbedded  false hasGrammar  RDFXML
hasParseState  ParseSuccess hasDateLastmodified
 2005-04-29 hasDateCache  2006-03-21
hasEncoding  ISO-8859-1 hasLength  18K
hasCntTriple  311.00 hasOntoRatio  0.98
hasCntSwt  94.00 hasCntSwtDef  72.00
hasCntInstance  8.00
30
(No Transcript)
31
(No Transcript)
32
These are the namespaces this ontology uses.
Clicking on one shows all of the documents using
the namespace.
All of this is available in RDF form for the
agents among us.
33
Heres what the agent sees. Note the swoogle and
wob (web of belief) ontologies.
34
We can also search for terms (classes,
properties) like terms for person.
35
10K terms associatged with person! Ordered by
use.
Lets look at foafPersons metadata
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
UMBC Triple Shop
  • http//sparql.cs.umbc.edu/
  • Online SPARQL RDF query processing basedon HPs
    Jena and Joseki with several interesting features
  • Selectable level of inference over model
  • Automatically finds SWDs for give queries using
    Swoogle backend database
  • Provide dataset creation wizard
  • Dataset can be stored on our server or downloaded
  • Tag, share and search over saved datasets

46
Web-scale semantic web data access
data access service
the Web
agent
Index RDF data
ask (person)
Search vocabulary
Search URIrefs in SW vocabulary
inform (foafPerson)
Compose query
ask (?x rdftype foafPerson)
Search URLs in SWD index
Populate RDF database
inform (doc URLs)
Fetch docs
Query local RDF database
47
Who knows Anupam Joshi? Show me their names,
email address and pictures
48
The UMBC ebiquity site publishes lots of RDF
data, including FOAF profiles
49
(No Transcript)
50
PREFIX foaf lthttp//xmlns.com/foaf/0.1/gt SELECT
DISTINCT ?p2name ?p2mbox ?p2pix WHERE ?p1
foafname "Anupam Joshi" . ?p1 foafmbox
?p1mbox . ?p2 foafknows ?p3 . ?p3
foafmbox ?p1mbox . ?p2 foafname ?p2name
. ?p2 foafmbox ?p2mbox . OPTIONAL
?p2 foafdepiction ?p2pix . ORDER BY
?p2name
51
(No Transcript)
52
Swoogle found 292 RDF data files that appear
relevant to answering our query
53
Lets save the dataset before we use it
54
(No Transcript)
55
And tag it so we and others can find it more
easily.
56
Here we are using it to get an answer to Who
knows Anupam Joshi
57
He has many friends!
58
(No Transcript)
59
This talk
  • Motivation
  • Semantic web 101
  • Swoogle Semantic Websearch engine
  • Use cases and applications
  • State of the Semantic Web
  • Conclusions

60
Will it Scale? How?
  • Heres a rough estimate of the data in RDF
    documents on the semantic web based on Swoogles
    crawling

System/date Terms Documents Individuals Triples Bytes
Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109
Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010
2006 1x106 5x107 5x107 5x109 5x1011
2008 5x106 5x109 5x109 5x1011 5x1013
We think Swoogles centralized approach can be
made to work for the next few years if not longer.
61
How much reasoning?
  • SwoogleN (Nlt3) does limited reasoning
  • Its expensive
  • Its not clear how much should be done
  • More reasoning would benefit many use cases
  • e.g., type hierarchy
  • Recognizing specialized metadata
  • E.g., that ontology A some maps terms from B to C

62
This talk
  • Motivation
  • Semantic web 101
  • Swoogle Semantic Websearch engine
  • Use cases and applications
  • State of the Semantic Web
  • Conclusions

63
Conclusion
  • The web will contain the worlds knowledge in
    forms accessible to people and computers
  • We need better ways to discover, index, search
    and reason over SW knowledge
  • SW search engines address different tasks than
    html search engines
  • So they require different techniques and APIs
  • Swoogle like systems can help create consensus
    ontologies and foster best practices
  • Swoogle is for Semantic Web 1.0
  • Semantic Web 2.0 will make different demands

64
For more information
http//ebiquity.umbc.edu/
Annotatedin OWL
65
backup
66
(No Transcript)
67
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com