Semantic Search - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Semantic Search

Description:

The focus of this paper is to demonstrate the implementation of the ... as AllMusic, Ebay, Amazon, AOL Shopping, TicketMaster, Weather.com and Mapquest ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 30
Provided by: hme
Category:

less

Transcript and Presenter's Notes

Title: Semantic Search


1
Semantic Search
  • R. Guha IBM Almaden
  • Rob McCool Stanford Univ.
  • Eric Miller W3C/MIT

2
Outline
  • Introduction
  • Assumptions
  • Aspects of the Semantic Web
  • TAP
  • GetData
  • Semantic Search
  • Denotation
  • What to Show
  • Formatting
  • Text Search
  • Evaluation
  • Conclusion

3
Introduction
  • The Semantic Search uses well-defined meaning
    given to information on the Semantic Web to
    augment traditional Information Retrieval (IR)
    searches
  • The focus of this paper is to demonstrate the
    implementation of the Semantic Search
  • The scope of the paper is a subset of the web as
    it pertains to IR
  • Two kinds of searches, Navigational and Research

4
Assumptions
  • The Semantic Web has well-defined meaning for the
    information it contains
  • The resources on the Semantic Web have
    relationships between each other
  • The Semantic Web is modeled as a directed labeled
    graph
  • RDF is used to describe resources and
    relationships

5
Assumptions
A segment of the Semantic Web pertaining to Eric
Miller
6
Aspects of the Semantic Web
  • Documents vs. Real World Objects
  • Human information vs. Machine information
  • The relationship between HTML and the Semantic
    Web
  • Distributed Extensibility
  • Trustworthiness of the data

7
TAP
  • TAP enables sites to publish data onto the
    Semantic Web and applications to use that data
  • TAP Scrapping retrieves information from HTML
    sites and convert it into machine understandable
    data
  • TAPache was used to publish data onto their
    Semantic Web, ABS
  • TAP has a query interface called GetData

8
GetData
  • A lightweight query language
  • Unpredictable users
  • Good for aggregating data from multiple sites
  • It is not an expressive query language
  • Built on SOAP
  • Designed to query directed labeled graphs

9
GetData
  • Queries have two arguments the resource of the
    properties to be accessed (nodes) and the
    properties to be accessed (arcs)
  • GetData(, ),
  • Returned from the query is a graph with resources
    as URIs

10
GetData
  • Example
  • GetData (, livesIn),
  • http//tap.stanford.edu/data/CityDublin,_Ohio

11
Semantic Search
  • ABS contains data from many sites, such as
    AllMusic, Ebay, Amazon, AOL Shopping,
    TicketMaster, Weather.com and Mapquest
  • There are millions of triples in the ABS Semantic
    Web
  • TAP knowledge base has a broad range of domains
    including people, places, organizations, and
    products
  • Resources have a rdftype and rdfslabel

12
Semantic Search
  • W3C Semantic Search has five different data
    sources People, Activities, Working Groups,
    Documents, and News
  • Both the ABS and W3C Semantic Search have a basic
    ontology about people, places, events,
    organizations, vocabulary terms, etc.
  • The plan is to augment a traditional search with
    data from the Semantic Web

13
Semantic Search
14
Semantic Search
  • A secondary objective is to use an understanding
    of the denotation to improve traditional search,
    this will be covered under Text Search
  • Semantic Search is built as a client of TAP
  • Search terms are mapped to nodes on the Semantic
    Web

15
Semantic Search
  • Three problems facing Semantic Search
  • Denotation
  • What to Show
  • Presentation

16
Denotation
  • Need to determine the concept denoted by the
    search query
  • To overcome ambiguity
  • Popularity of the search term
  • A user profile
  • Search context
  • Offer other denotations to the user

17
Denotation
  • Complex Search Terms
  • eric miller rdf
  • eric miller rdf
  • eric miller rdf
  • eric miller rdf
  • Restricted to two denotations
  • Nothing will be contributed to the traditional
    search if terms do not denote anything known to
    the Semantic Web

18
What to Show
  • What data should be included in the results and
    in what order?
  • The selected node or nodes, which corresponds to
    the denotation of the term, is the anchor node
    (for combination of terms there are two anchor
    nodes)
  • A sub graph must then be selected to be returned

19
What to Show
Approach 1
  • The graph is selected by walking through the
    graph in breadth first order starting at the
    anchor node
  • The first N triples are collected (N is some
    predefined limit)
  • Included are at most R triples with the same
    source and arc label (R is computed based on the
    average branching factor)

20
What to Show
Approach 1 cont.
  • Included in the sub graph are at most M triples
    with the same source (M is computed based on the
    bushiness or can be a function of the distance
    from the anchor node)
  • Adv no hand coding by the user, incorporates
    information about the anchor and its neighbors
  • Disadv sensitive to representation on the
    Semantic Web, ignores the search context

21
What to Show
Approach 2
  • Manually specify the set of properties that
    should be gathered
  • Walk through the graph breadth first starting at
    the anchor node, collecting values for the
    properties until N triples have been collected (N
    is some predefined number)
  • Adv able to specify certain properties, able to
    factor in search context
  • Disadv takes time to setup, does not incorporate
    information about the anchor node and its
    neighbors

22
What to Show
  • A hybrid approach would have the best results
  • The second approach is applied first, then if N
    triples were not collected, the second approach
    is applied looking at properties not previously
    examined
  • The triples are ordered based first on their
    distance from the anchor and second the arch label

23
Formatting
  • Results are displayed using a set of templates
  • Each class of object has an associated set of
    templates
  • The templates specify the class and the
    properties and a HTML template
  • A template is identified for each node in the
    ordered list and the HTML is generated
  • The HTML is included in the results page

24
Formatting
25
Text Search
  • Most searches are about a relatively small number
    of categories, such as person, place,
    organization, document, etc.
  • Knowing the category greatly increase accuracy
  • Applying type to the category narrows the search
    further
  • This is in need of further research, however the
    current proposed solution is to offer an example
    from each category and have the user select the
    context which they want

26
Evaluation
  • Testing is being conducted on the W3C website
  • Individuals searching on the website are
    anonymously chosen at random
  • Their results are augmented with the W3C Semantic
    Web Search
  • User activity is recorded, when a user clicks on
    a link supplied by the Semantic Search as opposed
    to a link from the traditional search

27
Conclusion
  • Research searches are suitable to exploit the
    information on the Semantic Web
  • More work is needed on Text Search
  • More work is needed on denotation

28
My Thoughts
  • It may be an unachievable goal to have a computer
    know what we are looking for when what we are
    looking for is not easily expressed and
    categorized.
  • An achievable goal may be to have searches filter
    out the junk, leaving us with a minimal number
    of resources to sift through manually
  • The authors were thorough and moderately easy to
    follow

29
References
  • Semantic Search, R. Guha, Rob McCool, Eric
    Miller 2003
  • http//www2003.org/cdrom/papers/refereed/p779/ess
    .html
  • TAP
  • http//tap.stanford.edu/
Write a Comment
User Comments (0)
About PowerShow.com