Inconsistent Data on the Semantic Web A Theoretical Approach Brian Goodrich - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Inconsistent Data on the Semantic Web A Theoretical Approach Brian Goodrich

Description:

... data as input which causes a conflicted state in deciding its output, it will ... Human intent is frequently conflicting. Conflicting Data Sources ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 18
Provided by: goo102
Category:

less

Transcript and Presenter's Notes

Title: Inconsistent Data on the Semantic Web A Theoretical Approach Brian Goodrich


1
Inconsistent Data on the Semantic WebA
Theoretical ApproachBrian Goodrich
2
The Problem
  • An computer application has a set of input and a
    set of output based upon the set of input and its
    internal logic.
  • If an application is given data as input which
    causes a conflicted state in deciding its output,
    it will crash without some kind of logic by which
    to decide that conflict.
  • The Semantic Web is based being able to parse
    human intent from structured, semi-structured,
    and unstructured data on the Web.
  • Human intent is frequently conflicting.

3
  • Conflicting Data Sources
  • Malicious - (deceptive or rerouting attempts) or
    just ignorantly incorrect information
  • Incomplete Information having insufficient
    context or simply unfinished data
  • Humor especially sarcasm, satire and
    exaggeration (e.g. political cartoons)
  • Time what once was one thing is now another
    (e.g. quality of service, price, etc.)
  • Ontological Deficiency when extraction ontology
    lacks sufficient vividness to separate data
    appropriately.

4
Solution
Fast Maintain current speed of the Web.
Accurate Correct decisions of data reliance.
Dynamic Keeps pace with change on the Web.
5
Thesis
  • To propose a method for simplifying the task of
    dealing with conflicting data on the Semantic Web
    in a fast, accurate and dynamic way by supplying
    each web source with a derived indicator of its
    communal usage called a Consensual Reliability
    Score. (CRS)

6
Methods
(az) (by) (cx) (dw) CRS(f)
  • Formula for deriving CRS from inputs a, b, c,
    d.
  • With weighted constants z, y, x, w.

7
Site Type Mining (a z)
  • Five types of Web Pages
  • Head Pages
  • Navigation Pages
  • Content Pages
  • Look up Pages
  • Personal Pages

8
Incoming Index (b y)
  • Distributed web crawler that counts hyperlinks
    then traverses the unique hyperlink paths,
    looking for additional links.
  • Link counts are stored in a hash indexed by the
    destination of the hyperlinks.
  • Provides a dynamic count of how often the
    internet as a whole is pointing to a given web
    source. Therefore an indication of how often
    people use the given web source.
  • Excludes orphan sites (mostly personal sites and
    spam pop-ups)
  • Based on the success of the Google search engine

9
Usage Mining (c x)
  • Most straight forward approach of testing how
    often people use a web source. Query sites of
    hits or how many people have seen this site?
  • Problem Unlike Incoming Index method, does not
    exclude orphan sites.
  • Further experimentation needed to determine xs
    weight.

10
Direct Survey (d w)
  • Most reliable method of determining reliability.
    Manually query users directly.
  • Too slow and costly to be consider a whole
    solution but can assist in CRS derivation.
    Hopefully offset frequently visited sites with no
    true info (onion.com, humor, etc.)
  • More experimentation needed to determine ws
    weight.

11
Review
(az) (by) (cx) (dw) CRS(f)
12
Classical content data mining is not applicable
in this case (CRS derivation) because it is the
content of the web sources that is in
question. -Brian Goodrich
13
Storage
  • Global Index
  • Fast access
  • Centralized storage for CRSBot.
  • Centralized vulnerability.
  • Vital non-distributed resource in a distributed
    system.
  • Local Storage
  • Non-centralized vulnerability
  • Non-unified derivation formula (disrupts trust
    algorithm)
  • Local Derivation
  • Too slow to be useful (problem size too large)

14
Related Work
  • Tim Berners-Lee
  • There is a choice here, and I am not sure right
    now which appeals to me most. One is to say
    precicely,
  • "whatever any document says of the form xxxx is a
    member of W3C so long as it is signed with key
    32457934759432".
  • The other is to say,
  • "whatever is of form xxxx and can be inferred
    from information signed with key 32457934759432
  • Problems with both choices, but both use static
    references in a dynamic environment (the web)

15
Contributions
  • CRS provides a fast and accurate measure of
    community consensus on the web.
  • Allows reliable decision about deciding between
    conflicting data on the web, fine-tuning the
    results from the Semantic Web.

16
Limitations
  • Totally reliant on usage patterns of the
    internet, which may not always reflect which data
    is more correct.
  • Reflects only consensus to a data source, not the
    actual data contained in it.
  • Cannot express complex or compound relationships
    or extract partial truths.

17
Questions?
Write a Comment
User Comments (0)
About PowerShow.com