Inconsistent Data on the Semantic Web A Theoretical Approach Brian Goodrich - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Inconsistent Data on the Semantic Web A Theoretical Approach Brian Goodrich

Description:

... data as input which causes a conflicted state in deciding its output, it will ... Human intent is frequently conflicting. Conflicting Data Sources ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 18

Provided by: goo102

Category:

more less

Transcript and Presenter's Notes

Title: Inconsistent Data on the Semantic Web A Theoretical Approach Brian Goodrich

1
Inconsistent Data on the Semantic WebA
Theoretical ApproachBrian Goodrich
2
The Problem

An computer application has a set of input and a
set of output based upon the set of input and its
internal logic.
If an application is given data as input which
causes a conflicted state in deciding its output,
it will crash without some kind of logic by which
to decide that conflict.
The Semantic Web is based being able to parse
human intent from structured, semi-structured,
and unstructured data on the Web.
Human intent is frequently conflicting.

Conflicting Data Sources
Malicious - (deceptive or rerouting attempts) or
just ignorantly incorrect information
Incomplete Information having insufficient
context or simply unfinished data
Humor especially sarcasm, satire and
exaggeration (e.g. political cartoons)
Time what once was one thing is now another
(e.g. quality of service, price, etc.)
Ontological Deficiency when extraction ontology
lacks sufficient vividness to separate data
appropriately.

4
Solution
Fast Maintain current speed of the Web.
Accurate Correct decisions of data reliance.
Dynamic Keeps pace with change on the Web.
5
Thesis

To propose a method for simplifying the task of
dealing with conflicting data on the Semantic Web
in a fast, accurate and dynamic way by supplying
each web source with a derived indicator of its
communal usage called a Consensual Reliability
Score. (CRS)

6
Methods
(az) (by) (cx) (dw) CRS(f)

Formula for deriving CRS from inputs a, b, c,
d.
With weighted constants z, y, x, w.

7
Site Type Mining (a z)

Five types of Web Pages
Head Pages
Navigation Pages
Content Pages
Look up Pages
Personal Pages

8
Incoming Index (b y)

Distributed web crawler that counts hyperlinks
then traverses the unique hyperlink paths,
looking for additional links.
Link counts are stored in a hash indexed by the
destination of the hyperlinks.
Provides a dynamic count of how often the
internet as a whole is pointing to a given web
source. Therefore an indication of how often
people use the given web source.
Excludes orphan sites (mostly personal sites and
spam pop-ups)
Based on the success of the Google search engine

9
Usage Mining (c x)

Most straight forward approach of testing how
often people use a web source. Query sites of
hits or how many people have seen this site?
Problem Unlike Incoming Index method, does not
exclude orphan sites.
Further experimentation needed to determine xs
weight.

10
Direct Survey (d w)

Most reliable method of determining reliability.
Manually query users directly.
Too slow and costly to be consider a whole
solution but can assist in CRS derivation.
Hopefully offset frequently visited sites with no
true info (onion.com, humor, etc.)
More experimentation needed to determine ws
weight.

11
Review
(az) (by) (cx) (dw) CRS(f)
12
Classical content data mining is not applicable
in this case (CRS derivation) because it is the
content of the web sources that is in
question. -Brian Goodrich
13
Storage

Global Index
Fast access
Centralized storage for CRSBot.
Centralized vulnerability.
Vital non-distributed resource in a distributed
system.
Local Storage
Non-centralized vulnerability
Non-unified derivation formula (disrupts trust
algorithm)
Local Derivation
Too slow to be useful (problem size too large)

14
Related Work

Tim Berners-Lee
There is a choice here, and I am not sure right
now which appeals to me most. One is to say
precicely,
"whatever any document says of the form xxxx is a
member of W3C so long as it is signed with key
32457934759432".
The other is to say,
"whatever is of form xxxx and can be inferred
from information signed with key 32457934759432
Problems with both choices, but both use static
references in a dynamic environment (the web)

15
Contributions

CRS provides a fast and accurate measure of
community consensus on the web.
Allows reliable decision about deciding between
conflicting data on the web, fine-tuning the
results from the Semantic Web.

16
Limitations

Totally reliant on usage patterns of the
internet, which may not always reflect which data
is more correct.
Reflects only consensus to a data source, not the
actual data contained in it.
Cannot express complex or compound relationships
or extract partial truths.

17
Questions?

Write a Comment

User Comments (0)