Title: Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search
1 Research in Semantic Web and Information
Retrieval Trust, Sensors, and Search
- T. K. Prasad (Krishnaprasad Thirunarayan)
- Professor
- Kno.e.sis Center
- Department of Computer Science and Engineering
- Wright State University, Dayton, OH-45435, USA
-
2 3http//knoesis.wright.edu/
4Information Retrieval
- Information retrieval (IR) is finding material
(usually documents) of an unstructured nature
(usually text) that satisfies an information need
from within large collections (usually stored on
computers).
5Evolution of the Web
6Semantic Web
- Semantic Web is a standards-based extension of
the WWW in which the semantics of information and
services on the web is defined, so as to satisfy
information need of people and enable machines to
use the web content. - Machine comprehensible structured data
7Tim Berner-Lees Semantic Web Layer Cake
8Updated Semantic Web Cake
9 Trust Issues inSocial Media and Sensor Networks
- T. K. Prasad, Cory Henson,
- Amit Sheth and Pramod Anantharam
- Kno.e.sis Center
- Department of Computer Science and Engineering
- Wright State University, Dayton, OH-45435, USA
-
10Goal
- Study semantic issues relevant to trust in
- Social Media - Data and Networks
- Sensor - Data and Networks
- Generic Examples involving Trust
- Analyzing ratings/reviews online on TV models
before making purchasing decision from amazon.com
- Seeking recommendations on handy man, car
mechanic, etc. from neighbors
11Generic Approach
- Propose models of trust/trust metrics to
formalize trust aggregation and trust propagation
to deal with indirect trust - Develop techniques and tools to glean trust
information from - social media data (streams) and networks
- sensor data (streams) and networks
12Trust in Social Media Networks
13Previous WorkStructure of Trust
- Trust between a pair of users is modelled as a
real number in the closed interval 0,1 or
-1,1 - Pros Facilitates propagation and computation of
aggregated trust - Cons
- Too fine-grained, total order
- Inherent difficulties in initializing,
understanding, and justifying
computed trust values
14Quote
- Guha et al
- While continuous-valued trusts are
mathematically clean from the standpoint of
usability, most real-world systems will
in fact use discrete values
at which one user
can rate another. - E.g., Epinions, Ebay, Amazon, Facebook, etc all
use small sets for (dis)trust/rating values.
15Trust-aware Recommender Systems
- Collaborative Filtering systems exploit
user-similarity to get recommendations. - But suffer from data sparsity problem.
- Adding trust links
- improves quality of recommendations
- benefits cold-start users who most need it
- is robust w.r.t. spamming via engineered profiles
(Shilling Attacks)
16Our Research
- Propose a model of trust based on
- Partially ordered discrete values (with emphasis
on relative magnitude) - Local but realistic semantics
- Distinguishes functional and referral trust
- Distinguishes direct and inferred trust
- Prefers direct information over conflicting
inferred information - Represents ambiguity explicitly
- HOLY GRAIL Direct Semantics in favor of Indirect
Translations
17Essential concepts
- Trust Scope Context, Action,
- Functional Trust Agent a1 trusts agent a2s
ability in some context or for doing something - Referral Trust Agent a1 trusts agent a2s
ability to recommend another agent in some
context or for doing something - Trust is a relationship among agents/users,
while belief is a relationship between
agents/users and statements
18Semantics Interpretation
- Four valued logic
- inconsistent information,
- true, false,
- no information
- Trust / Distrust
- 4-valued binary function among users
- Belief / Disbelief
- 4-valued binary function on users and
statements
19Example Trust Network - Different Trust Links
and Local Ordering on Trust Links
- Alice trusts Bob for recommending good car
mechanic. - Bob trusts Dick to be a good car mechanic.
- Charlie does not trust Dick to be a good car
mechanic. - Alice trusts Bob more than Charlie, w.r.t. car
mechanic context. - Alice trusts Charlie more than Bob, w.r.t. baby
sitter context.
20 Formalization of Semantics Basis for Trust
Computation Algorithm
21Formalization Approach
- Given a trust network (Nodes, Edges with Trust
Scopes, Local Orderings), specify when a source
agent can trust, distrust, or be ambiguous about
another target agent, reflecting - Functional and referral trust links
- Direct and inferred trust
- Locality
22(No Transcript)
23Similarly for Evidence in support of Negative
Functional Trust.
24(No Transcript)
25Quote summarizing potential bug
- The whole problem with the world is that fools
and fanatics are always so certain of themselves,
but wiser people so full of doubts. - --- Betrand Russell
26Possible Future Extensions
- Trust links with trust-scoped exceptions
- Straddles two extremes involving
- just trust links and just trust-scoped links
- Trust values annotated with trust path length,
target neighborhood summary, etc. - Other forms of trust links formalized using upper
ontology
27Trust in Sensor Networks
28Sensor Networks
- Approaches to Trust
- Reputation-based Trust
- Based on past behavior
- Policy-based Trust
- Based on explicitly stated constraints
- Evidence-based Trust
- Based on seeking/verifying evidence
29Probabilistic basis for reputation-based trust in
a Sensor Node
- Sensor Reputation and Sensor Observation
Credibility determined using outlier detection
algorithm aggregating results over time - Homogeneous sensor networks can exploit
spatio-temporal locality and redundancy for this
purpose - Heterogeneous sensor networks require complex
domain models for this purpose
30(contd)
- Trust/Reputation in a Sensor node can be modeled
as beta probability distribution function with
parameters (a,b) gleaned from total number of
correct (a-1) and erroneous (b-1) observations
so far.
31Motivation for using Beta PDF
- Computational Ease
- Retain/manipulate just two values (a,b)
- Incremental update after checking whether new
data is normal or outlier - Intuitively Satisfactory
- Initialization not necessary (flat PDF)
- PDF variation sufficiently expressive
- That is, it assimilates updates and large number
of observations satisfactorily
32Next few slides shed light on beta probability
distribution function
- Mathematical formulation
- Graphs for intuitive understanding of its role
33Role of Beta probability distribution function
x is a probability, so it ranges from 0-1
If the prior distribution of p is uniform, then
the beta distribution gives posterior
distribution of p after observing a-1
occurrences of event with probability p and b-1
occurrences of the complementary event with
probability (1-p).
34- b, so the pdfs are symmetric w.r.t 0.5.
- Note that the graphs get narrower as (ab)
increases.
35- / b, so the pdfs are asymmetric w.r.t . 0.5.
- Note that the graphs get narrower as (ab)
increases.
36Advantages Robust w.r.t. attacks
- Bad-mouthing attack
- E-commerce analogy Sellers collude with buyers
to give bad ratings to others - Ballot stuffing attack
- E-commerce analogy Sellers collude with buyers
to give it unfairly good ratings - Sleeper attacks
- Apparently trusted agent defects
37Trust in Tweets
38Twitter
- Large network of people
- Large number of tweets
- Tweet 140 character description of an event
- Problem How to organize tweets?
39Exploiting trust information
- Rank tweets according to trust information
- Trust in the user who tweets
- Belief (trust) in the tweet
40Trust in the person who tweets
- Popularity of the user
- Based on count of followers
- Reputation of the user
- Based on history of making informed observations
- Enrich using Pagerank Analogy?
- Highly trusted followers count more than lowly
trusted followers
41Belief (Trust) in a tweet
- Belief in a tweet depends on the trust in the
user who generates it. - Belief in a tweet depends on the content of
similar tweets (originating from approximately
the same location around the same time)
42Trust in Linked Open Data
43Linked Data
- The Linking Open Data (LOD) project is a
community-led effort to create openly accessible,
and interlinked, RDF Data on the Web. - RDF Resource Description Framework graph-based
representation language
44Linked Data
45Exploiting trust for access and standardization
- Trust in the creator of the data, and belief
(trust) in the data - How well connected is the data?
- Rank LOD according to trust information
46Sensor Data on LOD
- MesoWest weather data in US
- 20,000 Sensor Systems
- 1 billion Observational Assertions
- Sensors linked with Geonames on LOD
- http//wiki.knoesis.org/index.php/SSW
47Trust in Active Perception
48Active Perception
- Perception is the process of observing,
hypothesis generation, and verification
49Evidence-based Trust
- Observations (and hypotheses) are more trusted if
they can be verified through empirical evidence - Sensors are more trusted if their observations
are trusted
50Evidence-based Trust
Trust
Strengthened Trust
51Additional uses of active perception in sensors
context
- Determining actionable intelligence by narrowing
set of explanations to one - Enable use of a small set of always on sensors
to bootstrap and selectively turn-on additional
sensors in a resource (e.g., power) constrained
environment
52References
- Krishnaprasad Thirunarayan, Dharan Althuru, Cory
Henson, and Amit Sheth, A Local Qualitative
Approach to Referral and Functional Trust, The
4th Indian International Conference on Artificial
Intelligence (IICAI-09), December 2009. - Cory Henson, Joshua Pschorr, Amit Sheth, and
Krishnaprasad Thirunarayan, SemSOS Semantic
Sensor Observation Service, International
Symposium on Collaborative Technologies and
Systems (CTS2009), Workshop on Sensor Web
Enablement (SWE2009), Baltimore, Maryland, 2009. - Krishnaprasad Thirunarayan, Cory Henson, and Amit
Sheth, Situation Awareness via Abductive
Reasoning from Semantic Sensor Data A
Preliminary Report, International Symposium on
Collaborative Technologies and Systems (CTS2009),
Workshop on Collaborative Trusted Sensing,
Baltimore, Maryland, 2009.
53References
- A. Sheth and M. Nagarajan, Semantics-Empowered
Social Computing, IEEE Internet Computing,
Jan/Feb 2009, 76-80 - Amit Sheth, Cory Henson, and Satya Sahoo,
"Semantic Sensor Web," IEEE Internet Computing,
vol. 12, no. 4, July/August 2008, p. 78-83.
54Machine and Citizen Sensor Data Demos
- Illustrate semantic web and information retrieval
techniques -- spatio-temporal-thematic
ontologies, mash-ups, machine and citizen sensor
data analytics
55Motivating Scenario Spatio-temporal-thematic
analytics
High-level Sensor
Low-level Sensor
- How do we check if the three images depict
- the same time and same place?
- same entity?
- a serious threat?
55
56Semantic Observation Service Overall
Architecture and Details
57- SemSOS Demo
- http//knoesis.wright.edu/research/semsci/applicat
ion_domain/sem_sensor/cory/demos/ssos_demo/ssos_de
mo.htm - Twitris Demo http//twitris.knoesis.org/
58Situation Awareness Analysis
- Situation Awareness Components
- Physical World Sensor Data
- Perception Entity Metadata
- Comprehension Relationship Metadata
- Semantic Analysis
- How is the data represented? Sensor Web
Enablement - What are the sources of the data?
Provenance Analysis - What objects/events account for the data?
Abductive Reasoning - Where did the event occur?
Spatial Analysis - When did the event occur?
Temporal Analysis - What is the significance of the event?
Thematic Analysis - What are the reasons for inconsistency?
Abductive Reasoning
59A Unified Approach to Retrieving Web Documents
and Semantic Web Data
- Trivikram Immaneni and Krishnaprasad
Thirunarayan - Department of Computer Science and Engineering
- Wright State University
- Dayton, OH-45435, USA
- Currently at Technorati, San Francisco
-
60Outline
- Goal (What?)
- Background and Motivation (Why?)
- Unified Web Model (Why?)
- Query Language and Examples (What?)
- Implementation Details (How?)
- Evaluation and Applications (Why?)
- Conclusions
61Goal
62- Integrate HTML Web and Semantic Web by
establishing and exploiting connections between
them gt Unified Web Model - Design and implement a language to retrieve data
and documents from the Unified Web gt Hybrid
Query Language - Implement the system using mature software
components for indexing and search gt SITAR
63Background and Motivation
64HTML Web
- Hyperlinked Web of documents
- Content human comprehensible
- Search engines and web browsers search, retrieve,
navigate, and display information - Keyword-based searches have low precision and
high recall
65Semantic Web
- Standards-based labeled graph of resources and
binary properties (data) - Content machine accessible
- Database techniques adapted to store and
retrieve Semantic Web data - Query formulation by lay users difficult but
results are precise - XML, RDF, SPARQL, Web Services, etc.
66Shoehorning HTML Web into Semantic Web
- Document Data node Content
- as string in RDF graph
- Regular expressions in SPARQL used to retrieve
documents. - Drawbacks that IR tries to overcome
- Ease of query formulation Keyword-based
- Dealing with Large datasets Ranking
67Formalizing HTML Web as Semantic Web
- Techniques for manual (re)-authoring of (legacy)
documents using Semantic Web Technologies is
neither feasible nor advisable. - State-of-the-art NLP and information extraction
techniques inadequate - Informal description indispensable for human
comprehension - Escape route Traceability via superposition
(E.g., RDFa)
68Shoehorning Semantic Web into HTML Web
- Currently, Semantic Web documents live on the
HTML Web but their components are neither
accessible nor reasoned with via keyword-based
searches - Swoogle attempts to rank Semantic Web documents
69Unified Web Model (What?)
70Aim
- Unified Web integrates the two Webs to enable
improved hybrid retrieval of data and documents. - Unified Web Model
- Hybrid Query Language
71Unified Web Model Graph
- Node
- Abstract entity identified by its URI
- Blank/Literal node names automatically generated
- Home URI Section
- URI index words
- Document Section (optional)
- Outgoing Links Section
- Triples Section
72(contd)
- Relationships (Edges)
- hasDocument
- Relates Node to content string
- hyperlinksTo
- Relates Node with another node to which the
former nodes document contains a hyperlink - Asserts
- Relates Node with each RDF statement in the
document - linksTo
- Relates Node with another node
- to which the former nodes document contains a
hyperlink, or - such that the former nodes document contains a
triple with the latter node
73Example of Unified Web Model
- Document http//www.abc.com/xyz.htm contains the
RDF fragment - ltmailTo joe_at_abc.com/gt
- ltrdfRDFgt
- ltowlClass rdfIDhttp//www.abc.com/swJaguar
/gt - lt/rdfRDFgt
74(No Transcript)
75Data Retrieval from Unified Web
- Unified Web Model can be specified using RDF
- In terms of rdfsResource, rdfsPropery,
rdfsStatement, rdfsLiteral, refsSubject,
rdfsPredicate, rdfsObject, etc - Unified Web is a reified Semantic Web (user
triples) - SPARQL usable as query language
76Information Retrieval from Unified Web
- Node can be indexed using URI index words
- Based on name, content, label, triples, etc
- Node can be ranked using its phrasal / URI-based
annotations and its node neighborhood
77Advantages
- Semantic Web nodes can be retrieved using
(associated) keywords - Legacy document recall improved by interpreting
hyperlink as Semantic Markup for reasoning. - Hyperlink mailtoabc_at_wright.edu
- Triple
- ltmailtoabc_at_wright.edu rdftype univprofgt
78- Semantics rich URIs (such as those from
dictionary.com) in legacy documents can be
incrementally equated with ontologies - Document lta href http//dictionary.com/searc
h?qjaguargt Jaguar lt/agt God of the Underworld - Ontology lthttp//dictionary.com/search?qjaguar
owlSameas http//www.animalOnto.com/Jaguargt ...
79Query Language and Examples (What?)
80Aim
- Store and retrieve Semantic Web data, and use
information in documents to enhance data
retrieval - Enable use of keywords to deal with lack of
complete URI information - Peter affiliated-with ?X
- Enable use of partial information about data
being searched - Student Peter affiliated-with ?X
81- Store and retrieve documents, and use information
in the Semantic Web to enhance document retrieval - Docsearch(ltanimalgtltjaguargt Maya God)
82Sample Queries
- Wordset queries
- ltpeter haasegt -gt retrieves all URIs indexed by
BOTH peter AND haase - Includes document and URIs
- URIs are indexed by words.
- The words are obtained by analyzing URIs, from
label literals, and anchor text of the URIs.
83- Wordset Pair queries
- ltphdstudentgtltpetergt -gt specifies that user is
looking for peter, the phd student - Transitive closure
84More Queries
- Get Peter the Phd students home page
- getBindings ( ltphdstudentgtltpetergt
lthomepagegt ?x ) - Get Peter Haases publications that have
Semantic in their title - getBindings(ltpeter haasegt ltpublicationgt ?x
?x lttitlegt ltsemanticgt) - Get group 1 element which is white in color
- getBindings( ?x ltgroupgt ltgroup 1gtĀ ?x
ltcolorgt ltwhitegt )
85- Homepages of Phd students named Peter that talk
about Semantic Grid - getDocsByBindingsAndContent
- ( ltphdstudentgtltpetergt lthomepagegt ?x
semantic grid ) - getLinkingNodes
- ( http//www.aifb.uni-karlsruhe.de/Personen/viewPe
rson?id_db2023 ) - getAssertingNodes
- (ltpeter haasegt ltpublicationgt ?x).
- getDocsByIndexOrContent (peter haase)
86Implementation Details (How?)
SITAR Semantic InformaTion Analysis and
Retrieval system
87Tools Used
- Apache Lucene 2.0 APIs in Java
- A high-performance, text search engine library
with smart indexing strategies. - Cyberneko HTML Parser
- Jena ARP RDF parser
88Evaluation and Application (Why?)
89Experiments
- DATASETs
- AIFB SEAL data
- The crawler collected 1665 files (English XHTML
pages and RDF/OWL pages). - 1455 (610 RDF files and 845 XHTML files) were
successfully parsed and indexed - A total of 193520 triples were parsed and indexed
90- Datasets (contd)
- TAP dataset
- Periodic table
- Lehigh University BenchMarks
91Conclusions
92- Developed a Hybrid Query language for data and
document retrieval - that is convenient because it is keyword-based
- that can be accurate and flexible because
disambiguation information can be provided - that is expressive because it can support
inheritance reasoning - that is pragmatic because it can work with legacy
documents - FUTURE WORK Robust Ranking Strategy
93References
- T. Immaneni and K. Thirunarayan, A Unified
approach To Retrieving Web Documents and Semantic
Web Data, In Proceedings of the 4th European
Semantic Web Conference (ESWC 2007), LNCS 4519,
pp. 579-593, June 2007. - T. Immaneni, and K. Thirunarayan, Hybrid
Retrieval from the Unified Web, In Proceedings
of the 22nd Annual ACM Symposium on Applied
Computing (ACM SAC 2007), pp. 1376-1380, March
2007.
94THANK YOU!
- http//knoesis.wright.edu/tkprasad/