Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search - PowerPoint PPT Presentation

1 / 94
About This Presentation
Title:

Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search

Description:

Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center – PowerPoint PPT presentation

Number of Views:267
Avg rating:3.0/5.0
Slides: 95
Provided by: tk4
Category:

less

Transcript and Presenter's Notes

Title: Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search


1
Research in Semantic Web and Information
Retrieval Trust, Sensors, and Search
  • T. K. Prasad (Krishnaprasad Thirunarayan)
  • Professor
  • Kno.e.sis Center
  • Department of Computer Science and Engineering
  • Wright State University, Dayton, OH-45435, USA

2

3
http//knoesis.wright.edu/
4
Information Retrieval
  • Information retrieval (IR) is finding material
    (usually documents) of an unstructured nature
    (usually text) that satisfies an information need
    from within large collections (usually stored on
    computers).

5
Evolution of the Web
6
Semantic Web
  • Semantic Web is a standards-based extension of
    the WWW in which the semantics of information and
    services on the web is defined, so as to satisfy
    information need of people and enable machines to
    use the web content.
  • Machine comprehensible structured data

7
Tim Berner-Lees Semantic Web Layer Cake
8
Updated Semantic Web Cake
9
Trust Issues inSocial Media and Sensor Networks
  • T. K. Prasad, Cory Henson,
  • Amit Sheth and Pramod Anantharam
  • Kno.e.sis Center
  • Department of Computer Science and Engineering
  • Wright State University, Dayton, OH-45435, USA

10
Goal
  • Study semantic issues relevant to trust in
  • Social Media - Data and Networks
  • Sensor - Data and Networks
  • Generic Examples involving Trust
  • Analyzing ratings/reviews online on TV models
    before making purchasing decision from amazon.com
  • Seeking recommendations on handy man, car
    mechanic, etc. from neighbors

11
Generic Approach
  • Propose models of trust/trust metrics to
    formalize trust aggregation and trust propagation
    to deal with indirect trust
  • Develop techniques and tools to glean trust
    information from
  • social media data (streams) and networks
  • sensor data (streams) and networks

12
Trust in Social Media Networks
13
Previous WorkStructure of Trust
  • Trust between a pair of users is modelled as a
    real number in the closed interval 0,1 or
    -1,1
  • Pros Facilitates propagation and computation of
    aggregated trust
  • Cons
  • Too fine-grained, total order
  • Inherent difficulties in initializing,
    understanding, and justifying
    computed trust values

14
Quote
  • Guha et al
  • While continuous-valued trusts are
    mathematically clean from the standpoint of
    usability, most real-world systems will
    in fact use discrete values
    at which one user
    can rate another.
  • E.g., Epinions, Ebay, Amazon, Facebook, etc all
    use small sets for (dis)trust/rating values.

15
Trust-aware Recommender Systems
  • Collaborative Filtering systems exploit
    user-similarity to get recommendations.
  • But suffer from data sparsity problem.
  • Adding trust links
  • improves quality of recommendations
  • benefits cold-start users who most need it
  • is robust w.r.t. spamming via engineered profiles
    (Shilling Attacks)

16
Our Research
  • Propose a model of trust based on
  • Partially ordered discrete values (with emphasis
    on relative magnitude)
  • Local but realistic semantics
  • Distinguishes functional and referral trust
  • Distinguishes direct and inferred trust
  • Prefers direct information over conflicting
    inferred information
  • Represents ambiguity explicitly
  • HOLY GRAIL Direct Semantics in favor of Indirect
    Translations

17
Essential concepts
  • Trust Scope Context, Action,
  • Functional Trust Agent a1 trusts agent a2s
    ability in some context or for doing something
  • Referral Trust Agent a1 trusts agent a2s
    ability to recommend another agent in some
    context or for doing something
  • Trust is a relationship among agents/users,
    while belief is a relationship between
    agents/users and statements

18
Semantics Interpretation
  • Four valued logic
  • inconsistent information,
  • true, false,
  • no information
  • Trust / Distrust
  • 4-valued binary function among users
  • Belief / Disbelief
  • 4-valued binary function on users and
    statements

19
Example Trust Network - Different Trust Links
and Local Ordering on Trust Links
  • Alice trusts Bob for recommending good car
    mechanic.
  • Bob trusts Dick to be a good car mechanic.
  • Charlie does not trust Dick to be a good car
    mechanic.
  • Alice trusts Bob more than Charlie, w.r.t. car
    mechanic context.
  • Alice trusts Charlie more than Bob, w.r.t. baby
    sitter context.

20
Formalization of Semantics Basis for Trust
Computation Algorithm
21
Formalization Approach
  • Given a trust network (Nodes, Edges with Trust
    Scopes, Local Orderings), specify when a source
    agent can trust, distrust, or be ambiguous about
    another target agent, reflecting
  • Functional and referral trust links
  • Direct and inferred trust
  • Locality

22
(No Transcript)
23
Similarly for Evidence in support of Negative
Functional Trust.
24
(No Transcript)
25
Quote summarizing potential bug
  • The whole problem with the world is that fools
    and fanatics are always so certain of themselves,
    but wiser people so full of doubts.
  • --- Betrand Russell

26
Possible Future Extensions
  • Trust links with trust-scoped exceptions
  • Straddles two extremes involving
  • just trust links and just trust-scoped links
  • Trust values annotated with trust path length,
    target neighborhood summary, etc.
  • Other forms of trust links formalized using upper
    ontology

27
Trust in Sensor Networks
28
Sensor Networks
  • Approaches to Trust
  • Reputation-based Trust
  • Based on past behavior
  • Policy-based Trust
  • Based on explicitly stated constraints
  • Evidence-based Trust
  • Based on seeking/verifying evidence

29
Probabilistic basis for reputation-based trust in
a Sensor Node
  • Sensor Reputation and Sensor Observation
    Credibility determined using outlier detection
    algorithm aggregating results over time
  • Homogeneous sensor networks can exploit
    spatio-temporal locality and redundancy for this
    purpose
  • Heterogeneous sensor networks require complex
    domain models for this purpose

30
(contd)
  • Trust/Reputation in a Sensor node can be modeled
    as beta probability distribution function with
    parameters (a,b) gleaned from total number of
    correct (a-1) and erroneous (b-1) observations
    so far.

31
Motivation for using Beta PDF
  • Computational Ease
  • Retain/manipulate just two values (a,b)
  • Incremental update after checking whether new
    data is normal or outlier
  • Intuitively Satisfactory
  • Initialization not necessary (flat PDF)
  • PDF variation sufficiently expressive
  • That is, it assimilates updates and large number
    of observations satisfactorily

32
Next few slides shed light on beta probability
distribution function
  1. Mathematical formulation
  2. Graphs for intuitive understanding of its role

33
Role of Beta probability distribution function
x is a probability, so it ranges from 0-1
If the prior distribution of p is uniform, then
the beta distribution gives posterior
distribution of p after observing a-1
occurrences of event with probability p and b-1
occurrences of the complementary event with
probability (1-p).
34
  • b, so the pdfs are symmetric w.r.t 0.5.
  • Note that the graphs get narrower as (ab)
    increases.
  • 2
  • b 2
  • 1
  • b 1
  • 5
  • b 5
  • 10
  • b 10

35
  • / b, so the pdfs are asymmetric w.r.t . 0.5.
  • Note that the graphs get narrower as (ab)
    increases.
  • 5
  • b 10
  • 5
  • b 25
  • 25
  • b 5
  • 10
  • b 5

36
Advantages Robust w.r.t. attacks
  • Bad-mouthing attack
  • E-commerce analogy Sellers collude with buyers
    to give bad ratings to others
  • Ballot stuffing attack
  • E-commerce analogy Sellers collude with buyers
    to give it unfairly good ratings
  • Sleeper attacks
  • Apparently trusted agent defects

37
Trust in Tweets
38
Twitter
  • Large network of people
  • Large number of tweets
  • Tweet 140 character description of an event
  • Problem How to organize tweets?

39
Exploiting trust information
  • Rank tweets according to trust information
  • Trust in the user who tweets
  • Belief (trust) in the tweet

40
Trust in the person who tweets
  • Popularity of the user
  • Based on count of followers
  • Reputation of the user
  • Based on history of making informed observations
  • Enrich using Pagerank Analogy?
  • Highly trusted followers count more than lowly
    trusted followers

41
Belief (Trust) in a tweet
  • Belief in a tweet depends on the trust in the
    user who generates it.
  • Belief in a tweet depends on the content of
    similar tweets (originating from approximately
    the same location around the same time)

42
Trust in Linked Open Data
43
Linked Data
  • The Linking Open Data (LOD) project is a
    community-led effort to create openly accessible,
    and interlinked, RDF Data on the Web.
  • RDF Resource Description Framework graph-based
    representation language

44
Linked Data
45
Exploiting trust for access and standardization
  • Trust in the creator of the data, and belief
    (trust) in the data
  • How well connected is the data?
  • Rank LOD according to trust information

46
Sensor Data on LOD
  • MesoWest weather data in US
  • 20,000 Sensor Systems
  • 1 billion Observational Assertions
  • Sensors linked with Geonames on LOD
  • http//wiki.knoesis.org/index.php/SSW

47
Trust in Active Perception
48
Active Perception
  • Perception is the process of observing,
    hypothesis generation, and verification

49
Evidence-based Trust
  • Observations (and hypotheses) are more trusted if
    they can be verified through empirical evidence
  • Sensors are more trusted if their observations
    are trusted

50
Evidence-based Trust
Trust
Strengthened Trust
51
Additional uses of active perception in sensors
context
  • Determining actionable intelligence by narrowing
    set of explanations to one
  • Enable use of a small set of always on sensors
    to bootstrap and selectively turn-on additional
    sensors in a resource (e.g., power) constrained
    environment

52
References
  • Krishnaprasad Thirunarayan, Dharan Althuru, Cory
    Henson, and Amit Sheth, A Local Qualitative
    Approach to Referral and Functional Trust, The
    4th Indian International Conference on Artificial
    Intelligence (IICAI-09), December 2009.
  • Cory Henson, Joshua Pschorr, Amit Sheth, and
    Krishnaprasad Thirunarayan, SemSOS Semantic
    Sensor Observation Service, International
    Symposium on Collaborative Technologies and
    Systems (CTS2009), Workshop on Sensor Web
    Enablement (SWE2009), Baltimore, Maryland, 2009.
  • Krishnaprasad Thirunarayan, Cory Henson, and Amit
    Sheth, Situation Awareness via Abductive
    Reasoning from Semantic Sensor Data A
    Preliminary Report, International Symposium on
    Collaborative Technologies and Systems (CTS2009),
    Workshop on Collaborative Trusted Sensing,
    Baltimore, Maryland, 2009.

53
References
  • A. Sheth and M. Nagarajan, Semantics-Empowered
    Social Computing, IEEE Internet Computing,
    Jan/Feb 2009, 76-80
  • Amit Sheth, Cory Henson, and Satya Sahoo,
    "Semantic Sensor Web," IEEE Internet Computing,
    vol. 12, no. 4, July/August 2008, p. 78-83.

54
Machine and Citizen Sensor Data Demos
  • Illustrate semantic web and information retrieval
    techniques -- spatio-temporal-thematic
    ontologies, mash-ups, machine and citizen sensor
    data analytics

55
Motivating Scenario Spatio-temporal-thematic
analytics
High-level Sensor
Low-level Sensor
  • How do we check if the three images depict
  • the same time and same place?
  • same entity?
  • a serious threat?

55
56
Semantic Observation Service Overall
Architecture and Details
57
  • SemSOS Demo
  • http//knoesis.wright.edu/research/semsci/applicat
    ion_domain/sem_sensor/cory/demos/ssos_demo/ssos_de
    mo.htm
  • Twitris Demo http//twitris.knoesis.org/

58
Situation Awareness Analysis
  • Situation Awareness Components
  • Physical World Sensor Data
  • Perception Entity Metadata
  • Comprehension Relationship Metadata
  • Semantic Analysis
  • How is the data represented? Sensor Web
    Enablement
  • What are the sources of the data?
    Provenance Analysis
  • What objects/events account for the data?
    Abductive Reasoning
  • Where did the event occur?
    Spatial Analysis
  • When did the event occur?
    Temporal Analysis
  • What is the significance of the event?
    Thematic Analysis
  • What are the reasons for inconsistency?
    Abductive Reasoning

59
A Unified Approach to Retrieving Web Documents
and Semantic Web Data
  • Trivikram Immaneni and Krishnaprasad
    Thirunarayan
  • Department of Computer Science and Engineering
  • Wright State University
  • Dayton, OH-45435, USA
  • Currently at Technorati, San Francisco

60
Outline
  • Goal (What?)
  • Background and Motivation (Why?)
  • Unified Web Model (Why?)
  • Query Language and Examples (What?)
  • Implementation Details (How?)
  • Evaluation and Applications (Why?)
  • Conclusions

61
Goal
62
  • Integrate HTML Web and Semantic Web by
    establishing and exploiting connections between
    them gt Unified Web Model
  • Design and implement a language to retrieve data
    and documents from the Unified Web gt Hybrid
    Query Language
  • Implement the system using mature software
    components for indexing and search gt SITAR

63
Background and Motivation
64
HTML Web
  • Hyperlinked Web of documents
  • Content human comprehensible
  • Search engines and web browsers search, retrieve,
    navigate, and display information
  • Keyword-based searches have low precision and
    high recall

65
Semantic Web
  • Standards-based labeled graph of resources and
    binary properties (data)
  • Content machine accessible
  • Database techniques adapted to store and
    retrieve Semantic Web data
  • Query formulation by lay users difficult but
    results are precise
  • XML, RDF, SPARQL, Web Services, etc.

66
Shoehorning HTML Web into Semantic Web
  • Document Data node Content
  • as string in RDF graph
  • Regular expressions in SPARQL used to retrieve
    documents.
  • Drawbacks that IR tries to overcome
  • Ease of query formulation Keyword-based
  • Dealing with Large datasets Ranking

67
Formalizing HTML Web as Semantic Web
  • Techniques for manual (re)-authoring of (legacy)
    documents using Semantic Web Technologies is
    neither feasible nor advisable.
  • State-of-the-art NLP and information extraction
    techniques inadequate
  • Informal description indispensable for human
    comprehension
  • Escape route Traceability via superposition
    (E.g., RDFa)

68
Shoehorning Semantic Web into HTML Web
  • Currently, Semantic Web documents live on the
    HTML Web but their components are neither
    accessible nor reasoned with via keyword-based
    searches
  • Swoogle attempts to rank Semantic Web documents

69
Unified Web Model (What?)
70
Aim
  • Unified Web integrates the two Webs to enable
    improved hybrid retrieval of data and documents.
  • Unified Web Model
  • Hybrid Query Language

71
Unified Web Model Graph
  • Node
  • Abstract entity identified by its URI
  • Blank/Literal node names automatically generated
  • Home URI Section
  • URI index words
  • Document Section (optional)
  • Outgoing Links Section
  • Triples Section

72
(contd)
  • Relationships (Edges)
  • hasDocument
  • Relates Node to content string
  • hyperlinksTo
  • Relates Node with another node to which the
    former nodes document contains a hyperlink
  • Asserts
  • Relates Node with each RDF statement in the
    document
  • linksTo
  • Relates Node with another node
  • to which the former nodes document contains a
    hyperlink, or
  • such that the former nodes document contains a
    triple with the latter node

73
Example of Unified Web Model
  • Document http//www.abc.com/xyz.htm contains the
    RDF fragment
  • ltmailTo joe_at_abc.com/gt
  • ltrdfRDFgt
  • ltowlClass rdfIDhttp//www.abc.com/swJaguar
    /gt
  • lt/rdfRDFgt

74
(No Transcript)
75
Data Retrieval from Unified Web
  • Unified Web Model can be specified using RDF
  • In terms of rdfsResource, rdfsPropery,
    rdfsStatement, rdfsLiteral, refsSubject,
    rdfsPredicate, rdfsObject, etc
  • Unified Web is a reified Semantic Web (user
    triples)
  • SPARQL usable as query language

76
Information Retrieval from Unified Web
  • Node can be indexed using URI index words
  • Based on name, content, label, triples, etc
  • Node can be ranked using its phrasal / URI-based
    annotations and its node neighborhood

77
Advantages
  • Semantic Web nodes can be retrieved using
    (associated) keywords
  • Legacy document recall improved by interpreting
    hyperlink as Semantic Markup for reasoning.
  • Hyperlink mailtoabc_at_wright.edu
  • Triple
  • ltmailtoabc_at_wright.edu rdftype univprofgt

78
  • Semantics rich URIs (such as those from
    dictionary.com) in legacy documents can be
    incrementally equated with ontologies
  • Document lta href http//dictionary.com/searc
    h?qjaguargt Jaguar lt/agt God of the Underworld
  • Ontology lthttp//dictionary.com/search?qjaguar
    owlSameas http//www.animalOnto.com/Jaguargt ...

79
Query Language and Examples (What?)
80
Aim
  • Store and retrieve Semantic Web data, and use
    information in documents to enhance data
    retrieval
  • Enable use of keywords to deal with lack of
    complete URI information
  • Peter affiliated-with ?X
  • Enable use of partial information about data
    being searched
  • Student Peter affiliated-with ?X

81
  • Store and retrieve documents, and use information
    in the Semantic Web to enhance document retrieval
  • Docsearch(ltanimalgtltjaguargt Maya God)

82
Sample Queries
  • Wordset queries
  • ltpeter haasegt -gt retrieves all URIs indexed by
    BOTH peter AND haase
  • Includes document and URIs
  • URIs are indexed by words.
  • The words are obtained by analyzing URIs, from
    label literals, and anchor text of the URIs.

83
  • Wordset Pair queries
  • ltphdstudentgtltpetergt -gt specifies that user is
    looking for peter, the phd student
  • Transitive closure

84
More Queries
  • Get Peter the Phd students home page
  • getBindings ( ltphdstudentgtltpetergt
    lthomepagegt ?x )
  • Get Peter Haases publications that have
    Semantic in their title
  • getBindings(ltpeter haasegt ltpublicationgt ?x
    ?x lttitlegt ltsemanticgt)
  • Get group 1 element which is white in color
  • getBindings( ?x ltgroupgt ltgroup 1gtĀ  ?x
    ltcolorgt ltwhitegt )

85
  • Homepages of Phd students named Peter that talk
    about Semantic Grid
  • getDocsByBindingsAndContent
  • ( ltphdstudentgtltpetergt lthomepagegt ?x
    semantic grid )
  • getLinkingNodes
  • ( http//www.aifb.uni-karlsruhe.de/Personen/viewPe
    rson?id_db2023 )
  • getAssertingNodes
  • (ltpeter haasegt ltpublicationgt ?x).
  • getDocsByIndexOrContent (peter haase)

86
Implementation Details (How?)
SITAR Semantic InformaTion Analysis and
Retrieval system
87
Tools Used
  • Apache Lucene 2.0 APIs in Java
  • A high-performance, text search engine library
    with smart indexing strategies.
  • Cyberneko HTML Parser
  • Jena ARP RDF parser

88
Evaluation and Application (Why?)
89
Experiments
  • DATASETs
  • AIFB SEAL data
  • The crawler collected 1665 files (English XHTML
    pages and RDF/OWL pages).
  • 1455 (610 RDF files and 845 XHTML files) were
    successfully parsed and indexed
  • A total of 193520 triples were parsed and indexed

90
  • Datasets (contd)
  • TAP dataset
  • Periodic table
  • Lehigh University BenchMarks

91
Conclusions
92
  • Developed a Hybrid Query language for data and
    document retrieval
  • that is convenient because it is keyword-based
  • that can be accurate and flexible because
    disambiguation information can be provided
  • that is expressive because it can support
    inheritance reasoning
  • that is pragmatic because it can work with legacy
    documents
  • FUTURE WORK Robust Ranking Strategy

93
References
  • T. Immaneni and K. Thirunarayan, A Unified
    approach To Retrieving Web Documents and Semantic
    Web Data, In Proceedings of the 4th European
    Semantic Web Conference (ESWC 2007), LNCS 4519,
    pp. 579-593, June 2007.
  • T. Immaneni, and K. Thirunarayan, Hybrid
    Retrieval from the Unified Web, In Proceedings
    of the 22nd Annual ACM Symposium on Applied
    Computing (ACM SAC 2007), pp. 1376-1380, March
    2007.

94
THANK YOU!
  • http//knoesis.wright.edu/tkprasad/
Write a Comment
User Comments (0)
About PowerShow.com