Social Networks, the Semantic Web, and the Future of Online Scientific Collaboration - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Social Networks, the Semantic Web, and the Future of Online Scientific Collaboration

Description:

E.g. MySpace, Friendster, Orkut, LinkedIn. 27. Growth of Social Nets. The ... Friendster 21,000,000. Tickle 20,000,000. BlackPlanet 17,000,000. Hi5 14,000,000 ... – PowerPoint PPT presentation

Number of Views:876
Avg rating:3.0/5.0
Slides: 57
Provided by: jennifer886
Category:

less

Transcript and Presenter's Notes

Title: Social Networks, the Semantic Web, and the Future of Online Scientific Collaboration


1
Social Networks, the Semantic Web, and the Future
of Online Scientific Collaboration
  • Jennifer Golbeck
  • University of Maryland, College Park

2
Overview
  • What is the Semantic Web?
  • How can it help us do science?
  • About Web-based Social Networks
  • Combining the Semantic Web, Social Nets, Science,
    and Provenance

3
What is the Semantic Web
  • Extension of the current web
  • Make information machine processable
  • Supported at the W3C

4
Current Web to Semantic Web
  • HTML is designed to make documents on the web
    easy to read for humans
  • Computers have difficulty understanding what is
    on the web
  • We do ok with keywords for text
  • What about videos, pictures, songs, data?

5
Stuff We Want
  • Find me the mp3 of a song that was on the
    Billboard top 10 that uses a cowbell
  • Show me the URLs of the blogs written by people
    my friends know
  • Get a video where its snowing
  • All of this is hard to do on the web as it stands

6
Making it Easier
  • On the Semantic Web, data is represented in a
    machine readable standard format
  • Some created automatically, some by humans
  • Ontologies add semantics
  • Each datum is uniquely identified by a URI
  • Distributed data can be aggregated and integrated
    into one model

7
Semantic Web Technologies
  • URIs
  • Ontologies
  • Standard Languages
  • RDF
  • RDFS
  • OWL
  • SPARQL

8
Example A Video of it Snowing
  • On the Semantic web, people will annotate their
    data, but they wont annotate everything
  • If my video is of two government officials
    meeting, the weather may be irrelevant to me
  • How can the semantic web solve this? Do people
    have to annotate everything?

9
Linking Distributed Data
Location
Camera Info
Date
President
Video
More data
Prime Minister
10
Data Aggregation
  • URIs are unique.
  • If the same URI is used in two files, it refers
    to the same object
  • Semantic Web tools (e.g. things like databases
    that understand the semantics of the languages)
    build models that merge information about the
    same URI
  • Model can be queried, filtered, used

11
Semantic Web for Science
12
Provenance
  • The history of a file or resource
  • Files that were used in its creation
  • Processes executed to create it
  • When, where it was created
  • Who created it

13
Why is it important?
  • People in the scientific and intelligence
    communities are very interested in provenance
  • Science provenance of data can be used to
    recreate them
  • Intelligence provenance of information is
    important to determine its reliability

14
Example in Science
  • We want to track the workflow that lead to a
    given scientific image
  • What were the files used to create it?
  • What is the provenance of those files?
  • What process was performed to create the file?
  • When was that file created?
  • Who executed the processes?

15
Case Study A Semantic Web Approach to the
Provenance Challenge
16
The Provenance Challenge
  • Tracking provenance is a growing topic of
    interest to computer scientists
  • Applications to grid computing, file systems,
    databases, etc
  • The challenge is to build a system that will
    track the provenance of files produced from a
    workflow
  • Series of procedures performed to produce output
  • functional Magnetic Resonance Imaging (fMRI) is
    the example in the challenge

17
(No Transcript)
18
Challenge
  • Represent all data that we consider relevant
    about the history of each file
  • Answer as many queries as possible

19
Queries
  • Find everything that caused a given Graphic to be
    as it is.
  • Find all invocations of procedure align_warp
    using a twelfth order nonlinear 1365 parameter
    that ran on a Monday.
  • Find all images where at least one of the input
    files had an entry global maximum4095.
  • A user has annotated some images with a
    key-value pair centerUChicago. Find the outputs
    of align_warp where the inputs are annotated with
    centerUChicago.

20
Semantic Web Approach
  • Each procedure in the workflow is encoded as a
    web service
  • Workflow is an execution of a series of web
    services
  • Web Services take files as input and output files
    to the web

21
Semantic Web Approach
  • Ontology represents information about the
    execution of services and the dependencies of
    files

22
Provenance.owl
23
Answering the Queries
  • SPARQL, a W3C standard, is used to formulate
    queries
  • Reasoning with the semantics of OWL and some rules

24
Results
  • We were easily able to answer all nine queries
    for the challenge
  • Semantic Web is an easy and natural format for
    representing the provenance of scientific
    information
  • So, with a format for representing data and
    metadata, what next?

25
Social Networks The Phenomenon
26
What are Web-based Social Networks
  • Websites where users set up accounts and list
    friends
  • Users can browse through friend links to explore
    the network
  • Some are just for entertainment, others have
    business/religious/political purposes
  • E.g. MySpace, Friendster, Orkut, LinkedIn

27
Growth of Social Nets
  • The big web phenomenon
  • About 150 different social networking websites
    (that meet the definition that they can be
    browsed)
  • 275,000,000 user accounts among the networks
  • Number of users has doubled in the last 18 months
  • Full list at http//trust.mindswap.org

28
Biggest Networks
  • MySpace 120,000,000
  • Adult Friend Finder 23,000,000
  • Friendster 21,000,000
  • Tickle 20,000,000
  • BlackPlanet 17,000,000
  • Hi5 14,000,000
  • LiveJournal 10,000,000
  • Orkut 8,500,000
  • Facebook 8,000,000
  • Asia Friend Finder 6,000,000

29
Social Networks on the Semantic Web
  • FOAF (Friend Of A Friend)
  • A simple ontology for representing information
    about people and who they know
  • About 20,000,000 social network profiles are
    available in FOAF format
  • Approximately 60 of all semantic web data is
    FOAF data

30
Structure of Social Nets
  • Small World Networks
  • AKA Six degrees of separation (or six degrees of
    Kevin Bacon)
  • Term coined by Stanley Milgram, 1967
  • Math of Small Worlds
  • Average shortest path length grows
    logarithmically with the size of the network
  • Short average path length
  • High clustering coefficient (friends of mine who
    are friends with other friends of mine)

31
Trust in Social Networks
  • People annotate their relationships with
    information about how much they trust their
    friends
  • Trust can be binary (trust or dont trust) or on
    some scale
  • This work uses a 1-10 scale where 1 is low trust
    and 10 is high trust
  • At least 8 social networks have some mechanism
    for expressing trust explicitly, several dozen
    have implicit trust information

32
Using Trust from Social Networks
  • If we have trust available from a social network,
    how can we use that?
  • Trust in people can influence how likely we are
    to
  • Give them access to information
  • Accept information from them at all
  • Consider the quality of information from them

33
Examples
  • Only people I trust can see my phone number
  • I will only accept emails from people I trust

34
Challenges to Using Trust
  • Each person only knows a very very small part of
    the network
  • For people we know, some automatic use of trust
    may be helpful, but it does not provide any new
    information
  • If we have access to the network, we need a way
    to compute how much we should trust others

35
Inferring Trust
The Goal Select two individuals - the source
(node A) and sink (node C) - and recommend to the
source how much to trust the sink.
tAC
A
B
C
tAB
tBC
36
Caveats and Insights
  • Trust is contextual
  • Trust is asymmetric
  • Trust is not exactly transitive

37
(No Transcript)
38
Trust Algorithm
  • If the source does not know the sink, the source
    asks all of its friends how much to trust the
    sink, and computes a trust value by a weighted
    average
  • Neighbors repeat the process if they do not have
    a direct rating for the sink

39
How Well Does It Work?
  • Pretty well
  • On networks where we have tested it, trust is
    computed accurately within about 10
  • Test this by taking a known trust value, deleting
    the edge between those people, comparing the
    known value with the value we compute
  • 10 is very good for social systems with lots of
    noise

40
Applications of Trust
  • With direct knowledge or a recommendation about
    how much to trust people, this value can be used
    as a filter in many applications
  • Since social networks are so prominent on the
    web, it is a public, accessible data source for
    determining the quality of annotations and
    information

41
Ordering
  • Use trust to determine the order in which
    information is presented
  • Aggregating
  • If data is aggregated, we can use trust to
    determine how much weight is given to different
    sources

42
Social Networks for Science
  • Data Provenance Social Networks Social
    Policies

43
Policies on the Web
  • Policies on the web are used to filter and
    restrict access to information for
  • Security
  • Privacy
  • Trust
  • Information filtering
  • Accountability
  • Important because of the open nature of the web

44
Applications of the policy aware web
  • Website access
  • Network routing
  • Storage management
  • Grid computing
  • Pervasive computing
  • Information filtering
  • Digital rights management
  • Collaboration

45
Applications and Industrial Interest
  • Internet Content Rating Agency
  • Using policies and rules to develop content
    ratings for websites
  • Efforts underway at
  • Microsoft, IBM, Sun, BEA, Oracle
  • Heavily discussed at W3C Workshop on Constraints
    and Capabilities for Web Services
  • http//www.w3.org/2004/09/ws-cc-program.html

46
Example Policies
  • Only allow members of my research group to access
    this data set
  • Reject messages from anyone whose address is not
    on my list of verified senders

47
Policies and Trust
  • Only users whose inferred trust rating is a 9 or
    10 may run processes on this shared computing
    resource
  • Access to preprints of this paper are accessible
    only to trusted Fermilab personnel, members of
    the research team at other institutions, or the
    NSF advisory board
  • Include information in my knowledge base only if
    it, and all the files and processes in its
    provenance, were created or executed by people I
    trust at a level 7 or above

48
Extending Trust to Science
  • In collaborative scientific environments, some
    data and resources require strict access control
    (username / password)
  • For others, this level of control is unnecessary
    and cumbersome

49
Trust for Access Control
  • With a scientific social network, trust can be
    used to restrict access to
  • Data
  • Computing resources
  • and
  • Limit what data is integrated into a knowledge
    base
  • Weight conflicting information from different
    sources according to the trustworthiness of the
    source

50
Leading to Collaboration
  • The semantic web with social networks provides a
    platform for
  • Publishing data
  • Publishing metadata (so experiments can be
    verified)
  • Limiting/granting access to sensitive data
  • Gathering data from other sources
  • Filtering data from the web

51
What do we need to do?
  • Easy Steps
  • Building ontologies for representing scientific
    data / metadata
  • Publishing data on the web

52
What do we need to do?
  • Hard Steps (because people dont want to do it)
  • Developing web policies for limiting access to
    non-critical data
  • Webmasters can do this, with training and
    collaboration with data owners
  • Motivating scientists into social networks

53
Forcing the Anti-Social Into Social Nets
  • Cant expect scientists to use a Facebook/MySpace
    style social network
  • (and we probably dont want to see that anyway)
  • Integrate social networking into other activities
  • E.g. email

54
The Payoff
  • A whole new way of working over the web
  • Multiple levels of collaboration
  • New ways of sharing data and working together

55
Conclusions
  • The intersection of the Semantic Web, social
    networks, and science holds great promise for
    revolutionizing collaboration over the web
  • Steps to achieving it are mostly social, not
    technological
  • Motivating the use of these technologies among
    everyone involved with data
  • Introducing new ways to collaborate and
    encouraging adoption of new techniques

56
Questions
  • Jennifer Golbeck
  • Golbeck_at_cs.umd.edu
  • http//trust.mindswap.org
Write a Comment
User Comments (0)
About PowerShow.com