Social Networks, the Semantic Web, and the Future of Online Scientific Collaboration - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

Social Networks, the Semantic Web, and the Future of Online Scientific Collaboration

Description:

E.g. MySpace, Friendster, Orkut, LinkedIn. 27. Growth of Social Nets. The ... Friendster 21,000,000. Tickle 20,000,000. BlackPlanet 17,000,000. Hi5 14,000,000 ... – PowerPoint PPT presentation

Number of Views:876

Avg rating:3.0/5.0

Slides: 57

Provided by: jennifer886

Category:

more less

Transcript and Presenter's Notes

Title: Social Networks, the Semantic Web, and the Future of Online Scientific Collaboration

1
Social Networks, the Semantic Web, and the Future
of Online Scientific Collaboration

Jennifer Golbeck
University of Maryland, College Park

2
Overview

What is the Semantic Web?
How can it help us do science?
About Web-based Social Networks
Combining the Semantic Web, Social Nets, Science,
and Provenance

3
What is the Semantic Web

Extension of the current web
Make information machine processable
Supported at the W3C

4
Current Web to Semantic Web

HTML is designed to make documents on the web
easy to read for humans
Computers have difficulty understanding what is
on the web
We do ok with keywords for text
What about videos, pictures, songs, data?

5
Stuff We Want

Find me the mp3 of a song that was on the
Billboard top 10 that uses a cowbell
Show me the URLs of the blogs written by people
my friends know
Get a video where its snowing
All of this is hard to do on the web as it stands

6
Making it Easier

On the Semantic Web, data is represented in a
machine readable standard format
Some created automatically, some by humans
Ontologies add semantics
Each datum is uniquely identified by a URI
Distributed data can be aggregated and integrated
into one model

7
Semantic Web Technologies

URIs
Ontologies
Standard Languages
RDF
RDFS
OWL
SPARQL

8
Example A Video of it Snowing

On the Semantic web, people will annotate their
data, but they wont annotate everything
If my video is of two government officials
meeting, the weather may be irrelevant to me
How can the semantic web solve this? Do people
have to annotate everything?

9
Linking Distributed Data
Location
Camera Info
Date
President
Video
More data
Prime Minister
10
Data Aggregation

URIs are unique.
If the same URI is used in two files, it refers
to the same object
Semantic Web tools (e.g. things like databases
that understand the semantics of the languages)
build models that merge information about the
same URI
Model can be queried, filtered, used

11
Semantic Web for Science
12
Provenance

The history of a file or resource
Files that were used in its creation
Processes executed to create it
When, where it was created
Who created it

13
Why is it important?

People in the scientific and intelligence
communities are very interested in provenance
Science provenance of data can be used to
recreate them
Intelligence provenance of information is
important to determine its reliability

14
Example in Science

We want to track the workflow that lead to a
given scientific image
What were the files used to create it?
What is the provenance of those files?
What process was performed to create the file?
When was that file created?
Who executed the processes?

15
Case Study A Semantic Web Approach to the
Provenance Challenge
16
The Provenance Challenge

Tracking provenance is a growing topic of
interest to computer scientists
Applications to grid computing, file systems,
databases, etc
The challenge is to build a system that will
track the provenance of files produced from a
workflow
Series of procedures performed to produce output
functional Magnetic Resonance Imaging (fMRI) is
the example in the challenge

17
(No Transcript)
18
Challenge

Represent all data that we consider relevant
about the history of each file
Answer as many queries as possible

19
Queries

Find everything that caused a given Graphic to be
as it is.
Find all invocations of procedure align_warp
using a twelfth order nonlinear 1365 parameter
that ran on a Monday.
Find all images where at least one of the input
files had an entry global maximum4095.
A user has annotated some images with a
key-value pair centerUChicago. Find the outputs
of align_warp where the inputs are annotated with
centerUChicago.

20
Semantic Web Approach

Each procedure in the workflow is encoded as a
web service
Workflow is an execution of a series of web
services
Web Services take files as input and output files
to the web

21
Semantic Web Approach

Ontology represents information about the
execution of services and the dependencies of
files

22
Provenance.owl
23
Answering the Queries

SPARQL, a W3C standard, is used to formulate
queries
Reasoning with the semantics of OWL and some rules

24
Results

We were easily able to answer all nine queries
for the challenge
Semantic Web is an easy and natural format for
representing the provenance of scientific
information
So, with a format for representing data and
metadata, what next?

25
Social Networks The Phenomenon
26
What are Web-based Social Networks

Websites where users set up accounts and list
friends
Users can browse through friend links to explore
the network
Some are just for entertainment, others have
business/religious/political purposes
E.g. MySpace, Friendster, Orkut, LinkedIn

27
Growth of Social Nets

The big web phenomenon
About 150 different social networking websites
(that meet the definition that they can be
browsed)
275,000,000 user accounts among the networks
Number of users has doubled in the last 18 months
Full list at http//trust.mindswap.org

28
Biggest Networks

MySpace 120,000,000
Adult Friend Finder 23,000,000
Friendster 21,000,000
Tickle 20,000,000
BlackPlanet 17,000,000
Hi5 14,000,000
LiveJournal 10,000,000
Orkut 8,500,000
Facebook 8,000,000
Asia Friend Finder 6,000,000

29
Social Networks on the Semantic Web

FOAF (Friend Of A Friend)
A simple ontology for representing information
about people and who they know
About 20,000,000 social network profiles are
available in FOAF format
Approximately 60 of all semantic web data is
FOAF data

30
Structure of Social Nets

Small World Networks
AKA Six degrees of separation (or six degrees of
Kevin Bacon)
Term coined by Stanley Milgram, 1967
Math of Small Worlds
Average shortest path length grows
logarithmically with the size of the network
Short average path length
High clustering coefficient (friends of mine who
are friends with other friends of mine)

31
Trust in Social Networks

People annotate their relationships with
information about how much they trust their
friends
Trust can be binary (trust or dont trust) or on
some scale
This work uses a 1-10 scale where 1 is low trust
and 10 is high trust
At least 8 social networks have some mechanism
for expressing trust explicitly, several dozen
have implicit trust information

32
Using Trust from Social Networks

If we have trust available from a social network,
how can we use that?
Trust in people can influence how likely we are
to
Give them access to information
Accept information from them at all
Consider the quality of information from them

33
Examples

Only people I trust can see my phone number
I will only accept emails from people I trust

34
Challenges to Using Trust

Each person only knows a very very small part of
the network
For people we know, some automatic use of trust
may be helpful, but it does not provide any new
information
If we have access to the network, we need a way
to compute how much we should trust others

35
Inferring Trust
The Goal Select two individuals - the source
(node A) and sink (node C) - and recommend to the
source how much to trust the sink.
tAC
A
B
C
tAB
tBC
36
Caveats and Insights

Trust is contextual
Trust is asymmetric
Trust is not exactly transitive

37
(No Transcript)
38
Trust Algorithm

If the source does not know the sink, the source
asks all of its friends how much to trust the
sink, and computes a trust value by a weighted
average
Neighbors repeat the process if they do not have
a direct rating for the sink

39
How Well Does It Work?

Pretty well
On networks where we have tested it, trust is
computed accurately within about 10
Test this by taking a known trust value, deleting
the edge between those people, comparing the
known value with the value we compute
10 is very good for social systems with lots of
noise

40
Applications of Trust

With direct knowledge or a recommendation about
how much to trust people, this value can be used
as a filter in many applications
Since social networks are so prominent on the
web, it is a public, accessible data source for
determining the quality of annotations and
information

41
Ordering

Use trust to determine the order in which
information is presented
Aggregating
If data is aggregated, we can use trust to
determine how much weight is given to different
sources

42
Social Networks for Science

Data Provenance Social Networks Social
Policies

43
Policies on the Web

Policies on the web are used to filter and
restrict access to information for
Security
Privacy
Trust
Information filtering
Accountability
Important because of the open nature of the web

44
Applications of the policy aware web

Website access
Network routing
Storage management
Grid computing
Pervasive computing
Information filtering
Digital rights management
Collaboration

45
Applications and Industrial Interest

Internet Content Rating Agency
Using policies and rules to develop content
ratings for websites
Efforts underway at
Microsoft, IBM, Sun, BEA, Oracle
Heavily discussed at W3C Workshop on Constraints
and Capabilities for Web Services
http//www.w3.org/2004/09/ws-cc-program.html

46
Example Policies

Only allow members of my research group to access
this data set
Reject messages from anyone whose address is not
on my list of verified senders

47
Policies and Trust

Only users whose inferred trust rating is a 9 or
10 may run processes on this shared computing
resource
Access to preprints of this paper are accessible
only to trusted Fermilab personnel, members of
the research team at other institutions, or the
NSF advisory board
Include information in my knowledge base only if
it, and all the files and processes in its
provenance, were created or executed by people I
trust at a level 7 or above

48
Extending Trust to Science

In collaborative scientific environments, some
data and resources require strict access control
(username / password)
For others, this level of control is unnecessary
and cumbersome

49
Trust for Access Control

With a scientific social network, trust can be
used to restrict access to
Data
Computing resources
and
Limit what data is integrated into a knowledge
base
Weight conflicting information from different
sources according to the trustworthiness of the
source

50
Leading to Collaboration

The semantic web with social networks provides a
platform for
Publishing data
Publishing metadata (so experiments can be
verified)
Limiting/granting access to sensitive data
Gathering data from other sources
Filtering data from the web

51
What do we need to do?

Easy Steps
Building ontologies for representing scientific
data / metadata
Publishing data on the web

52
What do we need to do?

Hard Steps (because people dont want to do it)
Developing web policies for limiting access to
non-critical data
Webmasters can do this, with training and
collaboration with data owners
Motivating scientists into social networks

53
Forcing the Anti-Social Into Social Nets

Cant expect scientists to use a Facebook/MySpace
style social network
(and we probably dont want to see that anyway)
Integrate social networking into other activities
E.g. email

54
The Payoff

A whole new way of working over the web
Multiple levels of collaboration
New ways of sharing data and working together

55
Conclusions

The intersection of the Semantic Web, social
networks, and science holds great promise for
revolutionizing collaboration over the web
Steps to achieving it are mostly social, not
technological
Motivating the use of these technologies among
everyone involved with data
Introducing new ways to collaborate and
encouraging adoption of new techniques

56
Questions