Title: Social Networks, the Semantic Web, and the Future of Online Scientific Collaboration
1Social Networks, the Semantic Web, and the Future
of Online Scientific Collaboration
- Jennifer Golbeck
- University of Maryland, College Park
2Overview
- What is the Semantic Web?
- How can it help us do science?
- About Web-based Social Networks
- Combining the Semantic Web, Social Nets, Science,
and Provenance
3What is the Semantic Web
- Extension of the current web
- Make information machine processable
- Supported at the W3C
4Current Web to Semantic Web
- HTML is designed to make documents on the web
easy to read for humans - Computers have difficulty understanding what is
on the web - We do ok with keywords for text
- What about videos, pictures, songs, data?
5Stuff We Want
- Find me the mp3 of a song that was on the
Billboard top 10 that uses a cowbell - Show me the URLs of the blogs written by people
my friends know - Get a video where its snowing
- All of this is hard to do on the web as it stands
6Making it Easier
- On the Semantic Web, data is represented in a
machine readable standard format - Some created automatically, some by humans
- Ontologies add semantics
- Each datum is uniquely identified by a URI
- Distributed data can be aggregated and integrated
into one model
7Semantic Web Technologies
- URIs
- Ontologies
- Standard Languages
- RDF
- RDFS
- OWL
- SPARQL
8Example A Video of it Snowing
- On the Semantic web, people will annotate their
data, but they wont annotate everything - If my video is of two government officials
meeting, the weather may be irrelevant to me - How can the semantic web solve this? Do people
have to annotate everything?
9Linking Distributed Data
Location
Camera Info
Date
President
Video
More data
Prime Minister
10Data Aggregation
- URIs are unique.
- If the same URI is used in two files, it refers
to the same object - Semantic Web tools (e.g. things like databases
that understand the semantics of the languages)
build models that merge information about the
same URI - Model can be queried, filtered, used
11Semantic Web for Science
12Provenance
- The history of a file or resource
- Files that were used in its creation
- Processes executed to create it
- When, where it was created
- Who created it
13Why is it important?
- People in the scientific and intelligence
communities are very interested in provenance - Science provenance of data can be used to
recreate them - Intelligence provenance of information is
important to determine its reliability
14Example in Science
- We want to track the workflow that lead to a
given scientific image - What were the files used to create it?
- What is the provenance of those files?
- What process was performed to create the file?
- When was that file created?
- Who executed the processes?
15Case Study A Semantic Web Approach to the
Provenance Challenge
16The Provenance Challenge
- Tracking provenance is a growing topic of
interest to computer scientists - Applications to grid computing, file systems,
databases, etc - The challenge is to build a system that will
track the provenance of files produced from a
workflow - Series of procedures performed to produce output
- functional Magnetic Resonance Imaging (fMRI) is
the example in the challenge
17(No Transcript)
18Challenge
- Represent all data that we consider relevant
about the history of each file - Answer as many queries as possible
19Queries
- Find everything that caused a given Graphic to be
as it is. - Find all invocations of procedure align_warp
using a twelfth order nonlinear 1365 parameter
that ran on a Monday. - Find all images where at least one of the input
files had an entry global maximum4095. - A user has annotated some images with a
key-value pair centerUChicago. Find the outputs
of align_warp where the inputs are annotated with
centerUChicago.
20Semantic Web Approach
- Each procedure in the workflow is encoded as a
web service - Workflow is an execution of a series of web
services - Web Services take files as input and output files
to the web
21Semantic Web Approach
- Ontology represents information about the
execution of services and the dependencies of
files
22Provenance.owl
23Answering the Queries
- SPARQL, a W3C standard, is used to formulate
queries - Reasoning with the semantics of OWL and some rules
24Results
- We were easily able to answer all nine queries
for the challenge - Semantic Web is an easy and natural format for
representing the provenance of scientific
information - So, with a format for representing data and
metadata, what next?
25Social Networks The Phenomenon
26What are Web-based Social Networks
- Websites where users set up accounts and list
friends - Users can browse through friend links to explore
the network - Some are just for entertainment, others have
business/religious/political purposes - E.g. MySpace, Friendster, Orkut, LinkedIn
27Growth of Social Nets
- The big web phenomenon
- About 150 different social networking websites
(that meet the definition that they can be
browsed) - 275,000,000 user accounts among the networks
- Number of users has doubled in the last 18 months
- Full list at http//trust.mindswap.org
28Biggest Networks
- MySpace 120,000,000
- Adult Friend Finder 23,000,000
- Friendster 21,000,000
- Tickle 20,000,000
- BlackPlanet 17,000,000
- Hi5 14,000,000
- LiveJournal 10,000,000
- Orkut 8,500,000
- Facebook 8,000,000
- Asia Friend Finder 6,000,000
29Social Networks on the Semantic Web
- FOAF (Friend Of A Friend)
- A simple ontology for representing information
about people and who they know - About 20,000,000 social network profiles are
available in FOAF format - Approximately 60 of all semantic web data is
FOAF data
30Structure of Social Nets
- Small World Networks
- AKA Six degrees of separation (or six degrees of
Kevin Bacon) - Term coined by Stanley Milgram, 1967
- Math of Small Worlds
- Average shortest path length grows
logarithmically with the size of the network - Short average path length
- High clustering coefficient (friends of mine who
are friends with other friends of mine)
31Trust in Social Networks
- People annotate their relationships with
information about how much they trust their
friends - Trust can be binary (trust or dont trust) or on
some scale - This work uses a 1-10 scale where 1 is low trust
and 10 is high trust - At least 8 social networks have some mechanism
for expressing trust explicitly, several dozen
have implicit trust information
32Using Trust from Social Networks
- If we have trust available from a social network,
how can we use that? - Trust in people can influence how likely we are
to - Give them access to information
- Accept information from them at all
- Consider the quality of information from them
33Examples
- Only people I trust can see my phone number
- I will only accept emails from people I trust
34Challenges to Using Trust
- Each person only knows a very very small part of
the network - For people we know, some automatic use of trust
may be helpful, but it does not provide any new
information - If we have access to the network, we need a way
to compute how much we should trust others
35Inferring Trust
The Goal Select two individuals - the source
(node A) and sink (node C) - and recommend to the
source how much to trust the sink.
tAC
A
B
C
tAB
tBC
36Caveats and Insights
- Trust is contextual
- Trust is asymmetric
- Trust is not exactly transitive
37(No Transcript)
38Trust Algorithm
- If the source does not know the sink, the source
asks all of its friends how much to trust the
sink, and computes a trust value by a weighted
average - Neighbors repeat the process if they do not have
a direct rating for the sink
39How Well Does It Work?
- Pretty well
- On networks where we have tested it, trust is
computed accurately within about 10 - Test this by taking a known trust value, deleting
the edge between those people, comparing the
known value with the value we compute - 10 is very good for social systems with lots of
noise
40Applications of Trust
- With direct knowledge or a recommendation about
how much to trust people, this value can be used
as a filter in many applications - Since social networks are so prominent on the
web, it is a public, accessible data source for
determining the quality of annotations and
information
41Ordering
- Use trust to determine the order in which
information is presented - Aggregating
- If data is aggregated, we can use trust to
determine how much weight is given to different
sources
42Social Networks for Science
- Data Provenance Social Networks Social
Policies
43Policies on the Web
- Policies on the web are used to filter and
restrict access to information for - Security
- Privacy
- Trust
- Information filtering
- Accountability
- Important because of the open nature of the web
44Applications of the policy aware web
- Website access
- Network routing
- Storage management
- Grid computing
- Pervasive computing
- Information filtering
- Digital rights management
- Collaboration
45Applications and Industrial Interest
- Internet Content Rating Agency
- Using policies and rules to develop content
ratings for websites - Efforts underway at
- Microsoft, IBM, Sun, BEA, Oracle
- Heavily discussed at W3C Workshop on Constraints
and Capabilities for Web Services - http//www.w3.org/2004/09/ws-cc-program.html
46Example Policies
- Only allow members of my research group to access
this data set - Reject messages from anyone whose address is not
on my list of verified senders
47Policies and Trust
- Only users whose inferred trust rating is a 9 or
10 may run processes on this shared computing
resource - Access to preprints of this paper are accessible
only to trusted Fermilab personnel, members of
the research team at other institutions, or the
NSF advisory board - Include information in my knowledge base only if
it, and all the files and processes in its
provenance, were created or executed by people I
trust at a level 7 or above
48Extending Trust to Science
- In collaborative scientific environments, some
data and resources require strict access control
(username / password) - For others, this level of control is unnecessary
and cumbersome
49Trust for Access Control
- With a scientific social network, trust can be
used to restrict access to - Data
- Computing resources
- and
- Limit what data is integrated into a knowledge
base - Weight conflicting information from different
sources according to the trustworthiness of the
source
50Leading to Collaboration
- The semantic web with social networks provides a
platform for - Publishing data
- Publishing metadata (so experiments can be
verified) - Limiting/granting access to sensitive data
- Gathering data from other sources
- Filtering data from the web
51What do we need to do?
- Easy Steps
- Building ontologies for representing scientific
data / metadata - Publishing data on the web
52What do we need to do?
- Hard Steps (because people dont want to do it)
- Developing web policies for limiting access to
non-critical data - Webmasters can do this, with training and
collaboration with data owners - Motivating scientists into social networks
53Forcing the Anti-Social Into Social Nets
- Cant expect scientists to use a Facebook/MySpace
style social network - (and we probably dont want to see that anyway)
- Integrate social networking into other activities
- E.g. email
54The Payoff
- A whole new way of working over the web
- Multiple levels of collaboration
- New ways of sharing data and working together
55Conclusions
- The intersection of the Semantic Web, social
networks, and science holds great promise for
revolutionizing collaboration over the web - Steps to achieving it are mostly social, not
technological - Motivating the use of these technologies among
everyone involved with data - Introducing new ways to collaborate and
encouraging adoption of new techniques
56Questions
- Jennifer Golbeck
- Golbeck_at_cs.umd.edu
- http//trust.mindswap.org