Title: Part 4: Semantics and Metadata Semantic publication and discovery Provenance metadata Semantic Web and the Grid
1Part 4 Semantics and MetadataSemantic
publication and discoveryProvenance
metadataSemantic Web and the Grid
- Professor Carole Goble
- University of Manchester
- http//www.mygrid.org.uk
2Virtual organisations and (Re)use
Service Platform Administrators
Bioinformaticians
Service Providers
Annotation providers
Biologists
Tool middleware developers
3Finding and selecting services
- Activation energy gradient
- Unregistered services
- Scavenging
- URLs and Soaplab endpoints
- Introspection
- Registered services
- Word-based searching
- Semantic annotation for later discovery and
(re)use by friends and strangers in your VO (Part
3) - Drag and drop services onto Taverna workbench
4Registry View Service
- Registry
- Third party registries
- Third party services
- Third party annotation (RDF)
- Views over federated registries
- UDDI interfaces extended with RDF
- Federated views
- Updated via Notification Service
- Personalized based on Annotation
- Authorisation and IPR
5Semantic discovery
- User chooses services
- A common ontology is used to annotate and query
any myGrid object including services. - Discover workflows and services described in the
registry via Taverna. - Look for all workflows that accept an input of
semantic type nucleotide sequence - Aim to have semantic discovery over public view
on the Web.
6Workflow and service annotation
- Adding structured metadata to a workflow
registration to enable others to discover and
reuse it more effectively. E.g. what semantic
type of input does it accept.
7Can you guess what it is yet?
8Service Registration
http//pedro.man.ac.uk
9Semantic Discovery
- Drag a workflow entry into the explorer pane and
the workflow loads. - Drag a service/ workflow to the scavenger window
for inclusion into the workflow
10myGrid and Semantics
- Workflow and service discovery
- Prior to and during enactment
- Semantic registration
- Workflow assembly
- Semantic service typing of inputs and outputs
- Provenance of workflows and other entities
- Experimental metadata glue
- Use of RDF, RDFS, DAMLOIL/OWL
- Instance store, ontology server, reasoner
- Materialised vs at point of delivery reasoning.
- myGrid Information Model
11Annotation
Service Providers
Ontologists
Others
Ontology Store
Description extraction
WSDL
Interface Description
Vocabulary
Soap- lab
Pedro Annotation tool
Annotation providers
Annotation/ description
Taverna Workbench
Registry (Personalised View)
Registry
Registry plug-in
Registry
12Annotation
Ontologists
Ontology Store
Vocabulary
Haystack Provenance Browser
Pedro Annotation tool
Annotation providers
Annotation/ description
Scientists
Taverna Workbench
mIR
Store plug-in
13Service Providers
Ontology Store
Ontologists
Others
Vocabulary
WSDL
Feta Semantic Discovery
Soap- lab
Bioinformaticians
Registry
Taverna Workbench
Registry (Personalised View)
Registry
Registry
Workflow Execution
FreeFluo WfEE
invoking
mIR
Store data metadata
14Layered Semantics
- Domain Semantics layered on top of domain neutral
but scientific data model - Reducing the activation energy, lowering barriers
of entry.
Domain Semantics
Ontologies
Data Metadata
Workflow metadata
IMv2
Experiment Semantics
Format XSD types MIME types
Service Metadata
Provenance metadata
Syntax
Workflow OGSA-DQP
15Model of services
Operation name, description task method resource
application
Service name description authororganisation
Parameter name, description semantic
type format transport type collection
type collection format
hasInput
hasOutput
subclass
subclass
WSDL based Web service
WSDL based operation
Soaplab service
bioMoby service
workflow
Local Java code
16Tiered specifications
Classes of services Domain semantic Unexecutabl
e Potentials
Instances of services Business operational Exec
utable Actuals
17Matrix of metadata in workflow lifecycle
18Stratified metadata
- Service Type and Class (OWL)
19Service and Workflow registration
- Description scheme
- RDFS DAMLOIL / OWL ontologies of services
biology - Based on DAML-S
- Reasoning over OWL descriptions
- Query over RDF
- Aim to have semantic discovery over public view
on the web.
Workflow registration allows peer review and
publication of e-Science methods.
20Service Ontology Suite
parameters input, output, precondition,
effect performs_task uses-resource is_function_of
Upper level ontology
Inspired by DAML-S
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Task ontology
Bioinformatics ontology
Web serviceontology
Current work Joint development on an Open
Biological Ontologies BioService Ontology.
http//obo.sourceforge.net/
21Reflections
- Adverts for services and workflows turns out to
be tricky - Describing different executable objects
- Workflows and Services
- Stratification of metadata
- Classes and Instances of services and workflows
- Service execution
- Complex state based invocation models
- Parametric polymorphism of services
- Executable process models vs discovery process
models - Multi-dimensions of service composition.
22Human vs machine views
Human
Machine
Service User
Service provider
Weak semantic descriptions Rewriting views
UDDI style advertisements
Human
Syntactic descriptions Interface
descriptions Invocation descriptions Semantic
mining
Elaborate Semantic descriptions Simplification
views
Machine
23Reflections
- Multiple descriptions, multiple interfaces
- Users needs vs machine needs
- The dimensions of Service Class substitution
- Biologists choose experimentally meaningful
services and do not want semantically similar
substitutions only substituting one instance for
another - Experimentally neutral glue services that can
be substituted are comparatively few - If users are choosing services you dont need
many kinds of metadata to eliminate 90 of
options.
24Reuse and Repurposing
- Describing for reuse is challenging
- Reuse depends on semantic descriptions and these
are costly to produce - Describing for someone elses benefit
- Reuse by multiple stakeholders
- Licensing workflows for reuse.
- Authorisation models
- But reuse does happen!
- Metadata pays off but it needs a network effect
and there is a cost.
25So far, Using Concepts
- Controlled vocabulary for advertisements for
workflows and services - Indexes into registries and mIR
- Semantic discovery of services and workflows
- Semantic discovery of repository entries
- Type management for composition
- Semantic workflow construction guidance and
validation - Navigation paths between data and knowledge
holdings - Semantic glue between repository entries
- Semantic annotation and linking of workflow
provenance logs
26Part 4 Semantics and MetadataSemantic
publication and discoveryProvenance
metadataSemantic Web and the Grid
27Provenance
- Experiments being performed repeatedly, at
different site, different time, by different
users or groups
A large repository of records about experiments!!
- verification of data
- recipes for experiment designs
- explanation for the impact of changes
- ownership
- performance of services
- data quality
Scientists
In silico experiments
28Provenance Web
29Representing links
urnlsidtaverna.sf.netdatathing45fg6
urnlsidtaverna.sf.netdatathing23ty3
- Identify each resource
- Life science identifier URI with associated data
and metadata retrieval protocols. - Understanding that underlying data will not change
30Representing links II
http//www.mygrid.org.uk/ontologyderived_from
urnlsidtaverna.sf.netdatathing45fg6
urnlsidtaverna.sf.netdatathing23ty3
- Identify link type
- Again use URI
- Allows us to use RDF infrastructure
- Repositories
- Ontologies
31Link v Data Representation
- Data management questions refer to relationships
rather than internal content - What are the origins of this data?
- Which service produced this data?
- Which data is this derived from?
- Who was this data produced for?
- ?What is this data telling me?
- Data analysis questions delegated to external
services.
32Provenance Pyramid
Process Level
33Organisation level provenance
Process level provenance
Service
Project
runBye.g. BLAST _at_ NCBI
Experiment design
Process
Workflow design
componentProcesse.g. web service invocation of
BLAST _at_ NCBI
Event
partOf
instanceOf
componentEvente.g. completion of a web service
invocation at 12.04pm
Workflow run
Data/ knowledge level provenance
knowledge statementse.g. similar protein
sequence to
run for
User can add templates to each workflow process
to determine links between data items.
Data item
Person
Organisation
Data item
Data item
data derivation e.g. output data derived from
input data
34Provenance tracking
- Automated generation of this web of links
- Workflow enactor generates
- LSIDs
- Data derivation links
- Knowledge links
- Process links
- Organisation links
Relationship BLAST report has with other items in
the repository
Other classes of information related to BLAST
report
35Haystack (IBM/MIT)
GenBank record
Portion of the Web of provenance
Managing collection of sequences for review
36(No Transcript)
37Reflections
- Visualisation of results usually domain specific
- Provenance browsing and querying needs to fit
with that visualisation - Generic graphical presentation limited to small,
low complexity result sets - Layered provenance for different purposes and
different stakeholders - Detailed process for debugging and usage
statistics for QoS - Data and Knowledge for the Scientist
- Migration with data objects
- Versioning
- Using provenance to its maximum potential
38Map of Context
Literature relevant to provenance study or data
in this workflow
Provenance record of a workflow run
Interlinking graph of the workflow that generates
the provenance logs
Web page of people who has related interests as
the owner of the workflow
Experiment Notes
39Provenance metadata
- Outside objects
- RDF store
- Within objects
- LSID metadata.
40Linked Provenance Resources
The subsumed concepts
Link to the log annotated with more general
concept
The subsuming concepts
Link to the log annotated with more specific
concept
41Generating Links
The concept
The generated Link to related provenance
document
The name of the data
42Semantics
Ontology-aided workflow construction
- RDF-based service and data registries
- RDF-based metadata for ALL experimental
components - RDF-based provenance graphs
- OWL based controlled vocabularies for database
content - OWL based integration of experiment entities
RDF-based semantic mark up of results, logs,
notes, data entries
43Standards
- By tapping into (defacto) standards (LSID, RDF,
WS-I) and communities we can leverage others
results and tools - Haystack, Pedro, Jena, CHEF/Sakai.
- The Grid standards are confusing and volatile
- The choice of vanilla Web Services was good.
- We didnt jump to OGSI. We wont jump to WSRF
until its necessary. - And workflow standards have been untimely.
44Role of Ontologies
Service matching and provisioning
Composing and validating workflows and service
compositions negotiations
Service resource registration discovery
Help
Knowledge-based guidance and recommendation
Schema mediation
45Part 4 Semantics and MetadataSemantic
publication and discoveryProvenance
metadataSemantic Web and the Grid
46A pioneer of the
The Semantic Grid is an extension of the current
Grid in which information and services are given
well-defined and explicitly represented meaning,
better enabling computers and peopleto work in
cooperation
Semantics in and on the Grid
47The semantics of knowledge
- Semantic Grids
- Grids and Grid middleware that makes use of
semantics for its installation, deployment,
running etc. - I.e. Semantics IN the Grid FOR the Grid.
- Knowledge Grids
- A virtual knowledge base derived by using the
Grid resources, in the same spirit as a data grid
is a virtual data resource and a compute grid a
virtual computer. Knowledge Grids include
services for knowledge mining. - I.e Semantics ON the Grid arising from the USE of
the Grid.
48Knowledge Stakeholders
Knowledge for the Grid Application
Semantics for the Grid
Sources of Knowledge
49- The Semantic Web is an extension of the current
Web in which information is given well-defined
meaning, better enabling computers and people to
work in cooperation. It is based on the idea of
having data on the Web defined and linked such
that it can be used for more effective discovery,
automation, integration, and reuse across various
applications. - Hendler, J., Berners-Lee, T., and Miller, E.
- Integrating Applications on the Semantic Web,
2002, http//www.w3.org/2002/07/swint.
50Big Vision
- The Web today is
- A hypermedia digital library
- Collection of linked web pages
- Ubiquitous interface to applications
- Amazon.com
- A platform for multimedia
- BBC Radio 4 in my room!
- A naming scheme
- Unique identity for resources
A place where people do the work, filtering,
linking and interpreting. Computers do the
presentation.
Why not make the computers do the work?
From machine readable resources for humans to
computable resources for machines
51- Expose the meaning of resources by assertions in
a common data model - Publish and share consensually agreed ontologies
so we can share the metadata and add in
background knowledge - Then we can query, filter, integrate and
aggregate the metadata - and reason over it to infer more metadata using
rules - and attribute trust to the metadata.
hasvenue
http//www.marriott.com/epp/...
http//www.amia.org/meetings/...
haslocation
organisedby
event
conference
hotel
period
haslocation
Washington
http//www.amia.org/about/
dates
city
locatedin
locatedin
USA
country
52Infrastructure enablers for e-Research
Grid Computing
Semantic Web
- On demand transparently constructed
multi-organisational federations of distributed
services - Distributed computing middleware
- Computational Integration
- Sharing Resources
- An automatically processable, machine
understandable web - Distributed knowledge and information management
- Information integration
- Sharing information
53Semantic Web layers
Trust
?p -gt ?a pa
?p -gt ?a pa
Rules
?p -gt ?a pa
?p -gt ?a pa
?p -gt ?a pa
Agents
Ontologies
Metadata Annotation
Search engines and filters
Web
Applications
Deep web
54Semantic Grid layers
Trust
?p -gt ?a pa
?p -gt ?a pa
Rules
?p -gt ?a pa
?p -gt ?a pa
?p -gt ?a pa
Agents
Ontologies
Metadata Annotation
Search engines and filters
Grid Services
Applications
Grid State
55Languages
56RDF in a nutshell
- Resource Description Framework
- W3C candidate recommendation (http//www.w3.org/RD
F) - Graphical formalism ( XML syntax semantics)
- for representing metadata
- for describing the semantics of information in a
machine- accessible way - RDFS extends RDF with schema vocabulary, e.g.
- Class, Property
- type, subClassOf, subPropertyOf
- range, domain
- Statements are ltsubject, predicate, objectgt
triples - ltIan,hasColleague,Uligt
- Statements describe properties of resources
- A resource is any object that can be pointed to
by a URI - Properties themselves are also resources (URIs)
57RDF in a nutshell
- Common model for metadata
- A graph of triples
- Query over and link together
- RDQL, repositories, integration tools,
presentation tools - Jena, Haystack
http//www.w3.org/RDF/
58Connected by concepts
Tim Berners-Lee, 2003
http//www.w3.org/2003/Talks/0521-www-keynote-tbl/
slide22-0.html
59W3C Web Ontology language OWL
- The Ontology Language de jour
- Continuum of expressivity
- Concepts, roles, individuals, axioms
- From simple frames to description logics
- Sound and complete formal semantics
- Supports reasoning to infer classification
- Based on the SHIQ description logic
- Eas(ier) to extend and evolve and merge
ontologies - Known in the Bioinformatics world e.g. OBO
- Layered on top of RDF
- Tools, tools, tools.
http//www.w3.org/TR/2004/REC-owl-features-2004021
0/
60Coupling Semantic Web and e-Science/Grid
- Expose the meaning of Grid services, resources
and entities by assertions in a common data model
RDF - Publish and share consensually agreed ontologies
so we can share the metadata and add in
background knowledge RDF(S), OWL - Then we can query, filter, integrate and
aggregate the metadata RDQL - and reason over it to infer more metadata using
rules DL Reasoning, SWRL - and attribute trust to the metadata.
61(No Transcript)
62Publications
- P Lord, C Wroe, R Stevens, CA Goble, S Miles, L
Moreau, K Decker, T Payne, J Papay, Semantic and
Personalised Service Discovery in Proceedings
IEEE/WIC International Conference on Web
Intelligence / Intelligent Agent Technology
Workshop on "Knowledge Grid and Grid
Intelligence" October 13, 2003, Halifax, Canada. - J Zhao, CA Goble, M Greenwood, C Wroe, R Stevens
Annotating, linking and browsing provenance logs
for e-Science in 1st Semantic Web Conference
(ISWC2003) Workshop on Retrieval of Scientific
Data, Florida, USA, October 2003 - C Wroe, R.D. Stevens, CA Goble, A Roberts, M
Greenwood A suite of DAMLOIL ontologies to
describe bioinformatics web services and data.
International Journal of Cooperative Information
Systems. Special issue on Bioinformatics and
Biological Data Management  12(2)197-224, 2003. - C Wroe, CA Goble, M Greenwood, P Lord, S Miles, L
Moreau, J Papay, T Payne Experiment automation
using semantic data on a bioinformatics Grid,
IEEE Intelligent Systems, Jan/Feb 2004 - J Zhao, C Wroe, CA Goble, R Stevens, D Quan, M
Greenwood, Using Semantic Web Technologies for
Representing e-Science Provenance in Proc 3rd
International Semantic Web Conference ISWC2004,
Hiroshima, Japan, 9-11 Nov 2004. - C Wroe, P Lord, S Miles, J Papay, L Moreau, C
Goble Recycling Services and Workflows through
Discovery and Reuse to appear in Proceedings UK
e-Science All Hands Meeting Nottingham, UK, 1-3
September, 2004. - P Lord, S Bechhofer, M Wilkinson, G Schiltz, D
Gessler, C Goble, L Stein, D Hull. Applying
semantic web services to bioinformatics
Experiences gained, lessons learnt. in Proc 3rd
International Semantic Web Conference ISWC2004,
Hiroshima, Japan, 9-11 Nov 2004 - M. Szomszor and L. Moreau Recording and Reasoning
Over Data Provenance in Web and Grid Services in
International Conference on Ontologies, Databases
and Applications of Semantics (ODBASE'03), volume
2888 of Lecture Notes in Computer Science, pages
603-620, Catania, Sicily, Italy, 3-7 November 2003