Part 4: Semantics and Metadata Semantic publication and discovery Provenance metadata Semantic Web and the Grid - PowerPoint PPT Presentation

About This Presentation
Title:

Part 4: Semantics and Metadata Semantic publication and discovery Provenance metadata Semantic Web and the Grid

Description:

... Semantic Web layers Semantic Grid layers Languages RDF in a nutshell Resource Description Framework W3C ... A Web language - maps onto RDF & XML Tooling ... – PowerPoint PPT presentation

Number of Views:435
Avg rating:3.0/5.0
Slides: 60
Provided by: ChrisW175
Category:

less

Transcript and Presenter's Notes

Title: Part 4: Semantics and Metadata Semantic publication and discovery Provenance metadata Semantic Web and the Grid


1
Part 4 Semantics and MetadataSemantic
publication and discoveryProvenance
metadataSemantic Web and the Grid
  • Professor Carole Goble
  • University of Manchester
  • http//www.mygrid.org.uk

2
Virtual organisations and (Re)use
Service Platform Administrators
Bioinformaticians
Service Providers
Annotation providers
Biologists
Tool middleware developers
3
Finding and selecting services
  • Activation energy gradient
  • Unregistered services
  • Scavenging
  • URLs and Soaplab endpoints
  • Introspection
  • Registered services
  • Word-based searching
  • Semantic annotation for later discovery and
    (re)use by friends and strangers in your VO (Part
    3)
  • Drag and drop services onto Taverna workbench

4
Registry View Service
  • Registry
  • Third party registries
  • Third party services
  • Third party annotation (RDF)
  • Views over federated registries
  • UDDI interfaces extended with RDF
  • Federated views
  • Updated via Notification Service
  • Personalized based on Annotation
  • Authorisation and IPR

5
Semantic discovery
  • User chooses services
  • A common ontology is used to annotate and query
    any myGrid object including services.
  • Discover workflows and services described in the
    registry via Taverna.
  • Look for all workflows that accept an input of
    semantic type nucleotide sequence
  • Aim to have semantic discovery over public view
    on the Web.

6
Workflow and service annotation
  • Adding structured metadata to a workflow
    registration to enable others to discover and
    reuse it more effectively. E.g. what semantic
    type of input does it accept.

7
Can you guess what it is yet?
8
Service Registration
http//pedro.man.ac.uk
9
Semantic Discovery
  • Drag a workflow entry into the explorer pane and
    the workflow loads.
  • Drag a service/ workflow to the scavenger window
    for inclusion into the workflow

10
myGrid and Semantics
  • Workflow and service discovery
  • Prior to and during enactment
  • Semantic registration
  • Workflow assembly
  • Semantic service typing of inputs and outputs
  • Provenance of workflows and other entities
  • Experimental metadata glue
  • Use of RDF, RDFS, DAMLOIL/OWL
  • Instance store, ontology server, reasoner
  • Materialised vs at point of delivery reasoning.
  • myGrid Information Model

11
Annotation
Service Providers
Ontologists
Others
Ontology Store
Description extraction
WSDL
Interface Description
Vocabulary
Soap- lab
Pedro Annotation tool
Annotation providers
Annotation/ description
Taverna Workbench
Registry (Personalised View)
Registry
Registry plug-in
Registry
12
Annotation
Ontologists
Ontology Store
Vocabulary
Haystack Provenance Browser
Pedro Annotation tool
Annotation providers
Annotation/ description
Scientists
Taverna Workbench
mIR
Store plug-in
13
Service Providers
Ontology Store
Ontologists
Others
Vocabulary
WSDL
Feta Semantic Discovery
Soap- lab
Bioinformaticians
Registry
Taverna Workbench
Registry (Personalised View)
Registry
Registry
Workflow Execution
FreeFluo WfEE
invoking
mIR
Store data metadata
14
Layered Semantics
  • Domain Semantics layered on top of domain neutral
    but scientific data model
  • Reducing the activation energy, lowering barriers
    of entry.

Domain Semantics
Ontologies
Data Metadata
Workflow metadata
IMv2
Experiment Semantics
Format XSD types MIME types
Service Metadata
Provenance metadata
Syntax
Workflow OGSA-DQP
15
Model of services
Operation name, description task method resource
application
Service name description authororganisation
Parameter name, description semantic
type format transport type collection
type collection format
hasInput
hasOutput
subclass
subclass
WSDL based Web service
WSDL based operation
Soaplab service
bioMoby service
workflow
Local Java code
16
Tiered specifications
Classes of services Domain semantic Unexecutabl
e Potentials
Instances of services Business operational Exec
utable Actuals
17
Matrix of metadata in workflow lifecycle
18
Stratified metadata
  • Service Type and Class (OWL)
  • Service Instance (RDF)

19
Service and Workflow registration
  • Description scheme
  • RDFS DAMLOIL / OWL ontologies of services
    biology
  • Based on DAML-S
  • Reasoning over OWL descriptions
  • Query over RDF
  • Aim to have semantic discovery over public view
    on the web.

Workflow registration allows peer review and
publication of e-Science methods.
20
Service Ontology Suite
parameters input, output, precondition,
effect performs_task uses-resource is_function_of
Upper level ontology
Inspired by DAML-S
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Task ontology
Bioinformatics ontology
Web serviceontology
Current work Joint development on an Open
Biological Ontologies BioService Ontology.
http//obo.sourceforge.net/
21
Reflections
  • Adverts for services and workflows turns out to
    be tricky
  • Describing different executable objects
  • Workflows and Services
  • Stratification of metadata
  • Classes and Instances of services and workflows
  • Service execution
  • Complex state based invocation models
  • Parametric polymorphism of services
  • Executable process models vs discovery process
    models
  • Multi-dimensions of service composition.

22
Human vs machine views
Human
Machine
Service User
Service provider
Weak semantic descriptions Rewriting views
UDDI style advertisements
Human
Syntactic descriptions Interface
descriptions Invocation descriptions Semantic
mining
Elaborate Semantic descriptions Simplification
views
Machine
23
Reflections
  • Multiple descriptions, multiple interfaces
  • Users needs vs machine needs
  • The dimensions of Service Class substitution
  • Biologists choose experimentally meaningful
    services and do not want semantically similar
    substitutions only substituting one instance for
    another
  • Experimentally neutral glue services that can
    be substituted are comparatively few
  • If users are choosing services you dont need
    many kinds of metadata to eliminate 90 of
    options.

24
Reuse and Repurposing
  • Describing for reuse is challenging
  • Reuse depends on semantic descriptions and these
    are costly to produce
  • Describing for someone elses benefit
  • Reuse by multiple stakeholders
  • Licensing workflows for reuse.
  • Authorisation models
  • But reuse does happen!
  • Metadata pays off but it needs a network effect
    and there is a cost.

25
So far, Using Concepts
  • Controlled vocabulary for advertisements for
    workflows and services
  • Indexes into registries and mIR
  • Semantic discovery of services and workflows
  • Semantic discovery of repository entries
  • Type management for composition
  • Semantic workflow construction guidance and
    validation
  • Navigation paths between data and knowledge
    holdings
  • Semantic glue between repository entries
  • Semantic annotation and linking of workflow
    provenance logs

26
Part 4 Semantics and MetadataSemantic
publication and discoveryProvenance
metadataSemantic Web and the Grid
27
Provenance
  • Experiments being performed repeatedly, at
    different site, different time, by different
    users or groups

A large repository of records about experiments!!
  • verification of data
  • recipes for experiment designs
  • explanation for the impact of changes
  • ownership
  • performance of services
  • data quality

Scientists
In silico experiments
28
Provenance Web
29
Representing links
urnlsidtaverna.sf.netdatathing45fg6
urnlsidtaverna.sf.netdatathing23ty3
  • Identify each resource
  • Life science identifier URI with associated data
    and metadata retrieval protocols.
  • Understanding that underlying data will not change

30
Representing links II
http//www.mygrid.org.uk/ontologyderived_from
urnlsidtaverna.sf.netdatathing45fg6
urnlsidtaverna.sf.netdatathing23ty3
  • Identify link type
  • Again use URI
  • Allows us to use RDF infrastructure
  • Repositories
  • Ontologies

31
Link v Data Representation
  • Data management questions refer to relationships
    rather than internal content
  • What are the origins of this data?
  • Which service produced this data?
  • Which data is this derived from?
  • Who was this data produced for?
  • ?What is this data telling me?
  • Data analysis questions delegated to external
    services.

32
Provenance Pyramid
Process Level
33
Organisation level provenance
Process level provenance
Service
Project
runBye.g. BLAST _at_ NCBI
Experiment design
Process
Workflow design
componentProcesse.g. web service invocation of
BLAST _at_ NCBI
Event
partOf
instanceOf
componentEvente.g. completion of a web service
invocation at 12.04pm
Workflow run
Data/ knowledge level provenance
knowledge statementse.g. similar protein
sequence to
run for
User can add templates to each workflow process
to determine links between data items.
Data item
Person
Organisation
Data item
Data item
data derivation e.g. output data derived from
input data
34
Provenance tracking
  • Automated generation of this web of links
  • Workflow enactor generates
  • LSIDs
  • Data derivation links
  • Knowledge links
  • Process links
  • Organisation links

Relationship BLAST report has with other items in
the repository
Other classes of information related to BLAST
report
35
Haystack (IBM/MIT)
GenBank record
Portion of the Web of provenance
Managing collection of sequences for review
36
(No Transcript)
37
Reflections
  • Visualisation of results usually domain specific
  • Provenance browsing and querying needs to fit
    with that visualisation
  • Generic graphical presentation limited to small,
    low complexity result sets
  • Layered provenance for different purposes and
    different stakeholders
  • Detailed process for debugging and usage
    statistics for QoS
  • Data and Knowledge for the Scientist
  • Migration with data objects
  • Versioning
  • Using provenance to its maximum potential

38
Map of Context
Literature relevant to provenance study or data
in this workflow
Provenance record of a workflow run
Interlinking graph of the workflow that generates
the provenance logs
Web page of people who has related interests as
the owner of the workflow
Experiment Notes
39
Provenance metadata
  • Outside objects
  • RDF store
  • Within objects
  • LSID metadata.

40
Linked Provenance Resources
The subsumed concepts
Link to the log annotated with more general
concept
The subsuming concepts
Link to the log annotated with more specific
concept
41
Generating Links
The concept
The generated Link to related provenance
document
The name of the data
42
Semantics
Ontology-aided workflow construction
  • RDF-based service and data registries
  • RDF-based metadata for ALL experimental
    components
  • RDF-based provenance graphs
  • OWL based controlled vocabularies for database
    content
  • OWL based integration of experiment entities

RDF-based semantic mark up of results, logs,
notes, data entries
43
Standards
  • By tapping into (defacto) standards (LSID, RDF,
    WS-I) and communities we can leverage others
    results and tools
  • Haystack, Pedro, Jena, CHEF/Sakai.
  • The Grid standards are confusing and volatile
  • The choice of vanilla Web Services was good.
  • We didnt jump to OGSI. We wont jump to WSRF
    until its necessary.
  • And workflow standards have been untimely.

44
Role of Ontologies
Service matching and provisioning
Composing and validating workflows and service
compositions negotiations
Service resource registration discovery
Help
Knowledge-based guidance and recommendation
Schema mediation
45
Part 4 Semantics and MetadataSemantic
publication and discoveryProvenance
metadataSemantic Web and the Grid
46
A pioneer of the
The Semantic Grid is an extension of the current
Grid in which information and services are given
well-defined and explicitly represented meaning,
better enabling computers and peopleto work in
cooperation
Semantics in and on the Grid
47
The semantics of knowledge
  • Semantic Grids
  • Grids and Grid middleware that makes use of
    semantics for its installation, deployment,
    running etc.
  • I.e. Semantics IN the Grid FOR the Grid.
  • Knowledge Grids
  • A virtual knowledge base derived by using the
    Grid resources, in the same spirit as a data grid
    is a virtual data resource and a compute grid a
    virtual computer. Knowledge Grids include
    services for knowledge mining.
  • I.e Semantics ON the Grid arising from the USE of
    the Grid.

48
Knowledge Stakeholders
Knowledge for the Grid Application
Semantics for the Grid
Sources of Knowledge
49
  • The Semantic Web is an extension of the current
    Web in which information is given well-defined
    meaning, better enabling computers and people to
    work in cooperation. It is based on the idea of
    having data on the Web defined and linked such
    that it can be used for more effective discovery,
    automation, integration, and reuse across various
    applications.
  • Hendler, J., Berners-Lee, T., and Miller, E.
  • Integrating Applications on the Semantic Web,
    2002, http//www.w3.org/2002/07/swint.

50
Big Vision
  • The Web today is
  • A hypermedia digital library
  • Collection of linked web pages
  • Ubiquitous interface to applications
  • Amazon.com
  • A platform for multimedia
  • BBC Radio 4 in my room!
  • A naming scheme
  • Unique identity for resources

A place where people do the work, filtering,
linking and interpreting. Computers do the
presentation.
Why not make the computers do the work?
From machine readable resources for humans to
computable resources for machines
51
  • Expose the meaning of resources by assertions in
    a common data model
  • Publish and share consensually agreed ontologies
    so we can share the metadata and add in
    background knowledge
  • Then we can query, filter, integrate and
    aggregate the metadata
  • and reason over it to infer more metadata using
    rules
  • and attribute trust to the metadata.

hasvenue
http//www.marriott.com/epp/...
http//www.amia.org/meetings/...
haslocation
organisedby
event
conference
hotel
period
haslocation
Washington
http//www.amia.org/about/
dates
city
locatedin
locatedin
USA
country
52
Infrastructure enablers for e-Research
Grid Computing
Semantic Web
  • On demand transparently constructed
    multi-organisational federations of distributed
    services
  • Distributed computing middleware
  • Computational Integration
  • Sharing Resources
  • An automatically processable, machine
    understandable web
  • Distributed knowledge and information management
  • Information integration
  • Sharing information

53
Semantic Web layers
Trust
?p -gt ?a pa
?p -gt ?a pa
Rules
?p -gt ?a pa
?p -gt ?a pa
?p -gt ?a pa
Agents
Ontologies
Metadata Annotation
Search engines and filters
Web
Applications
Deep web
54
Semantic Grid layers
Trust
?p -gt ?a pa
?p -gt ?a pa
Rules
?p -gt ?a pa
?p -gt ?a pa
?p -gt ?a pa
Agents
Ontologies
Metadata Annotation
Search engines and filters
Grid Services
Applications
Grid State
55
Languages
56
RDF in a nutshell
  • Resource Description Framework
  • W3C candidate recommendation (http//www.w3.org/RD
    F)
  • Graphical formalism ( XML syntax semantics)
  • for representing metadata
  • for describing the semantics of information in a
    machine- accessible way
  • RDFS extends RDF with schema vocabulary, e.g.
  • Class, Property
  • type, subClassOf, subPropertyOf
  • range, domain
  • Statements are ltsubject, predicate, objectgt
    triples
  • ltIan,hasColleague,Uligt
  • Statements describe properties of resources
  • A resource is any object that can be pointed to
    by a URI
  • Properties themselves are also resources (URIs)

57
RDF in a nutshell
  • Common model for metadata
  • A graph of triples
  • Query over and link together
  • RDQL, repositories, integration tools,
    presentation tools
  • Jena, Haystack

http//www.w3.org/RDF/
58
Connected by concepts
Tim Berners-Lee, 2003
http//www.w3.org/2003/Talks/0521-www-keynote-tbl/
slide22-0.html
59
W3C Web Ontology language OWL
  • The Ontology Language de jour
  • Continuum of expressivity
  • Concepts, roles, individuals, axioms
  • From simple frames to description logics
  • Sound and complete formal semantics
  • Supports reasoning to infer classification
  • Based on the SHIQ description logic
  • Eas(ier) to extend and evolve and merge
    ontologies
  • Known in the Bioinformatics world e.g. OBO
  • Layered on top of RDF
  • Tools, tools, tools.

http//www.w3.org/TR/2004/REC-owl-features-2004021
0/
60
Coupling Semantic Web and e-Science/Grid
  • Expose the meaning of Grid services, resources
    and entities by assertions in a common data model
    RDF
  • Publish and share consensually agreed ontologies
    so we can share the metadata and add in
    background knowledge RDF(S), OWL
  • Then we can query, filter, integrate and
    aggregate the metadata RDQL
  • and reason over it to infer more metadata using
    rules DL Reasoning, SWRL
  • and attribute trust to the metadata.

61
(No Transcript)
62
Publications
  • P Lord, C Wroe, R Stevens, CA Goble, S Miles, L
    Moreau, K Decker, T Payne, J Papay, Semantic and
    Personalised Service Discovery in Proceedings
    IEEE/WIC International Conference on Web
    Intelligence / Intelligent Agent Technology
    Workshop on "Knowledge Grid and Grid
    Intelligence" October 13, 2003, Halifax, Canada.
  • J Zhao, CA Goble, M Greenwood, C Wroe, R Stevens
    Annotating, linking and browsing provenance logs
    for e-Science in 1st Semantic Web Conference
    (ISWC2003) Workshop on Retrieval of Scientific
    Data, Florida, USA, October 2003
  • C Wroe, R.D. Stevens, CA Goble, A Roberts, M
    Greenwood A suite of DAMLOIL ontologies to
    describe bioinformatics web services and data.
    International Journal of Cooperative Information
    Systems. Special issue on Bioinformatics and
    Biological Data Management   12(2)197-224, 2003.
  • C Wroe, CA Goble, M Greenwood, P Lord, S Miles, L
    Moreau, J Papay, T Payne Experiment automation
    using semantic data on a bioinformatics Grid,
    IEEE Intelligent Systems, Jan/Feb 2004
  • J Zhao, C Wroe, CA Goble, R Stevens, D Quan, M
    Greenwood, Using Semantic Web Technologies for
    Representing e-Science Provenance in Proc 3rd
    International Semantic Web Conference ISWC2004,
    Hiroshima, Japan, 9-11 Nov 2004.
  • C Wroe, P Lord, S Miles, J Papay, L Moreau, C
    Goble Recycling Services and Workflows through
    Discovery and Reuse to appear in Proceedings UK
    e-Science All Hands Meeting Nottingham, UK, 1-3
    September, 2004.
  • P Lord, S Bechhofer, M Wilkinson, G Schiltz, D
    Gessler, C Goble, L Stein, D Hull. Applying
    semantic web services to bioinformatics
    Experiences gained, lessons learnt. in Proc 3rd
    International Semantic Web Conference ISWC2004,
    Hiroshima, Japan, 9-11 Nov 2004
  • M. Szomszor and L. Moreau Recording and Reasoning
    Over Data Provenance in Web and Grid Services in
    International Conference on Ontologies, Databases
    and Applications of Semantics (ODBASE'03), volume
    2888 of Lecture Notes in Computer Science, pages
    603-620, Catania, Sicily, Italy, 3-7 November 2003
Write a Comment
User Comments (0)
About PowerShow.com