Nobody said it was easy: Semantically Discovering BioGrid Services is tricky - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Nobody said it was easy: Semantically Discovering BioGrid Services is tricky

Description:

Provenance, change notification and personalisation ... Provenance logs. Service registration. Workflow deposition. Metadata annotation ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 50
Provided by: caro256
Category:

less

Transcript and Presenter's Notes

Title: Nobody said it was easy: Semantically Discovering BioGrid Services is tricky


1
Nobody said it was easySemantically
Discovering BioGrid Services is tricky
  • Professor Carole Goble
  • University of Manchester, UK
  • myGrid project
    http//www.mygrid.org.uk

2
  • Environmental requirements of bioinformatics in
    silico experimentation
  • The services
  • Workflow execution
  • And the impact on describing services for how you
    description stuff, what to describe and how and
    when to use the descriptions
  • different levels of descriptions
  • different views on services
  • depending on whether you are middleware or a user
  • implications for registration

3
Road map
  • Why are we describing bio-services
  • myGrid project requirements and architecture
  • A little tiny wenny contextualising demo
  • The user perspective and the implementation
    perspective.
  • Thoughts, lessons and design decisions
  • Describing different executable objects
  • Workflows and Services
  • Stratification of metadata
  • Classes and Instances
  • Service execution
  • State based invocation models
  • Parametric polymorphism of services
  • Multiple descriptions, multiple interfaces

4
The Grid Problem
(Foster, Kesselman, Tueke)
  • flexible, secure, coordinated resource sharing
    among dynamic collections of individuals,
    institutions, and resources - what we refer to as
    virtual organizations."

a low level framework to allow inter-operation of
resources.  mainly for the benefit of
application developers deploy standard tasks on
the Grid in a straightforward manner
5
Open Grid Services Architecture
  • Present Grid Architecture is a services
    architecture
  • Implemented using Web Services Technology
  • OGSA will provide
  • Naming /Authorization / Security / Privacy
  • Higher level services Workflow, Transactions,
    Data Mining,Knowledge Discovery,
  • Exploiting Synergy Commercial Internet with Grid
    Services
  • OGSI extends Web Services
  • Transient Service Instances
  • Service State
  • Lifetime management
  • Defines fundamental (WSDL) interfaces and
    behaviors that define a Grid Service
  • Required optional interfaces WS profile
  • Defines WSDL extensibility elements
  • E.g., serviceType (a group of portTypes)

6
myGrid
  • EPSRC UK e-Science pilot project
  • Open Source Upper Middleware for Bioinformatics
  • Data intensive not compute intensive
  • Sharing knowledge and sharing components

7
myGrid in a nutshell
  • An example of a second generation open
    service-based Grid project, specifically a test
    bed for the OGSI, OGSA and OGSA-DAI base
    services
  • myGrid Information Repository that is OGSA-DAI
    compliant
  • Developing high level services for data intensive
    integration, rather than computationally
    intensive problems
  • Workflow distributed query processing
  • Developing high level services for e-Science
    experimental management
  • Provenance, change notification and
    personalisation
  • Developing Semantic Grid capabilities and
    knowledge-based technologies, such as
    semantic-based resource discovery and matching.
  • Metadata descriptions and ontologies for service
    discovery, component discovery and linking
    components.

8
(No Transcript)
9
Experiment life cycle
  • Service discovery
  • Workflow discovery refinement
  • Workflow creation
  • Personalised service registries
  • Personalised workflows

Forming experiments
Personalisation
Discovering and reusing experiments and resources
Executing experiments
  • Service discovery
  • Workflow discovery refinement
  • Provenance logs
  • Workflow enactment
  • Service invocation
  • Provenance logs

Providing services experiments
Managing experiments
  • Service registration
  • Workflow deposition
  • Metadata annotation
  • Third party registration
  • Provenance records
  • Workflow evolution
  • Service monitoring

10
Provenance
  • Experiment is repeatable, if not reproducible,
    and explained by provenance records
  • Who, what, where, why, when, (w)how?
  • The tracability of knowledge as it is evolves and
    as it is derived.
  • Implications for recording which services invoked
    on what data when with what parameters.
  • Immutatable and persistent

11
Architectural Overview
Knowledge Services
Knowledge Service
Semantic registration
Registry
Ontology Server
Registry
Reasoner
Structural registration
UDDI
Matcher
Service
KB Store
Registry View
Notification Service
Notification Service
RDF-based UDDI
Service Discovery
JMS
Provenance service
Workflow enactment engine
Discover Workflow or Service
mIR
Test Data
Scufl WSFL
mG Object Discovery
Information Extraction
Distributed Query Processor
Job Execution
m Info Repository
Workflow templates
Workflow instances
PESTO
Service
Service
Service
Metadata
Concepts
Data
Provenance
SoapLab
DB2
DB2
12
Workflows
  • Workflow discovery
  • Finding workflows that others have done, and that
    I have done myself
  • Workflow specification
  • Finding classes of services
  • Guiding service composition
  • We dont do automated composition
  • Dynamic workflow enactment service discovery and
    invocation
  • Choose services instances when running workflow
  • User involvement

13
myGrid Find Service
Discovery Client Find Service
Word-based discovery
Semantic discovery
Syntactic discovery
Views
Ontology Server
UDDI-M
Views
Reasoner
Third party description
RDF
Service
FaCT
publishes
Matcher
Description Store
Gather service descriptions
publishes
Org. registry
KAON
Public registry
WSDL
UDDI
Third Party
14
myGrid Components Demo
  • portal operation.
  • semantics to define type system.
  • mIR, to store, and retrieve data.
  • registry to describe and record services

Uncharacterised DNA sequence
Select an open reading frame
Translate to protein
BLAST search
Characterised DNA sequence
15
myGrid Components Demo
  • Pre-existing third party application
  • Service invocation
  • Workflow enactment

DNA sequence
getOrf
transeq
prophet
plotorf
Proteins from a family
emma
prophecy
Classical bioinformatics detecting whether an
uncharacterised protein domain is conserved
across a group of proteins
16
Bio Services Landscape
  • Wrap CORBA, Perl etc to look like web services,
    to become Grid services (eventually)
  • Multiple services
  • Many hundreds of different services in the public
    domain and privately owned
  • Multiple registries
  • 3rd party public registries, private registries,
    personal registries
  • 3rd parties
  • JEMBOSS, PathPort, bioMoby
  • Wrap our own
  • Soaplab
  • A soap-based programmatic interface to
    command-line applications
  • 300 different classes of services
  • Swiss-Prot, EMBOSS, Medline, blah, blah
  • http//industry.ebi.ac.uk/soap/soaplab

17
Bio Services Problem Space
  • Multiple service providers of same service (not
    just similar service)
  • Many implementations of Swiss-Prot version 40
  • What and which Discovery based on
  • What the services does from a domain perspective.
  • Which service instance has the appropriate
    capabilities from an operational perspective.
  • Users dont care if the service is a service or a
    workflow.
  • Same what description from their perspective
  • Different how description from middleware
    perspective.

SWISS-PROT
SWISS-PROT_at_local
SWISS-PROT_at_ncbi
SWISS-PROT_at_ebi
18
Consequences
  • We support (at least) two types of semantic
    service discovery
  • Domain
  • requiring access to common application domain
    ontologies
  • Biology and bioinformatics
  • Service
  • using cross-domain knowledge independent of
    application
  • Quality of service, ownership, location,
    organisations
  • We describe the profile of workflows as if they
    were services (of course a workflow could be
    deployed as a service)
  • Should workflow descriptions be in the same
    registry as service descriptions, or elsewhere?
  • A find service must transcend the location.

19
Tiers of service description
Select an open reading frame
Characterised DNA sequence
Uncharacterised DNA sequence
Sequence alignment
Translate to protein
Characterised DNA sequence
EMBOSS TransSeq
EMBOSS GetORF
BLAST-p
CATTACCC
Characterised DNA sequence
EMBOSS TransSeq_at_httped.ac.uk
EMBOSS GetORF _at_httpimg.cs.man.ac.uk
BLASTp _at_ncbi.nih.gov
CATTACCC
20
Summary Tiered levels of descriptions
Abstract Service
Sequence alignment
Classes of services Domain semantic Unexecutable
Potentials
Ontology
Specific Service
Blastn
Ontology
Service Instance
Blastn_at_EBI
Instances of services Business operational Execu
table Actuals
Ontology
Data model
Invoked Service
Blastn_at_EBI invoked proxy
Service Data Element
21
What are you discovering? Classes Users
Workflow specifications
Discovery
Classes of Service
  • Finding a service that will fulful some task e.g.
    aligning of biological sequences.
  • What services perform a specific kind of task,
    for example, what services can I used to perform
    a biological sequence similarity search?
  • Finding a service that will accept or produce
    some kind of data.
  • What services produce this kind of data, for
    example, from where can I find sequence data for
    a protein?
  • What services consume this kind of data, for
    example, if I have protein sequence data, what
    can I do with it?
  • Class of service
  • a protein sequence alignment, a protein sequence
    database.
  • Specific example of an abstract service
  • BLAST, BLASTn, SWISS-PROT,
  • Applies to class of services and workflow
    specifications

22
Originally Based on DAML-S
  • US DARPA Agent Markup Language Services
    http//www.daml.org
  • An Upper Ontology for Services

23
Suite
Specialises. All concepts are subclassed from
those in the more general ontology.
Contributes concepts to form definitions.
Upper level ontology
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Task ontology
parameters input, output, precondition,
effect performs_task uses-resource is_function_of
Bioinformatics ontology
Web serviceontology
24
(No Transcript)
25
Pedro interface to Service Discovery
26
Classification and matchmaking of services
  • Classification of services/workflows
  • Imprecise (best effort) substitutions of
    services/workflows
  • Service/workflow organisation indexing,
  • Service/workflow matchmaking substitution
  • BLAST finds tblastx, tblastn, psi-blast,
    marks_super_blast.
  • Alignment finds ClustalW, Blast,
    Smith-Waterman, Needleman-Wunsch
  • Expanded selection of services based on expansion
    of in-hand object
  • A vocabulary for expressing service descriptions
    without pre-determining every description
  • A reasoning process to manage
  • coherency of the classifications and the
    descriptions when they are created,
  • the service discovery, matching and composition
    when they are deployed.
  • Ontologies in DAMLOIL/OWL based on the DAML-S
    ontology

27
What are you discovering? Instances Machines
Workflow specifications
Discovery
Classes of Service
registry
Instantiate
Select instances
28
Discovering services based on their operational
properties
  • What resources does a specific organisation
    provide?
  • Who authored this resource?
  • What services offering x currently give the best
    quality of service?
  • Which service would the local bioinformatics
    expert suggest we use?
  • Data quality, quality of service, cost,
    geographical location, authorisation, provenance
    of data and so on.
  • Third party metadata
  • Instance service description of a specific
    service
  • BLAST, SWISS-PROT as offered by the EBI is 80
    reliable.
  • Invoked instance service description
  • BLAST as offered by the EBI on a particular date,
    with particular parameters when a service
    invoked.

Applies to instances of services and workflows
29
RDF based UDDI metadata for service instances
30
User engagement
Workflow specifications
Discovery
Classes of Service
registry
Instantiate
Select instances
Support for the user to find a service that
fulfils their task. ontology should be fairly
simple couched in concepts the user is familiar
with e.g. protein sequence. analogous to DAML-S
profile
31
EMBOSS seqret
  • Function that reads and writes (returns)
    sequences
  • But its so much more than that!
  • EMBOSS programs can take a wide range of
    qualifiers that slightly change the behaviour of
    the program when reading or writing a sequence
  • seqret can read a sequence or many sequences from
    databases, files, files of sequence names, the
    command-line or the output of other programs and
    then can write them to files, the screen or pass
    them to other programs.
  • Because it can read in a sequence from a database
    and write it to a file, its a program for
    extracting sequences from databases
  • Because it can write the sequence to the screen,
    seqret is a program for displaying sequences.

32
And more.
  • seqret can read sequences in any of a wide range
    of standard sequence formats. You can specify the
    input and output formats being used. If you don't
    specify the input format, it will try a set of
    possible formats until it reads it in
    successfully. Because you can specify the output
    sequence format, its a program to reformat a
    sequence. seqret can read in the reverse
    complement of a nucleic acid sequence. So its a
    program for producing the reverse complement of a
    sequence. seqret can read in a sequence whose
    begin and end positions you have specified and
    write out that fragment. So its a utility for
    doing simple extraction of a region of a
    sequence. seqret can change the case of the
    sequence being read in to upper or to lower case.
    So its a simple sequence beautification utility.
    seqret can do any combination of the above
    functions. ......

33
EMBOSS
  • EMBOSS sequence alignment service matcher simple
    way to describe the task it fulfils ismatcher
    has_input sequence        performs_task aligning
  • some verb acting on some object to produce a
    result and it fits most descriptions.
  • Quickly get more complicated.
  • EMBOSS degap removes gap characters from a
    sequence.
  • Where should the gap character concept be
    included? It is neither an input or an output.

34
  • Several properties added over the DAML-S profile
    for bioinformatics
  • e.g. uses_resource and uses_application.
  • These could be simplified away either just as one
    additional property or a precondition as used
    DAML-S.
  • More obtuse to the user.
  • Makes the model more complex or redundant for the
    benefit of the user.
  • Reduces inter operability with service
    descriptions in other domains.
  • Perhaps this redundancy should be encoded within
    the applications delivering the ontology and a
    more complex precondition description used under
    the hood?

35
EMBOSS matcher
  • protein sequence is an ambiguous term and relies
    on implicit information held in the head of the
    bioinformatician.
  • to reason over or organise concepts we need a
    more precise definition
  • data structure conforming to some schema that
    encodes the sequence of amino acid in a protein
    molecule.
  • We can now start to infer the relationship
    between protein sequences and nucleotide
    sequences.
  • But a user cannot be expected to interact with
    such a complex model.

36
Outcome Views
  • Multiple descriptions over same services
    workflows held in registries
  • Third party descriptions Subsets of services
  • publication of descriptions must be supported
    both for the author of the service and third
    parties
  • third party annotations are a view of a service
    and discovery should offer a variety of views
    based upon third party annotations
  • there is a need for control over who make add and
    alter third party annotations
  • Generic services supporting a wide variety of
    multiple tasks
  • Middleware must have some way of going beyond a
    generic description and stating given these
    inputs what are the outputs going to be.
  • Rather than author very complex description that
    cater for all possibilities, it is better to
    author many simpler descriptions for each case.
  • It may in fact be necessary to ask the service
    itself for specific answers, such as given these
    inputs what would you perform?

37
myGrid Find Service
Discovery Client Find Service
Word-based discovery
Semantic discovery
Syntactic discovery
Views
Ontology Server
UDDI-M
Views
Reasoner
Third party description
RDF
Service
FaCT
publishes
Matcher
Description Store
Gather service descriptions
publishes
Org. registry
KAON
Public registry
WSDL
UDDI
Third Party
38
Bio Services Problem Space
  • Wrap CORBA, Perl etc to look like web services,
    to become Grid services (eventually)
  • Dialogue oriented (e.g. Soaplab) and function
    oriented (e.g. bioMOBY)
  • Often highly parameterised
  • Mixture of synchronous and asynchronous
  • Simulations and feedback loops
  • Streaming large scale data
  • Mixture of binary and text

39
EMBOSS
  • Suite of 200 command line programs, which uses a
    command definition language AJAX
  • How do we present these services?
  • As 200 different services, one for each EMBOSS
    program, with a single method, with as many
    parameters as the EMBOSS program requires.
  • As 200 different services, one for each EMBOSS
    program, with a number of overloaded methods
    where the program takes optional parameters.
  • As a single service with 200 different methods,
    one for each EMBOSS program.
  • As a single, highly parametric service, with a
    single method, called invoke, the first
    parameter of which names the EMBOSS program to
    run.

40
Workflow specifications
Discovery
Classes of Service
Instantiate
Select instances
Execution
Invoked instance
Workflow enactment
41
Invocation
Workflow specifications
Discovery
Classes of Service
Registry
Discovery Instantiate
Select instances
Registry?
Execution
Invoked instance
Workflow enactment
Monitor
Terminate
42
Phases
Support for middleware to perform tasks such as
substitution, data transformation between
services, automatic invocation of services where
the invocation model is not simple. a complex
model to explicitly describe every implementation
detail of the service or a binding to it.
analogous to DAML-S process model and grounding.
Workflow specifications
Discovery
Classes of Service
Discovery Instantiate
Select instances
Execution
Invoked instance
Workflow enactment
Monitor
Terminate
43
Invocation models
  • bioMoby forces services to have a single
    operation that completely encompasses the single
    task the service supports.
  • Each task may be in turn supported by a single
    operation
  • Soaplab there is no one to one mapping between a
    single task and a single operation.
  • Can repurpose a service to be presented multiple
    times a different wrapper for every view
  • Proliferation of views
  • Makes discovery easier
  • Reasoning that its the same service as one
    running

44
Soaplab version of matcher alignment_localmatch
erderived (wsdl)
  • createEmptyJob
  • get_detailed_status
  • get_report
  • get_outfile
  • set_gappenalty
  • set_sbegin1
  • set_sbegin2
  • set_send1
  • set_send2
  • set_sformat1
  • set_sformat2
  • set_slower1
  • set_slower2
  • set_snucleotide1
  • set_snucleotide2
  • set_sprotein1
  • set_sprotein2
  • set_sreverse1
  • set_sreverse2
  • set_sequenceb_usa
  • set_gaplength
  • set_alternatives
  • run
  • destroy
  • getStatus
  • describe
  • getInputSpec
  • getResultSpec
  • getAnalysisType
  • createJob
  • runNotifiable
  • createAndRun
  • createAndRunNotifiable
  • waitFor
  • runAndWaitFor
  • getResults
  • terminate
  • getLastEvent

45
Coordinating EMBOSS through Soaplab - WSFL
  • for each task
  • createJob(inputsMap)
  • run(...)
  • waitFor(...)
  • getResults(...)
  • destroy(...)

Workflow Engine
WSFL
46
Coordinating EMBOSS through Soaplab - Scufl
  • for each task
  • run(operation, inputs)

Soaplab plugin
Workflow Engine
Scufl
47
Does the user ever see this?
  • If the user never has to deal with the invocation
    model
  • The DAML-S approach of splitting the information
    between two descriptions seems plausible.
  • Once the user has used the simpler profile, the
    middleware gets to work on the more complex
    process model and binding, or a myGrid workflow
    to actually translate the task into concrete
    service operation calls.
  • If the user does want to know what is going to
    happen
  • A more unified model with views for user and
    middleware seems more appropriate.
  • The downside is the cost of implementing the
    infrastructure to deliver the views.

48
Summary Views
  • Two parallel but slightly redundant descriptions
    of the service
  • one for human discovery and one for middleware.
  • what DAML-S does.
  • OR
  • One common model which is complex and supports
    multiple tasks but have an extra layer that
    provides a view to support each specific task
  • intermediate representations, reasonables,
    perspectives, language generation.
  • The user sees the term protein sequence even
    though the underlying concept is far more
    explicit.
  • Transformed into the more complex pattern the
    user may be promoted for attributes associated
    with the parent concept data even though the
    user never explicitly stated this was a kind of
    data.
  • The view approach used in GALEN and GONG.
  • The DAML-S profile probably too complex to
    present to bioinformatics users.

49
Summary 2 human vs machine views
Human
Machine
Service User
Weak semantic descriptions Rewriting views
UDDI style advertisements
Human
Syntactic descriptions Semantic mining
Elaborate Semantic descriptions Simplication
views
Machine
Service provider
50
Discovery space
Classes and instances
Abstractions over a single description of a
service
Third party multiple viewpoints
People and machines
Multiple descriptions over a single service
Multiple tasks
51
AcknowledgementsLuc Moreau, Simon Miles,
Keith Decker, Terry Payne, Phil Lord, Chris Wroe,
Roberts Stevens, Kevin Garwoodhttp//www.mygrid.
org.uk/
Write a Comment
User Comments (0)
About PowerShow.com