myGrid: Upper level Grid Services for the Bioinformatican - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

myGrid: Upper level Grid Services for the Bioinformatican

Description:

Carp Gene expression analysis. TALISMAN. annotation workbench. Wor. kbench. Bio Services ... Workbench for gene expression in Carp & Graves disease. Developers ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 46
Provided by: myg5
Category:

less

Transcript and Presenter's Notes

Title: myGrid: Upper level Grid Services for the Bioinformatican


1
myGrid Upper level Grid Services for the
Bioinformatican
  • Prof. Carole Goble
  • http//www.mygrid.org.uk
  • Sun Microsystems BioGrid Symposium,
  • Baltimore, USA 4th-5th December 2002

2
UK eScience Programme
  • Grid-enabled eScience
  • Emphasis on information integration and knowledge
    management
  • The Virtual Organisation view
  • 180 million industrial contributions
  • Complete infrastructure of
  • regional eScience centres,
  • support and a UK computational Grid
  • Started on Globus though Unicore used in EuroGrid
    with great success
  • Centres donated equipment highly heterogeneous
  • Core component of the EU Grid FP6 programme

3
myGrid
  • EPSRC UK eScience pilot project
  • 01/01/02 - end 30/03/05
  • Uses the UK Grid infrastructure

IBM
Lion BioSciences, Millennium Pharmaceuticals
Oracle
4

myGrid
  • Not a computational grid project
  • Building Grid middleware
  • Higher level services workflow, databases,
    knowledge management, provenance
  • Service-based Open Grid Service Architecture
    early adopter
  • Bioinformatics services are published as Web
    services and Grid Services
  • Working with publicly available biological
    resources e.g. EMBL-EBI

5
What is the Grid?
  • Resource sharing coordinated problem solving in
    dynamic, multi-institutional virtual
    organizations
  • On-demand, ubiquitous access to computing, data,
    and all kinds of services
  • New capabilities constructed dynamically and
    transparently from distributed services
  • No central location, No central control, No
    existing trust relationships, Little
    predetermination
  • Uniformity, Pooling Virtualisation

6
What is the Grid?
E-Scientists Environment
  • In silico experiments
  • Information harvesting PSE
  • Dynamically forming virtual organisations to
    solve problems.
  • Describing, searching for and weaving resources
    people. applications, db, content, instruments
  • Orchestrating resources
  • Support for scientific method provenance,
    argumentation, opinion contextualisation etc
  • BioUtility communities of practice

Knowledge Grid
Information Grid
Data/Computation Grid
7
Information Weaving
  • Large amounts of different kinds of data many
    applications.
  • Highly heterogeneous.
  • Different types, algorithms, forms,
    implementations, communities, service providers
  • High autonomy.
  • Highly complex and inter-related, volatile.
  • Much of it textual narrative

8
Circadian Rhythms
  • Has anyone else studied the effect of
    neurotransmitters on the circadian rhythms in
    Drosophila?
  • Ive got a cluster of proteins from my
    experiment. How do their functions interrelate?
    And what are the proteins with a particular
    function?
  • Is a structure known for my protein? What other
    proteins have a similar structure?
  • Can I build a homology 3D model?
  • What is known about a homologous protein?

1
2
3
5
4
9
e-Science Q A
  • Who else has asked this question can I
    use/adapt their approach?
  • Workflow.
  • What were the results at each stage?
  • Dynamic Data Repositories.
  • When was P12345 last updated?
  • Which BLAST did I use?
  • Provenance.
  • Has PDB changed since I last ran this?
  • Notification.

1
2
3
5
4
Personalisation.
10
Courtesy of Mark Wilkinson (BioMOBY)
11
myGrid
  • Service based architecture
  • Publication, discovery, interoperation,
    composition, decommissioning of myGrid services
  • Resource Interoperation
  • Workflow coordination Database integration.
  • Experimental workflows rather than production
    workflows.
  • Experimentation
  • Provenance Change Propagation
  • Personalisation Collaborative working.
  • Security ownership
  • Knowledge based using metadata and ontologies

12
Web Portal
Wor kbench
Carp Gene expression analysis
TALISMAN annotation workbench
BioMedical Services Library DAS, workflow sets,
integrated databases
Upper level knowledge-based Grid Common
Services Semantic integration, knowledge based
querying, workflow composition, visualisation,
provenance mgt, semantic service discovery
Middle level Grid Common Services Database
access, distributed query processing, service
discovery, workflow enactment, event notification
Low level Grid Common Services (OGSI) Co-schedulin
g, data shipping, authentication, job execution,
resource monitoring, database access
13
Bio Services
from Rick Stevens, Argonne Labs
  • Drug Discovery
  • Microbial Engineering
  • Molecular Ecology
  • Oncology Research
  • Sequence Annotation

Domain Oriented Services
  • Integrated Databases
  • Sequence Analysis
  • Protein Interactions
  • Cell Simulation

Basic BioGrid Services
Grid Resource Services
  • Compute Services
  • Pipeline Services
  • Data Archive Service
  • Database Hosting
  • Workflow Enactment
  • Event notification

Common Services
Base Services
Fabric Services
14
Who is myGrid for?
myGrid users
IS specialists
biologists
systems administrators
tool builders
infrequent
problem specific
service provider
bioinformaticians
bioinformatics tool builders
15
myGrid Outcomes
  • e-Scientists
  • Environment built on toolkits for service access,
    personalisation community.
  • Talisman Interpro family of pattern databases
    annotation
  • UTOPIA visual multiple sequence alignment
  • Workbench for gene expression in Carp Graves
    disease
  • Developers
  • Protocols and service descriptions.
  • myGrid-in-a-Box developers kit of core services.
  • Reference implementation services applications.
  • Bio services.

16
Service based architecture
  • Each bio resource is a service
  • Database, archive, analysis, tool, person,
    instrument, a workflow
  • Each myGrid architectural component is a service
  • Workflow enactment engine, event notification,
    registry, scheduler
  • OGSA early adopter.

Open Grid Service Architecture
Web services
Grid protocols
17
Service Discovery
  • Find appropriate type of services
  • sequence alignment
  • Find appropriate instances of that service
  • BLAST (an algorithm for sequence alignment), as
    delivered by NCBI
  • Assist in forming an appropriate assembly of
    discovered services.
  • Find, select and execute instances of services
    while the workflow is being enacted.
  • Knowledge in the head of expert bioinformatian

18
Metadataontology
W3C RDF, DAMLOIL, OWL
  • Service registration, discovery, publication,
    composition, management.
  • Data types ontologies
  • Service matchmaking
  • Ontology editor, deployment server reasoner
  • Typing inputs and outputs of workflows
  • Semantic Database integration
  • Portal driving .

Semantic Web
OGSA
Web services
Grid protocols
19
1. User selects values from a drop down list to
create a property based description of their
required service. Values are constrained to
provide only sensible alternatives.
2. Once the user has entered a partial
description they submit it for matching. The
results are displayed below.
3. The user adds the operation to the growing
workflow.
4. The workflow specification is complete and
ready to match against those in the workflow
repository.
20
Why have ontologies for services?
  • A shared vocabulary for describing a service
  • that can evolve and say as little or as much as
    necessary.
  • Service classifications
  • Service discovery, organisation indexing
  • Service matching and substitution
  • BLAST Finds tblastx, tblastn, psi-blast, and
    marks_super_blast.
  • Alignment Finds ClustalW, Blast,
    Smith-Waterman, Needleman-Wunsch
  • Expanded selection of services presented based on
    expansion of in-hand object

21
Why have ontologies for services?
  • Controlling service composition
  • Outputs of service A semantically compatible with
    inputs of service B.
  • A service description is plausible.
  • Blastn compares a nucleotide query sequence
    against a nucleotide sequence database

22
Integration Coordination
  • View-based Information Repository for XML data
  • Database integration
  • Access XML and RDBMS with OGSA-DAI
  • Semantic database integration.
  • Distributed query processing.
  • Workflow
  • Dynamic workflow enactment engine.
  • Workflow repository
  • User interactivity.
  • Workflows linked with results

23
E-Science Support
  • Data provenance and resource change management
  • Workflow logs.
  • Event notification service.
  • Incremental view management.
  • Workflow and query evolution.
  • Personalisation
  • Management of views over repositories.
  • Personalisation of process flows.
  • Annotation of data sets and workflows
  • Dynamic creation of personal data sets.

24
Bio-Science services
  • Grid-enabled BioServices by the EMBL-European
    Bioinformatics Institute
  • EMBOSS, SRS, Open BQS, BLAST, XEmbl and
    EmblFetch, Flybase, Gadfly
  • Applications using Gateway API
  • TALISMAN (annotation tool used by Interpro)
  • UTOPIA (sequence fingerprint analysis)
  • Portal
  • Workbench application

25
How do the functions of a cluster of proteins
interrelate?
  • Some proteins in my personal repository

26
Find services that takes a protein and gives
their functions and pick the best match.
27
Find another that displays the proteins base on
their function. Ontology restricts inputs
outputs
28
Build a workflow of composed services linked
together
29
See if a workflow that is appropriate already
exists. It could have been made anyone who will
share with you.
30
Pick one and enact it.
31
While its running it picks the best service
instance that can run the service at that time.
32
While its running it picks the best service
instance that can run the service at that
time. Or you choose.
33
The workflow finishes with the final display
service
34
Results are put into your personal repository,
with a concept from the ontology to tell you and
myGrid what they mean.
35
And full provenance record kept, and linked with
the results. We could redo or reuse the workflow.
36
(No Transcript)
37
  • Programmable interface essential!

38
HPC vs Bioinformatics
  • Computational Biology vs Bioinformatics gt HPC vs
    Info Grid
  • Relationship between them? Shared components?
    Architectures?
  • Information management matters! Accelerating
    scientific process is not just accelerating
    compute intensive processes.
  • HPC style BioGrid
  • Provenance? Personalisation? Metadata?
    Interactivity? Knowledge? Intermediate results to
    db annotated logs

39
We are not alone
  • Other Efforts we are not alone
  • W3C semantic web, BioMOBY, I3C, OMG LSR, active
    ontology development in the community, DARPA,
  • Open Grid Service Architecture
  • We believe!! Links with Web Services give many
    benefits.
  • But its a moving target
  • GGF is a zoo over 40 RG and WG, often
    overlapping.

40
Service Providers
  • Its hard to get Service Providers buy-in
  • lower the barriers of entry
  • make it reliable.
  • security intellectual property management
  • programmatic interfaces
  • How do we migrate legacy applications?
  • Whole bunch of apps and databases on the web
  • Accounting matters
  • Who is going to pay for all this?

41
Hotch potch
  • Heterogeneity sucks
  • Multi-policy of everything security, access,
    accounting really matters in EU
  • Getting a UK Grid to work is non-trivial
  • Huge investment in system admin.
  • Doing more than you could do before.
  • Not just another predictable BLAST service over a
    bunch of machines
  • Non-predictable analysis.

42
Not a silver bullet!
  • Its just middleware not magic
  • Data quality
  • Content management of databases (controlled
    vocabularies)
  • Provenance and versioning policies
  • Appropriate use of tools
  • Computational inaccessibility of free text
    annotation
  • Database accessibility through means other than
    point and click web interfaces.
  • Independent of the Grid!

43
Life Sciences Grid (LSG)
http//people.cs.uchicago.edu/dangulo/LSG/
44
The sum up
  • If you ignore the multi-organisational aspect of
    Grid
  • If you ignore the heterogeneous aspect of Grid
  • If you assume its safe and free and fair
  • Then its not so hard.

45
The myGrid Team
  • Carole Goble
  • Norman Paton
  • Alvaro Fernandes
  • Stephen Pettifer
  • Luc Moreau
  • Dave De Roure
  • Chris Greenhalgh
  • Tom Rodden
  • John Brooke
  • Paul Watson
  • Alan Robinson
  • Rob Gaizauskas
  • Robert Stevens
  • Neil Wipat
  • Matthew Addis
  • Nick Sharman
  • Rich Cawley
  • Simon Harper
  • Karon Mee
  • Simon Miles
  • Vijay Dailani
  • Xiaojian Liu
  • Tom Oinn
  • Martin Senger
  • Milena Radenkovic
  • Kevin Glover
  • Angus Roberts
  • Chris Wroe
  • Mark Greenwood
  • Phil Lord
  • Neil Davis
  • Darren Marvin
  • Justin Ferris
  • Peter Li
  • Nedim Alpdemir
  • Luca Toldo
  • Robin McEntire
  • Anne Westcott
  • Tony Storey
  • Bernard Horan
  • Paul Smart
  • Robert Haynes

46
Spares
47
Applications
e-Scientist environment
Knowledge applications networks
Text mining
Annotation
Collaboratory
Prediction
Knowledge services
Knowledge Services
Knowledge-based information services
Semantic services
Knowledge-based data/computation services
Base services
Data/computation services
Information services
Resources
48
Sequence annotation
MSD
Cold Carp Gene Expression
Custom Application Demonstrator
Workbench Demonstrator
Application UTOPIA
Workbench
Apps Builder (Talisman)
User Agent
Presentation Services
Collaboration Support
Management Tools
Web Portal
Gateway API
Provenance
Personalisaion
Security
BioMedical Services Library e.g. Distributed
Annotation Service
Semantic Discovery
Information Extraction
Knowledge
Provenance Validation Assessment
Semantic aware services
Semantic Workflow Design
Ontology Service
Semantic Data Integration
Reasoner
Service matcher

Provenance metadata
Annotation
Base Services

Distributed Query
Workflow Enactment
QoS
Syntactic Discovery
Availability
Versioning
Preferences
Fabric
Event Notification

JobExecution
Third Party
MIR
White Pages Yellow Pages Discovery
Device Access
Database Access
Metadata
Database
myGrid Stack
49
Sequence annotation
MSD
Cold Carp Gene Expression
Custom Application Demonstrator
Workbench Demonstrator
Application UTOPIA
Workbench
Apps Builder (Talisman)
User Agent
Presentation Services
Collaboration Support
Management Tools
Web Portal
Gateway API
Provenance
Personalisaion
Security
BioMedical Services Library e.g. Distributed
Annotation Service
Semantic Discovery
Information Extraction
Knowledge
Provenance Validation Assessment
Semantic aware services
Semantic Workflow Design
Ontology Service
Semantic Data Integration
Reasoner
Service matcher

Provenance metadata
Annotation
Base Services

Distributed Query
Workflow Enactment
QoS
Syntactic Discovery
Availability
Versioning
Preferences
Fabric
Event Notification

JobExecution
Third Party
MIR
White Pages Yellow Pages Discovery
Device Access
Database Access
Metadata
Database
myGrid Stack 0.1
50
Sequence annotation
MSD
Cold Carp Gene Expression
Custom Application Demonstrator
Workbench Demonstrator
Application UTOPIA
Workbench
Apps Builder (Talisman)
User Agent
Presentation Services
Collaboration Support
Management Tools
Web Portal
Gateway API
Provenance
Personalisaion
Security
BioMedical Services Library e.g. Distributed
Annotation Service
Semantic Discovery
Information Extraction
Knowledge
Provenance Validation Assessment
Semantic aware services
Semantic Workflow Design
Ontology Service
Semantic Data Integration
Reasoner
Service matcher

Provenance metadata
Annotation
Base Services

Distributed Query
Workflow Enactment
QoS
Syntactic Discovery
Availability
Versioning
Preferences
Fabric
Event Notification

JobExecution
Third Party
MIR
White Pages Yellow Pages Discovery
Device Access
Database Access
Metadata
Database
myGrid Stack 0.2
51
Service based architecture
Find them Publication, registration, discovery,
matchmaking, deregistration.
Run them. Execution, monitoring, exception
handling.
Organise them. Interoperation, composition,
substitution.
Write a Comment
User Comments (0)
About PowerShow.com