myGrid and Taverna: Now and in the Future - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

myGrid and Taverna: Now and in the Future

Description:

Hound. Processor. Bio. MART. SCUFL. Application data flow ... To store knowledge provenance. Taverna workflow workbench & plugins. Ensure automated recording ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 40
Provided by: Kat7211
Category:
Tags: future | mygrid | now | taverna

less

Transcript and Presenter's Notes

Title: myGrid and Taverna: Now and in the Future


1
myGrid and TavernaNow and in the Future
  • Dr. K. Wolstencroft
  • University of Manchester
  • Helsinki, June 2006

2
Background
  • myGrid middleware components to support in silico
    experiments in biology
  • Originally designed to support bioinformatics
  • chemoinformatics
  • health informatics
  • medical imaging
  • integrative biology

3
History
EPSRC funded UK eScience Program Pilot Project
4
myGrid in OMII-UK
10 Developers Dedicated design, implementation,
testing and support team moving towards
production quality software
myGrid
OMII Stack
OGSA-DAI
March 2006
5
Lots of Resources
NAR 2006 over 850 databases
6
The User Community
  • Bioinformatics is an open Community
  • Open access to data
  • Open access to resources
  • Open access to tools
  • Open access to applications
  • Global in silico biological research

7
The User Community Problems
  • Everything is Distributed
  • Data, Resources and Scientists
  • Heterogeneous data
  • Very few standards
  • I/O formats, data representation, annotation
  • Everything is a string!
  • Integration of data and interoperability of
    resources is difficult

8
ID MURA_BACSU STANDARD PRT 429
AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE
1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7)
(ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMI
NE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA
OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA
FIRMICUTES BACILLUS/CLOSTRIDIUM GROUP
BACILLACEAE OC BACILLUS. KW PEPTIDOGLYCAN
SYNTHESIS CELL WALL TRANSFERASE. FT ACT_SITE
116 116 BINDS PEP (BY SIMILARITY). FT
CONFLICT 374 374 S -gt A (IN REF.
3). SQ SEQUENCE 429 AA 46016 MW 02018C5C
CRC32 MEKLNIAGGD SLNGTVHISG AKNSAVALIP
ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE
MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI
GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER
LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE
IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP
DRIEAGTFMI
9
myGrid Approach - Workflows
  • General technique for describing and enacting a
    process
  • describes what you want to do, not how you want
    to do it
  • Simple language specifies how bioinformatics
    processes fit together processes are web
    services
  • High level workflow diagram separated from any
    lower level coding therefore, you dont have to
    be a coder to build workflows

Predicted Genes out
Sequence
RepeatMasker Web service
GenScan Web Service
BlastWeb Service
10
SCUFL
Taverna Workbench
Application data flow layer Scufl graph service
introspection
Scufl Workflow Object Model
Execution flow layer List management implicit
iteration mechanism MIME semantic type
decoration fault management service alternates
Workflow Execution
Freefluo Workflow enactor
Processor invocation layer
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Bio MOBY
Plain Web Service
Soap lab
Seq Hound
Bio MART
Local App
Enactor
11
Taverna Workflow Components
Freefluo
Freefluo Workflow engine to run workflows
Scufl Simple Conceptual Unified Flow
Language Taverna Writing, running workflows
examining results SOAPLAB Makes applications
available
12
What Services we Support
13
User Interaction Handling
  • Interaction Service and corresponding Taverna
    processor allows a workflow to call out to an
    expert human user
  • Used to embed the Artemis annotation editor
    within an otherwise automated genome annotation
    pipeline
  • Collaboration with the University of Bergen
  • Ref Poster, Nettab 2005
  • R for numerical analysis (microarray informatics
    amongst others)

14
What shall I do when a service fails?
  • Most services are owned by other people
  • No control over service failure
  • Some are research level
  • Workflows are only as good as the services they
    connect!
  • To help - Taverna can
  • Notify failures
  • Instigate retries
  • Set criticality
  • Substitute services

15
myGrid Users
  • 20000 downloads
  • Users in US, Singapore, UK, Europe, Australia
  • Systems biology
  • Proteomics
  • Gene/protein annotation
  • Microarray data analysis
  • Medical image analysis

16
Trypanosomiasis Study
  • Resistance to trypanosomiasis in cattle in Kenya
  • Andy Brass, Paul Fisher University of Manchester
  • Form of Sleeping sickness in cattle
  • Known as ngana
  • Caused by Trypanosoma brucei

17
Study involves
  • Microarray data
  • QTL
  • SNPs
  • Metabolic pathway analysis
  • Need to access microarray data, genomic sequence
    information, pathway databases AND integrate the
    results

18
(No Transcript)
19
Workflow Reuse
myGrid Workflow Repository http//workflows.mygrid
.org.uk/repository
20
(No Transcript)
21
Data Management
  • Workflows can generate vast amount of data - how
    can we manage and track it?
  • Data AND metadata AND experiment provenance
  • LSIDs - to identify objects
  • Semantic Web technologies (RDF, Ontologies)
  • To store knowledge provenance
  • Taverna workflow workbench plugins
  • Ensure automated recording

22
KAVE Data and metadata management
  • Life Science Identifiers (LSIDs)
  • Information Model
  • File management
  • Support for custom database building
  • Provenance metadata capture using RDF
  • SRB integration
  • OGSA-DAI integration

23
Provenance Browsing in Taverna
New in Taverna 1.4
24
Feta Semantic Discovery
  • Over 3000 services!
  • Find services by their function
  • Questions we can ask
  • Find me all the services that perform a multiple
    sequence alignment And accepts protein sequences
    in FASTA format as input

25
myGrid Ontology
Specialises
Upper level ontology
Contributes to
Task ontology
Informatics ontology
Molecular Biology ontology
Bioinformatics ontology
Web Service ontology
26
Feta Architecture
Feta Descriptions
Feta Descriptions
Feta Descriptions
Obtain descriptions
Taverna Workbench
3
Obtain Classification
Feta GUI Client
Feta Engine Service
Ontology Editor
3
Semantic Discovery
4
Classification - In RDF(S) -
Build myGrid Domain Ontology
27
Annotations
  • Feta has been available for 1 year
  • Not yet in the release
  • Need critical mass of services before release
  • Annotation experiments with users and domain
    experts
  • Domain expert annotations much better
  • hiring a full-time annotation see the myGrid
    website for details

28
Results Integration
Smarter workflow design incorporating
visualisation VBI collaboration
29
Visualisation
SeqVista
Utopia
30
New Plans for Taverna 2.0
31
Evolving challenges
  • Long running data intensive workflows
  • Manipulation of confidential or otherwise
    protected information
  • Use with classical grid systems
  • Interaction with users during workflows

32
Development
  • Development of Taverna 2.0
  • reworking of the processor model to include duel
    execution semantics incorporating data and
    control flow
  • enhanced support for long-running workflows
  • fully distributed workflow enactment and
    authoring
  • User steering
  • large scale data transfer

33
Enhanced Processor Model
  • Modular dispatcher mechanism
  • Dynamic service binding
  • Recursive invocation
  • Data filter implementation
  • Retry, failover, back-off behaviours
  • Transparent third party data transfers
  • High throughput stream handling with implicit
    iteration semantics

34
3rd Party Data Transfers
  • Allows in place referencing of data
  • Large data sets no longer round-trip between
    workflow engine and data provider
  • Allows restricted access to sensitive data
  • Automatic de-reference when a reference type is
    linked to a value type within a workflow.
  • Connecting a grid service to a web service

35
Streaming Data
  • Allow execution of downstream workflow stages on
    partially complete results from upstream.

Service 1
Service 2
Service 3
Non streaming (Taverna 1), entire iteration must
complete at each stage
Streamed data, Service 2 starts operating on
partial results from Service 1
36
Recursive Invocation
Receive Input
  • Dispatcher allowing recursive invocation to be
    plugged into per operation semantics.

Return Result
37
Future Direction
  • Enhancements to the Workflow Core
  • Enhancements to user interface and experience
  • Expanded use of semantic web technologies
  • Code remains open source and always will

38
Latest News
  • See plans for Taverna 2.0 on myGrid wiki
  • Taverna development is user-driven
  • Please keep in touch and tell us what you would
    like to see by the myGrid mailing lists Taverna
    Users, Taverna Hackers
  • Bioinformatics curator for service annotation
  • Details on the myGrid website

39
Acknowledgements
  • The myGrid group Past and Present
  • OMII-uk
  • Carole Goble
  • Pinar Alper
  • Tom Oinn
  • Antoon Goderis
  • Matthew Gamble
  • Daniele Turi
Write a Comment
User Comments (0)
About PowerShow.com