The myGrid Project - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

The myGrid Project

Description:

open world of services. open to wider eScience context. open to user feedback ... DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 29
Provided by: Chris547
Category:

less

Transcript and Presenter's Notes

Title: The myGrid Project


1
The myGrid Project
  • Professor Chris Greenhalgh
  • University of Nottingham

2
  • Open Source Upper Middleware for Bioinformatics
  • (Web) Service-based architecture
  • Targeted at Tool Developers, Bioinformaticians
    and Service Providers

Newcastle
Sheffield
Manchester
Nottingham
Hinxton
Southampton
3
Philosophy
  • Openness
  • open source
  • open world of services
  • open to wider eScience context
  • open to user feedback
  • open to third party metadata
  • Collection of components for assembly
  • Pick and mix

4
Data-intensive bioinformatics
ID MURA_BACSU STANDARD PRT 429
AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE
1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7)
(ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMI
NE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA
OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA
FIRMICUTES BACILLUS/CLOSTRIDIUM GROUP
BACILLACEAE OC BACILLUS. KW PEPTIDOGLYCAN
SYNTHESIS CELL WALL TRANSFERASE. FT ACT_SITE
116 116 BINDS PEP (BY SIMILARITY). FT
CONFLICT 374 374 S -gt A (IN REF.
3). SQ SEQUENCE 429 AA 46016 MW 02018C5C
CRC32 MEKLNIAGGD SLNGTVHISG AKNSAVALIP
ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE
MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI
GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER
LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE
IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP
DRIEAGTFMI
5
Use Scenarios
  • Graves Disease
  • Autoimmune disease of the thyroid
  • Simon Pearce and Claire Jennings, Institute of
    Human Genetics School of Clinical Medical
    Sciences, University of Newcastle
  • Discover all you can about a gene
  • Annotation pipelines and Gene expression analysis
  • Services from Japan, Hong Kong, various sites in
    UK
  • Williams-Beuren Syndrome
  • Microdeletion of 155 Mbases on Chromosome 7
  • Hannah Tipney, May Tassabehji, Andy Brass, St
    Marys Hospital, Manchester, UK
  • Characterise an unknown gene
  • Annotation pipelines and Gene expression analysis
    Services from USA, Japan, various sites in UK

6
Williams-Beuren Syndrome Microdeletion
C-cen
A-cen
B-cen
C-mid
B-mid
A-mid
B-tel
A-tel
C-tel
WBSCR1/E1f4H
WBSCR5/LAB
GTF2IRD1
WBSCR21
WBSCR18
WBSCR22
WBSCR14
POM121
GTF2IRD2
BCL7B
BAZ1B
NOLR1
GTF2I
FKBP6
CYLN2
CLDN4
CLDN3
STX1A
LIMK1
NCF1
RFC2
TBL2
FZD9
ELN
1.5 Mb
7q11.23
Patient deletions


WBS
SVAS
Chr 7 155 Mb
7
Manually filling a genomic gap
  • Numerous web-based services (i.e. BLAST,
    RepeatMasker)
  • Cutting and pasting
  • Large number of steps
  • Frequently repeated info now rapidly added to
    public databases
  • Dont always get results
  • Time consuming
  • Huge amount of interrelated data is produced
    handled in lab book and files saved to local hard
    drive
  • Mundane
  • Much knowledge remains undocumented .
    Bioinformatician does the analysis

8
WBS Workflows
Query nucleotide sequence
ncbiBlastWrapper
RepeatMasker
Pink Outputs/inputs of a service Purple
Taylor-made services Green Emboss soaplab
services Yellow Manchester soaplab services
Grey Unknowns
GenBank Accession No
URL inc GB identifier
Translation/sequence file. Good for records and
publications
prettyseq
GenBank Entry
Amino Acid translation
Sort for appropriate Sequences only
Identifies PEST seq
epestfind
6 ORFs
Seqret
Identifies FingerPRINTS
pscan
MW, length, charge, pI, etc
Nucleotide seq (Fasta)
pepstats
sixpack
ORFs
transeq
Predicts Coiled-coil regions
RepeatMasker
pepcoil
tblastn Vs nr, est, est_mouse, est_human
databases. Blastp Vs nr
GenScan
Coding sequence
ncbiBlastWrapper
Restriction enzyme map
restrict
SignalP TargetP PSORTII
Predicts cellular location
CpG Island locations and
cpgreport
InterPro PFAM Prosite Smart
Identifies functional and structural
domains/motifs
RepeatMasker
Repetative elements
Hydrophobic regions
Pepwindow? Octanol?
Blastn Vs nr, est databases.
ncbiBlastWrapper
9
Workflow approachin-silico experiments
  • Williams-Beuren Syndrome
  • Manually takes two days () including analysis
  • Now takes 30 mins to produce results and half a
    day for analysis
  • Manually Do analysis as perform experiment
  • Workflow Do analysis at end of experiment
  • Therefore need good result co-ordination for
    back-tracking

10
(e-)Scientists
  • Experiment
  • Can workflow be used as an experimental method?
  • How many times has this experiment been run?
  • Analyze
  • How do we manage the results to draw conclusions
    from them?
  • How reliable are these results?
  • Collaborate
  • Can we share workflows, results, metadata etc?
  • Publish
  • Can we link to these workflows and results from
    our papers?
  • Review
  • Can I find, comprehend and review your work?
  • How was that result derived?

11
myGrid Service Stack
Work bench
Taverna
Talisman
Web Portal
Applications
Gateway
Personalisation
Service and Workflow Discovery
Registries
Provenance
Event Notification
Ontology Mgt
Ontologies
Metadata Mgt
Views
Core services
myGrid Information Repository
FreeFluo Workflow Enactment Engine
OGSA-DQP Distributed Query Processor
Web Service (Grid Service) communication fabric
External services
AMBIT Text Extraction Service
Native Web Services
SoapLab
GowLab
Legacy apps
Legacy apps
12
myGrid Service Stack
Work bench
Taverna
Talisman
Web Portal
Applications
Gateway
Personalisation
Service and Workflow Discovery
Registries
Provenance
Event Notification
Ontology Mgt
Ontologies
Metadata Mgt
Views
Core services
myGrid Information Repository
FreeFluo Workflow Enactment Engine
OGSA-DQP Distributed Query Processor
Web Service (Grid Service) communication fabric
External services
AMBIT Text Extraction Service
Native Web Services
SoapLab
GowLab
Legacy apps
Legacy apps
13
(No Transcript)
14
FreeFluo Features
  • Control flow, iteration and data flow
  • Data sets and nested flows
  • Configurable failure handling
  • Incorporated Life Science Id resolution
  • Provenance and status reporting
  • Type and data management
  • Plug-ins
  • User notification
  • Data entry wizard
  • Libraries of SHIM services
  • Libraries of workflows

15
Domain Services
  • Native WSDL Web services
  • DDBJ, NCBI BLAST, PathPort, BioMOBY
  • Wrapped legacy services
  • SoapLab
  • GowLab
  • Web pages as web services
  • One button wrapping
  • Leveraged the EMBOSS Suite
  • 159 services
  • Lots of them and lots of redundant services
  • The joys of firewalls and licensing

For each application CreateJob Run WaitFor GetRes
ults Destroy
EBI Support agreed to support Soaplab services as
core business
http//industry.ebi.ac.uk/soaplab/
16
Two Paths
  • Innovative work
  • Service and workflow registration
  • Semantic discovery
  • Provenance management
  • Text mining
  • Core functionality
  • Services Soaplab and Gowlab
  • Workflow enactment engine Freefluo
  • Workflow workbench Taverna
  • Data integration OGSADQP
  • Information model management
  • In between
  • Event notification
  • Gateway

17
Drilling Down myGrid and Semantics
  • Workflow and service discovery
  • Prior to and during enactment
  • Semantic registration
  • Workflow assembly
  • Semantic service typing of inputs and outputs
  • Provenance of workflows and other entities
  • Experimental metadata glue
  • Use of RDF, RDFS, DAMLOIL/OWL
  • Instance store, ontology server, reasoner
  • Materialised vs at point of delivery reasoning.
  • myGrid Information Model

18
Provenance (1)
Organisation level provenance
Process level provenance
Service
Project
runBye.g. BLAST _at_ NCBI
Experiment design
Process
Workflow design
componentProcesse.g. web service invocation of
BLAST _at_ NCBI
Event
partOf
instanceOf
componentEvente.g. completion of a web service
invocation at 12.04pm
Workflow run
Data/ knowledge level provenance
knowledge statementse.g. similar protein
sequence to
run for
User can add templates to each workflow process
to determine links between data items.
Data item
Person
Organisation
Data item
Data item
data derivation e.g. output data derived from
input data
19
RDF Rules
Relationship BLAST report has with other items in
the repository
Other classes of information related to BLAST
report
20
Information Model v2
Bioinformatics middleware domain neutral
  • Scientific data and the life-science identifier
  • Types
  • Identifier Types
  • Values and Documents
  • Provenance information
  • Annotation and Argumentation
  • Resources and Identifiers
  • People, teams and organizations
  • Representing the e-science process
  • Experimental methods for e-science

In the middle of deployment
21
LSIDs
http//www.i3c.org/wgr/ta/resources/lsid/docs/
  • LSID provides a uniform naming scheme.
  • LSID Resolver guarantees to resolve to same data
    object.
  • LSID Authority dishes them out.
  • Also returns metadata of object.
  • Used throughout myGrid as an object naming
    device.
  • myGrid Repository acts an LSID Authority
  • LSID allows universal access to results for
    collaboration, as well as for review.
  • RDFLSID explains the context of results, and
    provides guidance for further investigations.

I3C / IBM / EBI proposal for a Life Science
Identifier
Pioneered by myGrid
22
Using Haystack
23
In a nutshell
Pre-Prototype
Experimental Web-based Requirements gathering
Prototype 1
Demo at ISMB 2003
Architectural workout All services
represented NetBeans workbench API-based
integration Info Repository oriented XML-based
process provenance Workflow enactment engine
Full paper and demo at ISMB 2004 GSK
deployment Real biology
24
To Dos
  • Improve results management
  • Deployment of mIR
  • Portal for finding workflows, launching
    monitoring workflows, launching taverna, browsing
    results
  • Deploying publicly accessible semantic registry
  • Reinstate service discovery during enactment
  • Large scale data throughput workflow engine
  • Event notification on services
  • Using provenance graphs for impact analysis
  • Hiding LSIDs
  • Lexicons for concept names
  • Hardening semantic discovery
  • Ambient Text
  • Er..Security
  • Etc
  • myGrid in a box

25
Ongoing/Future Activities
  • myGrid-in-a-box
  • Technical follow-ons
  • Best practice (6) and OMII (Freefluo,Taverna,
    Event notification) bids
  • Research follow-ons
  • Semantic Grids, Data Grids, Workflow, Provenance
    services
  • PhD students
  • Science follow-ons
  • Life Sciences ISPIDER, e-Fungi
  • Clinical PsyGrid, CLEF-II
  • PhD students
  • Networking
  • LinK-up with BIRN/SEEK/GEON (SDSC) SCEC/GriPhyN
    (ISI,USC)

26
Wrap Up
  • Managed the transition from generic middleware
    development to practical day to day useful
    services
  • Real users (plural) fundamental to that
  • End to end support for an entire scenario
  • A broad view of the e-Science process
  • Show stoppers for practical adoption are not sexy
    technical showstoppers
  • Can I incorporate my favourite service?
  • Can I manage the results?
  • Tapping into (defacto) standards and communities
    to leverage others results and tools LSID,
    Haystack, Pedro
  • http//www.mygrid.org.uk

27
Acknowledgements
myGrid is an EPSRC funded UK eScience Program
Pilot Project
Particular thanks to the other members of the
Taverna project, http//taverna.sf.net
28
myGrid People
  • Core
  • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
    Cawley, Neil Davis, Alvaro Fernandes, Justin
    Ferris, Robert Gaizaukaus, Kevin Glover, Carole
    Goble, Chris Greenhalgh, Mark Greenwood, Yikun
    Guo, Ananth Krishna, Peter Li, Phillip Lord,
    Darren Marvin, Simon Miles, Luc Moreau, Arijit
    Mukherjee, Tom Oinn, Juri Papay, Savas
    Parastatidis, Norman Paton, Terry Payne, Matthew
    Pockock Milena Radenkovic, Stefan
    Rennick-Egglestone, Peter Rice, Martin Senger,
    Nick Sharman, Robert Stevens, Victor Tan, Anil
    Wipat, Paul Watson and Chris Wroe.
  • Users
  • Simon Pearce and Claire Jennings, Institute of
    Human Genetics School of Clinical Medical
    Sciences, University of Newcastle, UK
  • Hannah Tipney, May Tassabehji, Andy Brass, St
    Marys Hospital, Manchester, UK
  • Postgraduates
  • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
    Alper, John Dickman, Keith Flanagan, Antoon
    Goderis, Tracy Craddock, Alastair Hampshire
  • Industrial
  • Dennis Quan, Sean Martin, Michael Niemi, Syd
    Chapman (IBM)
  • Robin McEntire (GSK)
  • Collaborators
  • Keith Decker
Write a Comment
User Comments (0)
About PowerShow.com