Title: Workflow Systems in Bioinformatics and the Bioinformatics Educational Grid
1Workflow Systems in Bioinformatics and the
Bioinformatics Educational Grid
- Tan Tin Wee
- Associate Professor
- National University of Singapore
- tinwee_at_bic.nus.edu.sg
- Shoba Ranganathan, Victor Tong, Justin Choo,
Richard Tan, G.S.Ong, Simon See, TS Lim, Mark de
Silva and KSLim. - International Symposium on Grid Computing
ISGC2004 - Making the World Wide Grid a Reality 27 July
2004
2In a Nutshell
- Weaving several threads of development in
Bioinformatics such as Workflow Integration and
DataGrid (over the past 5 yrs or so) - to build an integrated educational grid
info-structure - that will support HR development, education,
training, self-learning etc in the emerging
discipline of bioinformatics - for conventional as well as E- eduation
3Making the World Wide Grid a reality
Contribution of Bioinformatics
- Bioinformatics is the science of using
information and ICT to understand biology - Despite being driven by rapid progress in allied
disciplines in the New Biology genomics,
proteomics, metabolomics, transcriptomics, other
omics, computational biology, systems biology
generating unprecedented volumes of data - Grid computing is not yet ubiquitous in life
sciences
4In Vitro In Vivo In Situ In Silico Biology And
Personalised Medicine Imaging Modeling Simulati
on Theoretical Biology
D. Hanahan and R. A. Weinberg. The hallmarks of
cancer. Cell., 100(1)5770 Review, 2000
5- Tools have changed, but the job hasntCartoons
from talk by Rozhan Mohammed Idrus Hanafi Atan,
Universiti Sains Malaysia, APAN 2003 - Bioinformatics - Emergent and almost pervasive in
all biological and life science disciplines
6Computational Demands and Data Processing in Life
Sciencesare expanding!
omics
Genomics
Proteomics
Bioinformatics
Computational Biology
Medical Informatics
BioStatistics
LIFE SCIENCE INFORMATICS
LIFE SCIENCES and HEALTH SCIENCES
7Where does the Grid fit in?
Life Science Informatics
BIOTECHNOLOGY and NEW BIOLOGY
INFOCOMMUNICATIONSTECHNOLOGY
8Why no Grid here yet?
- Lack of widespread awareness and training in
computational skills in the life sciences
community - Few computational, networking and grid computing
experts with first hand domain knowledge in life
sciences - Data-intensive nature of life science grid
computing applications - Labour-intensive nature of building life science
grids - Lack of Killer Applications
- Bioinformatics is a Rapidly changing target
9Biotech and InfoComm Technology - Parallel Growth
Systems Biology
Worm Genome
Dolly DNA chips
Human Genome
Genome Project
Microbial Genomes
BioX
Biotechnology
1990 92 94 96 98 2000 2002
2004
InfoCommunication Technology
Dotcom boom And crash
Wais Gopher
Lambda Networking
Internet 2
WWW boom
Grid Computing
Java
ISP
10Grids applied to Life Sciences
- Internet2 demos of late 1990sQuasi-realtime data
collection from synchrotrons for 3D structure
determination - iGrid98, SC98, SC99, SC2003 (most
geographically dispersed grid computing award
arthropod phylogenetics) - Anthrax research United Devices
- Encyclopedia of Life (EOL)
- OBIGrid
- Kansai BioGrid
- ? Large scale mega projects
- ? Not a WorldWide Grid
11iGrid98SC98
http//www.startap.net/startap/igrid98/maxLikeAnAp
bionet98.html
12INET99demo
http//www.bic.nus.edu.sg/admin/News/Jun99/inet/in
et99.html
http//www.startap.net/startap/APPLICATIONS/collab
ForStruct.html
13When will World Wide Grid be a reality for Life
sciences?
- Like World Wide Web everyone uses it, from
publication and accessing the content - Plug and play Tap computational cycles anywhere
from everywhere anytime - Secure to use
- Killer application like Mosaic in 1993
- Generate meaningful results
- Control key tools and automate mundane processes
- Connect people, computation, data, instruments
14Focus on two key areas
- Grid-enabled bioinformatics workflows systems as
the killer application - Building a bioinformatics educational grid
15Workflow integration
- 1996/7 Java based FlowBot project
- 1998 Inet98 Internet Flowbot Protocol
- http//www.isoc.org/inet98/proceedings/8x/8x_1.htm
- 1998 Application to Life Sciences Workflow
Integration BIC-CNPR joint project Lim et al - 1998 PSB98 From Sequence to Structure to
Literature The protocol approach to
Bioinformation Wu et al ? spinoff company
GeneticXchange.com - 2001 Spinoff Company KOOPrime Pte Ltd
- 2002 BioWorldWideWorkFlow initiative in APBioNet
- ? Workflow integration is the Killer Application
for a World Wide Life Science Grid!
16Bioinformatics Educational Grid
- 2001 - S Life Sciences Informatics Alliance
3 years of experience in online
bioinformatics education 5
courses and gt1000 persons worldwide trained in
basic bioinformatics
Team of Online Teaching Assistants - Workshop on Education in Bioinformatics WEB01,
WEB02, WEB03, WEB04 - 2004 Problem Based Learning PBL in
Bioinformatics online using
emeet.nus.edu.sg - 2004 Building the Bioinformatics Educational
Grid - ? Education is the answer to making the World
Wide Grid a reality
17Background
- Biologists and Biotechnologists need to be
equipped and trained to carry out tomorrows
biological research today! - Integration of
- Network Infrastructure
- Databases
- Software
- Computational Grid
- Online educational and teaching and learning
materials - Education Killer Application
181993
191. Network Infrastructure
APAN Advanced Research Network 1996-2004
20Internet2 and beyond
- 1st Country outside North America to connect
SINGAREN Singapore Advanced Research and
Education Network - TANET2 from Taiwan and APAN-Transpac were next.
- Then Abilene.
- Todays Starlight and Lambda networking
212. Databases
- Key major databases - 1.5 Terabytes today!
- Publicly accessible data over the Internet
doubling every 12 to 18 months http//www.bio-mirr
or.net/ - Mirroring Moores Law for chip technology
22BIODATABASES
Genbank Genbank Genomes InterPro PDB BlastDB BLOCK
S DDBJ EMBL ENZYME PROSITE
PIR PFAM REBASE NCBI REFSEQ SRC SWISSPROT Taxonom
y TrEMBL UniGene euGenes
23BioDataGrid Registry of Databases
- NUS BioDataGrid initiative everest.bic.nus.edu.sg/
lsdb - Singapore National Grid Office has a new
initiative to be announced soon. - Facilitate varying levels of granularity of
access to structured and unstructured biological
data
243. Software
- APBioBox project
- Funded by IDRC Pan Asia Networking RD Grant
- Rapid and Easy Replication of Grid enabled
software crucial to grid growth
253. Software - APBioBox
- Funded by International Development Research
Centre of Canada, under their PAN Pan Asia
Networking ICT grant - To build an easily installable, widely and
freely accessible, integrated suiteof
bioinformatics applications to faciliatetraining
and research amongst biologistsin developing
countries - A/P Tan Tin Wee, National University of
Singapore - Adjunct Professor Shoba Ranganathan, NUS and
Chair Professor, Macquarie University, Sydney - Ong Guan Sin, Consultant programmer, Singapore
Computer Systems Pte Ltd
263. Software - APBioBox
- Shrink-wrapped bundle of some 300 software
applications used in bioinformatics - Preconfigured and integrated
- 15 mins to install on a Linux RedHat9 platform
which typically takes several weeks to set up. - Partnered with Sun Microsystem to come up with
Bio-Cluster Grid, the equivalent in Sun Solaris
platform. - CDROMs and Downloadable http//www.apbionet.org/ap
biogrid/apbiobox
273. Software APBioBox appls
- Logical Abstraction through Java Wrappers built
for - EMBOSS 160 applications
- PHYLIP 30 applications
- HMMER
- CLUSTALW
- BLAST
- FASTA, SSEARCH (in progress)
- MySQL
- SRS (Lion Bioscience)
- Globus Grid Toolkit 2.4
- Unix Utilities
- KOOPlite
- Key Bioinformatics Databases (in progress)
28BioBox for Solaris
29Grid Engine Portal
304. Computational Grid
- APBioGrid 2002
- To faciliate the building of a shared
computational grid resources for the Asia Pacific
region.
31APBioGRID Project
CRAY
- APBioGrid
- Aims to provide computational resources to
bioinformaticians and biological researchers to
facilitate education and research through sharing
each others computers over the Grid
32Why APBioNet Grid is needed?
- Large-scale life science .. are done through
the interaction of people, heterogeneous
computing resources, information systems, and
instruments, all of which are geographically and
organizationally dispersed. - The overall motivation for Grids is to
facilitate the routine interactions of these
resources in order to support large-scale life
science .
Altered from Bill Johnston 27 July 01
33Why the Grid?
- 1998 advent of Grid Computing distributed
computing - E.g. Tapping idle CPU cycles globally in the SETI
project or the Anthrax online projects. - Like tapping electrons from the power grid, just
plug in the appliance into the socket - Currently, one of the hottest areas in ICT.
- So the basis for BioGridshas been laid
345. Online Learning Material
Eight institutions from 5 continents since 2001
The S Life Science Informatics Alliance
Sweden
Karolinska Institutet
University of Uppsala
USA
Stanford University
University of California, San Diego
National University of Singapore
Singapore
Australia
University of Sydney Macquarie University
South Africa
University of the Western Cape
35Sample Lecture- Slide View
36Wide Range of S learning materials
- - Tutorial ppt presentation materials on
introductory bioinformatics - - Frequently Asked Questions in Forum discussion
archives - - Overview lectures on
- Introductory Molecular Biology
- An Overview of the Computational Analysis of
Biological Sequences - Transcript Analysis and Reconstruction
- Comparative Genomics
- Representations and Algorithms for Computational
Molecular Biology - Protein Structure Primer, Structure Prediction
and Protein Physics - Genomics and Computational Molecular Biology
Genomics - Protein and Nucleic Acid Structure, Dynamics,and
Engineering - Proteomics and Proteomes
- Structure Prediction for Macromolecular
Interactions - Protein - Ligand Modeling
- Microarray informatics
37Goals of S
- Provide a GLObal Bioinformatics Unified Learning
Environment (GLOBULE) made up of modular courses
in the disciplines of bioinformatics, medical
informatics and genomics - Provide accessibility to the highest possible
quality of online courseware approved by the
educators from the host institutions. - Develop an integrated modular learning
environment that allows a student to select from
both pre-requisite modules and advanced modules
in order to build a comprehensive program.
38S course-3 by country
39S course Growing List of Participants Countries
40S Geographical Comparison
north america
africa
south america
41Feedback
- Pretty good. A few rough edges but I'm sure
you'll work them out over time. I really enjoyed
it. Most of the lectures were very well presented
and the participants in the forums helpful. I'm
very impressed at the amount of work that has
obviously gone into setting up the course. Alan
Wardroper, Thailand - The international participation of the lecturers
and students. The relevance of the field of
bioinformatics in meeting the biomedical needs of
today. The level of communication provided by the
IVLE system enhanced learning considerably. The
range of professional and academic background of
students. The technical support provided by SStar
was rapid and efficient to queries. - C.A.O. IDOWU, England
42Feedback
- To think that a world-class, web based education
with such valued lectures is brought to your desk
free of cost is impossible elsewhere. The course
was wonderfully well managed. Our requests and
problems were quickly and well attended to. I had
a great time doing this course and thank the
SSTAR team whole heartedly for making me a
fortunate participant with this fantastic
experience. - Naidu Ratnala Thulaja, Singapore
- I think it is a very useful course, it is exactly
what it says it is an introduction to
bioinformatics. It covers nicely major topics and
provides enough information in order for us to
understand what bioinformatics is all about. I
enjoyed it very much and I am even a bit sad it
is over. Thank you very much! Patricia
Severino, Romania
43Emergence of Grid Technologies
- The Grid - Grid Computing
- Next Generation Internet technologies (Internet2)
and their applications - Computational Grids
- Informational Grids
- Access Grid
- Educational Grids ? do the same for the
educational process the learner or the teacher
can tap into learning materials, tools,
information, computational hands-on, in the
so-called classroom without walls!
44Educational Grid for Bioinformatics
- Increase repository of regularly used
bioinformatics software - Registry of tools, software and databases
- Higher level abstraction of resources
- Virtual classrooms and discussions
- Distributed repository of learning objects and
materials - Self assessment tests
- Project Based modules
- Problem Based Learning
- Integrated learning environment for the practice
of bioinformatics in the life sciences - Support both conventional and e-learning/e-educati
on
45Problem-Based Learning (PBL)
- Started at McMaster University Medical School
over 25 years ago - Encourages hand-on and critical thinking. Its
hands-on approach is particular suited for
bioinformatics where many of the skills require
practical execution and the problems encountered
are generally open-ended. - PBL encourages
- acquisition of critical knowledge.
- problem solving proficiency problems tackled are
generally open-ended. - self-motivated learning.
- team participation.
46Role Change
- In PBL, theres a fundamental change in the role
played by the participants. - a facilitator guides the entire session.
- a scribe records the entire session.
- some participants field questions others try to
brainstorm and provide answers. There will not be
student-teacher relationship,everybody is treated
equally. Focus is on peer learning
47PBL Asynchronous Sessions
- S is currently experimenting PBL session using
IVLE discussion forum and eventually web-based
collaboration platform TWiKi (http//twiki.org) - Consideration/Issues to resolve
- How to accommodate so many participants
- How to host so many TWiKi page
- Will participants with slow connection able to
access ?
48PBL synchronous sessions
- Emeet.nus.edu.sg
- CENTRA technology
- Low bandwidth requirement
- VOIP for voice, Video if necessary
- Agenda, Whiteboard, Shared applications, File
transfer, Web Safari
49Projects
- 8 different projects
- 8 teams of volunteer facilitators
- 300 students into 8 groups
- Two phases
- Set them up to solve various topical
bioinformatics problems from bottom up in PBL
style.
50Online Delivery Mechanism
- Consider and want to explore various advanced
networking technologies particularly on video
conferencing software. - e.g. AccessGridTM
- http//www.accessgrid.org/
51AccessGridTM
- It is a suite of resources including multimedia
large-format displays, presentation and
interactive environments, and interfaces to Grid
middleware and to visualization environments. - Developed by the Futures Laboratory at Argonne
National Laboratory and deployed by the NCSA PACI
Alliance, it is now used over 150 institutions
worldwide with each institution hosting one or
more Access Grid (AG) node. - Each node employs high-end audio and visual
technology needed to provide a high-quality
compelling user experience.
52Immersive Learning
- Enable group-to-group interactions across the
Grid. - Activities such as large-scale distributed
meetings, collaborative work sessions, seminars,
lectures, tutorials, and training are made
possible.
Fig 1 Controlling Audio/Visual Quality
Fig 2 Group-to-Group Live Interaction
53Issues Consideration
- Infrastructure (high speed network,
connection/bandwidth) - Cost of setting up
- Location of set-up
- Manpower required
- Technical competency
54Workflow as the killer app
- KOOPrimes LivePortal/LifeBase and KOOPlatform
- Carole Gobles myGrid, Taverna, etc
- Anabench
- Vibe
- Bingo
- All with killer GUI
- Others such as ASP model Bioinformatics
.com/Entigens BioNavigator
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60Operator View
Administrator View
61Browse Drag Drop Connect
62Scheduling Functions
Search and Resource Discovery functions
Description Annotation Authoring Function
Sharing/Publishing/ Resource Browsing functions
63Future BioGRID Components
Bio End User
Web Interface
KOOP Interface
Bio Applications (EMBOSS, PHYLIP, FASTA, SSEARCH)
Bio Applications (EMBOSS, PHYLIP)
Globus-aware Scheduler (Nimrod-G)
Globus-aware Scheduler (Nimrod-G)
LSF
Sun SGE
Globus
Globus
OS
OS
CPU
CPU
64Weaving the threads of development
- In Networking
- In Bioinformatics Software application packages
- In BioDataGrid
- In Online Educational Learning Objects
- Bioinformatics Educational Grid and Bio World
Wide WorkFlow Bio W3F
Workflow Object
Output is Input to the next object
Output From previous object
65Ingredients of W3F
W3F orchestrator
KOOPserver
W3F service providers Apps developers
KOOPsdk
KOOPdaemon
W3F enactors
KOOPbrowser
W3F browser
KOOPeditor
W3F editor
- Users can browse workflows and cobble together
app objects to reuse, repurpose objects or
workflows - Apps developers can wrap their applications and
advertise to potential users and service
providers - Service providers can mount apps from apps
developer - W3F orchestrator coordinates scheduling, load
balancing, security etc -
66Why suitable for grid?
- KOOPdaemons can call grid commands through grid
portals - KOOPsdk easily wraps your existing applications,
including grid ones - KOOPsdk can also call grid commands of say Globus
grid toolkits - Layered approach for rapid uptake.
67Framework of Bioinformatics Development in Asia
Pacific from 1991-2004
POLICY
RESEARCH
EDUCATION Manpower Training
Coordination
Planning
Compute INFRASTRUCTURE
DATA INFRASTRUCTURE
NETWORK INFRASTRUCTURE
Collaboration Cooperation
68The Future
- Defining an Evolving Educational Grid for
bioinformatics - Continuing Major Impact of ICT in the Life
Sciences - Synergistic and sustained growth of two major
late 20th Century technologies - Building the framework for World Wide Workflow
- Share resources, access resources seamlessly
- Build sophisticated automated workflows
comprising interconnection of people,
computation, data and bioinstrumentation
69- Thank you for this opportunity to share this with
you. - Tan Tin Wee
- Tinwee_at_bic.nus.edu.sg