Workflow Systems in Bioinformatics and the Bioinformatics Educational Grid - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Workflow Systems in Bioinformatics and the Bioinformatics Educational Grid

Description:

Weaving several threads of development in Bioinformatics such as Workflow ... Wais Gopher. ISP. Java. Microbial Genomes. Worm Genome. Dolly & DNA chips. Human. Genome ... – PowerPoint PPT presentation

Number of Views:262
Avg rating:3.0/5.0
Slides: 70
Provided by: shobaran
Category:

less

Transcript and Presenter's Notes

Title: Workflow Systems in Bioinformatics and the Bioinformatics Educational Grid


1
Workflow Systems in Bioinformatics and the
Bioinformatics Educational Grid
  • Tan Tin Wee
  • Associate Professor
  • National University of Singapore
  • tinwee_at_bic.nus.edu.sg
  • Shoba Ranganathan, Victor Tong, Justin Choo,
    Richard Tan, G.S.Ong, Simon See, TS Lim, Mark de
    Silva and KSLim.
  • International Symposium on Grid Computing
    ISGC2004
  • Making the World Wide Grid a Reality 27 July
    2004

2
In a Nutshell
  • Weaving several threads of development in
    Bioinformatics such as Workflow Integration and
    DataGrid (over the past 5 yrs or so)
  • to build an integrated educational grid
    info-structure
  • that will support HR development, education,
    training, self-learning etc in the emerging
    discipline of bioinformatics
  • for conventional as well as E- eduation

3
Making the World Wide Grid a reality
Contribution of Bioinformatics
  • Bioinformatics is the science of using
    information and ICT to understand biology
  • Despite being driven by rapid progress in allied
    disciplines in the New Biology genomics,
    proteomics, metabolomics, transcriptomics, other
    omics, computational biology, systems biology
    generating unprecedented volumes of data
  • Grid computing is not yet ubiquitous in life
    sciences

4
In Vitro In Vivo In Situ In Silico Biology And
Personalised Medicine Imaging Modeling Simulati
on Theoretical Biology
D. Hanahan and R. A. Weinberg. The hallmarks of
cancer. Cell., 100(1)5770 Review, 2000
5
  • Tools have changed, but the job hasntCartoons
    from talk by Rozhan Mohammed Idrus Hanafi Atan,
    Universiti Sains Malaysia, APAN 2003
  • Bioinformatics - Emergent and almost pervasive in
    all biological and life science disciplines

6
Computational Demands and Data Processing in Life
Sciencesare expanding!
omics
Genomics
Proteomics
Bioinformatics
Computational Biology
Medical Informatics
BioStatistics
LIFE SCIENCE INFORMATICS
LIFE SCIENCES and HEALTH SCIENCES
7
Where does the Grid fit in?
Life Science Informatics
BIOTECHNOLOGY and NEW BIOLOGY
INFOCOMMUNICATIONSTECHNOLOGY
8
Why no Grid here yet?
  • Lack of widespread awareness and training in
    computational skills in the life sciences
    community
  • Few computational, networking and grid computing
    experts with first hand domain knowledge in life
    sciences
  • Data-intensive nature of life science grid
    computing applications
  • Labour-intensive nature of building life science
    grids
  • Lack of Killer Applications
  • Bioinformatics is a Rapidly changing target

9
Biotech and InfoComm Technology - Parallel Growth
Systems Biology
Worm Genome
Dolly DNA chips
Human Genome
Genome Project
Microbial Genomes
BioX
Biotechnology
1990 92 94 96 98 2000 2002
2004
InfoCommunication Technology
Dotcom boom And crash
Wais Gopher
Lambda Networking
Internet 2
WWW boom
Grid Computing
Java
ISP
10
Grids applied to Life Sciences
  • Internet2 demos of late 1990sQuasi-realtime data
    collection from synchrotrons for 3D structure
    determination
  • iGrid98, SC98, SC99, SC2003 (most
    geographically dispersed grid computing award
    arthropod phylogenetics)
  • Anthrax research United Devices
  • Encyclopedia of Life (EOL)
  • OBIGrid
  • Kansai BioGrid
  • ? Large scale mega projects
  • ? Not a WorldWide Grid

11
iGrid98SC98
http//www.startap.net/startap/igrid98/maxLikeAnAp
bionet98.html
12
INET99demo
http//www.bic.nus.edu.sg/admin/News/Jun99/inet/in
et99.html
http//www.startap.net/startap/APPLICATIONS/collab
ForStruct.html
13
When will World Wide Grid be a reality for Life
sciences?
  • Like World Wide Web everyone uses it, from
    publication and accessing the content
  • Plug and play Tap computational cycles anywhere
    from everywhere anytime
  • Secure to use
  • Killer application like Mosaic in 1993
  • Generate meaningful results
  • Control key tools and automate mundane processes
  • Connect people, computation, data, instruments

14
Focus on two key areas
  • Grid-enabled bioinformatics workflows systems as
    the killer application
  • Building a bioinformatics educational grid

15
Workflow integration
  • 1996/7 Java based FlowBot project
  • 1998 Inet98 Internet Flowbot Protocol
  • http//www.isoc.org/inet98/proceedings/8x/8x_1.htm
  • 1998 Application to Life Sciences Workflow
    Integration BIC-CNPR joint project Lim et al
  • 1998 PSB98 From Sequence to Structure to
    Literature The protocol approach to
    Bioinformation Wu et al ? spinoff company
    GeneticXchange.com
  • 2001 Spinoff Company KOOPrime Pte Ltd
  • 2002 BioWorldWideWorkFlow initiative in APBioNet
  • ? Workflow integration is the Killer Application
    for a World Wide Life Science Grid!

16
Bioinformatics Educational Grid
  • 2001 - S Life Sciences Informatics Alliance
    3 years of experience in online
    bioinformatics education 5
    courses and gt1000 persons worldwide trained in
    basic bioinformatics
    Team of Online Teaching Assistants
  • Workshop on Education in Bioinformatics WEB01,
    WEB02, WEB03, WEB04
  • 2004 Problem Based Learning PBL in
    Bioinformatics online using
    emeet.nus.edu.sg
  • 2004 Building the Bioinformatics Educational
    Grid
  • ? Education is the answer to making the World
    Wide Grid a reality

17
Background
  • Biologists and Biotechnologists need to be
    equipped and trained to carry out tomorrows
    biological research today!
  • Integration of
  • Network Infrastructure
  • Databases
  • Software
  • Computational Grid
  • Online educational and teaching and learning
    materials
  • Education Killer Application

18
1993
19
1. Network Infrastructure
APAN Advanced Research Network 1996-2004
20
Internet2 and beyond
  • 1st Country outside North America to connect
    SINGAREN Singapore Advanced Research and
    Education Network
  • TANET2 from Taiwan and APAN-Transpac were next.
  • Then Abilene.
  • Todays Starlight and Lambda networking

21
2. Databases
  • Key major databases - 1.5 Terabytes today!
  • Publicly accessible data over the Internet
    doubling every 12 to 18 months http//www.bio-mirr
    or.net/
  • Mirroring Moores Law for chip technology

22
BIODATABASES
Genbank Genbank Genomes InterPro PDB BlastDB BLOCK
S DDBJ EMBL ENZYME PROSITE
PIR PFAM REBASE NCBI REFSEQ SRC SWISSPROT Taxonom
y TrEMBL UniGene euGenes
23
BioDataGrid Registry of Databases
  • NUS BioDataGrid initiative everest.bic.nus.edu.sg/
    lsdb
  • Singapore National Grid Office has a new
    initiative to be announced soon.
  • Facilitate varying levels of granularity of
    access to structured and unstructured biological
    data

24
3. Software
  • APBioBox project
  • Funded by IDRC Pan Asia Networking RD Grant
  • Rapid and Easy Replication of Grid enabled
    software crucial to grid growth

25
3. Software - APBioBox
  • Funded by International Development Research
    Centre of Canada, under their PAN Pan Asia
    Networking ICT grant
  • To build an easily installable, widely and
    freely accessible, integrated suiteof
    bioinformatics applications to faciliatetraining
    and research amongst biologistsin developing
    countries
  • A/P Tan Tin Wee, National University of
    Singapore
  • Adjunct Professor Shoba Ranganathan, NUS and
    Chair Professor, Macquarie University, Sydney
  • Ong Guan Sin, Consultant programmer, Singapore
    Computer Systems Pte Ltd

26
3. Software - APBioBox
  • Shrink-wrapped bundle of some 300 software
    applications used in bioinformatics
  • Preconfigured and integrated
  • 15 mins to install on a Linux RedHat9 platform
    which typically takes several weeks to set up.
  • Partnered with Sun Microsystem to come up with
    Bio-Cluster Grid, the equivalent in Sun Solaris
    platform.
  • CDROMs and Downloadable http//www.apbionet.org/ap
    biogrid/apbiobox

27
3. Software APBioBox appls
  • Logical Abstraction through Java Wrappers built
    for
  • EMBOSS 160 applications
  • PHYLIP 30 applications
  • HMMER
  • CLUSTALW
  • BLAST
  • FASTA, SSEARCH (in progress)
  • MySQL
  • SRS (Lion Bioscience)
  • Globus Grid Toolkit 2.4
  • Unix Utilities
  • KOOPlite
  • Key Bioinformatics Databases (in progress)

28
BioBox for Solaris
29
Grid Engine Portal
30
4. Computational Grid
  • APBioGrid 2002
  • To faciliate the building of a shared
    computational grid resources for the Asia Pacific
    region.

31
APBioGRID Project
CRAY
  • APBioGrid
  • Aims to provide computational resources to
    bioinformaticians and biological researchers to
    facilitate education and research through sharing
    each others computers over the Grid

32
Why APBioNet Grid is needed?
  • Large-scale life science .. are done through
    the interaction of people, heterogeneous
    computing resources, information systems, and
    instruments, all of which are geographically and
    organizationally dispersed.
  • The overall motivation for Grids is to
    facilitate the routine interactions of these
    resources in order to support large-scale life
    science .

Altered from Bill Johnston 27 July 01
33
Why the Grid?
  • 1998 advent of Grid Computing distributed
    computing
  • E.g. Tapping idle CPU cycles globally in the SETI
    project or the Anthrax online projects.
  • Like tapping electrons from the power grid, just
    plug in the appliance into the socket
  • Currently, one of the hottest areas in ICT.
  • So the basis for BioGridshas been laid

34
5. Online Learning Material
Eight institutions from 5 continents since 2001
The S Life Science Informatics Alliance
Sweden
Karolinska Institutet
University of Uppsala
USA
Stanford University
University of California, San Diego
National University of Singapore
Singapore
Australia
University of Sydney Macquarie University
South Africa
University of the Western Cape
35
Sample Lecture- Slide View
36
Wide Range of S learning materials
  • - Tutorial ppt presentation materials on
    introductory bioinformatics
  • - Frequently Asked Questions in Forum discussion
    archives
  • - Overview lectures on
  • Introductory Molecular Biology
  • An Overview of the Computational Analysis of
    Biological Sequences
  • Transcript Analysis and Reconstruction
  • Comparative Genomics
  • Representations and Algorithms for Computational
    Molecular Biology
  • Protein Structure Primer, Structure Prediction
    and Protein Physics
  • Genomics and Computational Molecular Biology
    Genomics
  • Protein and Nucleic Acid Structure, Dynamics,and
    Engineering
  • Proteomics and Proteomes
  • Structure Prediction for Macromolecular
    Interactions
  • Protein - Ligand Modeling
  • Microarray informatics

37
Goals of S
  • Provide a GLObal Bioinformatics Unified Learning
    Environment (GLOBULE) made up of modular courses
    in the disciplines of bioinformatics, medical
    informatics and genomics
  • Provide accessibility to the highest possible
    quality of online courseware approved by the
    educators from the host institutions.
  • Develop an integrated modular learning
    environment that allows a student to select from
    both pre-requisite modules and advanced modules
    in order to build a comprehensive program.

38
S course-3 by country
39
S course Growing List of Participants Countries
40
S Geographical Comparison
north america
africa
south america
41
Feedback
  • Pretty good. A few rough edges but I'm sure
    you'll work them out over time. I really enjoyed
    it. Most of the lectures were very well presented
    and the participants in the forums helpful. I'm
    very impressed at the amount of work that has
    obviously gone into setting up the course. Alan
    Wardroper, Thailand
  • The international participation of the lecturers
    and students. The relevance of the field of
    bioinformatics in meeting the biomedical needs of
    today. The level of communication provided by the
    IVLE system enhanced learning considerably. The
    range of professional and academic background of
    students. The technical support provided by SStar
    was rapid and efficient to queries.
  • C.A.O. IDOWU, England

42
Feedback
  • To think that a world-class, web based education
    with such valued lectures is brought to your desk
    free of cost is impossible elsewhere. The course
    was wonderfully well managed. Our requests and
    problems were quickly and well attended to. I had
    a great time doing this course and thank the
    SSTAR team whole heartedly for making me a
    fortunate participant with this fantastic
    experience.
  • Naidu Ratnala Thulaja, Singapore
  • I think it is a very useful course, it is exactly
    what it says it is an introduction to
    bioinformatics. It covers nicely major topics and
    provides enough information in order for us to
    understand what bioinformatics is all about. I
    enjoyed it very much and I am even a bit sad it
    is over. Thank you very much! Patricia
    Severino, Romania

43
Emergence of Grid Technologies
  • The Grid - Grid Computing
  • Next Generation Internet technologies (Internet2)
    and their applications
  • Computational Grids
  • Informational Grids
  • Access Grid
  • Educational Grids ? do the same for the
    educational process the learner or the teacher
    can tap into learning materials, tools,
    information, computational hands-on, in the
    so-called classroom without walls!

44
Educational Grid for Bioinformatics
  • Increase repository of regularly used
    bioinformatics software
  • Registry of tools, software and databases
  • Higher level abstraction of resources
  • Virtual classrooms and discussions
  • Distributed repository of learning objects and
    materials
  • Self assessment tests
  • Project Based modules
  • Problem Based Learning
  • Integrated learning environment for the practice
    of bioinformatics in the life sciences
  • Support both conventional and e-learning/e-educati
    on

45
Problem-Based Learning (PBL)
  • Started at McMaster University Medical School
    over 25 years ago
  • Encourages hand-on and critical thinking. Its
    hands-on approach is particular suited for
    bioinformatics where many of the skills require
    practical execution and the problems encountered
    are generally open-ended.
  • PBL encourages
  • acquisition of critical knowledge.
  • problem solving proficiency problems tackled are
    generally open-ended.
  • self-motivated learning.
  • team participation.

46
Role Change
  • In PBL, theres a fundamental change in the role
    played by the participants.
  • a facilitator guides the entire session.
  • a scribe records the entire session.
  • some participants field questions others try to
    brainstorm and provide answers. There will not be
    student-teacher relationship,everybody is treated
    equally. Focus is on peer learning

47
PBL Asynchronous Sessions
  • S is currently experimenting PBL session using
    IVLE discussion forum and eventually web-based
    collaboration platform TWiKi (http//twiki.org)
  • Consideration/Issues to resolve
  • How to accommodate so many participants
  • How to host so many TWiKi page
  • Will participants with slow connection able to
    access ?

48
PBL synchronous sessions
  • Emeet.nus.edu.sg
  • CENTRA technology
  • Low bandwidth requirement
  • VOIP for voice, Video if necessary
  • Agenda, Whiteboard, Shared applications, File
    transfer, Web Safari

49
Projects
  • 8 different projects
  • 8 teams of volunteer facilitators
  • 300 students into 8 groups
  • Two phases
  • Set them up to solve various topical
    bioinformatics problems from bottom up in PBL
    style.

50
Online Delivery Mechanism
  • Consider and want to explore various advanced
    networking technologies particularly on video
    conferencing software.
  • e.g. AccessGridTM
  • http//www.accessgrid.org/

51
AccessGridTM
  • It is a suite of resources including multimedia
    large-format displays, presentation and
    interactive environments, and interfaces to Grid
    middleware and to visualization environments.
  • Developed by the Futures Laboratory at Argonne
    National Laboratory and deployed by the NCSA PACI
    Alliance, it is now used over 150 institutions
    worldwide with each institution hosting one or
    more Access Grid (AG) node.
  • Each node employs high-end audio and visual
    technology needed to provide a high-quality
    compelling user experience.

52
Immersive Learning
  • Enable group-to-group interactions across the
    Grid.
  • Activities such as large-scale distributed
    meetings, collaborative work sessions, seminars,
    lectures, tutorials, and training are made
    possible.

Fig 1 Controlling Audio/Visual Quality
Fig 2 Group-to-Group Live Interaction
53
Issues Consideration
  • Infrastructure (high speed network,
    connection/bandwidth)
  • Cost of setting up
  • Location of set-up
  • Manpower required
  • Technical competency

54
Workflow as the killer app
  • KOOPrimes LivePortal/LifeBase and KOOPlatform
  • Carole Gobles myGrid, Taverna, etc
  • Anabench
  • Vibe
  • Bingo
  • All with killer GUI
  • Others such as ASP model Bioinformatics
    .com/Entigens BioNavigator

55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
Operator View
Administrator View
61
Browse Drag Drop Connect
62
Scheduling Functions
Search and Resource Discovery functions
Description Annotation Authoring Function
Sharing/Publishing/ Resource Browsing functions
63
Future BioGRID Components
Bio End User
Web Interface
KOOP Interface
Bio Applications (EMBOSS, PHYLIP, FASTA, SSEARCH)
Bio Applications (EMBOSS, PHYLIP)
Globus-aware Scheduler (Nimrod-G)
Globus-aware Scheduler (Nimrod-G)
LSF
Sun SGE
Globus
Globus
OS
OS
CPU
CPU
64
Weaving the threads of development
  • In Networking
  • In Bioinformatics Software application packages
  • In BioDataGrid
  • In Online Educational Learning Objects
  • Bioinformatics Educational Grid and Bio World
    Wide WorkFlow Bio W3F

Workflow Object
Output is Input to the next object
Output From previous object
65
Ingredients of W3F
W3F orchestrator
KOOPserver
W3F service providers Apps developers
KOOPsdk
KOOPdaemon
W3F enactors
KOOPbrowser
W3F browser
KOOPeditor
W3F editor
  • Users can browse workflows and cobble together
    app objects to reuse, repurpose objects or
    workflows
  • Apps developers can wrap their applications and
    advertise to potential users and service
    providers
  • Service providers can mount apps from apps
    developer
  • W3F orchestrator coordinates scheduling, load
    balancing, security etc

66
Why suitable for grid?
  • KOOPdaemons can call grid commands through grid
    portals
  • KOOPsdk easily wraps your existing applications,
    including grid ones
  • KOOPsdk can also call grid commands of say Globus
    grid toolkits
  • Layered approach for rapid uptake.

67
Framework of Bioinformatics Development in Asia
Pacific from 1991-2004
POLICY
RESEARCH
EDUCATION Manpower Training
Coordination
Planning
Compute INFRASTRUCTURE
DATA INFRASTRUCTURE
NETWORK INFRASTRUCTURE
Collaboration Cooperation
68
The Future
  • Defining an Evolving Educational Grid for
    bioinformatics
  • Continuing Major Impact of ICT in the Life
    Sciences
  • Synergistic and sustained growth of two major
    late 20th Century technologies
  • Building the framework for World Wide Workflow
  • Share resources, access resources seamlessly
  • Build sophisticated automated workflows
    comprising interconnection of people,
    computation, data and bioinstrumentation

69
  • Thank you for this opportunity to share this with
    you.
  • Tan Tin Wee
  • Tinwee_at_bic.nus.edu.sg
Write a Comment
User Comments (0)
About PowerShow.com