Title: EPSRC eScience Pilot Project in Integrative Biology David Gavaghan, Damian Mac Randal, and Sharon Ll
1(No Transcript)
2 EPSRC e-Science Pilot Project in
Integrative BiologyDavid Gavaghan, Damian Mac
Randal, and Sharon Lloyd
3Project Overview
- Focus of first round of UK e-Science Projects
- Data storage, aggregation, and synthesis
- Life Sciences projects focused on supporting the
data generation work of laboratory-based
scientists - Key goal now is to turn this wealth of data into
information that can be used to determine
biological function - Requires an iterative interplay between
experiment, mathematical modelling, and
HPC-enabled simulation - Primary goal of this project is to build the
necessary Grid infrastructure to support this
goal
4The Science and e-Science Challenge
- To build an Integrative Biology Grid to support
applications scientists addressing the key
post-genomic aim of determining biological
function - To use this Grid to begin to tackle the two
chosen Grand Challenge problems the in-silico
modelling of heart failure and of cancer.
5Two Grand Challenge Research Questions
- What causes heart disease?
- How does a cancer form and grow?
- These two diseases together cause 61 of all
deaths in the UK -
6Courtesy of Peter Kohl (Physiology, Oxford)
Normal beating
Fibrillation
7Multiscale modelling of the heart
MRI image of a beating heart
Fibre orientation ensures correct spread of
excitation
Contraction of individual cells
Current flow through ion channels
8Simulation of sudden cardiac death due to a
mechanically induced impact applied during
repolarisation
Courtesy of W.Li, P.Kohl, and N.Trayanova.
J. Mol. Hist. 2004 (in press)
Required 27 hours of CPU time on an SGI IRIX 64
9Mathematical model of a beating heart by the
Auckland Group
10Multiscale modelling of cancer
11An integrative approach to disease modelling?
- The potential impact of this approach has been
demonstrated by the work on modelling the heart - Time is ripe to extend to cancer UK has
extensive expertise but little has yet been done - Together the two application areas provide a
sufficiently hard e-Science problem to require a
generic solution - Methodology and infrastructure will be utilised
across biology and in other scientific domains
12The scientific challenge
- Modelling and coupling phenomena which occur on
many different length and time scales
- 1m person
- 1mm tissue morphology
- 1mm cell function
- 1nm pore diameter of a membrane protein
- Range 109
- 109 s (years) human lifetime
- 107 s (months) cancer development
- 106 s (days) protein turnover
- 103 s (hours) digest food
- 1 s heart beat
- 1 ms ion channel gating
- 1 ms Brownian motion
- Range 1015
13Details of test-run of heart simulation code on
HPCx
- Modelled 2ms of electrophysiological excitation
of a 5700mm3 volume of tissue from the left
ventricular free wall - Noble 98 cell model used
- Mesh contained 20,886 bilinear elements (spatial
resolution 0.6mm) - 0.05ms timestep (40 timesteps in total)
- Required 978s CPU on 8 processors and 2.5 Gbytes
of memory - A complete simulation of the ventricular
myocardium would require up to 30 times the
volume and at least 100 times the duration - Estimated max compute time to investigate
arrhythmia 107s (100 days) requiring 100Gb of
memory (compute time scales to the power 5/3) - At high efficiency this scales to approximately 1
day on HPCx
14Key Deliverables
- A robust and fault-tolerant infrastructure to
support post-genomic research in integrative
biology that is user and application driven - 2nd Generation Grid bringing together components
across range of current EPSRC pilot projects
15The e-Science Challenge
- To leverage the global Grid infrastructure to
build an international collaboratory which
places the applications scientist within the
Grid allowing fully integrated and collaborative
use of - HPC resources (capacity and capability)
- Computational steering, performance control and
visualisation - Storage and data-mining of very large data sets
- Easy incorporation of experimental data
- User- and science-friendly access
- gt Predictive in-silico models to guide
experiment and, ultimately, design of novel
drugs and treatment regimes
16e-Science/Grid Research Issues
- Ability to carry out reliably and resiliently
large scale distributed coupled HPC simulations - Ability to co-schedule Grid resources based on a
GGF-agreed standard - Use of Grid Services based on OGSA-DAI for data
virtualisation - Secure data management and access-control in a
Grid environment - Grid services for computational steering
conforming to an agreed GGF standard
17e-Science/Grid Research (contd.)
- Grid Services for supporting distributed
collaborative working including steering and
visualisation - An interface to using Grid resources which
understands and supports effectively the science
context of the project - The project also stretches the cross-disciplinary
aspects of the Grid by linking medical,
biological, engineering and computing activities. - The project is intending to produce a long term
(10 year) production environment based on the
Grid to support what we expect to become a major
scientific growth area.
18Architecture and Software Engineering
- Initially use Web Services to provide a platform
and language independent interface to the main
functional components - Adopt Grid Services as stable open source
OGSA-compliant implementations become available - Deploy an object-oriented component-based toolkit
allowing a plug-and-play style programming
paradigm - Use of Portal Technologies to provide
collaborative access to services
19Architecture
20Architecture
21Technology Gaps that will be addressed
- Much of this work will be in conjunction with
other EPSRC Pilot projects - Resilient, robust, reliable Grid framework for
large scale distributed coupled simulations - Standardised Grid framework for computational
steering and visualisation - Metadata schemas for describing the information
and data resources involved - Standardised means to schedule multiple resources
on the Grid concurrently - Tools for collaborative working in a Grid
Services environment - Transparent Grid
22Project management
- Building on extensive experience in other
e-Science projects (particularly e-DiaMoND) - Focus on team building and common goals (key for
large, inter-institutional development projects) - Establishing good communication mechanisms
- Iterative prototype development
23The Team
- World-leading expertise in the two application
areas - IBM
- CCLRC
- Seven UK and NZ Universities (Oxford, Nottingham,
Leeds, UCL, Birmingham, Sheffield and Auckland) - Expertise from across the UK e-Science Programme
- Extensive existing connectivity between all
members of the consortium and with the wider
research communities in e-Science and within the
application areas - Research training in an area crucial to the UK
24The Resources
- 2.44M from EPSRC e-Science to fund 10 PDRAs and
6 PhD students - A further 4 PhD students plus sys admin and
secretarial support funded internally - Equivalent of 3FTEs from IBM plus substantial
hardware discounts to provide a Power 4 server
and high performance workstations to all project
staff. - Use of Atlas Data store at RAL and substantial
commitment of staff time by CCLRC - Large pool of expertise through the
co-investigators in the seven partner
universities, IBM and CCLRC - Extensive access to national HPC resources (HPCx
and CSAR)
25Current Status
- Award letter issued 26/9/03, agreed by University
in late October, grant announced 26/10/03. - Project manager, project architect, six PDRAs,
and one D.Phil student already appointed - Project Structure defined and agreed,
requirements gathering and security policy
exercises commenced - Recruitment of other staff in process
- Kick off meeting of project participants held in
Oxford on January 19th
26(No Transcript)