Title: Use of Data Provenance and the Grid in Medical Image Analysis and Drug Discovery an IXI exemplar
1Use of Data Provenance and the Grid in Medical
Image Analysis and Drug Discovery an IXI
exemplar
- Kelvin K. Leung1, Mark Holden1, Rolf A.
Heckemann2, Nadeem Saeed3, - Keith J. Brooks3, Jacky B. Buckton4, Kumar
Changani3, David G. Reid3, - Daniel Rueckert5, Joseph V. Hajnal2, Derek L.G.
Hill1 - 1Division of Imaging Sciences, King's College
London, UK - 2Imaging Sciences Department, Imperial College
(Hammersmith Hospital Campus), UK - 3Imaging Centre, 4RA Disease Biology, ri-CEDD,
GlaxoSmithKline, UK - 5Department of Computing, Imperial College, UK
2Overview
- Background
- Motivations
- Virtual data system
- Automatic delineation of multiple bones in serial
MR images of joints in a disease model of
Rheumatoid Arthritis (RA) - Image registration and segmentation propagation
- Methods
- Prototype
- Results
- Conclusions
3Motivations
- Medical imaging is going to play an important
part in drug discovery - Recent 76m investment by GlaxoSmithKline (GSK)
and Imperial College on a new clinical imaging
center - Automatic analysis of medical image data
requires - Lots of storage space (each image is about 32Mb
in this work) - Computational power (running time is about 20-24
hours for processing an image on a single desktop
computer in this work) - Motivated by the need of computational resources
4Motivations
- The Grid has the potential to allow better
collaboration between industry and university
with the idea of virtual organisation - University can provide image analysis algorithms
as services to the industry, such as GSK, over
the Grid - Motivated by the need of better and more
effective collaboration with the industry
5Motivations
- Detail and reliable documentation of data
provenance of all the analysis is very important
in order to obtain regulatory approval for new
drug. - Part 11 of Guidance on industry issued by US Food
and Drug Administration (FDA) - Good Laboratory Practice (GLP) and Good Clinical
Practice (GCP) - Motivated by the need of data provenance
6Overview
- Background
- Motivations
- Virtual data system
- Automatic delineation of multiple bones in serial
MR images of joints in a disease model of
Rheumatoid Arthritis - Image registration and segmentation propagation
- Methods
- Prototype
- Results
- Conclusions
7Virtual data system (VDS or Chimera)
- A system to enable documentation of data
provenance, discovery of available methods and
on-demand data generation (so-called virtual
data) - Developed by I. Foster, J. Vöckler, M. Wilde and
Y. Zhao of University of Chicago - It consists of
- A virtual data catalogue is a virtual data schema
that provides a representation of computational
procedures and their invocations. - A virtual data language interpreter handles all
the requests for constructing and querying the
database entries. - Data objects, such as input and output files, are
described by logical file names (LFN), which are
mapped to physical files via Globus replica
catalog (RC) or Globus replica location service
(RLS)
8Virtual data system
- Virtual data language (VDL) is used to describe
computational procedures and their invocations - Computational procedures are defined by
transformation (TR) statements. Example - TR foo(input file1, output file2)
- Invocations are defined by derivation (DV)
statements. Example - To invoke foo with logical filenames file_a
(input) and file_b (output) - DV call_foo-gtfoo(file1_at_inputfile_a,file2_at_o
utputfile_b) - Virtual data schema allows the storage of TRs
and DVs
9Virtual data system
- Compound TR can be built so that workflow can be
defined. Example - To call foo twice and pass the output of the
first call to the input of the second call - TR compound_foo(input file_in, output file_out,
io file_io) - call foo(file1_at_inputfile_in,
file2_at_outputfile_io) - call foo(file1_at_inputfile_io,
file2_at_outputfile_out) - When requesting an output file from the system,
an abstract DAG (contains only LFN) will be
generated. - A planner called Planning for Execution in Grid
(Pegasus) converts the abstract DAG into a
Condor DAGman script and submit it to the Globus
universe of Condor.
10Overview
- Background
- Motivations
- Virtual data system
- Automatic delineation of multiple bones in serial
MR images of joints in a disease model of
Rheumatoid Arthritis - Image registration and segmentation propagation
- Methods
- Prototype
- Results
- Conclusions
11Automatic delineation of multiple bones
- Rheumatoid Arthritis (RA)
- Is a chronic, systemic, autoimmune inflammatory
disease. - Targets synovial joints, in which there is a
massive accumulation of blood-borne cells such as
T cells and macrophages. - Blood vessels are formed to support this new
tissue and the whole mass is called pannus. - Progressive erosion to cartilage and bone leads
to disability in patients - MR images were acquired in a disease model of RA
- Interested in the talus bone and the calcaneus
bone in the ankle - Delineate them from the MR images and study them,
e.g. calculate volume to measure any erosion
12Image registration
- Refers to the spatial alignment of two images so
that corresponding features in the two images are
matched - The result is a spatial mapping or transformation
that transforms positions from one image to
positions in another image. - Example Movie showing the rigid registration of
two 3D MR images of a knee
13Image registration
- Rigid registration translation rotation 6
degrees of freedom (dof) - Affine registration rigid skewing scaling
12 dof - Nonrigid registration warp one image into
another one - Very computationally demanding because of lots of
dof - Example Free form deformation (FFD) models local
deformation as translation of a regularly spaced
grid of points (control points)
14Segmentation propagation
- Makes use of the spatial mapping calculated from
the registration of two image to perform
segmentation - Requires an atlas
- An atlas is a reference image with labelled
structures
15Segmentation propagation
Atlas
calcaneus
All image analysis workflows were entered into VDS
Target image
Reference image
Manual segmentation of calcaneus
16Overview
- Background
- Motivations
- Virtual data system
- Automatic delineation of multiple bones in serial
MR images of joints in a disease model of
Rheumatoid Arthritis - Image registration and segmentation propagation
- Methods
- Prototype
- Results
- Conclusions
17Prototype
- Simple web interface to replace some command line
tools of VDS, Globus Toolkit 2.4 and Condor - Researchers or clinicians working on medical
image analysis may not be comfortable with
command line tools and the virtual data language - Developed using Java servlet on Apache Tomcat
- Web pages for
- Querying VDS for transformations and derivations
- Invoking transformations in VDS
- Querying, uploading and downloading files to and
from Globus RLS - Displaying job status in Condor
18Prototype
Web portal machine running Apache Tomcat, Globus
client, personal Condor (job submission site)
Grid machine running Globus Gatekeeper, GridFTP
server, Globus RLS and Condor
Experimental condor pool of 4 machines
(storage and execution site)
19Overview
- Background
- Motivations
- Virtual data system
- Automatic delineation of multiple bones in serial
MR images of joints in a disease model of
Rheumatoid Arthritis - Image registration and segmentation propagation
- Methods
- Prototype
- Results
- Conclusions
20Results
services
21Results
Service to delineate the calcaneus and talus
from the target image
22Results
23Results
Jobs generated
24Results
Job status in Condor
25Results
Click to download files and view in vtkview
26Results
Service to render the surfaces of the bones
27Results
Job submitted
Job status
28Results
29Results
Browse all the executed services
30Results
31Overview
- Background
- Motivations
- Virtual data system
- Automatic delineation of multiple bones in serial
MR images of joints in a disease model of
Rheumatoid Arthritis - Image registration and segmentation propagation
- Methods
- Prototype
- Results
- Conclusions
32Conclusions
- We integrated Grid middleware and data provenance
tool with medical image processing software in a
prototype system with collaboration with GSK - Data provenance of the results were kept in VDS.
They can be queried and retrieved easily. - Aim to satisfy guidelines issued by US FDA, GLP
and GCP on the maintenance of audit trail of
electronic records. - The total processing time of delineating 12 bones
from 6 subjects were cut down from about 132
hours to about 33 hours (a factor of 4) by
running the computing tasks on a Condor pool
instead of on a single desktop computer
33Further work
- More user feedback is required to evaluate and
improve the system - Further validation and application to a larger
amount of subjects are required to determine the
sensitivity of the delineation technique to
disease progression
34Acknowledgements
- EPSRC
- GlaxoSmithKline (GSK)
- Links
- IXI www.ixi.org.uk
- VDS www.griphyn.org/chimera