e-HTPX - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

e-HTPX

Description:

eHTPX – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 17
Provided by: davem169
Category:
Tags: htpx | moio

less

Transcript and Presenter's Notes

Title: e-HTPX


1
e-HTPX HPC, Grid and Web-Portal Technologies in
High Throughput Protein Crystallography
Rob Allan (r.j.allan_at_dl.ac.uk), Ronan Keegan
(r.m.keegan_at_dl.ac.uk), David Meredith
(d.j.meredith_at_dl.ac.uk), Martyn Winn
(m.d.winn_at_dl.ac.uk), Graeme Winter
(g.winter_at_dl.ac.uk), CCLRC Daresbury
Laboratory Jonathan Diprose (jon_at_
strubi.ox.ac.uk), Chris Mayo (chris.mayo_at_strubi.ox
.ac.uk), University of Oxford, The Welcome Trust
Centre of Human Genetics, Oxford Ludovic Launer
(launer_at_embl-grenoble.fr), MRC France, ESRF,
Grenoble Joel Fillon (fillon_at_ebi.ac.uk), European
Bioinformatics Institute, Cambridge Paul Young
(pyoung_at_ysbl.york.ac.uk), York Structural Biology
Laboratory
2
e-HTPX Overview
  • The vast amounts of data coming from the genome
    projects have generated a demand for new methods
    to obtain structural information about proteins
    and macromolecules. This has led to a demand for,
    high throughput structural biology to determine
    the structure of important proteins.
  • e-HTPX A distributed computing infrastructure
    required to remotely plan, initiate, monitor
    experiments for protein crystallographic
    structure determination (workflow).
  • Relies heavily on Grid portal, web-service, HPC
    technologies.
  • Project integrates a number of key services
    provided by UK e-Science, protein manufacture and
    synchrotron laboratories.

3
e-HTPX Workflow
Stage 1 Select protein target Stage 2
Crystallisation of Protein Stage 3 Data
Collection (X-ray diffraction images, Scaling and
Integration) Stage 4 Structure Solution (HPC
data processing to derive digital protein model)
Stage 5 Submit model into public database
Structure Solution
Target Selection
Start
Finish
  • A single all encompassing web interface from
    which users can initiate, plan, direct and
    document the experimental workflow either locally
    or remotely from a desktop computer.

4
Key Technologies
SRS - Beamline
3rd Party Grid ftp RSL
MyProxy Server
Grid Node
Grid Node
HPC Cluster Sun Grid Engine
Grid ftp RSL
Upload Credentials (Java WebStart App)
U P
Grid ftp RSL
Credentials
Web Server JSP, Scoped Beans, Java CoG Kit,
Apache Axis, Apache Tomcat
Beamline Machine
Username Password
IP Recognition through firewall
Web Services (PPDM)
Web Services (PPDM)
Beamline Database
5
Stages 1 to 2 (Target Selection and Protein
Production)
6
Web-Service Call Stack PPDM
  • Web-service Call Stack
  • Complex sequence of communications is required
    between the user and the different laboratories
    involved (protein production and delivery to
    SRS).
  • Hub centralizes all the requests/responses
    between user and various labs involved
  • PPDM Protein Production Data Model
  • To provide a model to exchange information
    between the different partners of the high
    throughput process
  • Communicates with hardware, databases, LIMS and
    different stages of the Protein Crystallography
    Pipeline
  • Each facility can implement independently the
    service according to an agreed standard
  • Describes many components (experiments,
    molecules, sample constituents)
  • Various Expressed Languages-
    XML schema, SQL, Java Classes, Python
    Classes

7
e-HTPX Hub Interface
  • Interface used to plan and experiment and input
    required data
  • (e.g. on-line completion of safety forms,
    specification of crystal growth conditions,
    remote authorization of necessary permissions and
    allocation of sufficient beam-time)
  • Interface simplifies complex web-service call
    stack
  • (The status of each call is automatically updated
    and presented to the user)

8
Protein Visualisation Web-Service
  • OPPF - Automatic pipetting facilities in to 96
    well trays, facilities for imaging the wells and
    a database. Images of the wells can be provided
    for the user over the internet.
  • Colour codes indicate likelihood of crystal
    developed in droplet.

9
Stage 3 Data Collection (X-ray Diffraction and
collection of images)
  • A typical experiment on a high brilliance
    beamline may generate a few gigabytes of data
  • Data collection involves automated X-ray
    diffraction facilities, including sample changers
    to exchange crystals on the beamline.
  • Automation of this type is essential for remote
    operation.
  • The system is being linked in to a database
    which is used to store requests from the user and
    handle the data for individual samples.

10
Stage 3 Data Collection
Sample Changer
2) Expert system providing automated and
synchronous analysis / verification of data
quality
4) Data Collection (Beamline Control Module)
Diffractometer
Portal
SRS - Beamline
X-ray Diffraction Images
1) Start Specify experimental requirements
Detector
3) Feedback Modify data collection parameters
5) Finish Data Collection and Processing Complete
Grid FTP Data
Grid FTP Data
6) Start Stage 4 HPC Further data processing
Grid FTP
Grid Node Storage Facility
11
Stage 4 Solve Structure of Protein
End Stage3
1) Continue Pipeline
2) Job Submission a) Globus 2.4 GRAM Job Manager
Automated job submission and data transfer
(continuation of e-HTPX pipeline)
b) Sun Grid Engine Batch Queuing
1) New Entry Point
3rd Party Grid FTP Data Job Submission
3) CCP4 Code Suite
Key codes parallelized - Beast, Molecular
Replacement, Scala, Mosflm
4) Digital Protein Model
5) Submit model in DB
12
Remotely Accessing the Facilities
  • Monitor Status of Grid FTP Hosts and GRAM Job
    Managers
  • Interface to Grid FTP (Jsp, servlets, Java CoG
    Kit)
  • e-HTPX Requires secure transfer of
    diffraction images to HPC for structure solution

13
Remotely Accessing the Facilities (Upload data
from remote machine)
e-HTPX Portal
Web Start Download digitally signed jars
Grid FTP Data
2) Run Grid FTP File Transfer Tool

Web Start Download digitally signed jars
1) Run Proxy Delegation Tool from portal with Web
Start (Delegation via Web Services)
Remote Machine Requirements Java Web Start,
Internet Access, Port 2811 Open
14
Remotely Accessing the Facilities
  • Job submission interface (session scope Java
    beans Java CoG Kit, GT2.4)
  • Batch / Interactive jobs, Staging of exes,
    Stdout, Stderr re-direction,
  • Monitoring status of jobs (application scope
    job-monitor bean)

15
Remotely Accessing the Facilities
  • Custom interfaces for e-HTPX specific jobs
  • Molecular Replacement (new entry point for part
    of stage 4 structure solution process)

16
Conclusions
  • Key Problems Solved
  • Allows biologist to concentrate on the
    scientific questions rather than technical
    details.
  • Comprehensive Data Model (PPDM) allows each
    facility to implement independently the service
    according to an agreed standard.
  • Allows biologist to access to high-performance
    facilities (HPC, CCP4 codes.).
  • Key Technologies Java Beans, JSP, Servlets,
    Web Start, Java CoG Kit, GT2.4, Web Services,
    Expert Systems, Databases.
  • Future Plans
  • Remotely interfacing with robotic hardware
    (sample changers)
  • Outreach to industry to integrate e-HTPX into
    drug discovery pipelines.
Write a Comment
User Comments (0)
About PowerShow.com