caGrid Version 0.5 Reference Implementation RProteomics caBIG Architecture Workspace Face to Face Georgetown University August 16th -18th, 2005 - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

caGrid Version 0.5 Reference Implementation RProteomics caBIG Architecture Workspace Face to Face Georgetown University August 16th -18th, 2005

Description:

caGrid Version 0'5 Reference Implementation RProteomics caBIG Architecture Workspace Face to Face Ge – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 20
Provided by: ArumaniMan8
Category:

less

Transcript and Presenter's Notes

Title: caGrid Version 0.5 Reference Implementation RProteomics caBIG Architecture Workspace Face to Face Georgetown University August 16th -18th, 2005


1
caGrid Version 0.5 Reference ImplementationRProte
omicscaBIG Architecture Workspace Face to
FaceGeorgetown UniversityAugust 16th -18th,
2005
  • Patrick McConnell
  • Duke Comprehensive Cancer Center
  • patrick.mcconnell_at_duke.edu
  • Shannon Hastings
  • Ohio State University
  • hastings_at_bmi.osu.edu

2
Outline
  • High Level Overview of Proteomics
  • Data Model
  • Project Architecture
  • Process of getting to Silver level compliance
  • Functionality Exposed to Grid
  • Process of Grid Enablement
  • Demo/Screenshots
  • Lessons Learned / Technical Difficulties / Wish
    List
  • Acknowledgements

3
Proteomics Overview
  • Goal
  • Find biomarker
  • Build predictive model
  • Proteins are split into peptide fragments
  • Mass is measured by time-of-flight (TOF)
  • Mass of peptides can be used to identify
    proteins
  • Peptides can undergo a second MS to help
    identification

http//www.appliedbiosystems.com/catalog/myab/Stor
eCatalog/products/CategoryDetails.jsp?hierarchyID
101category3rd112051trailno
4
Proteomics Data
  • A modest study can be on the order of 10 GB of
    data

5
Project Overview
  • RProteomics is a development project in the
    Proteomics SIG of the ICR Workspace
  • Developing analytical routines for proteomics
    data
  • Denoising, background removal, peak
    identification, spectral alignment,
    normalization, peptide quantitation
  • Focus is on analytics
  • NOT databases, LIMS, protein identification
  • RProteomics is a critical step in the proteomics
    pipeline
  • LIMS -gt repository -gt RProteomics -gt
    classification -gt protein identification
  • RProteomics provides integration
  • Q5 classification has been integrated

6
Statistics Background Removal
7
Statistics Denoising
8
Statistics Spectral Alignment
9
Statistics Protein Quantitation
10
Data Model
  • mzXML
  • Encodes raw spectra data (mz-intensity pairs)
  • Some metadata about instrumentation
  • Utilizes base64 encoding for binary data
  • scanFeatures
  • Encodes analysis results as a set of features
  • Some metadata about the experiment
  • Utilizes base64 encoding for binary data
  • Service parameters
  • JpegImage
  • Lsid
  • WindowSize
  • ThreshholdMultiplier

11
Project Architecture
12
Project Architecture
13
Process of getting to Silver level compliance
  • Programming and messaging interfaces
  • Apache Axis for web services
  • Wrapped functionality with Java interfaces that
    made sense
  • Vocabularies, terminologies, and ontologies
  • Data elements
  • Wrote tool for XML Schema to XMI conversion
  • Manually curated UML
  • Went through semantic connecting process
  • Information models
  • XML Schema to begin with, so information models
    were easy

14
Functionality Exposed to the Grid
  • Analytical service no security requirements
  • Discuss its input and output and what it does
    scientifically
  • Functionality to be exposed
  • 20 more statistical methods
  • Data access methods, translation methods

15
Process of Grid Enablement
  • Process
  • Creation/extraction of data types using XML
    Schema
  • Upload data types into caGrid GME
  • Use the Analytical Toolkit Portal to create and
    modify grid service interface.
  • Implement the server stub that is generated by
    making the appropriate calls into the original
    non-grid-enabled RProteomics application.
  • Compile, and deploy.

16
Demo and/or Screenshots
  • Demonstration of RProteomics GUI with grid
    functionality

17
Lessons Learned / Technical Difficulties / Wish
List
  • Think grid from the beginning
  • Have an idea what the service interface will be
    ahead of time
  • Wrap parameters with objects
  • Technology is complex
  • XML, Schema, CDEs, Globus, Web Services, etc.
  • Installation is complex
  • Have to have working knowledge of Tomcat, Axis,
    Ant, environment variables, etc.
  • Need to have compatible versions of each
    component, esp. Java 1.4.2_04
  • Wish list
  • Wizard for grid-enabling existing code
  • Documentation of every aspect of installation and
    functionality
  • Clone Shannon for each development project

18
Lessons Learned / Technical Difficulties / Wish
List
  • Starting with a non-grid-enabled application
    which has been tested and is stable made wrapping
    it to a grid service easier to debug.
  • Need a standard mechanism for dealing with large
    data objects.
  • Some sort of lazy loaded object/pointer would be
    sufficient.
  • Integration of toolkit portal into some standard
    IDEs might make development even easier.

19
Acknowledgements
  • Duke, ICR Developer
  • Patrick McConnell, Project lead
  • Richard Haney, Architect and developer of
    statistical systems
  • Salvatore Mungal, Middle-tier Java developer
  • Mark Peedin, Database developer
  • Northwestern University, Collaborator
  • Simon Lin, Proteomics domain expert
  • Oregon Health Sciences University, ICR Adopter
  • Shannon McWeeney
  • Veena Rajaraman
  • University of Pennsylvania, ICR Adopter
  • David Fenstermacher
  • Craig Street
  • University of North Carolina, Collaborator
  • Cristoph Borchers, Proteomics scientist
  • OSU, caGRID Team
  • Shannon Hastings
  • Scott Oster
  • Stephen Langella
  • Tahsin Kurc
  • Joel Saltz
  • Architecture
  • Arumani Manisundaram
  • Avinash Krishnakant
  • VCDE
  • Brian Davis, Workspace Lead
  • George Komatsoulis, VCDE lead
  • Claire Wolfe, VCDE curator
  • Salvatore Mungal, VCDE mentor
Write a Comment
User Comments (0)
About PowerShow.com