caGrid Version 0.5 Reference Implementation caArray caBIG Architecture Workspace Face to Face Georgetown University August 16th -18th, 2005 PowerPoint PPT Presentation

presentation player overlay
1 / 11
About This Presentation
Transcript and Presenter's Notes

Title: caGrid Version 0.5 Reference Implementation caArray caBIG Architecture Workspace Face to Face Georgetown University August 16th -18th, 2005


1
caGrid Version 0.5 Reference ImplementationcaArra
ycaBIG Architecture Workspace Face to
FaceGeorgetown UniversityAugust 16th -18th,
2005
  • Colin Freas
  • Lombardi Comprehensive Cancer Center
  • cef6_at_georgetown.edu

2
Outline
  • High Level Overview of caArray
  • Data Model
  • Project Architecture
  • Process of getting to Silver level compliance
  • Functionality Exposed to Grid
  • Process of Grid Enablement
  • Demo/Screenshots
  • Lessons Learned / Technical Difficulties / Wish
    List
  • Acknowledgements

3
Project Overview
  • In our context, NCICB's caArray consists of a
    microarray database and the associated web-portal
    and API for accessing that data.
  • caArray is a standards based data repository of
    microarray experiment data.
  • The MIAME standard describes the Minimum
    Information About a Microarray Experiment that is
    needed to enable the interpretation of the
    experiment results unambiguously and,
    potentially, to reproduce the experiment.
    caArray is compliant with the MIAME 1.1 standard.
  • Concepts within the MIAME standard can be mapped
    to the MAGE-OM and to MAGE-ML which are an object
    model and an XML format, respectively. This is
    the basis for caArrays object model.
  • caArray consists of two parts a web application
    where users input experiment data and perform
    searches, and the MAGE-OM server, which allows
    programmatic access to caArray objects. API
    access is read-only and is achieved via RMI.

4
Data Model
  • Experiment - The Experiment is the collection of
    all the BioAssays that are related by the
    ExperimentDesign.
  • ArrayDesign - Describes the design of an gene
    expression layout. In some cases this might be
    virtual and, for instance, represent the output
    from analysis software at the composite level
    without reporters or features.
  • BioDataCube The experiment data itself.
  • The rest of the is ostensibly required to
    duplicate experiment results. This has been an
    ongoing challenge for microarray experimenters,
    which the MAGE-OM is an effort to address.

5
Project Architecture
6
Process of getting to Silver level compliance
  • Programming and Messaging Interface
  • RMI based MAGE-OM API and EJB APIs provide
    programmatic access to data.
  • Vocabularies/Terminologies Ontologies
  • Utilizes MGED Ontologies and corresponding
    vocabularies.
  • Data Elements
  • CDEs defined and entered in caDSR. caCoreToolkit
    used to define and semantically map CDEs in
    caDSR.
  • Information Models
  • UML class diagram defined and available for
    interface.

7
Functionality Exposed to the Grid
  • caArray is a data service.
  • All objects within the MAGE-OM, which are exposed
    via an API are accessible. Important examples
    BioDataCube, Experiment, ArrayDesign.
  • Login and password are required, however public
    data from caArray nodes can always be accessed.

8
Process of Grid Enablement
  • Standing up caArray locally was no easy task.
    Tremendous help was provided by NCICB to work out
    bugs in the 1.2 release packages, and by Scott Li
    and John Osborne in standing up the MAGE-OM
    server.
  • Standing up local caGrid node easier. Scripts
    provided by NCICB installed GLOBUS, OGSADAI,
    Tomcat, the caGrid infrastructure, and the
    caArray node.
  • The MAGE-OM server is the interface between the
    grid and caArray.
  • Much work in enabling caArray to function on the
    grid had already been accomplished.

9
Demo and/or Screenshots
  • Lets go to the videotape!
  • Query to caArray public data from Georgetown and
    NCICB instances. Private data from Georgetown.
    Use Experiment objects.
  • Query to caArray get gene accession numbers for
    a set of genes in the ArrayDesign for the
    Affymetrix 133 2.0 microarray. Run query with
    accession numbers against caBIO.
  • Query to caArray Genepix ArrayDesign for IMAGE
    Clone annotations. Use returned results to query
    caBIO with appropriate Clone characteristics.
  • Query to caArray data at Georgetown and NCICB
    returning an aggregate result set.

10
Lessons Learned / Technical Difficulties / Wish
List
  • caArray QC. Version 1.2 had a lot of problems.
    It was not possible to install without additional
    resources, provided in our case by NCICB and
    fellow adopters. It took a month to correct the
    files on the public download site. Development
    cycle may be too short.
  • caGrid software made the installation process
    very easy, but not necessarily transparent. More
    documentation and copious feedback during the
    install process would be useful.
  • Firewall issues. Several ports need to be
    correctly configured on both the MAGE-OM server
    and on the local grid node. A standard way of
    documenting such requirements for caBIG projects
    would be nice.
  • Some way to stress test the system and to collect
    performance metrics would be useful.

11
Acknowledgements
  • Arnie Miles Georgetown caBIG lead
  • Nick Marcou coordinating with caGrid team to
    install node.
  • Jack Yuelin Zhu entering test data and testing
    caArray from a users perspective
  • Colin Freas overall coordination from
    Georgetown, setting up local caArray instance
  • Sumeet Muju Juergen Lorenz indispensable
    support in standing up our local instance of
    caArray
  • Andrew Shinohara provided great help in
    elucidating test scenarios and writing the XML
    for the grid queries
  • Ruowei Wu Manav Kher providing some excellent
    scripts for installation of caGrid node
  • William Sanchez caGrid team liaison,
    coordinating efforts from the NCICB and keeping
    the efforts moving forward with a sense of
    urgency
  • Scott Li and John Osborne Scott wrote a
    critical script which correctly ran the MAGE-OM
    server which caGrid communicates with, and help
    in running unit tests against it. John provided
    several example configuration files, and some
    other critical unit tests.
Write a Comment
User Comments (0)
About PowerShow.com