Analytical Services Best Practices - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Analytical Services Best Practices

Description:

Analytical Services Best Practices. Baris Suzek, Georgetown University ... Fix/handcraft schema using XML visualization tool. Import schema to Introduce Toolkit ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 40
Provided by: bmi9
Learn more at: https://medicine.osu.edu
Category:

less

Transcript and Presenter's Notes

Title: Analytical Services Best Practices


1
Analytical Services Best Practices
Baris Suzek, Georgetown University Shannon
Hastings, Ohio State University January 28-30,
2008
2
Agenda
  • Analytical Services Best Practices (ASBP) WG
    Charter Objectives
  • Issues Solutions
  • Model Reuse
  • XSD Reuse and/or Generation
  • Process used for Service Development
  • Recommended Process for Future Development by
    caGrid team
  • Outstanding Issues
  • Generic Parameters
  • Next Steps

3
Team Members
  • Ted Liefeld (Lead)
  • Kiran Keshav
  • Patrick McConnell
  • Martin Morgan
  • Salvatore Mungal
  • Rakesh Nagarajan
  • Baris Suzek (VCDE Liaison)
  • Juli Klemm
  • Elaine Freund
  • Denise Warzel
  • Claire Wolfe
  • Brian Davis
  • Shannon Hastings

4
Analytical Services Essential components of
caBIG engine
caBioconductor normalizing MAGE-OM formatted
microarray data
5
ASBP Charter Objectives
  • The Analytical Services Best Practices (ASBP) is
    chartered to develop, establish and communicate
    best practices for the creation of caGrid
    analytical services.
  • Objectives
  • Work with Arch. and VCDE workspaces to streamline
    the mentoring and compatibility review processes
    and associated tools to ensure semantic and
    syntactic compatibility between services
  • Document and codify best practices learned over
    the previous three years of caBIG development
  • Create reference services, tools and processes
  • Work closely with the ICR Workflow WG to ensure
    that analytical services can be easily shared and
    reused

6
Analytical Services Reuse
caBioconductor normalizing MAGE-OM formatted
microarray data
7
Issue 1 Model Reuse
  • Background
  • Importing Annotated XMIs to EA not possible (SIW
    3.1)
  • Annotated XMIs for reused UML models not
    available
  • SIW Roundtrip not available or no support for
    CDEs of inherited attributes
  • Significant time/effort spent to remodel,
    reannotate and reregister reused UML models
  • Import unannotated reused UML model to EA
  • Model generic parameter classes
  • Annotate full model (including reused classes)
  • Register UML model to caDSR

8
Solution to Issue 1 Model Reuse
  • A new tool Service Loader
  • Uses Service Metadata generated by Introduce
    Toolkit to register use of CDEs to caDSR
  • Revised process
  • Model/annotate/register unique to the service
  • Use Introduce Toolkit to create the service
    metadata
  • Service Loader to load model to caDSR

9
Issue 2 XSD Reuse and/or Generation
  • Background
  • Only EA and caCORE SDK provide XSD generators
  • Analytical Services typically do not use caCORE
    SDK
  • Reused XSDs not available in GME
  • Developer provided XSD and UML Model in caDSR
    not consistent
  • Analytical services cannot use MAGE-OM from
    caArray
  • Process to create XSD error-prone
  • Import portion/full model into EA
  • Generate schema from EA
  • Fix/handcraft schema using XML visualization
    tool
  • Import schema to Introduce Toolkit

10
Solution to Issue 2 XSD Reuse and/or Generation
  • Several options in Introduce Toolkit to generate
    a valid XSD
  • From caDSR
  • From GME
  • Using XMI with caCORE SDK XSD generator
  • caDSR/GME Mapping (F2F Day 3 Presentation by
    Scott Oster and Denise Warzel)

11
Process used for Service Development
  • Reference Analytical Service Developers used a
    process to develop their services within the
    limitations at the time
  • Know-how almost no previous developers
  • Tooling not many around or aligned
  • Documentation not much
  • Not necessarily ideal process, yet GenePattern,
    Bioconductor and GeWorkbench teams delivered
    successful products

12
Process used for Service Development
  • Re-annotating reused classes
  • Annotation of parameter classes
  • Model reused classes and parameters
  • Reloading re-used classes
  • Loading parameter classes

13
Process used for Service Development
  • Error-prone XSD generation using EA

XSD File
Create Skeleton / Implement Methods
Add Service Metadata and Domain Model
Import dataypes
Create operations
caDSR
GME
  • Reused XSDs not available in GME
  • Developer provided XSD inconsistent with caDSR

14
Analytical Services Best Practices
  • caGrid Team Suggested Development Process

Shannon Hastings, Ohio State University January
28-30, 2008
15
Getting Connected The development process.
  • Describe the caGrid Best Practice to Service
    Generation
  • Demonstrate this process with an animation

16
Two core approaches
  • Top Down
  • Upfront modeling and semantic harmonization
  • Grid Service development last
  • Pros
  • Will be clear that you have the modeling and
    harmonization process down and that any modeling
    needed is done and approved before generating
    service interfaces on it.
  • Cons
  • Takes upfront time before you get to implement
    the service.
  • Bottom Up
  • Schema modeling first
  • Service Generation
  • Model creation/mapping and harmonization last
  • Pros
  • You can prototype the service before determining
    exactly what models you want to use or will need.
  • Cons
  • You could get far down the development process
    and realize that schemas already exist, or that
    the model you are using is just like an existing
    model however with a slight but insignificant
    difference.

17
Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
18
Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
If the models we need already exist in the caDSR
or as a communities standards group then we can
skip this step.
19
Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
If the models have already been semantically
annotated with EVS concepts then we can skip
this step too.
20
Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
If the models are already in caDSR than we can
skip this step.
21
Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
We can soon use the caCORE tools to automatically
generate and publish our schemas to the GME.
22
Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
Now we have the prerequisites for generating our
grid services.
23
Demonstration
Grid
algorithm A
24
Exposing the Algorithm
  • We will use Introduce to describe the API of our
    algorithm and expose it to the grid.

Grid
algorithm A
25
Exposing the Algorithm
  • Introduce will enable the user to browse data
    models in the caDSR and chose the ones which they
    are going to be using in the API.

GME
Grid
algorithm A
26
Exposing the Algorithm
  • Then they will locate the schemas which describe
    the data models and will provide the wire
    protocol for transferring data instances.

GME
Grid
algorithm A
27
Exposing the Algorithm
  • Introduce will create a grid service which can
    expose the data resource we described to the grid

Grid
caGrid AnalyticalService
algorithm A
28
Exposing the Algorithm
  • Developer will provide glue code which will map
    the calls from the grid client to calls on their
    algorithm

Grid
caGrid AnalyticalService
Algorithm invocation code
algorithm A
29
Algorithm now available as Grid Service.
  • Now that our service is generated we can deploy
    it so that the resource can be used.

Grid
GridService
caGrid AnalyticalService
Algorithm invocation code
algorithm A
30
How will users find me
  • We need to expose metadata to a registry so that
    a user/service can locate and use our service

Grid
GridService
caGrid AnalyticalService
Algorithm invocation code
algorithm A
31
How will users find me
  • We will send our metadata to the caGrid Index
    Service so that service can be discovered and
    used by grid users.

Grid
GridService
caGrid AnalyticalService
Algorithm invocation code
algorithm A
Index Service
32
Tips / Suggestions
  • Work iteratively use the Top Down or Bottom Up
    method, in iterations until you refine on what
    you really want.
  • Be prepared to refactor dont go too far down
    one path until you have vetted all the issues and
    design for change.
  • Browse the caDSR and EVS to determine just how
    well the domain you are working with is
    represented. This will give you an idea of the
    likelihood that what you are doing has existing
    models or CDEs.

33
Analytical Services Best Practices
  • Generic Parameters

Baris Suzek, Georgetown University Shannon
Hastings, Ohio State University January 28-30,
2008
34
Analytical Services Generic Parameters
caBioconductor normalizing MAGE-OM formatted
microarray data
35
Motivation
  • Analytical services unlike Data Services have
    short lifespans
  • Overhead of parameter class registration
    constitutes a significant portion of development
    effort
  • Preexisting analytical services need to remodel
    the service parameters in caBIG way
  • Parameters change often, each software version
    may have different parameters

36
Proposal - Generic Parameter Model
Use a generic parameter model to pass parameters
to the services
Simple reusable metadata model facilitates
auto-generation of Parameter metadata service
implementation
37
Proposal - Generic Parameters Metadata Model
  • Extend caGrid Service Metadata with Parameter
    metadata Model (as discussed with caGrid Team)
  • All metadata is handled at caGrid level
  • Draft model

38
Pros and Cons
  • Pros
  • SAVE TIME (1 developer FTE week per service)
  • More analytic services on caGrid/available to
    caBIG
  • Actual parameters and descriptions of parameters
    are still available at Grid level
  • No caDSR/GME registration dependency (if all
    classes are reused)
  • Cons
  • No parameter reuse
  • No concept based-discovery of services unless
    Semantic Metadata properly filled at service
    level
  • No semantic interoperability based on parameters
    (not likely to happen anyway)
  • No CDEs for parameters
  • A different place to look for parameter metadata
    (not caDSR)
  • Proposed model is not appropriate for non-caGrid
    services

39
Next Steps
  • Develop a demo service using
  • Process recommended by caGrid team
  • Introduce Toolkit
  • Service loader
  • Generic parameter model
  • After development, assess
  • Impact of new process to overall service
    development registration time
  • Usability of the Service Loader process
  • Impact of generic parameter model to semantics
  • Write a best practices white paper
Write a Comment
User Comments (0)
About PowerShow.com