Title: Analytical Services Best Practices
1Analytical Services Best Practices
Baris Suzek, Georgetown University Shannon
Hastings, Ohio State University January 28-30,
2008
2Agenda
- Analytical Services Best Practices (ASBP) WG
Charter Objectives - Issues Solutions
- Model Reuse
- XSD Reuse and/or Generation
- Process used for Service Development
- Recommended Process for Future Development by
caGrid team - Outstanding Issues
- Generic Parameters
- Next Steps
3Team Members
- Ted Liefeld (Lead)
- Kiran Keshav
- Patrick McConnell
- Martin Morgan
- Salvatore Mungal
- Rakesh Nagarajan
- Baris Suzek (VCDE Liaison)
- Juli Klemm
- Elaine Freund
- Denise Warzel
- Claire Wolfe
- Brian Davis
- Shannon Hastings
4Analytical Services Essential components of
caBIG engine
caBioconductor normalizing MAGE-OM formatted
microarray data
5ASBP Charter Objectives
- The Analytical Services Best Practices (ASBP) is
chartered to develop, establish and communicate
best practices for the creation of caGrid
analytical services. - Objectives
- Work with Arch. and VCDE workspaces to streamline
the mentoring and compatibility review processes
and associated tools to ensure semantic and
syntactic compatibility between services - Document and codify best practices learned over
the previous three years of caBIG development - Create reference services, tools and processes
- Work closely with the ICR Workflow WG to ensure
that analytical services can be easily shared and
reused
6Analytical Services Reuse
caBioconductor normalizing MAGE-OM formatted
microarray data
7Issue 1 Model Reuse
- Background
- Importing Annotated XMIs to EA not possible (SIW
3.1) - Annotated XMIs for reused UML models not
available - SIW Roundtrip not available or no support for
CDEs of inherited attributes - Significant time/effort spent to remodel,
reannotate and reregister reused UML models - Import unannotated reused UML model to EA
- Model generic parameter classes
- Annotate full model (including reused classes)
- Register UML model to caDSR
8Solution to Issue 1 Model Reuse
- A new tool Service Loader
- Uses Service Metadata generated by Introduce
Toolkit to register use of CDEs to caDSR - Revised process
- Model/annotate/register unique to the service
- Use Introduce Toolkit to create the service
metadata - Service Loader to load model to caDSR
9Issue 2 XSD Reuse and/or Generation
- Background
- Only EA and caCORE SDK provide XSD generators
- Analytical Services typically do not use caCORE
SDK - Reused XSDs not available in GME
- Developer provided XSD and UML Model in caDSR
not consistent - Analytical services cannot use MAGE-OM from
caArray - Process to create XSD error-prone
- Import portion/full model into EA
- Generate schema from EA
- Fix/handcraft schema using XML visualization
tool - Import schema to Introduce Toolkit
10Solution to Issue 2 XSD Reuse and/or Generation
- Several options in Introduce Toolkit to generate
a valid XSD - From caDSR
- From GME
- Using XMI with caCORE SDK XSD generator
- caDSR/GME Mapping (F2F Day 3 Presentation by
Scott Oster and Denise Warzel)
11Process used for Service Development
- Reference Analytical Service Developers used a
process to develop their services within the
limitations at the time - Know-how almost no previous developers
- Tooling not many around or aligned
- Documentation not much
- Not necessarily ideal process, yet GenePattern,
Bioconductor and GeWorkbench teams delivered
successful products
12Process used for Service Development
- Re-annotating reused classes
- Annotation of parameter classes
- Model reused classes and parameters
- Reloading re-used classes
- Loading parameter classes
13Process used for Service Development
- Error-prone XSD generation using EA
XSD File
Create Skeleton / Implement Methods
Add Service Metadata and Domain Model
Import dataypes
Create operations
caDSR
GME
- Reused XSDs not available in GME
- Developer provided XSD inconsistent with caDSR
14Analytical Services Best Practices
- caGrid Team Suggested Development Process
Shannon Hastings, Ohio State University January
28-30, 2008
15Getting Connected The development process.
- Describe the caGrid Best Practice to Service
Generation - Demonstrate this process with an animation
16Two core approaches
- Top Down
- Upfront modeling and semantic harmonization
- Grid Service development last
- Pros
- Will be clear that you have the modeling and
harmonization process down and that any modeling
needed is done and approved before generating
service interfaces on it. - Cons
- Takes upfront time before you get to implement
the service. - Bottom Up
- Schema modeling first
- Service Generation
- Model creation/mapping and harmonization last
- Pros
- You can prototype the service before determining
exactly what models you want to use or will need. - Cons
- You could get far down the development process
and realize that schemas already exist, or that
the model you are using is just like an existing
model however with a slight but insignificant
difference.
17Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
18Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
If the models we need already exist in the caDSR
or as a communities standards group then we can
skip this step.
19Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
If the models have already been semantically
annotated with EVS concepts then we can skip
this step too.
20Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
If the models are already in caDSR than we can
skip this step.
21Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
We can soon use the caCORE tools to automatically
generate and publish our schemas to the GME.
22Getting Connected The top down development
process.
Create Semantically Harmonized Data Model
Generate Analytical Grid Service
GME
Now we have the prerequisites for generating our
grid services.
23Demonstration
Grid
algorithm A
24Exposing the Algorithm
- We will use Introduce to describe the API of our
algorithm and expose it to the grid.
Grid
algorithm A
25Exposing the Algorithm
- Introduce will enable the user to browse data
models in the caDSR and chose the ones which they
are going to be using in the API.
GME
Grid
algorithm A
26Exposing the Algorithm
- Then they will locate the schemas which describe
the data models and will provide the wire
protocol for transferring data instances.
GME
Grid
algorithm A
27Exposing the Algorithm
- Introduce will create a grid service which can
expose the data resource we described to the grid
Grid
caGrid AnalyticalService
algorithm A
28Exposing the Algorithm
- Developer will provide glue code which will map
the calls from the grid client to calls on their
algorithm
Grid
caGrid AnalyticalService
Algorithm invocation code
algorithm A
29Algorithm now available as Grid Service.
- Now that our service is generated we can deploy
it so that the resource can be used.
Grid
GridService
caGrid AnalyticalService
Algorithm invocation code
algorithm A
30How will users find me
- We need to expose metadata to a registry so that
a user/service can locate and use our service
Grid
GridService
caGrid AnalyticalService
Algorithm invocation code
algorithm A
31How will users find me
- We will send our metadata to the caGrid Index
Service so that service can be discovered and
used by grid users.
Grid
GridService
caGrid AnalyticalService
Algorithm invocation code
algorithm A
Index Service
32Tips / Suggestions
- Work iteratively use the Top Down or Bottom Up
method, in iterations until you refine on what
you really want. - Be prepared to refactor dont go too far down
one path until you have vetted all the issues and
design for change. - Browse the caDSR and EVS to determine just how
well the domain you are working with is
represented. This will give you an idea of the
likelihood that what you are doing has existing
models or CDEs.
33Analytical Services Best Practices
Baris Suzek, Georgetown University Shannon
Hastings, Ohio State University January 28-30,
2008
34Analytical Services Generic Parameters
caBioconductor normalizing MAGE-OM formatted
microarray data
35Motivation
- Analytical services unlike Data Services have
short lifespans - Overhead of parameter class registration
constitutes a significant portion of development
effort - Preexisting analytical services need to remodel
the service parameters in caBIG way - Parameters change often, each software version
may have different parameters
36Proposal - Generic Parameter Model
Use a generic parameter model to pass parameters
to the services
Simple reusable metadata model facilitates
auto-generation of Parameter metadata service
implementation
37Proposal - Generic Parameters Metadata Model
- Extend caGrid Service Metadata with Parameter
metadata Model (as discussed with caGrid Team) - All metadata is handled at caGrid level
- Draft model
38Pros and Cons
- Pros
- SAVE TIME (1 developer FTE week per service)
- More analytic services on caGrid/available to
caBIG - Actual parameters and descriptions of parameters
are still available at Grid level - No caDSR/GME registration dependency (if all
classes are reused) - Cons
- No parameter reuse
- No concept based-discovery of services unless
Semantic Metadata properly filled at service
level - No semantic interoperability based on parameters
(not likely to happen anyway) - No CDEs for parameters
- A different place to look for parameter metadata
(not caDSR) - Proposed model is not appropriate for non-caGrid
services
39Next Steps
- Develop a demo service using
- Process recommended by caGrid team
- Introduce Toolkit
- Service loader
- Generic parameter model
- After development, assess
- Impact of new process to overall service
development registration time - Usability of the Service Loader process
- Impact of generic parameter model to semantics
- Write a best practices white paper