caBIG Overview - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

caBIG Overview

Description:

caBIG Overview – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 38
Provided by: william228
Category:
Tags: cabig | mre | overview

less

Transcript and Presenter's Notes

Title: caBIG Overview


1
Data Grid Services Design caGrid 0.5
Manav Kher. Ruowei Wu. Jijin Yan Ram Chilukuri
August, 2005
2
Outline
  • Data service architecture.
  • Silver compliant data services.
  • Silver data services and the grid.
  • caBIG Data resources.
  • caBIG XML query activity.
  • Query language and perform document.
  • Data service configuration files.
  • Deployment diagram.
  • Data services and Grid security
  • Future actions.

3
Grid Data Service Architecture
Diagram from OGSA-DAI
4
Grid Data Service Architecture (Cont.)
Data Layer The data layer consists of caBIG
silver data resources (Server side). Business
logic layer This layer encapsulates the core
functionality. This includes Execution of
Perform documents which specify sequences of data
resource queries and updates and data
transformation and delivery operations.
Preparation of responses to client requests for
data resource query, update, transformation and
delivery activities. Responses include execution
status information and can also include data.
Responses are in the form of Response documents.
Data transformation and delivery management.
caBIG data resource and SDK activity. Presentatio
n layer Business logic layer interface This
interface communicates information between the
presentation and business logic layers. This
interface supports invocation of OGSA-DAI
functionality within the business logic layer in
a way that is independent of any Web or Grid
environment i.e. a way that is also suitable to
allow non-Web-enabled clients to access OGSA-DAI
functionality directly. Presentation layer This
layer encapsulates the functionality relating to
exposing OGSA-DAI to a Grid via Web- or
Grid-enabled interfaces. For each realization
there is associated WSDL and XML Schema
describing the Web- or Grid-enabled interfaces.
The following presentation layer interfaces are
supported OGSA-DAI OGSI-compliant services based
on the Globus Toolkit 3.2, OGSA-DAI
WS-RF-compliant services based on the Globus
Toolkit 4.0 and OGSA-DAI WS-I-compliant services
based on Apache Axis 1.2 Apache. Client
OGSA-DAI provides a Client Toolkit which provides
a higher-level of interaction with OGSA-DAI
services than that supported by exchanging
Perform and Response documents.
5
caGrid Layers
Data Layer - caBIG Object Resource
Metadata and Semantic connector Layer caDSR and
EVS
Grid Layer GT3 and OGSA-DAI
6
Silver compliant data services
Standard API
Client Generated
caDSR
CDE
Object Model Generated Server
EVS
Standard Vocabulary
Data
7
Silver data resource in the caGrid infrastructure
Standard data grid interface
EVS
caBIG Gird Infrastructure
Query
Silver complaint data services
8
caBIG Data Resource - caGrid OGSA-DAI extensions.
DataResourceMediator (from ogsa-dai)
  • Reference Implementations extends the OGSA-DAI
    DataResourceMediator abstract class

SDKDataResourceMediator
CaArrayDataResourceMediator
9
Interacting with data resources
OGSA-DAI and caGrid extensions supports
interaction with data resources, and other data
manipulation operations, via a document-oriented
interface Activities - are the data resource
manipulation, data transformation and delivery
actions that a client wants an OGSA-DAI service
to perform. Activities are the basic building
block of Perform documents. Perform Documents -
are used by clients to specify to OGSA-DAI
services the data resource query and update, data
transformation and data delivery activities they
want executed. Response Documents - are used by
OGSA-DAI services to inform clients as to the
status of execution of their Perform documents
and, often, to also return data to a client.
10
Activities
Activities include data resource manipulation,
data transformation and delivery actions that a
client wants an OGSA-DAI service to perform. Some
activities are data resource-specific (e.g.
relational or XML query activities), others (e.g.
delivery and data transformation) are
generic. Activities are the basic building block
of Perform documents. Activities are designed to
inter-operate. For example, the output of an
caGridQuery can be directed to a deliverToURL
activity thereby allowing data to be delivered to
third parties. To support inter-operation an
activity can have zero or more inputs and zero or
more outputs. These outputs can be given specific
names and are termed stream
11
caBIG Activity - caGrid OGSA-DAI extensions.
Activity (from ogsa-dai)
  • Extends from OGSA-DAI activity.
  • Query language implementation which represents
    the data source API.
  • When code is generate with SDK, no code required
    to expose data service in the grid.
  • One query language regardless of the data source.

CaBIGXMLQueryActivity
SDKXMLQueryActivity
CaArrayXMLQueryActivity
12
Perform Document
Perform documents are used by clients to specify
to OGSA-DAI services the data resource query and
update, data transformation and data delivery
activities they want executed. A Perform document
specifies an inter-connected collection of one or
more activities. Activities are connected by
ensuring that the output stream of one activity
is named as the input stream of another activity.
Any activity whose output stream(s) are not
referenced by another activity's input stream(s)
will have their output inserted into a Response
document
ltgridDataServicePerform xsischemaLocation"http/
/ogsadai.org.uk/namespaces/2003/07/gds/typesgt
ltdocumentationgtThis example demonstrates how to
parameterise an caBIOlt/documentationgt
ltcaBIGXMLQuery name"MyQueryTest1"gt
ltTarget name"gov.nih.nci.cabio.domain.Taxon"
path"gov.nih.nci.cabio.domain.Taxon"gt
ltObjects name"gov.nih.nci.cabio.domain.impl.Gene
"gt ltProperty name"id"
predicate"equal" value"2"/gt
lt/Objectsgt lt/Targetgt ltwebRowSetStream
name"myQueryOutput"/gt lt/caBIGXMLQuerygt
ltdeliverToGDT name"deliverQueryResults"gt
ltfromLocal from"myQueryOutput"/gt
lttoGDT streamId"otherServiceInput
mode"full"gthttp//localhost8080/www.Georgetown.e
dult/toGDTgt lt/deliverToGDTgt lt/gridDataServicePe
rformgt
13
Sample XML Query Language
Description Run a caGrid query on a caBIG data
resource API.
ltcaBIGXMLQuery name"MyQueryTest7"gt ltTarget
name"gov.nih.nci.cabio.domain.Agent"gt
ltObjects name"gov.nih.nci.cabio.domain.impl.Targ
et"gt ltGroup LogicRelation"OR"gt
ltObjects
name"gov.nih.nci.cabio.domain.impl.Gene"gt
ltProperty name"id"
predicate"equal" value"2"/gt
lt/Objectsgt ltObjects
name"gov.nih.nci.cabio.domain.impl.Gene"gt
ltProperty name"symbol"
predicate"like" value"Nat"/gt
lt/Objectsgt lt/Groupgt
lt/Objectsgt lt/Targetgt lt/caBIGQuerygt
14
Query language Specification
Element caBIGXMLQuery - This represent the
activity name. Attribute name - This gives a
name for a query. Currently there is no process
for the attribute. Therefore, the name can be
arbitrary. Element Target - The target object
of a searching query. Attribute name - Name of
the target object. It should be of full package
path. Attribute path - This gives an association
between the target object and the criteria
object. Different path may result in different
search results. Element Object - This is a
search criteria object. In NCICB object oriented
data model, searching criteria is object(s).
Attribute name - This is the name of the search
criteria object. It also should be of full
package path. Element Group - Under this
element, a list or collection of different
searching objects are composed. Attribute
LogicRelation - Currently, it only contains "OR"
and "AND". "OR" represents list and "AND"
represents collection.
15
Response document Specification
  • Element request (one or more) - the status of the
    request (the status of execution of the Perform
    document)
  • Attribute status (required) - the status of the
    request. This takes one of the following values
  • PROCESSING - the request is still running.
  • COMPLETED - the request has sucessfully
    completed.
  • TERMINATED - the request has been terminated.
  • ERROR - the request encountered a problem.
  • Attribute cause (zero or one) - if the status is
    an ERROR then this attribute will hold the name
    of the activity that caused the error.
  • Element result (one or more) - the status of an
    activity plus, depending upon the activity, any
    results or other information.
  • Attribute name (required) - the name of the
    activity. This corresponds to the value of the
    name attribute of an activity specified within
    the Perform document.
  • Attribute status (required) - the status of the
    activity. This takes one of the following values
  • UNSTARTED - the activity has not yet been
    started.
  • PROCESSING - the activity is still running.
  • COMPLETED - the activity has sucessfully
    completed.
  • ERROR - the activity encountered a problem.
  • Zero or more XML elements containing the results
    of an activity, if applicable., if applicable.

16
Data resource configuration
  • Tomcat / Axis
  • Server-config.wsdd
  • OGSA-DAI
  • DataResourceConfigCABIG.xml
  • ActivityConfigCABIG.xml
  • Index service
  • caGrid-SDE-registration.xml
  • caGrid-SDE-config.xml
  • Metadata
  • caGrid-common-metadata.xml
  • cadsr-metadata-extract.xml

17
Server-config.wsdd
ltservice name"cagrid/caBIO" provider"Handler"
style"wrapped" use"literal"gt ltparameter
name"instance-dai.version" value"OGSI 5.0"/gt
ltparameter name"instance-schemaPath"
value"schema/ogsadai/gds/gds_service.wsdl"/gt
ltparameter name"ogsadai.gdsf.config.xml.file"
value"/caGrid05/jakarta-tomcat-5.0.30/webapps/o
gsa/WEB-INF/etc/_cagrid_caBIO/dataResourceConfigCA
BIG.xml"/gt ltparameter name"className"
value"gov.nih.nci.cagrid.data.stubs.CaGridDataSer
viceFactoryPortType"/gt ltparameter
name"ogsadai.gdsf.activity.xml.file
value"/caGrid05/jakarta-tomcat-5.0.30/webapps/ogs
a/WEB-INF/etc/_cagrid_caBIO/activityConfigCABIG.xm
l"/gt ltparameter name"operationProviders"
value"org.globus.ogsa.impl.base.providers.service
data.ServiceDataProviderManager
org.globus.ogsa.impl.core.registry.RegistryPublish
Provider org.globus.ogsa.impl.ogs
i.NotificationSourceProvider
org.globus.ogsa.impl.ogsi.FactoryProvider"/gt
ltparameter name"dai.version" value"OGSI 5.0"/gt
ltparameter name"baseClassName"
value"uk.org.ogsadai.service.gdsf.impl.GridDataSe
rviceFactory"/gt ltparameter name"instance-baseCl
assName" value"uk.org.ogsadai.service.gds.impl.Gr
idDataService"/gt ltparameter name"serviceConfig"
value"etc/_cagrid_caBIO/caGrid-SDE-config.xml"/gt
ltparameter name"allowedMethods" value""/gt
ltparameter name"instance-operationProviders"
value"org.globus.ogsa.impl.ogsi.NotificationSourc
eProvider"/gt ltparameter name"registrationConfig
" value"etc/_cagrid_caBIO/caGrid-SDE-registration
.xml"/gt ltparameter name"schemaPath"
value"schema/cagrid/cagdsf/caGridDataServiceFacto
ryPortType_service.wsdl"/gt ltparameter
name"instance-name" value"Data Service"/gt
ltparameter name"persistent" value"true"/gt
ltparameter name"instance-className"
value"uk.org.ogsadai.service.gds.GDSPortType"/gt
ltparameter name"activateOnStartup"
value"true"/gt ltparameter name"handlerClass"
value"org.globus.ogsa.handlers.RPCURIProvider"/gt
ltparameter name"factoryCallback"
value"uk.org.ogsadai.service.gdsf.impl.GridDataSe
rviceFactoryCallback"/gt ltparameter name"name"
value"caGrid Data Service Factory"/gt lt/servicegt
18
ActivityConfigCABIG.xml
ltactivityConfiguration xmlns"http//ogsadai.o
rg.uk/namespaces/2004/05/gdsf/config"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xsischemaLocation"http//ogsadai.org.uk/
namespaces/2004/05/gdsf/config
http//localhost8080/ogsa/schema/cagrid/xsd/cabig
_activity_config.xsd"gt lt!-- Location of the
base perform document schema --gt
ltbasePerformDocumentSchema
location"http//localhost8080/ogsa/schema/cagrid
/types/grid_data_service_types.xsd"/gt
ltactivityMap schemaBase"http//localhost8080/ogs
a/schema/cagrid/xsd/activities/"gt lt!--
caGrid specific activities --gt
ltactivity name"caBIGXMLQuery"
implementation"gov.nih.nci.cagrid.activity.caBIOX
MLQueryActivity" schema"caBIG_xml_q
uery.xsd"gt ltdescriptiongt
caGrid XML query implementation.
lt/descriptiongt lt/activitygt lt!--
Delivery activities --gt lt/activityMapgt lt/activit
yConfigurationgt
19
caGrid-SDE-registration.xml
lt?xml version"1.0" encoding"UTF-8"
?gt ltserviceConfiguration xmlnsogsi"http//www.gr
idforum.org/namespaces/2003/03/OGSI"
xmlnsaggr"http//www.globus.org/namespaces/2003/
09/data_aggregator" xmlnsxsd"http//www.w3.
org/2001/XMLSchema"gt ltregistrationsgt
ltregistration registry"http//cagrid-registry.nci
.nih.gov8080/ogsa/services/base/index/IndexServic
e" keepalive"true" lifetime"1200"
remove"true"gt ltaggrDataAggregationgt
ltogsiparamsgt
ltaggrAggregationSubscriptiongt
ltogsiserviceDataNamesgt
ltogsiname xmlnsdata"http//cagrid.nci.nih.gov/1
/caDSRMetadata"gt
datacaDSRMetadata
lt/ogsinamegt ltogsiname
xmlnscom"http//cagrid.nci.nih.gov/1/CommonServi
ceMetadata"gt
comCommonServiceMetadata
lt/ogsinamegt lt/ogsiserviceDataN
amesgt ltaggrlifetimegt60000lt/aggr
lifetimegt lt/aggrAggregationSubscr
iptiongt lt/ogsiparamsgt
lt/aggrDataAggregationgt lt/registrationgt
lt/registrationsgt lt/serviceConfigurationgt
20
caGrid-SDE-config.xml
lt?xml version"1.0" encoding"UTF-8"
?gt ltserviceConfiguration xmlnsogsi"http//www.gr
idforum.org/namespaces/2003/03/OGSI"
xmlnsaggregator"http//www.globus.org/namespaces
/2003/09/data_aggregator"
xmlnsprovider-exec"http//www.globus.org/namespa
ces/2003/04/service_data_provider_execution"
xmlnsxsd"http//www.w3.org/2001/XMLSchema"gt
ltinstalledProvidersgt ltproviderEntry
class"org.globus.ogsa.impl.base.providers.service
data.impl.AsyncDocumentProvider" /gt
lt/installedProvidersgt ltexecutedProvidersgt
ltprovider-execServiceDataProviderExecutiongt
ltprovider-execserviceDataProviderNamegtAsyncDocum
entlt/provider-execserviceDataProviderNamegt
ltprovider-execserviceDataProviderImplgt
org.globus.ogsa.impl.base.providers.servicedata.im
pl.AsyncDocumentProvider lt/provider-execser
viceDataProviderImplgt ltprovider-execservice
DataProviderArgsgt -i 60000 -f
etc/_cagrid_caBIO/caGrid-common-metadata.xml
lt/provider-execserviceDataProviderArgsgt
ltprovider-execrefreshFrequencygt360lt/provider-exec
refreshFrequencygt ltprovider-execasyncgttrue
lt/provider-execasyncgt lt/provider-execService
DataProviderExecutiongt ltprovider-execServiceD
ataProviderExecutiongt ltprovider-execservice
DataProviderNamegtAsyncDocumentlt/provider-execserv
iceDataProviderNamegt ltprovider-execserviceD
ataProviderImplgt org.globus.ogsa.impl.bas
e.providers.servicedata.impl.AsyncDocumentProvider
lt/provider-execserviceDataProviderImplgt
ltprovider-execserviceDataProviderArgsgt
-i 60000 -f etc/_cagrid_caBIO/cadsr-metadata-ex
tract.xml lt/provider-execserviceDataProvid
erArgsgt ltprovider-execrefreshFrequencygt360lt
/provider-execrefreshFrequencygt
ltprovider-execasyncgttruelt/provider-execasyncgt
lt/provider-execServiceDataProviderExecutiongtlt/e
xecutedProvidersgt lt/serviceConfigurationgt
21
caGrid-common-metadata.xml
ltCommonServiceMetadata xmlns"http//cagrid.nci.ni
h.gov/1/CommonServiceMetadata"gt
ltresearchCenterInfogt ltresearchCenterBioDat
aTypegtBiologylt/researchCenterBioDataTypegt
ltresearchCenterNamegtReserach Centerlt/researchCente
rNamegt ltresearchCenterTypegtXYZlt/researchCe
nterTypegt ltresearchCenterAddressgt6116
Exceutive Dr.lt/researchCenterAddressgt
ltresearchCenterPhonegt301-451-1234lt/researchCenterP
honegt ltresearchCenterFaxgt301-451-1234lt/res
earchCenterFaxgt ltresearchCenterPOCNamegtJon
h Brownlt/researchCenterPOCNamegt
ltresearchCenterDescriptiongtGoodlt/researchCenterDes
criptiongt ltresearchCenterCommentsgtTesting
caGridlt/researchCenterCommentsgt
lt/researchCenterInfogt lt/CommonServiceMetadatagt
22
Semantic Metadata UML to caDSR Mapping
  • UML Class is mapped to an Object Class
  • Attribute of a UML Class is mapped to a Property
  • Combination of UML Class and Attribute is mapped
    to Data Element Concept
  • Combination of UML Class, Attribute and its
    Datatype is mapped to Data Element (CDE)
  • UML Class and its attribute are based on EVS
    concepts
  • UML Model/Project is mapped to Classification
    Scheme
  • Packages are mapped to classification scheme
    items

23
Domain Model Metadata - caDSR
  • Unique Identifier public ID and version
  • Short Name Acronym of the project to which the
    domain model belongs to.
  • Long Name Full name of the project
  • Detailed description of the project

24
Domain Object Metadata - caDSR
  • Unique identifier consisting of public ID and
    version
  • Package qualified domain object name, long name
    and description based on EVS concepts
  • EVS concepts it is based on
  • Concept Code, Concept Preferred Name, Concept
    Definition

25
Domain Object Attribute Metadata - caDSR
  • Unique identifier consisting of public ID (CDE
    ID) and version
  • Attribute Name, Long Name and Description based
    on EVS concepts
  • EVS concept codes it is based on
  • Value Domain information
  • Datatype
  • Permissible Values
  • Concept codes
  • List contained within each Domain Object

26
Domain Object Association Metadata - caDSR
  • Describes a named relationship (source -gt target)
    between two Domain Objects
  • Uses references to domain object unique
    identifier(public ID and version) instead of
    value copy
  • List contained within each Domain Object

27
caDSR-metadata-extract.xml
lt?xml version"1.0" encoding"UTF-8"?gt ltcaDSRMetad
ata xmlns"http//cagrid.nci.nih.gov/1/caDSRMetada
ta" xmlnsxsi"http//www.w3.org/2
001/XMLSchema-instance"
xsischemaLocation"http//cagrid.nci.nih.gov/1/ca
DSRMetadata
http//localhost/ogsa/schema/cagrid/types/Common/
caDSRMetadata.xsd"gt ltdomain-model id"2262164"
version"3.0"gt ltshort-namegtcaCORElt/short-namegt lt
long-namegtcaCORElt/long-namegt ltdescriptiongtcaCORE
Descriptionlt/descriptiongt ltdomain-object
id"2223329" version"1.0"gt ltfull-namegt ltpack
age-namegtgov.nih.nci.cabio.domainlt/package-namegt
ltclass-namegtDiseaseOntologyRelationshiplt/class-n
amegt lt/full-namegt ltlong-namegtDiseaseOntologyRe
lationshiplt/long-namegt ltshort-namegtC45371lt/short
-namegt ltconcept-codes-listgt ltconcept-element
order"0"gt ltconcept-codegtC45371lt/concept-codegt
ltconcept-preferred-namegt
DiseaseOntologyRelationship lt/concept-preferre
d-namegt ltconcept-definitiongt The
disease relationship specifies the relationship
among diseases. lt/concept-definitiongt lt/con
cept-elementgt lt/concept-codes-listgt ltattribute
s-listgt ltattribute id"2223846"
version"3.0"gt ltnamegtidlt/namegt ltlong-namegt
Identifierlt/long-namegt ltshort-namegtC25364lt/sho
rt-namegt ltconcept-codes-listgt ltconcept-el
ement order"0"gt
28
Data service deployment Diagram
Research Center Grid Node
caGrid Service
Service metadata
Application server
Data Base
Data grid Service I.
Service Configuration files
. . .
caGrid Service
Service metadata
Application server
Data grid Service N.
Data Base
Service Configuration files
  • Tomcat.
  • Axis.
  • Globus.
  • OGSA-DAI
  • caGrid
  • Data service API.

29
Data service and security
GUMS (User Management)
CAMS (Attribute Management)
Index Service (Service Registry)
Data Service Client
30
Data service and security
GUMS (User Management)
CAMS (Attribute Management)
Index Service (Service Registry)
caArray Data Service (Secure)
Data Service Client
31
Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Data Service Client
32
Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Retrieve Proxy Certificate
Data Service Client
33
Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Discovery
Retrieve Proxy Certificate
Data Service Client
34
Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Retrieve Proxy Certificate
Query (Secure)
Data Service Client
35
Data service and security
GUMS (User Management)
CAMS (Attribute Management)
caArray username/password
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Retrieve Proxy Certificate
Query (Secure)
Data Service Client
36
Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Retrieve Proxy Certificate
Response
Data Service Client
37
Future actions
  • Federated query engine.
  • Implement process for schema management.
  • Test performance and large datasets
  • Extend query language to support writeable APIs.
Write a Comment
User Comments (0)
About PowerShow.com