Title: Gold Compatibility Criteria and Review Process
1Gold Compatibility Criteria and Review Process
- Robert Freimuth, Salvatore Mungal, Scott Oster,
Lynne Wilkens - Daniela Smith, Michael Keller
caBIG Annual Meeting Washington, D.C. June 25,
2008
2Agenda
- Introduction
- Overview of Gold Criteria
- Information Models
- Common Data Elements
- Vocabularies
- Program Interfaces
- Summary
- QA
3Working Groups
- Programming Messaging Interfaces
- Tahsin Kurc, OSU
- Patrick McConnell, Duke University
- Scott Oster, OSU
- Andy Pople, University of Pittsburgh
- Common Data Elements
- Dianne Reeves, NCI CBIIT
- Baris Suzek, Georgetown
- Lynne Wilkens, University of Hawaii
- Information Models
- Bob Freimuth, Mayo Clinic
- Lewis Frey, University of Utah
- Rakesh Nagarajan, Washington University
- Vocabularies
- Jim Buntrock, Mayo Clinic
- Sal Mungal, Duke University
- Craig Stancl. Mayo Clinic
- Stuart Turner, UC-Davis
- Larry Wright, NCI/OC
- Working Group Facilitators
- Brian Davis, 3rd Millennium
- Michael Keller, BAH
- Daniela Smith, BAH
- NCICB Facilitators
- George Komatsoulis
- Avinash Shanbhag
4IntroductionLevels of Maturity
- Legacy
- System does not meet any of the requirements for
interoperability - Implies no interoperability with an external
system or resource - Bronze
- Minimum requirements for a basic degree of
interoperability - Silver
- Rigorous set of requirements that significantly
reduce the barrier to use of a resource by a
remote party who was not involved in the
development of that resource - Gold
- Full semantic interoperability of disparate
systems - Formalized grid architecture and data standards
- Advertising, discovery, and use of all federated
caBIG resources
5Introduction
- The need to incorporate additional criteria into
the Gold maturity level of the caBIG
Compatibility Guidelines is being driven by - The release of the caGrid 1.0 Infrastructure
- The experience of the caGrid 1.0 reference
implementation projects - The experience of early adopters of caGrid 1.0
- The work done on UML model harmonization
- The work done on vocabulary standards
- The work done on CDE re-use
6IntroductionCompatibility Guidelines v3.0
- Released May 1, 2008
- Gold Compatibility requirements
- Information Models
- Data Elements
- Vocabularies/Terminologies Ontologies
- Programming and Messaging Interfaces
- Clarification/revision of Silver Compatibility
requirements - Available through the caBIG web site
https//cabig.nci.nih.gov
7Compatibility Guidelineshttps//cabig.nci.nih.gov
8IntroductionDevelopment of Gold Criteria
- Goal
- Development and release of the Gold compatibility
criteria and review process - Approach
- Kick off Working Group composed of XCWS
participants (Jan 2008) - Sub-groups focused on Interfaces, CDEs,
Vocabularies and Information Models - Each sub-group generated review criteria based on
Compatibility Guidelines v3.0 - Sub-Group Leads are identifying overlap and
harmonizing the checklists (June) - The Sub-Group Leads will also develop a review
process (June) - The review process and review criteria will be
piloted (July) - Lessons learned will be captured
- Review criteria and process will be updated as
needed
9Agenda
- Introduction
- Overview of Gold Criteria
- Information Models
- Common Data Elements
- Vocabularies
- Program Interfaces
- Summary
- QA
10Information ModelsFrom Silver to Gold
- Silver level
- Criteria for UML modeling
- Criteria for semantic annotation
- Gold level
- Criteria to meet infrastructure requirements
- Criteria to promote interoperability
11Information ModelsCriteria for Semantic
Annotation
- Criteria for semantic annotation is the same as
for Silver level - UML names should accurately convey their meaning
and be consistent with the definition. - This might be upgraded to "absolute" (TBD)
- UML definitions should accurately describe what
they represent and be sufficiently clear to
enable accurate semantic annotation. - The concepts assigned to each class and attribute
must be synonymous (consistent) with the
developer-derived UML definition.
12Information ModelsOverview of Criteria
- Criteria to meet infrastructure requirements
(caGrid) - The entire IM must be fully represented in the
XML schema - Names for classes and attributes (TBD)
- Criteria to promote interoperability
- Model harmonization and reuse
- Analytical services emphasis on reuse of whole
classes for input objects, output objects, and
parameters - Data services emphasis on reuse of components
from the backbone model
13Information ModelsOverview of Criteria
- Criteria for reuse
- Classes and attributes
- Focus on the backbone model
- Including associations, if applicable
- Datatypes
- Reused, or caBIG-approved
- Enumerated value domains
- Included in the model
- Modeled same way, if reused
- Reused components must be appropriate in the
context of the information model and accurately
capture the semantic meaning of the underlying
data.
14Information ModelsOverview of Criteria
- Preferred order of reuse
- CDEs from the Backbone Model
- Standard CDEs that are not in the Backbone Model
- CDEs from existing Gold level applications
- CDEs from existing Silver level applications
- CDEs registered in the caDSR
- For the Information Models criteria, "CDE"
refers to the corresponding class/attribute pair
in the UML model
15Information ModelsOverview of Criteria
- Full reuse of classes and attributes from the
backbone model is required - Justification and examples of interoperability
are required otherwise - If a new CDE is required, partial reuse should be
maximized - See order of preference
- Justification and examples of interoperability
are required in some cases - Developers will contribute to the evolution of
the backbone model - Expansion with new attributes
- Extension with new classes/attributes
- Requires data to be a specialization of an
existing class in the backbone model - Justification is required if similar, existing
classes in the backbone model cannot be expanded
or extended - Contributions will occur indirectly and through
a controlled process
16Information ModelsChanges to the Submission
Package
- CDE reuse report
- Breakdown by the source of each component
- UML model reuse report
- Identifies reuse of and deviations from the
backbone model - Includes a list of "whole class" reuse for
analytical services - Examples of interoperability (grid joins)
- Illustrates use cases for how the application
will interoperate with existing applications - Link to the XML schema
17Information ModelsIssues for Future Discussion
- Review the need for additional criteria
- Content of enumerated VDs (PVs)
- Non-enumerated VDs (number ranges, string
character limits, etc) - Currently there is no way to represent this in
the UML model - Semantics of associations
- List of approved datatypes
- Use of approved datatypes is required at Gold
level - Evolution of the backbone model
- When the backbone model is extended, do the new
attributes belong in the base class or in a child
class? - Process for versioning the backbone model
- Clarify the requirements for existing Gold
applications when the backbone model is revised
18Information ModelsResource Requirements
- Tooling needs
- Enumerated VDs
- Should be included in the model under review, and
should also map to an existing VD in the caDSR - This will require two models, one for review and
one for loading - Non-enumerated VDs (number ranges, string
character limits, etc) - Include this information as tagged values?
- Creation of reports for the submission package
- CDE reuse report, UML model reuse report,
vocabulary report, etc - Correspondence checks
- UML model XMI caDSR API API docs XML
schema - Process requirements
- May require more engagement by mentors to ensure
that the backbone model is considered for reuse
early in development
19Agenda
- Introduction
- Overview of Gold Criteria
- Information Models
- Common Data Elements
- Vocabularies
- Program Interfaces
- Summary
- QA
20Common Data Elements
- Basis of CDE criteria
- Elements must be well-formed and defined in order
for data to be joined and shared - Silver level compatibility criteria
- Ensured that ISO 11179 criteria are met
- Paring of data element concept (Object
property) and Value Domain - Registration in caDSR, an ISO 11179 CDE
repository - Required that all administered elements have good
semantics - Names are appropriate
- Definitions exist and are clear
- NCI Thesaurus codes in caDSR
- Consistency between UML model and caDSR
definitions - Gold level compatibility criteria
- Focus on re-use of CDE standards and existing
CDEs to facilitate interoperability
21Common Data Elements
- CDE Criteria in Gold Compatibility Matrix
- CDEs designated as caBIG Standards by the VCDE
workspace must be used as appropriate. - CDEs generated from the Backbone Model must be
re-used as appropriate. - Existing validated CDEs in the caDSR must be
re-used or otherwise justified before any new
data elements are created. - Data elements must be expressed in caGrid
standard metadata format - The data elements used by the service as part of
its operations must be fully described in the
caGrid metadata to facilitate effective
discovery, advertisement and interoperability.
22Common Data Elements
- caBIG Standards by the VCDE workspace
- About 20 CDE packages including 140 CDEs have
been approved by VCDE (https//gforge.nci.nih.gov/
frs/?group_id109) - Registration status on caDSR of Standard
- Examples sex/gender, age, address, performance
status - Re-use is absolute requirement
- Full re-use requires use of all administered
elements object/class, property/attribute, value
domain, permissible values - Violation requires strong justification
(regulatory requirements) - caBIG Backbone Model
- CDEs in Backbone Model will be registered in
caDSR - Will be promoted as standards
- Re-use is absolute requirement
- Validated CDEs
- Work flow status of Released in caDSR
- Provide list of those reviewed
- Provide explanation of lack of re-use
23Common Data Elements
- Other Re-use Issues
- For CDEs that would logically have a list of
permissible values (PVs), enumeration or
enumeration by reference is required - List of PVs with definitions must be available to
facilitate re-use - For newly created CDEs, partial re-use of
administered elements of standards is encouraged
to facilitate interoperability - Data Element Concept re-use (Object/class
Attribute/Property) - Value Domain re-use
24Common Data Elements
- caGrid service metadata
- XML document must be provided
- Concept codes for object, properties, value
domain and permissible values in XML document
must agree with caDSR mapping - Requirement is absolute
25Common Data Elements
- Tooling needs
- Tooling to compare concept codes between service
metadata, UML model and caDSR - Report of use of concept codes in application
under review within caDSR - Report of use of similar concepts based on name
- Helpful if caDSR could identify like CDEs
- In particular, CDEs with the same attributes
except that object class of person is substituted
with patient or participant
26Agenda
- Introduction
- Overview of Gold Criteria
- Information Models
- Common Data Elements
- Vocabularies
- Program Interfaces
- Summary
- QA
27Vocabularies
- Vocabulary Criteria will map to
- Full adoption of caBIG vocabulary standards as
approved by the VCDE workspace. - Concept identification in systems must use the
caBIG Identifier and Resolution Scheme - Metadata of vocabularies must be accessed through
a standard caGrid Vocabulary API - Vocabularies must be discovered through a
standard caGrid Vocabulary API
28Vocabularies
- Questions/Challenges to Address
- Will many vocabularies really satisfy our
vocabulary review criteria for caBIG Vocabulary
standards? - How can we develop review criteria around Concept
IDs and their resolution on caGrid as this is
still under development? - What is the relationship of vocabulary
standardization process to the Silver/Gold
Compatibility Guidelines?
29Changes in the Compatibility GuidelinesSilver
Vocabulary and Ontologies Matrix
- Differentiated from bronze and silver where all
data collection fields and attributes of data
objects are approved by caBIG VCDE Workspace - Vocabularies used in data elements should be
compatible with caBIG Identifier and Resolution
Scheme - Approved vocabularies will provide a minimum set
of core metadata - Approved vocabularies will be classified based on
scope, intent, and purpose
30Changes in the Compatibility GuidelinesSilver
Vocabulary and Ontologies Text
- Updated to reflect review checklist
- Vocabularies/Ontologies will be assessed via
LexEVS on the grid (LexBIG and EVS will merge
under the name of LexEVS) - Added description of caBIG Identifier Scheme for
semantic classes - Added description of the caBIG Identifier
Resolution Scheme for resolving identifiers
31Changes in the Compatibility GuidelinesGold
Vocabulary and Ontologies Matrix
- Differentiated from Silver based on usage of
common identifier scheme and common vocabulary
API - Full adoption of approved caBIG vocabulary
standards - Vocabularies will utilize the caBIG identifier
and Resolution Scheme - Vocabularies will be accessible through a
standard vocabulary API - Compatible systems will reference standard
vocabularies approved for use by gold systems
32Changes in the Compatibility GuidelinesGold
Vocabulary and Ontologies Text
- Added detail on gold compatibility
- Approved caBIG vocabulary standards are enabled
- Registered terminologies approved as caBIG
standards for caBIG usage are accessed via
terminology metadata and discovered through a
caGrid vocabulary service (caGrid Vocabulary API) - Vocabulary is accessible through a standard
caGrid vocabulary API - The current caGrid Vocabulary API is EVS. LexBIG
and EVS will merge under the name of LexEVS
33Gold Vocabularies
- Tooling needs
- Tooling to confirm correspondence between concept
IDs, names, and/or definitions in the Information
Model with the source terminology. - Tooling to confirm correspondence between concept
IDs, names, and/or definitions in the Service
Metadata with the source terminology. - These tools will not be implemented until the
Resolution Scheme is fully implemented
34Changes in the Compatibility GuidelinesGold
Vocabulary and Ontologies Text
- Recap - Gold Vocabulary and Ontologies are
- Accessed and discovered via caGrid services
- Provided with a standard set of metadata
- Mapped and implemented with caBIG Identifier and
Resolution Scheme - Classified based on scope, purpose and intent
- Creation of tools needed to help the vocabulary
review process
35Agenda
- Introduction
- Overview of Gold Criteria
- Information Models
- Common Data Elements
- Vocabularies
- Program Interfaces
- Summary
- QA
36Interfaces
- Interface Criteria to map to
- APIs are exposed as operations of a Grid service
Object-Oriented client APIs are available for
invoking those operations - Service operations use XML as data exchange
format, and are invoked using standardized
protocols and communication channels - Services provide public access to caGrid
standardized service metadata and have capability
to register it with a caGrid Index Service - Data-oriented services provide query access using
the caGrid standardized query interface and
language - Secure services must use the caGrid standardized
mechanisms for authentication, trust management,
and communication channel protection - Questions/Challenges to Address
- What is the distinction between reviewing a
Silver API vs. a Gold API? - Tooling for tedious consistency checking of
various artifacts - How will schemas in the GME be mapped to UML
models?
37Changes in the Compatibility Guidelines Gold API
(Grid Services)
- The primary change for Gold APIs is the move to
grid services and APIs, where data is transported
over the grid as well-defined XML - Tooling exists to make the development experience
very similar to any existing Silver API which is
client/server based - However, Gold compliance additionally requires
- Adherence to several standards and specifications
- Standardized approaches to metadata and security
- Specific (additional) constraints for data query
capabilities - Review process will focus on checking for
standards compliance and consistency between
existing artifacts (UML models, APIs, etc) and
new grid-specific artifacts (WSDLs, XSDs, service
metadata, etc)
38Changes in the Compatibility Guidelines Gold API
(Metadata)
- Gold compliance introduces the concept of
service metadata to all systems which are
exposed as grid services - Provides programmatic runtime access to metadata
about the API, information model, CDEs, and
vocabulary - Tooling exists to automate the development
experience such that most information is
extracted from existing system (e.g. caDSR) and
metadata is created automatically - Review process will focus on checking for
existence, syntax compliance, accuracy,
registration, and consistency of metadata
39Changes in the Compatibility Guidelines Gold API
(Security)
- Gold compliance unifies standards, technologies,
and methodologies for authentication,
authorization, message transport, and trust - Built upon X.509, HTTPS, and web/grid service
standards - Tooling exists to simplify accrual and use of
credentials, management of trust, and service
security configuration - Review process will focus appropriate use of
authentication process, integration to caBIG
trust fabric, and use of standards and
technologies for transport
40Changes in the Compatibility Guidelines Gold API
(Applications)
- Gold compliant applications are expected to
correctly leverage (secure) grid services, make
use of the discoverable nature of the grid,
present data using registered semantics, and
build on existing tooling/languages/APIs when
possible - Many high-level APIs and frameworks exist for
application developers to leverage - Review process will focus use of the grid APIs
and tools, presentation of data, and integration
with security infrastructure
41Changes in the Compatibility Guidelines Gold API
Review Challenges
- Gold compliance introduces many new artifacts
(grid service, metadata, WSDL, XSDs, etc) - Must be checked for consistency with each other
and existing Silver artifacts (UML models, APIs,
etc) - Magnitude of items to check practically requires
automation - Review process is complicated if existing Silver
review has been performed, or if both Silver and
Gold compliance are eventually sought - Large systems with many components seeking
reviews may have fuzzy boundaries (e.g. an
application consisting of a UI and 1 or more grid
services) - Some criteria are most ideally realized by
emerging technology or software in development
(e.g. caDSR/GME binding, identifiers, distributed
vocabulary services, etc) - Gold criteria groups have more overlap/dependencie
s than Silver criteria
42Agenda
- Introduction
- Overview of Gold Criteria
- Information Models
- Common Data Elements
- Vocabularies
- Program Interfaces
- Summary
- QA
43SummaryGold Criteria
- Information Models
- Model harmonization and reuse
- UML model represented in the XML schema
- CDEs
- Reuse of standards
- Concept codes in the XML document
- Vocabularies
- Use of vocabulary standards
- caBIG identifier and Resolution Scheme
- Program Interfaces
- Grid services and security
- Service metadata
44Next Steps
- Kick-off Working Groups
- Finalize individual checklists
- Sub-Group Leads harmonize checklists
- Sub-Group Leads develop review process (based on
Silver) - Working Group signs off on harmonized checklists
and process -
- Pilot review process with gridPIR (July 2008)
- Develop lessons learned
- Modify review criteria and process as needed
- Present to Architecture and VCDE WS for review
and approval
45Agenda
- Introduction
- Overview of Gold Criteria
- Information Models
- Common Data Elements
- Vocabularies
- Program Interfaces
- Summary
- QA