Object Oriented Data Technology for Space Science Data Archiving and Retrieval: Potential Applicatio - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Object Oriented Data Technology for Space Science Data Archiving and Retrieval: Potential Applicatio

Description:

Viking. Interface. Odyssey. Interfaces. Page 6. Page 6. PDS Nodes and Institutions (Silos) ... Viking. Ancillary. Files. DIS. Data. Mining. Data. Catalogs ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 49
Provided by: DAN383
Category:

less

Transcript and Presenter's Notes

Title: Object Oriented Data Technology for Space Science Data Archiving and Retrieval: Potential Applicatio


1
Object Oriented Data Technology for Space Science
Data Archiving and Retrieval Potential
Applications in Biomedical Research
March 21, 2001 Dan Crichton Steve Hughes Jet
Propulsion Labs California Institute of
Technology National Aeronautics and Space
Administration
2
Who are we?
  • JPL is a federally funded research and
    development center (FFRDC) run by Caltech for
    NASA
  • JPL is NASAs lead center for robotic exploration
    of the universe
  • JPL has an enormous amount of data that it needs
    to manage from scientific data, to engineering,
    to institutional
  • We represent several efforts in both the research
    and enterprise side of JPL that is addressing
    enterprise architectures for integrating data at
    both JPL and NASA. Such efforts include
  • Planetary Data System - NASA Planetary Archive
  • Enterprise Data Architecture and Applications
  • JPL IT Data Infrastructure
  • Knowledge Management - JPL/NASA KM Initiative
  • Object Oriented Data Technology - I.T. Data
    System Research

3
Challenges to Data ManagementArchiving, Search,
Retrieval and Integration
  • Space scientists cannot easily locate or use data
    across the hundreds if not thousands of
    autonomous, heterogeneous, and distributed data
    systems currently in the Space Science community.
  • Heterogeneous Systems
  • Data Management - RDBMS, ODBMS, HomeGrownDBMS,
    BinaryFiles
  • Platforms - UNIX, LINUX, WIN3.x/9x/NT, Mac, VMS,
  • Interfaces - Web, Windows, Command Line
  • Data Formats - HDF, CDF, NetCDF, PDS, FITS, VICR,
    ASCII, ...
  • Data Volume - KiloBytes to TeraBytes
  • Heterogeneous Disciplines
  • Moving targets and stationary targets
  • Multiple coordinate systems
  • Multiple data object types (images, cubes, time
    series, spectrum, tables,
  • binary, document)
  • Multiple interpretations of single object types
  • Multiple software solutions to same problem
  • Incompatible and/or missing metadata

4
Evolution of Data Systems(Trying to make order
out of entropy)
Distributed Data
Centralized Data
Data System Maturity
Basic Data Infrastructure - Data Acquisition -
Databases - Data Analysis Tools -
Homogeneous Computing
Data Archiving - Catalog Systems - Data
sets - Data Products - Metadata
Data Location - Metadata - Distributed
Data sets - Distributed Services
Data Product Exchange/ Interoperability -
Heterogeneous Servers - Data Interchange -
Data Sharing - Distributed Architectures
5
Independent Access to PDS Archives
Odyssey Interfaces
Viking Interface
Mars Odyssey
Mars Global Surveyor
Ancillary Files
Viking
6
PDS Nodes and Institutions (Silos)
Geosciences/Washington University
Rings/Ames
Radio Science/Stanford
Small Bodies/UMD
Planetary Plasma/UCLA
Imaging/JPL
Central Node/JPL
Imaging/USGS
Atmospheres/New Mexico State
NAIF/JPL
7
Managing Software Interfaces( of Systems vs
of Interfaces)
of interfaces
of systems
of interfaces (n2 - n) / 2 where the number
of systems is n
8
DIS Approach (circa 1998)
  • PDS Distributed Inventory System (DIS)
  • Identified all resources available across the PDS
  • Used Object Description Language (keyword/value)
    labels to describe data and non-data resources
  • Included URL/FTP links to all online resources
  • Used cgi-script engine to search label database
  • Displayed results in an HTML page

9
Product Label
OBJECT DATA_SET DATA_SET_NAME VO1/VO2
MARS VISUAL IMAGING SUBSYSTEM DIGITAL IMAGING
MODEL DATA_SET_ID VO1/VO2-M-VIS-5-DIM-V1.0
DATA_SET_DESC http//pds.jpl.nasa.gov/cgi-bin/r
emote.pl?dsVO1/VO22BMARS
DATA_SET_TERSE_DESC http//pds.jpl.nasa.gov/cgi-
bin/remote.pl?dsVO1/VO22BM
DATA_SET_RELEASE_DATE 1991 LINK
http//www-pdsimage.wr.usgs.gov/PDS/public/mapmake
r/mapmkr.htm ARCHIVE_STATUS ARCHIVED
ARCHIVE_STATUS_NOTE Passed peer review with all
liens resolved. Available through PDS and NSSDC
ARCHIVE_SCHEDULE_NOTE 1999-08-18ARCHIVED
DATA_OBJECT_TYPE IMAGE START_TIME N/A
STOP_TIME N/A NODE_NAME IMAGING
CURATING_NODE_ID IMAGING DATA_ENGINEER_ID
N/A TARGET_NAME MARS TARGET_TYPE
PLANET MISSION_NAME VIKING
INSTRUMENT_HOST_NAME VIKING ORBITER 1, VIKING
ORBITER 2 INSTRUMENT_HOST_TYPE SPACECRAFT
INSTRUMENT_NAME VISUAL IMAGING SUBSYSTEM -
CAMERA A, VISUAL IMAGING SUBSYSTEM - CAMERA B
INSTRUMENT_TYPE VIDICON CAMERA
NSSDC_DATA_SET_ID 75-075A-01F, 75-083A-01C
VOLUME_ID VO_2001, VO_2002, VOLUME_LINK
ftp//pdsimage.wr.usgs.gov/cdroms/vo_2001/,
ftp//pdsimage.wr.usgs.gov/cdroms/vo_2002/,
KEYVALUES CAMERA, DIGITAL, DIM, IMAGING, MARS,
MODEL, ORBITER, PLANET, SPACECRAFT, SUBSYSTEM,
TABLE, VIDICON, VIKING, VIS, VISUAL, VO1, VO2,
VO_2001, VO_2002, LABEL_REVISION_NOTE
2000-09-19 CCN SCN LNAYDN END_OBJECT
10
DIS Links to Mars Archives
Mars Odyssey
Mars Global Surveyor
Viking
Ancillary Files
Distributed Archives
11
Lessons Learned
  • Identified metadata as key to the solution
  • Identifies and describes data and resources
  • Provides search attributes
  • Helps determine whether query can be resolved by
    resource
  • However
  • ODL is not a universal interchange language
  • Resource heterogeneity is still visible
  • Navigation and access is limited to http links
  • System interoperability is not supported

12
OODT, The Next Generation XML/JAVA/CORBA
  • Started in 1998 as a research and development
    task funded at JPL by the Office of Space Science
    to address
  • Application of Information Technology to Space
    Science
  • Research methods for interoperability, knowledge
    management and knowledge discovery
  • Develop software frameworks for data management
    to reuse software, manage risk, reduce cost and
    leverage IT experience
  • OODT Initial focus
  • Data archiving Manage heterogeneous data
    products and resources in a distributed,
    metadata-driven environment
  • Data location Locate data products across
    multiple archives, catalogs and data systems
  • Data retrieval Retrieve diverse data products
    from distributed data sources and integrate

13
OODT System Design Goals
  • Encapsulate individual data systems to hide
    uniqueness
  • Provide data system location independence
  • Require that communication between distributed
    systems use metadata
  • Define a standard data dictionary structure and
    approach for describing systems and resources
  • Provide a scalable and extensible solution
  • Provide a mechanism for data product exchange
  • Allow systems using different data dictionaries
    and metadata implementations to be integrated
  • Define an architecture that can leverage off of
    open standard approaches

14
OODT Technology Focus
  • Focus on building middleware components
  • Focus on creating metadata profiles about data
    system resources
  • Provide sufficient layers of abstraction in the
    architecture to isolate technology choices from
    architecture choices
  • XML (Extensible Markup Language) for the data
    content
  • CORBA (Common Object Request Broker Architecture)
    for the data transport
  • Research technologies for implementing a
    distributed data architecture
  • Distributed Object Computing (CORBA, DCOM, etc)
  • Database Technology (RDBMS, ODBMS)
  • Data Access Technologies (O/JDBC, XML, etc)
  • Directory Implementations (LDAP)
  • Data Interchange (XML)
  • Communication Technologies (Web/HTTP, MOM, RPC,
    etc)

15
Motivation for Middleware
  • Middleware allows for the encapsulation of
    individual data systems
  • Hide uniqueness by introducing the data
    architecture layer
  • Allows for data abstraction
  • Provide common client interfaces to heterogeneous
    systems
  • Manage risk associated with technical decisions.
    Systems evolve independent of the clients.
  • Enable reuse and promote standards
  • Allow for incompatible systems to be tied
    together by introducing a middleware layer

16
Middleware Data Encapsulation
Applications
User Interface
Middleware
Data
Middleware can tie application, data, and user
interfaces together and hide the unique interfaces
17
OODT Component Framework
  • Java based software middleware component
    architecture that provides a software framework
    for archiving, search and retrieval, and data
    product exchange
  • Archive Component
  • Provides centralized data archiving and
    cataloging of data products
  • Distributed
  • A Search and Retrieval Component
  • Manage metadata associated with resources
  • Locate resources across geographically
    distributed data systems
  • Distributed
  • Data Product Exchange Component
  • Support interchange (data sharing) of data
    products
  • Support heterogeneous implementations and systems
  • Distributed
  • Query Service Component
  • Ties search and product exchange services
    together
  • Distributed

18
Component Framework for OODT
Archive Client
OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK
19
Data Archiving Goals
  • Provide basic functions
  • Transport and management of data sets and
    products
  • Identification of products using metadata
  • Event driven processing associated with data sets
  • Ability to add, get and delete products from the
    archive
  • Management of large, complex datasets and
    products
  • Provide extensible data management approach
  • Database is dynamically generated and extended
    based on metadata content
  • Separate the metadata and the archive. Metadata
    can be exported to other components of the OODT
    architecture.
  • Provide a n-tier, client/server architecture for
    archiving at remote locations
  • Build a service that is accessible via clients
    using common programming languages (Java, C,
    etc)

20
Archive Management
21
OODT Metadata Development
  • Develop methods for managing the semantics of
    data that are shared within and between domains
  • Terminology Base Domain specific name space
  • Data Dictionary Inventory of domain terms with
    definitions and other distinguishing attributes.
  • Ontology A set of concepts, their relationships
    and constraints, all within the scope of a
    domain.
  • XML for metadata registry and communication
  • Several I.T. efforts have shown the criticality
    of metadata in enabling data sharing and system
    interoperability
  • Use standards where appropriate
  • ISO/IEC 11179 A framework for the Specification
    and Standardization of Data Elements
  • Dublin Core A metadata element set intended to
    facilitate discovery of electronic resources.

22
Why XML?
  • XML doesnt provide a silver bullet, but it
    does allow us to refocus the problem on metadata
  • Metadata is a key to interoperability
    (http//www.cio.gov/docs/metadata.htm)
  • XML is language neutral
  • Allows the designer to separate the data and the
    transport (re CORBA vs XML-over-CORBA)
  • Transport mechanism and data are not tied
    together
  • Could be XML/HTTP
  • Simpler deployments
  • Simpler interfaces
  • Allows technologies to grow and change
    independently
  • Real value of XML is the process of describing
    the data

23
What is a profile?
  • A profile is a set of metadata resource
    definitions implemented in XML for data products
    residing in one or more distributed systems
  • Profile servers are CORBA servers that manage XML
    profile definitions
  • Profile servers communicate via XML-over-CORBA
  • Developed Java classes that map XML profiles to a
    Java object

Profile Distributed Node Architecture
XML/CORBA
XML/CORBA
XML/CORBA
XML/CORBA
XML/CORBA
XML/CORBA
24
Solutions to Data Search
  • Build metadata profiles that describe data
    system resources
  • Encapsulate individual data systems resources
    (Hide uniqueness)
  • Communicate using metadata (Provide metadata with
    data)
  • Enable interoperability based on metadata
    compatibility
  • Refocus problem on metadata development
  • Provide a core framework of software components
    to interconnect distributed data systems
  • Define profiles using standard industry
    approaches
  • Use XML to describe profiles
  • ISO/IEC 11179 A framework for the Specification
    and Standardization of Data Elements
  • Dublin Core A metadata element set intended to
    facilitate discovery of electronic resources.

25
Resource Profile Classifications
Resource Classes
Metadata
Data
Application
System
Resource Context (Discipline )
Space Science
Medicine
26
Profile DTD
lt!ELEMENT profiles (profile)gt lt!ELEMENT
profile (profAttributes, resAttributes,
profElement)gt lt!ELEMENT profAttributes
(profId, profVersion, profTitle, profDesc,
profType, profStatusId,
profSecurityType, profParentId, profChildId,
profRegAuthority, profRevisionNote,
profDataDictId)gt lt!ELEMENT resAttributes
(Identifier, Title, Format, Description,
Creator, Subject, Publisher,
Contributor, Date, Type, Source,
Language, Relation, Coverage, Rights,
resContext, resAggregation, resClass,
resLocation)gt lt!ELEMENT profElement
(elemId, elemName, elemDesc, elemType,
elemUnit, elemEnumFlag, (elemValue
(elemMinValue, elemMaxValue)),
elemSynonym, elemObligation,
elemMaxOccurrence, elemComment)gt
27
XML Profile Example (1 of 2)
ltprofilegt ltprofAttributesgt
ltprofIdgtOODT_PDS_DATA_SET_INV_82lt/profIdgt ltprofDat
aDictIdgtOODT_PDS_DATA_SET_DD_V1.0lt/profDataDictIdgt
lt/profAttributesgt ltresAttributesgt
ltIdentifiergtVO1/VO2-M-VIS-5-DIM-V1.0lt/Identifiergt
ltTitlegtVO1/VO2 MARS VISUAL IMAGING SUBSYSTEM
DIGITAL lt/Titlegt ltFormatgttext/htmllt/Formatgt
ltLanguagegtenlt/Languagegt ltresContextgtPDSlt/re
sContextgt ltresAggregationgtdataSetlt/resAggregat
iongt ltresClassgtdata.dataSetlt/resClassgt
ltresLocationgthttp//pds.jpl.nasa.gov/cgi-bin/pdsse
rv.pl?lt/resLocationgt lt/resAttributesgt
28
XML Profile Example (2 of 2)
ltprofElementgt ltelemIdgtARCHIVE_STATUSlt/elemI
dgt ltelemNamegtARCHIVE_STATUSlt/elemNamegt
ltelemTypegtENUMERATIONlt/elemTypegt
ltelemEnumFlaggtTlt/elemEnumFlaggt
ltelemValuegtARCHIVEDlt/elemValuegt
lt/profElementgt ltprofElementgt
ltelemIdgtTARGET_NAMElt/elemIdgt
ltelemNamegtTARGET_NAMElt/elemNamegt
ltelemTypegtENUMERATIONlt/elemTypegt
ltelemEnumFlaggtTlt/elemEnumFlaggt
ltelemValuegtMARSlt/elemValuegt
lt/profElementgt lt/profilegt
29
OODT Profile Service Component
  • Profiles are managed by profile servers
  • Profile servers are written in Java
  • OODT currently has three different registry
    methods for managing profiles which are
    configurable at run time
  • Flat File
  • RDBMS via JDBC (Oracle)
  • LDAP (OpenLDAP)

30
Data Product Exchange
  • Exchanging products requires access to each data
    system (RDBMS/OODBMS, Flat file, etc) which is
    difficult
  • Different vendor products
  • Non-standard interfaces
  • Different implementations (data model, home
    grown, COTS,etc)
  • Representations of data are different
  • Heterogeneous Platforms
  • Heterogeneous O/S
  • etc

31
Solutions to Data Product Exchange
  • Extend framework to support common access to
    distributed data systems by creating a Product
    Service Component
  • Product Servers - Middleware that negotiates the
    interfaces between the data system
    implementations
  • Design the component to leverage off of
  • Consistent metadata and data dictionary
  • Consistent data interchange methods and protocols
  • Provide data abstraction
  • Data and information hiding
  • Location hiding and independence
  • Provide a standard language for communication
  • Use the OODT XML Query language for data
    interchange
  • Support rich query description including data
    elements and constraints
  • Support rich query results that include results
    in many different formats

32
OODT Product Server Component
  • The Product Server plugs into the OODT framework
    and manages the handshake between the data
    system and the OODT system.
  • Extensible by dynamically loading objects at
    runtime which are specific to the data system
    model
  • Queries and results are passed using an OODT XML
    Query structure
  • Encapsulates one or more data sources for
    standardized access

Generic Server
Implementation Class
File Sys
Query XML
Result XML
Database
Product Server
33
OODT Query Service Component
  • Manages all queries for the identification and
    retrieval of data products
  • All components are identified by a unique name
    and managed in a CORBA name server
  • Queries to multiple profile or product servers
    occur concurrently
  • Queries are described using the OODT XML Query
    structure
  • Ties together the profile and product server
    components for the OODT framework

34
XML Query Example (1 of 2)
ltquerygt ltqueryAttributesgt ltqueryIdgtOODT_XML_QUE
RY_V0.1lt/queryIdgt ltqueryTitlegtOODT_XML_QUERY -
PDS DIS Query Examplelt/queryTitlegt
ltqueryDescgtPDS DIS Query for TARGET_NAME
MARSlt/queryDescgt ltqueryTypegtQUERYlt/queryTypegt
ltqueryStatusIdgtACTIVElt/queryStatusIdgt
ltquerySecurityTypegtUNKNOWNlt/querySecurityTypegt
ltqueryRevisionNotegt2000-05-12 JSH V1.2 Updated
for new

prof.dtdlt/queryRevisionNotegt ltqueryDataDictIdgtOO
DT_PDS_DATA_SET_DD_V1.0lt/queryDataDictIdgt
lt/queryAttributesgt ltqueryResultModeIdgtATTRIBUTElt/
queryResultModeIdgt ltqueryPropogationTypegtBROADCAS
Tlt/queryPropogationTypegt ltqueryPropogationLevelsgt
N/Alt/queryPropogationLevelsgt ltqueryMaxResultsgt100
lt/queryMaxResultsgtltqueryResultsgt0lt/queryResultsgt
ltqueryKWQStringgtTARGET_NAME MARSlt/queryKWQString
gt
35
XML Query Example (2 of 2)
ltquerySelectSetgtlt/querySelectSetgt
ltqueryFromSetgtlt/queryFromSetgt ltqueryWhereSetgt
ltqueryElementgt lttokenRolegtelemNamelt/tokenRolegt
lttokenValuegtTARGET_NAMElt/tokenValuegt
lt/queryElementgt ltqueryElementgt
lttokenRolegtLITERALlt/tokenRolegt
lttokenValuegtMARSlt/tokenValuegt lt/queryElementgt
ltqueryElementgt lttokenRolegtRELOPlt/tokenRolegt
lttokenValuegtEQlt/tokenValuegt lt/queryElementgt
lt/queryWhereSetgt ltqueryResultSetgtlt/queryResultSet
gt lt/querygt
36
Insertion of OODT into PDS
OODT Middleware
Distributed Archives
37
OODT Query Flow Example
Search Web Page
XMLQuery/IIOP(no results)
XMLQuery/IIOP(no results)
Userquery
Query Server
Profile Serverjpl
QueryClient
Web server
search.jsp
Profile DB
XMLQuery/IIOP(profiles of resources to handle
query)
XMLQuery/IIOP(profiles ordata resultsas
requested)
XSL(profiles ordata productsformatted)
Product Serverjpl.pti
PTI Repository
XMLQuery/IIOP (product search)
Product Serverjpl.pds
XMLQuery/IIOP (data results)
PDS DVD Jukebox
Product Serverjpl.pds.mola
PDS MOLA Oracle DB
38
MGS Coverage Plot
39
Query Results
40
Efforts in Bioinformatics
  • September 2000, the Office of Science Policy at
    NIH and JPL entered into a joint inter-agency
    agreement to
  • Explore knowledge environments to support the
    management and exchange of biomarkers and
    biomakers databases
  • Insert technologies developed to support
    interoperability between biomedical databases
  • Develop methodologies to are mutually beneficial
    to both biomedical and space science data systems
  • A series of meetings lead to an initial effort
    that focused on interoperability within the Early
    Detection Research Network, a network within the
    National Cancer Institute (NCI) consisting of 18
    laboratories and hospitals
  • Two initial epidemiological centers were selected
    for locating cancer specimens
  • Moffitt Cancer Research Center
  • University of San Antonio, Texas
  • A mock up data architecture for EDRN was
    presented on January 21st at the 3rd EDRN
    Steering Committee meeting in San Antonio, Texas

41
EDRN Mock Query Interface
42
EDRN Knowledge ArchitectureMockup Implementation
at JPL
OODT Middleware Hosted at JPL
InQuery OutIdentified Resources
InQuery OutData Products
InQuery OutData Products
EDRN Mock Databases Hosted at JPL
43
Component Based Mars Distributed Archive
Architecture 2001 Mars Odyssey Orbiter Example
DVD Volume
Physical Volume
OODT Middleware
Virtual Volume
Virtual Volume
InQuery OutIdentified Resources
InQuery OutData Products
Data Products
Data Product and Resource Descriptions
MARIE EDR, RDR Archive PDS PPI
GRS RDR Archive PDS GEO
Documents and Ancillary Files PDS CN
GRS EDR Archive UofA
THEMIS EDR, RDR Archive ASU
44
More Information and References
  • OODT Papers (http//oodt.jpl.nasa.gov/doc/papers)
  • Science Search and Retrieval using XML by OODT
    Team. Presented at Second National Conference on
    Scientific and Technical Data, National Academy
    of Sciences, Washington D.C. March 2000.
  • A Distributed Component Framework for Science
    Data Product Interoperability by OODT Team.
    Presented at the 17th Annual International CODATA
    conference. Baveno, Italy. October 2000.
  • OODT Presentations (http//oodt.jpl.nasa.gov/doc/p
    resentations)
  • JPL Enterprise Data Architecture White Paper
    (e-mail Dan.Crichton_at_jpl.nasa.gov)
  • Planetary Data System
  • http//pds.jpl.nasa.gov
  • Dublin Core
  • http//purl.oclc.org/dc
  • Extensible Markup Language
  • http//www.w3c.org/XML
  • ISO/IEC 11179 Specification and Standardization
    of Data Elements
  • Federal CIO Statement on Metadata
  • http//www.cio.gov/docs/metadata.htm

45
Backup Slides
46
OODT Prototype Environment
  • - XML parser Apache/IBM Xerces 1.0.3
  • http//xml.apache.org
  • - XSLT Apache Xalan 1.0.0
  • http//xml.apache.org
  • - CORBA Orbacus 4.0.3
  • http//www.ooc.com
  • - Database Oracle 8.1.5
  • http//www.oracle.com
  • - LDAP server OpenLDAP 1.2.11
  • http//www.openldap.org
  • - Development language Java 1.2
  • http//java.sun.com
  • - Web server iPlanet Fasttrack 4.1
  • http//www.iplanet.com
  • - Server operating system RedHat Linux 6.2
  • http//www.redhat.com
  • - Version control system CVS 1.10.5
  • http//www.cvshome.org

47
Profile Server Node
48
The Use of Metadata in the PDS
Locate and Use Data - Use context to find data -
Use context to understand data
Mission
Target
Data Set Collection
System Interoperability - Use context to share
data
Spacecraft
Planetary Science Model
Data Set
Correlative Science - Use context to find new
relationships between data
Instrument
Spectrum
Time Series
Image
Document
Model Attributes
Label
Data
Write a Comment
User Comments (0)
About PowerShow.com