Semantic Interoperability Between Distributed Science Data Registries - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Semantic Interoperability Between Distributed Science Data Registries

Description:

Physical copies sent to NSSDC. PDS provides data, documentation and science expertise to users ... term archives at PDS and NSSDC (minimum 3 copies) are still ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 42
Provided by: engl9
Category:

less

Transcript and Presenter's Notes

Title: Semantic Interoperability Between Distributed Science Data Registries


1
Semantic Interoperability Between Distributed
Science Data Registries
  • Open Forum 2003 on Metadata Registries
  • 400 - 445 PM
  • January 2003

J. Steve Hughes Jet Propulsion Laboratory
2
Topics
  • Introduction and Background
  • Challenges
  • Solution
  • Implementation
  • Benefits
  • Next Steps

3
PDS Overview
Key PDS Products and Services High quality
peer-reviewed data archives Data distribution to
planetary community Archiving expertise to
planetary missions Scientific expertise and
support for users Value-added aggregated data
products Education and outreach data products and
services
Node structure provides focus on key disciplines
4
PDS Role for Planetary Science Data
missions
2003 Mars Exploration Rovers
Muses-C
Mars Scouts
Mars Pathfinder
NEAR
Galileo
Mars Express
2001 Mars Odyssey
Messenger
Voyager
Deep Impact
Rosetta
Ulysses
Lunar Prospector
Deep Space 1
MGS
Cassini
Stardust
DAWN
MRO
P l a n e t a r y D a t a S y s t e m
data products
scientists
the public
mission planners
educators
5
Recently Archived PDS Products
The Environs of NEAR Shoemaker's Landing Site on
Eros Catalog PIA03141 2/07/01
Galileo SSI Global image of Io (true color)
Catalog PIA02308 8/27/99
MGS Pre-Mapping Phase Pilot DVD Set
MGS Martian North Polar Cap on September 12,
1998 Catalog PIA01471
Dark Dunes Over-riding Bright Dunes MGS MOC
Release No. MOC2-201, 1/31/2000
Clementine Observes the Moon, Solar Corona, and
Venus Catalog PIA00434 11/04/97
6
Data Archiving Life Cycle
  • Planning Phase
  • Data archiving requirements written into mission
    Announcement of Opportunity
  • Pre-proposal briefing on PDS data archiving
    requirements given to potential proposers
  • Proposal data archiving section reviewed by PDS
  • PDS orientation to flight project staff
  • Data archiving working groups formed

Definition Design Phase
PlanningPhase
  • Definition Design Phase
  • Project Data Management and Archive Plans define
    data to be archived
  • Data Product and Volume Organization Software
    Interface Specifications detail the data and
    volume structure
  • Preliminary metadata labels loaded into PDS
    catalog
  • Production Phase
  • Raw and processed data products, labels
    (metadata) and documentation produced
  • Preliminary and quick-look data made accessible
    via Project and PDS web pages
  • Data archive products validated and
    peer-reviewed liens corrected

Distribution Maintenance Phase
Production Phase
  • Distribution Maintenance Phase
  • Final data products made available on-line
  • PDS adds the data to the archive
  • Physical copies sent to NSSDC
  • PDS provides data, documentation and science
    expertise to users
  • Data archive maintained via periodic media
    refreshes, addition of new / updated data products

7
Previous PDS Archive Production and Distribution
Process
  • PDS receives data from flight projects for
    archive and distribution
  • PDS helps planetary missions to create high
    quality data archives and to release them in a
    timely manner
  • PDS validates data for compliance to PDS
    standards
  • PDS assembles, publishes, distributes, and
    maintains peer-reviewed, documented planetary
    data sets
  • PDS archive data also available on-line at PDS
    discipline nodes
  • Problem Planetary missions are producing larger
    data volumes
  • CD-ROM distribution is too expensive
  • Even if DVDs replace CDs, there will still be
    hundreds, even thousands, of volumes
  • Difficult for users to store difficult to locate
    data of interest
  • A new paradigm for archive and distribution is
    needed

8
Current Challenges
  • More missions
  • Smaller, more frequent missions more orbiters
  • New programs (Mars Exploration, Discovery, New
    Frontiers)
  • Inadequate mission archiving budgets
  • New PIs with little experience in data archiving
  • Larger data volumes
  • Bigger payloads (Cassini-18 instruments,
    Galileo-16, Rosetta-20)
  • More complex instruments/better resolution/higher
    data rates
  • THEMIS will return 5TB of data (2 Magellans)
  • Mars 05 300TB (100 Magellans!)
  • Increased user expectations
  • Demand for instant internet access and modern
    interfaces
  • Need for sophisticated methods to access larger
    data volumes and to locate data of interest

9
Archive Growth with Mars Exploration Program
10
PDS Data Distribution New Paradigm
  • Online access is the primary method for data
    distribution, with improved tools to support
    users
  • Find out what data exist
  • Select data of interest
  • Retrieve data
  • Correlate data across instruments, missions, and
    nodes
  • Data are publicly available as soon as possible
  • Copies on physical media are available on demand
    using limited resources
  • Special collections containing data of high
    interest can be published on physical media from
    time to time
  • Copies of complete data sets on cost-effective
    physical media for long term archives at PDS and
    NSSDC (minimum 3 copies) are still required from
    the flight projects

11
The Data
  • Variety and Volume
  • 5TB of data from 30 years of exploration
  • 700 Data Sets (hundreds of product types)
  • 1700 Archive Volumes CD/DVDs
  • Camera, Spectrometer, LECP, SAR, RS,
  • Images, Time_Series, Spectra, Qubes, Tables,
  • Binary and ASCII
  • Spacecraft and Earth Based
  • Many data representations
  • Geographically distributed
  • Multi-disciplinary
  • Maintain original bits and convert as needed

12
The Data Model
Level Group/Element Structure ___________________
______________________ 1 spacecraft instrument
identification group 2 instrument
identification 2 instrument name
2 spacecraft identification 2 instrument
type 1 instrument description ... 1 filter
group 2 filter name 2 filter
number 2 filter type ...
13
An Image Label
DATA_SET_ID "VO1/VO2-M-VIS-5-DIM-V1.0" SPACECRA
FT_NAME VIKING_ORBITER_1, ... TARGET_NAME
MARS IMAGE_ID MG88S045 IMAGE
2 SOURCE_IMAGE_ID "383B23", "421B23",
... INSTRUMENT_NAME VISUAL_IMAGING_SUBSYSTEM
... NOTE "MARS DIGITAL IMAGE
... OBJECT IMAGE LINES 160
LINE_SAMPLES 252 SAMPLE_TYPE
UNSIGNED_INTEGER SAMPLE_BITS 8
SAMPLE_BIT_MASK 211111111 CHECKSUM
2636242 END_OBJECT
14
Categories of Meta-Data
  • Data Represenation
  • Data Representation
  • ITEM_TYPE VAX_INTEGER
  • File Attributes
  • RECORD_TYPE FIXED_LENGTH
  • RECORD_BYTES 252
  • Data Organization
  • LINES 160
  • LINE_SAMPLES 252
  • SAMPLE_TYPE UNSIGNED_INTEGER
  • SAMPLE_BITS 8
  • Catalog
  • Identification
  • DATA_SET_ID "VO1/VO2-M-VIS-5-DIM-V1.0
  • IMAGE_ID MG88S045
  • INSTRUMENT_NAME VISUAL_IMAGING_...
  • Observation Context
  • FILTER_NAME CLEAR

15
PDS-D Implementation
  • Multi-tiered information architecture
  • Application Clients (Browsers/Interfaces)
  • Middleware (OODT)
  • Data and Metadata Servers (product server,
    profile server)
  • Data Repositories and Catalogs
  • Existing PDS subsystems
  • Data and resources remain physically distributed
    and locally managed
  • Underlying heterogeneity is encapsulated and
    hidden from the users
  • User Interfaces (Image Atlas, DITDOS,etc.)
  • Data repositories (disk farms, databases, CD
    Jukeboxes)
  • Catalogs
  • Separate data and technology architectures
  • PDS archive metadata used to its full potential
  • Evolved technology architecture deployed
  • Internet used for data distribution

16
PDS-D Architecture for Mars Odyssey
Users
Educational
Science
General Public
Data Set View
IDL, WIPE
Distributed Clients
Standard Interfaces (OODT Middleware)
Data Products and Metadata
MARIE, PDS PPI
Documents and Ancillary Files PDS CN
THEMIS ASU
Radio Science PDS GEO
SPICE PDS NAIF
GRS PDS GEO
ACCEL PDS ATMOS
17
Data Product Retrieval
18
Conceptual Architecture
Name Server
Web I/F
Query Server
Node 1 Profile Server
XMLQuery
Web server Plugins
Web Server
XMLQuery
Node 1 Catalog
Node 1 Product Server
XMLQuery
QueryClient
XMLQuery
Desktop I/F
XMLQuery
Node 1 Products
XMLQuery
XMLQuery
XMLQuery
DSCAT Profile Server
DS Catalog
Client Environment
Central Node
Discipline Node
19
Information Architecture in a Nutshell
  • Profiles describe and provide location for
    anything of interest
  • Things of interest
  • Data Sets, Data Set Browsers, Data Products,
    Volumes, Websites, Web Applications, etc
  • Written as XML documents
  • Provide sufficient information to describe and
    locate resources
  • Helps determines if the resource can resolve a
    user query
  • Profile Servers serve profiles
  • Allow search and retrieval of profiles
  • Distributed for local management and scalability
  • Access profile databases
  • Static profiles stored as XML documents
  • Dynamic profiles generated from information
    stored in databases

20
System Architecture in a Nutshell
  • Product Servers serve Data Products
  • Data Products are served from data repositories
  • Input unique product identifiers
  • Output products in the requested formats
  • Middleware
  • Uses XML documents for communication
  • Common language and protocols
  • XML profiles for resource descriptions
  • XMLQUERY for queries
  • Implements message passing protocol for
    distributed processing
  • Web service encapsulation of existing resources
  • Product servers for data repositories
  • Profile servers for catalogs

21
PROFILE DTD
lt!ELEMENT profiles (profile)gt lt!ELEMENT
profile (profAttributes, resAttributes,
profElement)gt lt!ELEMENT profAttributes
(profId, profVersion?, profType,
profStatusId, profSecurityType?, profParentId?,
profChildId, profRegAuthority?,
profRevisionNote, profDataDictId?)gt
lt!ELEMENT resAttributes (Identifier,
Title?, Format, Description?, Creator,
Subject, Publisher, Contributor, Date,
Type, Source, Language, Relation,
Coverage, Rights, resContext,
resAggregation?, resClass, resLocation)gt
lt!ELEMENT profElement (elemId?, elemName,
elemDesc?, elemType?, elemUnit?,
elemEnumFlag, (elemValue (elemMinValue,
elemMaxValue)), elemSynonym,
elemObligation?, elemMaxOccurrence?,
elemComment?)gt
22
Data Product Profile (ODYSSEY HEND)
  • -ltprofilegt
  • -ltprofAttributesgt
  • ltprofIdgt1.3.6.1.4.1.1306.2.104.10018791lt/prof
    Idgt
  • ltprofVersiongtnulllt/profVersiongt
  • ltprofTypegtprofilelt/profTypegt
  • lt/profAttributesgt
  • -ltresAttributesgt
  • ltIdentifiergtODY-M-HEND-EDR-2-V1.0H0133lt/Iden
    tifiergt
  • ltTitlegtData_Set_Name ODYSSEY-MARS-HEND-EDR-2
    -V1.0 Product_IdH0133lt/Titlegt
  • ltDescriptiongtnulllt/Descriptiongt
  • ltresContextgtNASA.PDSlt/resContextgt
  • ltresAggregationgtnulllt/resAggregationgt
  • ltresClassgtdata.granulelt/resClassgt
  • ltresLocationgtiiop//PDS.ProfServer.GEO.ODY.GR
    Slt/resLocationgt
  • lt/resAttributesgt
  • -ltprofElementgt
  • ltelemNamegtFILE_SPECIFICATION_NAMElt/elemNamegt
  • ltelemValuegt/ody_2001/xxx/H0133.DATlt/elemValue
    gt
  • lt/profElementgt

23
Profile Server
  • Profile Servers serve profiles
  • Allow search and retrieval of profiles
  • Retrieves from profile databases
  • Static profiles stored as XML documents
  • Dynamic profiles generated from information
    stored in databases
  • Distributed for local management and scalability

24
Profile Server Requirements
  • A profile server shall search and retrieve
    profiles from a profile database
  • For search, a profile server shall allow any
    profile attribute as a query constraint. Profile
    attributes include those from the profile
    element, resource attribute, and profile
    attribute sections of the profile document.
  • For retrieval, a profile server shall return
    matching profiles. The user can request the
    complete profile or any subset of the profile.

25
Demo
http//starbrite.jpl.nasa.gov/pds
26
Data Set View Results
27
Custom Data Set Browser THEMIS Search
28
Custom Data Set Browser THEMIS Results
29
Default Data Set Browser
30
Benefits
  • New system architecture provides seamless search
    and retrieval of all PDS data products in the
    system
  • Users can access all PDS resources without
    knowing their location
  • Users are presented with an integrated set of PDS
    Nodes (one PDS, not seven)
  • Primary method of data distribution is now
    electronic and saves media costs
  • Heterogeneous data repositories can be located
    anywhere for optimum performance and cost savings
    (e.g., THEMIS data node at ASU)
  • PDS-D provides a standard interface for software
    developers thereby increasing the availability of
    user clients
  • Supports plug-ins for analysis tools and
    graphical user interfaces
  • PDS-D supports evolution and scaling to
    incorporate new information technology and
    requirements changes
  • Mission are now more involved with the PDS sooner
    and data are released through the PDS as soon as
    they become available
  • Mars Odyssey data were released to the public
    through the PDS on October 1st -- the same day
    they were delivered!

31
Scalability
  • Number of system component interconnections
    increases linearly
  • Nodes added as needed
  • One-to-one connections from each component to
    middleware
  • Exponential number of inter-operational
    connections made dynamically via message passing
  • Since distribution system is built as a light
    layer on top of the archive system, it will scale
    as long as the archive system scales
  • Continue to distribute archive as needed to
    support larger data repositories (e.g. MRO)
  • Parallel load balancing
  • Smaller frequently used data repositories can be
    mirrored

32
Correlative Search the Simple Way
  • All data resources in the system are profiled
  • Submit a query that describes what you want
  • Not how to get what you want
  • System returns all matching data profiles
  • Provides identification and description
    information
  • Provides location information
  • Provides all PDS metadata to support correlative
    science
  • Information is machine and human readable
  • Submit query to retrieve data

33
Next Steps
  • Collect and analyze requirements from upcoming
    planetary missions (e.g., Cassini, MER, Mars
    Express, MRO)
  • Gather user community feedback from PDS D-01
  • Incorporate both of these to determine future
    releases of PDS-D
  • Automate the data archiving processes to
    streamline getting data into the PDS
  • Automated archive product creation work flow
  • Product generation, labeling, validation, and
    ingestion
  • Derived product processing and versioning
  • Upgrade the PDS data model to support new
    requirements
  • XML modeling and interfaces
  • Ground-based data sets
  • Wavelength regimes
  • Targets with multiple identifiers and types

34
PDS Development Timeline
PDS-D D02 Data Set View for entire archive
Mars Database product search NEAR Data Set
3 sigma Data Set On-request CD/DVD creation
Cassini
35
For More Information
J. Steven Hughes Jet Propulsion
Laboratory Steve.Hughes_at_jpl.nasa.gov
36
Backup
37
Product Server Architecture
HTTP, IIOP, Java, C APIs
HTTP, IIOP, Java, C APIs
Distributed Product Servers
Java Server Framework
Java Server Framework
Data Source Interface For Dynamically Loaded
Query Handlers
Data Source Interface For Dynamically Loaded
Query Handlers
File System Access/Zip/ReadLabel
File system access/Zip/ReadLabel
Distributed Data Repositories
38
The Product Server in a Nutshell
  • If a product server has read privileges to a
    file, it can return that file.
  • If a product server has read privileges to a
    directory, it can return all files in the
    directory, packaged as a zip file.
  • If a product server has read privileges to a PDS
    labeled product, it can return all files
    referenced within the label of the product,
    packaged as a zip file.
  • The PDS/OODT product server is capable of serving
    the vast majority of all products in the PDS
    archive. (I.e. The product server is not
    constrained by any target body, mission, data
    set, or the data repository layout.)
  • The currently released product server is
    installed at six nodes. (I.e. All product server
    capabilities are at all nodes.)

39
Product Server Query Handlers
  • Return_Types
  • PDS_LABEL return PDS label
  • PDS_ZIP return PDS labeled file and all
    associated files in a ZIP package
  • PDS_ZIPN same as PDS_ZIP except for 1-n PDS
    labeled files
  • RAW (mime_type) return specified file
  • DIRLIST return list of all files in a directory
  • PDS_ZIPN_TES returns TES product in a ZIP
    package
  • PDS_JPEG convert PDS image to jpeg
  • Under consideration
  • PDS_CSV convert PDS binary TABLE to common
    separated value ASCII file
  • PDS_PDS Normalize data representation of a PDS
    product
  • PDS_FITS Convert PDS product to FITS
  • http//buttons.jpl.nasa.gov9002/index.html

40
Standard Product Server Interface
  • HTTP protocol link to product query servlet
  • http//starbrite.jpl.nasa.gov/servlet/jpl.oodt.se
    rvlets.ProductServlet
  • Target product server specification
  • objectPDS.ASU.Product
  • Keyword query
  • ONLINE_FILE_SPECIFICATION_NAME
  • data/odtie0_xxxx/i009xxedr/I00900003EDR.QUB
  • AND RETURN_TYPE PDS_ZIP

41
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com