Working with Data Providers in a Distributed Data Environment - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Working with Data Providers in a Distributed Data Environment

Description:

Enlisted a group of domain experts (X-men) to work with data providers. ... The X-men have identified structure in the model that can be used to build tools ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 20
Provided by: haoU
Category:

less

Transcript and Presenter's Notes

Title: Working with Data Providers in a Distributed Data Environment


1
Working with Data Providers in a Distributed Data
Environment
  • Raymond J. Walker
  • Todd A. King
  • Steven P. Joy
  • Lee F. Bargatze
  • Peter Chi
  • James Weygand
  • Robert L. McPherron

Presented at Virtual Observatories in Geoscience
Denver, Colorado
June 12, 2007
2
New Challenges
  • Background
  • Most Heliophysics data are available through
    independent repositories.
  • Found around the world
  • Use different metadata standards
  • Are organized differently
  • The Heliophysics Virtual Observatories have been
    tasked with connecting these disparate
    repositories into a logical whole that enables
    scientists to locate and access the data and
    services they need.

3
The VMO Approach Part I
  • Selected the Space Physics Archive Search and
    Extract (SPASE) metadata standard and its XML
    representation to describe resources.
  • Enables interoperability in a federated
    environment.
  • Acts as an interlingua or intermediate language
    through which the VMO communicates with data
    repositories.
  • Common metadata allows the repositories to be
    interconnected.
  • Current state of SPASE
  • Version 1.2.0 has been released and VMO has
    baselined to that version.
  • Defined a standard data model for all of
    Heliophysics.

4
The Elements
  • Resource descriptions are stored in registries.
  • The VMO provides services
  • Query registries
  • Aggregate and organize the responses
  • Direct users to the resource
  • Provide data services (reformat, manipulate,
    display, and analyze)

Resource
Repository
Registry
Access point
Model and Methods
5
Organized in a Self-declared Network
ResidentArchive
VxO
Individual Researcher
VMO
6
The Approach Part II
  • Generate resource descriptions in SPASE XML.
  • The SPASE data dictionary is scientifically very
    rich.
  • SPASE is so rich that the learning curve is
    steep.
  • At best it is a formidable task to populate the
    registries.
  • Most data providers do not have the resources to
    create the SPASE metadata and populate the
    registries.
  • Develop a system for creating and populating the
    metadata with minimum effort.

7
Creating SPASE Metadata
  • Built tools to edit and verify the SPASE
    metadata.
  • Built tools to populate the registries.
  • Enlisted a group of domain experts (X-men) to
    work with data providers.

8
Qualifications of the Magnetospheric X-men
  • Research scientists who are actively engaged in
    the analysis of magnetospheric data.
  • Must understand space plasma physics.
  • Must understand space particles and fields
    instruments or have sufficient background that
    they can quickly learn about them.
  • Must be expert in time series data analysis
    techniques.
  • X-men must augment their scientific background
    with training in the principles of data
    management.
  • Must understand the details of the SPASE data
    model.
  • Must be expert in tools used for creating the
    metadata and populating the registries.

9
What X-men do
  • Develop a plan to make all of the data useful for
    magnetospheric research available to the
    community.
  • We are working to make the list exhaustive.
  • The list includes correlative data which we plan
    to access through the other VXOs.
  • Prioritized the ingestion tasks and work out an
    ingestion schedule.
  • Contact data providers and jointly work out a
    plan to include their data in the VMO.

10
The SPASE data model is complex.
The X-men have identified structure in the mode
l that can be used to build tools to aid in
writing the high level metadata.
11
SPASE Editors Developed by VMO(Web Based)
12
SPASE Editors Developed by VMO(Excel and Matlab)
(Input by VMO members or data providers)
Programmed by VMO
13
SPASE Editors Developed by VMO (IDL)
SPASE Model 1) Ontology Tree 2) Enu
meration Lists
3) Custom Settings
spase_model
Version 1.2
create_spase_structure populate_structure write_
structure
WDC Geomagnetic Master Catalog
1) Acknowledgement File 2) Data Granul
e Existence Map 3) Granule Path, Name, Speci
fics
wdc_1_min
XML Files
14
Why Not Just One Editor?
  • Each of the three X-men uses a different SPASE
    editing scheme.
  • The SPASE leaning curve is sufficiently steep
    that they didnt want to learn a new software
    system.
  • The three tools use approaches with which they
    are comfortable.
  • The existence of these three approaches plus
    others developed by the SPASE consortium
    hopefully will allow data providers to select
    software with which they are comfortable.
  • For a first hand discussion see Bargatze et al.,
    (this meeting)

15
Automating the Generation of Detailed SPASE XML
16
Validation Tool
17
SPASE Data Dictionary Tool
18
Registry Tools
19
Working with Data Providers to make Data
Available Through VMO
  • The X-men assist the data providers to
  • Use a SPASE editor to write high level SPASE
    XML.
  • Verify the XML.
  • Create Rule Sets (or other software) to populate
    the detailed level SPASE XML.
  • Establish the registry at the remote site, if
    desired.
  • Load the high level SPASE XML into the registry.

  • Run the Rule Sets (or other software) to populate
    the registry.
  • Most importantly an expert is available to data
    providers at each step of the process.
Write a Comment
User Comments (0)
About PowerShow.com