GFDL Data Portal Update: Curator DB Approach - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

GFDL Data Portal Update: Curator DB Approach

Description:

Metadata Database design for Data Portal usage and for whole modeling process ... Examples: atmosphere or ocean 3D space. for dynamics. Physical Processes ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 23
Provided by: gfdl
Category:

less

Transcript and Presenter's Notes

Title: GFDL Data Portal Update: Curator DB Approach


1
GFDL Data Portal Update Curator DB Approach
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • S.Nikonov, V.Balaji, K.Dixon
  • GFDL

2
Outlines
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • GFDL Data Portal Hardware Upgrade
  • Data Portal Statistics
  • Metadata Database design for Data Portal usage
    and for whole modeling process

3
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Data Portal Hardware Upgrade
  • Dell PowerEdge 2850
  • Two Intel 3.2GHz Xeon processors
  • 2GB RAM
  • 300GB system disk
  • Two QLogic QLA2340 fiber channel controllers
    (2Gb/s)
  • Red Hat Enterprise Linux 4.0 ES operating system
  • Ten StorageTek FlexLine FLC200 fiber channel disk
    arrays
  • Fourteen 250GB SATA drives per array
  • 140 drives, total 35TB raw (27TB usable)
  • Increasing by 40 data transferring and
    processing speed
  • Future plan is to double storage capacity every 2
    yrs

4
Data Statistics
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • 01-Oct-2004 to
    1-June-2006
  • Total amount of data 8 TB (increased by 50 for
    1 yr)
  • 12,500 NetCDF files, average file size 650 MB
  • Distinct files requested 6,000
  • Distinct hosts served 1,200
  • Data transferred 20 TB (increased in 2 times)
  • Average data transferred per day 25 GB

5
Metadata Database Design
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • There is already progress done here in CGAM (NMM
    Suite) also Curator project is devoted partially
    to developing model and model output metadata
    standards. Those ideas and discussions were
    extremely useful for our design.
  • For comprehensive data analysis Data Portal
    should give description not only data but also
    how this data was generated.
  • It should use the same metadata database as
    modeling system (Flexible Runtime Environment).
    This database is a joining element of whole
    system.
  • Analysis of existing data through Data Portal
    will help to modelers in improving models and
    planning new experiments.
  • Thus Data Portal can be considered not as a
    separate independent system, but subsystem of
    modeling system

6
Common functionality schema of modeling system
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
7
Metadata Database usage on different stages of
modeling process
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Data Portal Service
Postprocessing Plan
Experiment Preparation
Model Composition
Component Building
Metadata Database
8
Main Database Compartments and their
relationships
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
9
Scheme Rationales
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • Process Domains arenas where physical processes
    play.
  • Physical Process descriptions of accepted
    theoretical approaches for given processes
    considered in modeling.
  • Algorithmization describes program modules of
    elementary physical processes
  • Composition components, couplers drivers
    technical environment
  • Simulation describe model output data and its
    location, including all accompanied
    administrative information.

10
Process Domains
  • They define phase spaces of the equations
    expressing in mathematical form physical
    phenomena. Also they serve as containers
  • where elements are put (gases, aerosols,
  • clouds). It contains common descriptions
  • and sets of elements constituent domain.
  • Examples atmosphere or ocean 3D space
  • for dynamics.

11
Physical Processes
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • It contains theoretical assumptions, full
    description, references and other information
    specific for process.
  • Identified by name and domain where they act.
  • Described individually in different tables.
  • All process tables have subset of the same
    fields
  • process id
  • process name
  • domain
  • full description.
  • Others reflect process specific.
  • Process name and domain are the one of the
    criteria for preventing to include the same
    process into component or coupled model twice.

12
Algorithmization
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • Process codebase set of modules implementing
    process including input data description
    (namelists and datasets) and accompanied with CVS
    tag
  • Numeric artifices set of modules implementing
    numeric smoothing (filters, artificial viscosity,
    general algorithms, etc)
  • Tracer models descriptions with pointing to
    fieldtables files associated with tracers
  • Grid specs
  • Boundary conditions
  • Namelists datasets (model parameters),
    fieldtables (tracers) their locations,
    versions, descriptions, checksums.

13
Composition
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • Main actors here are components.
  • Component can be of 2 types physical component
    and coupler.
  • Component consists of modules.
  • Modules constituent of component are defined by
    physical process to be participating in final
    model. These set of modules are described in
    Algorithmization part of database.
  • Another entity of Composition compartment is a
    driver. It is a program unit responsible for
    running components (solely or as whole coupled
    model).
  • Component is a minimal unit capable to be run by
    driver
  • Components have PMIOD description and system
    should make decision about components
    compatibility using it. Other criteria working at
    component building stage is that there should not
    be two the same processes of the same domain in
    component or in couple model.
  • Coupled Model table describes set of components
    are member of final coupled model

14
Simulation
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Contains tables having full description of
conducted experiment that includes
  • Institution
  • Author
  • Project
  • Scenario
  • Experiment
  • Realization
  • Postprocessing plan
  • Variables
  • Variable bundles
  • Metadata standards
  • Data fields
  • Files

15
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Compartment Structure of Curator Database
. . .
16
Modes of working with database
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • Research mode - modeler introduces new physical
    processes in modeling or new algorithmizations
    and new components from newly developed modules
    for future usage in coupled models. New
    components are to be described in database. The
    model runs conducted for this developed purpose
    are not to be recorded in DB excepting final ones
    proving physical correctness of new approach.
  • Production mode experimenter composes coupled
    model from available components described in
    database, builds scenario, postprocessing plan
    and runs experiment. All this activity is
    recorded in database.
  • Thoroughly elaborated very friendly GUI is
    critical need for these modes otherwise users
    will avoid the database based way of working, DB
    will be empty, project will fail.
  • Automatic mode applications fill metadata into
    database grabbing it from data files or reads
    metadata for their needs during execution.
  • The most progress was done here with usage
    Simulation compartment of Curator DB

17
Current usage of Curator DB
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
  • Currently the Simulation part of DB is designed
    for operational usage and its kept updated and
    used in Data Portal activity.
  • DB serves for GFDL Data Portal web site for data
    discovery and navigation IPCC CM2.1. The daemon
    screens Data Portal storage seeking newly put
    data files and records metadata extracted from
    files and system information about them into DB.
  • Its used for bringing metadata consistency data
    files on Data Portal with standards defined in
    DB. The application accesses to DB for metadata
    standard assumed for given file and
    compares/fixes it in the file.
  • Its used by automatic tool for configuring DODS
    Aggregation Server. The tool checks the
    experiment status (public/not public) into DB and
    requests all needed metadata for generating DODS
    xml configuration file and creates this file.

18
Tables examples - 1
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
19
Tables examples - 2
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
CoupledModels
20
Data examples - 1
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
Experiments
21
Data examples - 2
The 5th GO-ESSP Workshop June 19-21 2006, LLNL
OutDataFields
OutDataFiles
22
  • Thanks!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com