Unidatas Common Data Model - PowerPoint PPT Presentation

About This Presentation
Title:

Unidatas Common Data Model

Description:

Standardized Data Access in good shape. HDF5, NetCDF, OPeNDAP ... But that's not good enough! To do: Standard representations of coordinate systems ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 37
Provided by: unidat
Category:
Tags: common | data | model | unidatas

less

Transcript and Presenter's Notes

Title: Unidatas Common Data Model


1
Unidatas Common Data Model
  • John Caron
  • Unidata/UCAR
  • Nov 2006

2
Goals / Overview
  • Look at the landscape of scientific datasets from
    a few thousand feet up.
  • What semantics are needed to make these useful?
  • georeferencing
  • specialized subsetting

3
Whats a Data Model?
  • An Abstract Data Model describes data objects and
    what methods you can use on them.
  • An API is the interface to the Data Model for a
    specific programming language
  • A file format is a way to persist the objects in
    the Data Model.
  • An Abstract Data Model removes the details of any
    particular API and the persistence format.

4
Common Data Model Layers
Coordinate Systems
Data Access
5
Application
Scientific Datatypes
Datatype Adapter
NetCDF-Java version 2.2 architecture
NetcdfDataset
CoordSystem Builder
ADDE
NetcdfFile
I/O service provider
OPeNDAP
NetCDF-3
NIDS
GRIB
NetCDF-4
NcML
HDF5
GINI
Nexrad
DMSP

6
NetCDF-4 and Common Data Model (Data Access Layer)
7
I/O Service Provider Implementations
  • General NetCDF, HDF5, OPeNDAP
  • Gridded GRIB-1, GRIB-2
  • Radar NEXRAD level 2 and 3, DORADE
  • Point BUFR, ASCII
  • Satellite DMSP, GINI
  • In development
  • NOAA GOES (Knapp/Nelson), many others

8
Coordinate Systems needed
  • NetCDF, OPeNDAP, HDF data models do not have
    integrated coordinate systems
  • so georeferencing not part of API
  • Need conventions to specify (eg CF-1, COARDS,
    etc)
  • Contrast GRIB, HDF-EOS, other specialized formats

9
NetCDF Coordinate Variables
  • dimensions
  • lat 64
  • lon 128
  • variables
  • float lat(lat)
  • float lon(lon)
  • double temperature(lat,lon)

10
Coordinate Variables
  • One-dimension variable with same name as its
    dimension
  • Strictly monotonic values
  • No missing values
  • The coordinates of a point (i,j,k) is
  • CV1(i), CV2(j), CV3(k)

11
Limitations of 1D Coordinate Variables
  • Non lat/lon horizontal grids
  • float temperature(y,x)
  • float lat(y, x)
  • float lon(y, x)
  • Trajectory data
  • float NKoreaRadioactivity(pt)
  • float lat(pt)
  • float lon(pt)
  • float altitude(pt)
  • float time(pt)

12
General Coordinates in CF-1.0
  • float P(y,x)
  • Pcoordinates lat lon
  • float lat(y, x)
  • float lon(y, x)
  • float Sr90(pt)
  • Sr90coordinates
  • lat lon altitude time

13
Coordinate Systems (abstract)
  • A Coordinate System for a data variable is a set
    of Coordinate Variables2 such that the
    coordinates of the (i,j,k) data point is
  • CV1(i,j,k),CV2(i,j,k),CV3(i,j,k),CV4(i,j,k
    )
  • previous was CV1(i), CV2(j), CV3(k)
  • The dimensions of each Coordinate Variable must
    be a subset of the dimensions of the data
    variable.

14
Need Coordinate Axis Types
float gridData(t,z,y,x) float time(t) float
y(y) float x(x) float lat(y,x) float
lon(y,x) float height(t,z,y,x)
float radialData(radial, gate) float
distance(gate) float azimuth(radial) float
elevation(radial) float time(radial)
15
The same??
float stationObs(pt) float lat(pt) float
lon(pt) float z(pt) float time(pt)
float trajectory(pt) float lat(pt) float
lon(pt) float z(pt) float time(pt)
16
Revised Coordinate Systems
  • Specify Coordinate Variables
  • Specify Coordinate Types
  • (time, lat, lon, projection x, y, height,
    pressure, z, radial, azimuth, elevation)
  • Specify connectivity (implicit or explicit)
    between data points
  • Implicit Neighbors in index space are
    (connected) neighbors in coordinate space. Allows
    efficient searching.

17
Gridded Data
float gridData(t,z,y,x) float time(t) //
Time float y(y) // GeoX float x(x) //
GeoY float z(t,z,y,x) // Height or Pressure
  • Cartesian coordinates
  • All dimensions are connected
  • Connected means
  • Neighbors in index space are neighbors in
    coordinate space

18
Coordinate Systems UML
19
Scientific Data Types
  • Based on datasets Unidata is familiar with
  • APIs are evolving
  • How are data points connected?
  • Intended to scale to large, multifile collections
  • Intended to support specialized queries
  • Space, Time
  • Corresponding standard NetCDF file conventions

20
Gridded Data
  • Cartesian coordinates
  • All dimensions are connected
  • x, y, z, time
  • recently added runtime and ensemble
  • refactored into GridDatatype interface

float gridData(t,z,y,x) float time(t) float
y(y) float x(x) float lat(y,x) float
lon(y,x) float height(t,z,y,x)
21
GridDatatype methods
  • CoordinateAxis getTaxis()
  • CoordinateAxis getXaxis()
  • CoordinateAxis getYaxis()
  • CoordinateAxis getZaxis()
  • Projection getProjection()
  • int findXYindexFromCoord( double x_coord,
    double y_coord)
  • LatLonRect getLatLonBoundingBox()
  • Array getDataSlice (Range )
  • GridDatatype makeSubset (Range )

22
Radial Data
  • Polar coordinates
  • All dimensions are connected
  • Not separate time dimension

radialData(radial, gate) distance(gate)
azimuth(radial) elevation(radial) time(radial)
23
Swath
  • lat/lon coordinates
  • not separate time dimension
  • all dimensions are connected

swathData(line,cell) lat(line,cell)
lon(line,cell) time(line) z(line,cell) ??
24
Point Observation Data
  • Set of measurements at the same point in space
    and time
  • Point dimension not connected

float obs1(pt) float obs2(pt) float lat(pt)
float lon(pt) float z(pt) float time(pt)
Structure lat, lon, z, time v1, v2,
... obs( pt)
25
PointObsDataset Methods
  • // IteratorltStructureDatagt
  • Iterator getData(
  • LatLonRect boundingBox,
  • Date start, Date end)

26
Time series Station Data
Structure name lat, lon, z
Structure time v1, v2, ...
obs() // connected stn(stn) // not connected
27
StationObs Methods
  • // ListltStationgt
  • List getStations(
  • LatLonRect boundingBox)
  • // IteratorltStructureDatagt
  • Iterator getData(
  • Station s,
  • Date start, Date end)

28
Trajectory Data
  • pt dimension is connected
  • Collection dimension not connected

Structure lat, lon, z, time v1, v2, ...
obs(pt) // connected
Structure name Structure lat,
lon, z, time v1, v2, ... obs() //
connected traj(traj) // not connected
29
Profiler/Sounding Station Data
Structure name lat, lon, time
Structure z v1, v2, ...
obs() // connected loc(nloc) // not
connected
Structure name lat, lon Structure
time, Structure z
v1, v2, ... obs() // connected
time() // connected stn(stn) // not
connected
30
Unstructured Grid
  • Pt dimension not connected
  • Looks the same as point data
  • Need to specify the connectivity explicitly

float unstructGrid(t,z,pt) float lat(pt)
float lon(pt) float time(t) float height(z)
31
Data Types Summary
  • Data access through a standard API
  • Convenient georeferencing
  • Specialized subsetting methods
  • Efficiency for large datasets

32
Payoff N M instead of N M things on your TODO
List!
File Format 1
Visualization Analysis
NetCDF file
File Format 2
OpenDAP Server
File Format N
WCS Service
Web Service
33
THREDDS Data Server
HTTP Tomcat Server
Catalog.xml
Application
THREDDS Server
  • OPeNDAP
  • HTTPServer
  • WCS

NetCDF-Java library
hostname.edu
Datasets
IDD Data
34
Next DataType Aggregation
  • Work at the CDM DataType level, know (some) data
    semantics
  • Forecast Model Collection
  • Combine multiple model forecasts into single
    dataset with two time dimensions
  • With NOAA/IOOS (Steve Hankin)
  • Point/Station/Trajectory/Profile Data
  • Allow space/time queries, return nested sequences
  • Start from / standardize Dapper conventions

35
Forecast Model Collections
36
Conclusion
  • Standardized Data Access in good shape
  • HDF5, NetCDF, OPeNDAP
  • Write an IOSP for proprietary formats (Java)
  • But thats not good enough!
  • To do
  • Standard representations of coordinate systems
  • Classifications of data types, standard services
    for them
Write a Comment
User Comments (0)
About PowerShow.com