Mike Folk, Elena Pourmal - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Mike Folk, Elena Pourmal

Description:

data management needs in. science and engineering, and to ... Aircraft real-time test data (500Mb/sec) Voice communications. Video data. Ground tracking data ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 29
Provided by: HDF9
Category:

less

Transcript and Presenter's Notes

Title: Mike Folk, Elena Pourmal


1
Update on HDFSustaining and Growing Data
Technology
  • Mike Folk, Elena Pourmal
  • National Center for Supercomputing Applications
  • University of Illinois at Urbana-Champaign
  • canSAS-IV RAL
  • May 13, 2004

2
Talk overview
  • Data management challenges, and evolution of data
    requirements and file formats
  • HDF in a nutshell
  • Current HDF development effort
  • Sustainability and growth of HDF

3
We are drowning in data and still starving for
information
  • Web Anonymous

4
Current data management challenge
  • Data diversity
  • Comes from mixed sources (experiment, simulation,
    testing, remote sensing, etc.)
  • Comes in different sizes (KB, MB, GB, TB,)
  • Comes in different forms and formats (ASCI,
    binary, community formats, proprietary formats)
  • Hard to share, archive, and mine
  • Leads to duplicated effort in visualization and
    mining tools
  • Results in high costs, lost opportunities to
    share, to access and to use already existing data

5
Current software management challenge
Stovepipe applications
Climate Model Application
Agricultural Monitoring
Weather satellite
Produce data with unique model and format
Gather data with unique model and format
Gather data with unique model and format
Preprocess
Preprocess
Preprocess
Visualize, analyze
Visualize, analyze
Visualize, analyze
Archive
Archive
Archive
6
Solution Common data models and formats
Standards-based applications
Climate Model Application
Agricultural Monitoring
Weather satellite
Common data model and storage format
Visualize, analyze
Visualize, analyze
Visualize, analyze
Archive
Archive
Archive
7
Evolution of format and I/O requirements in the
last 15 years
1992 I want to mix images, metadata, and other
data. Better to use a multi-object binary format.
HDF2 is cool!
1998 Whoa, this data is big! I need really big
objects, parallel I/O. Its HDF5 for me!
2004 You human peebles betta use NeXus, or Ill
be baaaack!
1994 My objects are complex. I need groups and
tables. I like HDF4!
1991 Ive got lots o numbers and metadata.
Fortran print!
1990 I just need some numbers. Text is good.
vi will do the trick!
8
HDF was created to address data management needs
in science and engineering, and to
provide building blocks for scientific communities
standards
9
HDF in a nutshell
  • Comes from HDF group, NCSA University of Illinois
  • File format and I/O Library for storing,
    managing and archiving large complex scientific
    and other data
  • Accommodates data of diverse origins, sizes, and
    types
  • Portable available for almost all OSs
  • Scalable works in high performance computational
    environments
  • Became underlying file format for community built
    standards such as NeXus, HDF-EOS, NPOESS

10
Example of HDF file mixing and grouping objects
Text This file was create as a part of
see http//hdf.ncsa.uiuc.edu
foo
a
z
1GB
lat lon temp -------------- 12 23
3.1 15 24 4.2 17 21 3.6
c
b
x
_foo_y
Table
Raster image
Raster image
2-D array
11
HDF Community
  • Broad range of disciplines and applications
  • We try to support all users
  • Provide open source libraries tools
  • Provide user support, documentation
  • Encourage widespread use, vendor participation
  • Help to develop community standards
  • Example NASA HDF-EOS and NPOESS
  • Data reached 4,000,000,000,000,000 bytes
  • NASA Aqua, Terra, and (soon) Aura satellites use
    HDF4 and HDF5
  • NPOESS will use HDF5 for data distribution

12
Current HDF development effortsMajor directions
  • Performance, library enhancements and tools to
    facilitate access to the HDF data
  • Help different communities with common data
    models and formats based on HDF
  • netCDF on top of HDF5 (Atmospheric Sciences)
  • Storage and visualization of the FEM and CFD data
    (ESA, NASA, US DOE Labs)
  • Bioinformatics (NIH, DNA sequencing)
  • Real-time data processing (Boeing)
  • Public records (NARA, geospatial data)
  • HDF sustainability (non for profit organization
    to support HDF effort)

13
Current releases
  • For details check http//hdf.ncsa.uiuc.edu
  • HDF5 1.6.2 and HDF4 r2.0
  • Bug fixes
  • Performance enhancements
  • New compression method NASA SZIP (fast, better
    compression rations)
  • Better configuration
  • New platforms Linux 64, Altix, MAC OS X
  • Tools
  • repack, diff

14
Java Tools development
  • HDFView
  • Browse and edit HDF4 and HDF5 files
  • Modularized Java packages
  • Address the needs of the standards built on top
    of HDF
  • HDF-EOS browser first successful prototype to
    brows HDF file using HDF-EOS standard
  • Web Browser Plug-in to read HDF (current
    research)
  • http//hdf.ncsa.uiuc.edu/hdf-java-html/

15
HDF-EOS Browser
  • HDF-EOS objects
  • Swath
  • Grid
  • Point
  • Represented by HDF objects

16
HDF view of HDF-EOS data
17
Natural view of HDF-EOS data
18
netCDF4/HDF5
  • netCDF4
  • Funded by NASA
  • Extension to current netCDF
  • Build on top of HDF5
  • Used by atmospheric science community
  • http//my.unidata.ucar.edu/conten/software/netcdf/
    netcdf-4/index.html

19
Storage and Visualization of FEM and CFD Data
  • Few examples
  • NASA CGNM standard for CFD applications
  • STEP/NRF standard for non-destructive tests,
    thermal data analysis
  • Abaqus internal file format for FEM calculations
    and visualization
  • EnSight internal file format for visualization of
    FEM and CFD data
  • Want to use common model based on HDF5
  • First prototype ftp//ftp.ensight.com/pub/HDF_RW/

20
HDF5 and Bioinformatics
  • Goals
  • Make DNA sequencing available to any educational,
    research, or clinical laboratory, and individual
    researchers
  • Solutions
  • Create common file format and data model based on
    HDF5 for solving alignment problems in DNA and
    protein sequence analysis
  • Develop visualization and analysis tools to work
    with raw data and to produce final publishable
    results
  • Develop general approach to address numerous
    sequencing activities

21
HDF5 and Bioinformatics
  • HDF5 challenges
  • Efficient handling of element deletions
  • Efficient handling of variable-length records
  • Request to support new data structures in HDF5
  • Link-lists
  • Hash tables
  • Sorting mechanisms
  • Multithreaded support

22
Real-time data processing (Boeing)
  • Challenge
  • Multiple data sources
  • Aircraft real-time test data (500Mb/sec)
  • Voice communications
  • Video data
  • Ground tracking data
  • Satellite/GPS
  • Simulations
  • Difficulty to share data between different
    companys divisions
  • Solution use common file format based on HDF5

23
Boeing Variable length array storage
24
Boeing Variable length array storage
  • Variable Length Array Storage in HDF5
  • Needed for flight test data systems
  • Must handle up to 500Mb/sec
  • Must handle raw, real-time and/or embedded data
  • NCSA implementing API to read/write data
  • Based on HDF5 table API
  • Potential applications to many domains
  • Part of effort to adopt HDF5 as Boeing-wide
    standard for engineering data

25
National Archives and Records Administration
(NARA)
  • Huge challenges managing digital data
  • Investigate HDF5 as format for large and/or
    complex data records
  • Initial focus on geospatial data
  • Images (e.g. elevation models, aerial
    photography)
  • Features (e.g. boundaries, roads, rivers)
  • Results so far
  • HDF5 data model handles all data types
  • Feature data present access and size problems for
    HDF5
  • Research leading to good performance lessons

26
Sustaining HDF non for profit organization
  • Investigating idea of non for profit
    organization dedicated to long-term
    sustainability of HDF-based technologies
  • HDF remains free and open
  • Funding similar to current mechanisms, plus
    consulting, donors, TBD.

27
HDF Information
  • HDF website
  • http//hdf.ncsa.uiuc.edu/
  • HDF Help email address
  • hdfhelp_at_ncsa.uiuc.edu
  • HDF users mailing list
  • hdfnews_at_ncsa.uiuc.edu

28
Acknowledgements
  • This report is based upon work supported in part
    by
  • Cooperative Agreement with NASA under NASA grant
    NAG 5-2040 and NAG NCCS-599. Any opinions,
    findings, and conclusions or recommendations
    expressed in this material are those of the
    author(s) and do not necessarily reflect the
    views of the National Aeronautics and Space
    Administration.
  • Lawrence Livermore National Laboratory contract
    DOE LLNL B507374 and B527300.
  • Electronic Records Archive of the US National
    Archives and Records Administration under grant
    number NARA NSF 02-02GPG
  • Boeing, NCSA and others (http//hdf.ncsa.uiuc.edu
    /acknowledge.html)
Write a Comment
User Comments (0)
About PowerShow.com