A%20Domain-Specific%20Modeling%20Language%20for%20Scientific%20Data%20Composition%20and%20Interoperability - PowerPoint PPT Presentation

About This Presentation
Title:

A%20Domain-Specific%20Modeling%20Language%20for%20Scientific%20Data%20Composition%20and%20Interoperability

Description:

A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability Hyun Cho University of Alabama at Birmingham Jeff Gray University of Alabama – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: A%20Domain-Specific%20Modeling%20Language%20for%20Scientific%20Data%20Composition%20and%20Interoperability


1
A Domain-Specific Modeling Language
forScientific Data Composition and
Interoperability
Hyun Cho University of Alabama at Birmingham
Jeff Gray University of Alabama
2
File Formats Image Files
  • Organize and store digital images that are
    composed of either pixel or vector (geometric)
    data
  • Bitmap-based
  • Created by scanner and digital camera
  • TIF, JPG, BMP
  • Vector-based
  • Geometric description Bitmap
  • Resolution Independent Infinitely scalable
  • Font, DRW, CGM

3
File Formats Music and Audio Files
  • Storing audio data that are produced by
    audio-to-digital converters
  • Key Parameters
  • Sample Rate, Resolution, Number of channels
  • Uncompressed formats
  • WAV, AIFF and AU
  • Lossless compression Formats
  • FLAC, Lossless Windows Media Audio (WMA)
  • Lossy compression Formats
  • MP3, Lossy Windows Media Audio (WMA)

4
File Formats Text Files
  • File formats that are structured as plain text,
    representing a sequence of lines
  • ASCII, TXT

5
File Formats Compound File Formats
  • Used to structure the contents of a document in
    the file
  • Contain a number of independent data streams that
    are organized in a hierarchy
  • Stream files in a file system
  • Storage sub-directories in a file system
  • MS Office, OpenOffice

6
Characteristics of Generic File Formats
  • Can handle one or two data types
  • Numeric data or alphanumeric data
  • May have a limitation of the file size
  • Mostly limited to a maximum file size of 2GB
  • May increase file I/O time linearly as the file
    size grows

An In-Depth Examination of Java I/O Performance
and Possible Tuning Strategies http//pages.cs.wis
c.edu/remzi/Classes/736/Fall2000/Project-Writeups
/KaiHongfei.html
7
Characteristics of Generic File Formats
  • Can handle one or two data type
  • Numeric data or alphanumeric data
  • May have a limitation of the file size
  • Mostly limited to a maximum file size of 2GB
  • May increase file I/O time linearly as the file
    size is grew

These generic file formats are not appropriate
for storing and retrieving scientific data
because the files were not designed to maintain
high volume of complex scientific data, such as
high resolution images, massive numerical data,
and graphs.
An In-Depth Examination of Java I/O Performance
and Possible Tuning Strategies http//pages.cs.wis
c.edu/remzi/Classes/736/Fall2000/Project-Writeups
/KaiHongfei.html
8
Scientific Data Format NetCDF3
  • Network Common Data Format
  • Machine-independent file format
  • Support a wide variety of platformsincluding
    Linux, MacOS, Windows
  • Representing multi-dimensional arrayswith
    ancillary data


Time 1
Time n
9
Scientific Data Format HDF5
  • Hierarchical Data Format
  • File format for managing any kind of data
  • Support high volume and/or complex data
  • Platform-independent
  • Flexible, efficient storageand I/O

10
Characteristics of the Scientific Data File
Formats
  • Self-Descriptive
  • Contain metadata to inform the contained data
    type and their organization
  • Directly Accessible
  • Can access arbitrary data through APIs
  • Concurrently Accessible
  • Multiple threads or processes can access data
    simultaneously
  • Enable high performance computing and speedier
    access
  • Archivable
  • Have their own archiving mechanism to backup and
    restore a high volume of data

11
Challenges in Using the Scientific Data File
Formats
  • Use different representations to organize the
    file structure
  • Each file format needs its own data visualization
    and composition
  • It is difficult to exchange data between two or
    more scientific data formats
  • Manage the evolution of APIs
  • Challenging to verify that APIs are evolved in
    accordance with the evolution of file
    specification
  • Maintain stability of existing applications from
    API evolution
  • User applications are subject to change of APIs
  • Limited support for data integration among
    heterogeneous scientific data formats

12
Framework for Scientific Data File Management
13
NEW SLIDES NEEDED HERE TO INTRODUCE DSM!
14
Model-Driven Engineering (MDE) and
Domain-Specific Modeling (DSM)
  • MDE specifies and generates software systems
    based on high-level models
  • Domain-Specific Modeling (DSM) a paradigm of MDE
    that uses notations and rules from an application
    domain
  • Metamodel defines a Domain-specific Modeling
    language (DSML) by specifying the entities and
    their relationships in an application domain
  • Model an instance of the metamodel
  • Model Transformation a process that converts one
    or more models to various levels of software
    artifacts (e.g., other models, source code)

15
Unifying the representation of file structure
organization
Analyze data model of each scientific file format
  • Adapt a DSML to build a tool for visualizing
    composing the scientific file format in a unified
    way

Common Data Model
Feature Model
Variable Data Model
Define DSML from Feature Model
Grammar Syntax
Implement DSML
DSML Tool
16
Unifying the representation of file structure
organization
  • Feature Model for Scientific File Format
  • Describe some highlights here
  • And here

17
Unifying the representation of file structure
organization
  • Content Composer
  • DSML Modeling tool for scientific data file
  • Implemented by using GEMS

18
API Abstraction Layer
  • Help to protect user applications from the
    evolution of APIs

Abstraction
createFile( const char path, FileCreationProperty fileCreationProperty)
NetCDF HDF5
int nc_create ( const char path, int cmode, int ncidp) H5File ( const char name, unsigned int flags)
19
Integrating data among heterogeneous data formats
  • Content Mapper
  • Define rules how to map data from a scientific
    data format to another
  • Content Verifier
  • Verify the correctness of the file composition
  • Verify the correctness of mapping rule

20
Summary
  • From the prototype of the framework
  • A DSML can help to build a graphical tool to
    compose and support interoperability across
    scientific file structures
  • Adoption of the layered architecture in the
    framework can help to maintain the independence
    of each layer
  • Both the API abstraction layer and the layered
    architecture are essential to develop and
    maintain user applications
  • Further works
  • Create metamodels that include full specification
    of each scientific file
  • Categorizing APIs in accordance to their intended
    use for API abstraction layer
  • Develop metamodels for managing API evolution

21
Thank you!
22
Example of Scientific Data Format OPeNDAP
  • Client-server protocol for scientific data access
  • Targeted oceanographic data management
Write a Comment
User Comments (0)
About PowerShow.com