Introduction to the BinX Library - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Introduction to the BinX Library

Description:

defineType typeName='HeaderStruct' struct character-8 varName='A'/ character-8 varName='B' ... typeName='A' varName='FirstUse'/ useType typeName='A' ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 38
Provided by: ted83
Category:

less

Transcript and Presenter's Notes

Title: Introduction to the BinX Library


1
Introduction to the BinX Library
  • eDIKT project team
  • Ted Wen tedwen_at_nesc.ac.uk
  • Robert Carroll robertc_at_nesc.ac.uk

2
Agenda
  • About the BinX project
  • A brief introduction to the BinX language
  • Introduction to the BinX library
  • Advanced API to the BinX library
  • Use cases and requirements
  • Dr Bob Mann
  • Dr Chris Maynard
  • Discussion

3
About the BinX project
4
The problem
  • XML is useful to represent metadata
  • Scientific datasets can be too large in XML
  • Most scientific data are in binary files
  • Binary data files are not all standardized
  • Binary data files are platform-dependent

5
BinX a solution
  • Initially designed for the Grid environment
  • Annotate data schema for any binary file
  • Data elements are marked up in XML
  • Describe three levels of features in a binary
    file
  • Underlying physical representation (byte order)
  • Primitive data types (integer, float)
  • Structure of the dataset (array, table)

6
The BinX project at eDIKT
  • Implementing a software library for BinX
  • Develop a series of tools based on the library
  • Choose C for performance
  • Write portable code for different platforms
  • Robust and easy to use

7
Development status
  • Requirement gathering from July 2002
  • Development started in October 2002
  • Prototype finished in December 2002
  • Alpha version complete in April 2003
  • Beta version to be released in June 2003

8
The deliverables
  • The BinX library
  • Compiled code on different platforms
  • Source code with Open Source license
  • Documentation
  • Users guide
  • Developers guide
  • Utilities and examples

9
The BinX Language
10
What is BinX?
  • The Binary XML Description Language
  • A language for annotating binary data files
  • It describes data types, data structures and
    attributes such as byte order
  • A BinX document is an XML file with metadata of a
    binary data file

11
A BinX document
  • ltbinx byteOrderbigEndiangt
  • ltdefinitionsgt
  • ltdefineType typeNamemyTypgt
  • ltarrayFixedgt
  • ltcharacter-8/gt
  • ltdim indexTo9/gt
  • lt/arrayFixedgt
  • lt/defineTypegt
  • lt/definitionsgt
  • ltdataset srcmyfile.bingt
  • ltuseType typeNamemyTyp/gt
  • ltinteger-32 varNameX /gt
  • lt/datasetgt
  • lt/binxgt

Root element
Data class section
Abstract data type
Data instance section
12
Data elements
  • Primitive data elements
  • Byte, character, integer, real
  • Complex data elements
  • Arrays, struct, union
  • User-defined data elements

13
Primitive data types
  • Bit
  • ltbit-1gt
  • Character
  • ltcharacter-8gt
  • ltunicode-16gt
  • ltunicode-32gt
  • Integer
  • ltbyte-8gt
  • ltshort-16gt, ltunsignedShort-16gt
  • ltinteger-32gt, ltunsignedInteger-32gt
  • ltlong-64gt, ltunsignedLong-64gt
  • Real
  • ltfloat-32gt
  • ltdouble-64gt
  • ltquadruple-128gt

14
Complex data types
  • Arrays
  • Repetitive collection of any data element
  • Multidimensional
  • Three types of arrays
  • Fixed length array
  • Variable-length array
  • Streamed array
  • Struct
  • A sequence of data elements
  • Union
  • One of a group of possible data elements
    conditional to the discriminant

15
Arrays
  • Streamed array
  • ltarrayStreamedgt
  • ltbyte-8/gt
  • ltdim /gt
  • lt/arrayStreamedgt
  • Fixed-length array (3-dimensional, 5 4 3)
  • ltarrayFixedgt
  • ltdouble-64/gt
  • ltdim indexTo5 nameZgt
  • ltdim indexTo4 nameYgt
  • ltdim indexTo3 nameX /gt
  • lt/dimgt
  • lt/dimgt
  • lt/arrayFixedgt
  • Variable-length array (2-dimensional, ? 7)
  • ltarrayVariable sizeRefbyte-8gt
  • ltfloat-32 /gt
  • ltdimgt
  • ltdim indexTo7/gt
  • lt/dimgt
  • lt/arrayVariablegt

16
Struct
  • ltstructgt
  • ltshort-16 varNameID /gt
  • ltinteger-32 varNameCount /gt
  • ltdouble-64 varNameVar /gt
  • lt/structgt

17
Union
  • ltuniongt
  • ltdiscriminantgt
  • ltbyte-8/gt
  • lt/discriminantgt
  • ltcase discriminantValue32gt
  • ltfloat-32 /gt
  • lt/casegt
  • ltcase discriminantValue64gt
  • ltdouble-64 /gt
  • lt/casegt
  • ltcase discriminantValue0gt
  • ltvoid-0 /gt
  • lt/casegt
  • lt/uniongt

18
User-defined data type
  • ltdefineType typeNameHeaderStructgt
  • ltstructgt
  • ltcharacter-8 varNameA/gt
  • ltcharacter-8 varNameB /gt
  • ltinteger-32 varNameLength /gt
  • lt/structgt
  • ltdefineTypegt

19
Data elements as instances
  • ltdataset srcmyfile.bingt
  • ltshort-16 varNameid/gt
  • ltarrayFixed varNamenamegt
  • ltcharacter-8 /gt
  • ltdim indexTo7 /gt
  • lt/arrayFixedgt
  • ltstruct varNamerecordgt
  • ltshort-16 /gt
  • ltfloat-32 /gt
  • lt/structgt
  • lt/datasetgt

20
Reference defined elements
  • ltdefinitionsgt
  • ltdefineType typeNameAgt
  • ltstructgt
  • ltshort-16/gt
  • ltinteger-32/gt
  • lt/structgt
  • ltdefineTypegt
  • lt/definitionsgt
  • ltdataset srcmyfile.bingt
  • ltuseType typeNameA varNameFirstUse/gt
  • ltuseType typeNameA varNameSecondUse/gt
  • lt/datasetgt

21
The BinX Library
  • Alpha version

22
Fundamental requirements
  • Access to data elements in binary files via BinX
  • Parse the BinX document
  • Build in-memory data structures
  • Read data values from the binary file
  • Automatic conversion
  • Byte ordering
  • Padding
  • Producing BinX document and binary data
  • Generate BinX document for data structures
  • Save assigned data values into binary files

23
General use cases
  • Data conversion (byte order)
  • Data extraction (sub-dataset)
  • Data combination (two arrays to one)
  • Data presentation (browse, pure XML)

24
BinX Components
  • The library has core functionality to support
    generic utilities and applications

Applications
BinX core functionality Parse BinX document
Read binary data
Utilities
BinX Library Core
Generic tools Data conversion Extraction
Packing/Unpacking
Applications Domain-specific
25
The BinX library core
  • Input SchemaBinX, binary data file
  • Output DataBinX, In-memory dataset

In-memory Data structure (Values loaded on
demand)
ltdatasetgt lt/datasetgt
The BinX library
0101010101
ltshort-16gt 100 lt/short-16gt
26
The BinX Utilities
  • DataBinX generator
  • DataBinX splitter
  • SchemaBinX creator
  • Binary file indexer

27
DataBinX generator
  • Put binary data inside XML
  • For browsing, web service return, query result
    set

ltdatasetgt lt/datasetgt
The BinX library
ltshort-16gt 100 lt/short-16gt
0101010101
28
DataBinX splitter
  • The reverse of DataBinX generator
  • Generate binary file for testing, transportation
  • Cross-platform (byte order)

ltdatasetgt lt/datasetgt
The BinX library
ltshort-16gt 100 lt/short-16gt
0101010101
29
SchemaBinX creator
  • GUI and Web-based utilities
  • Build BinX document interactively
  • Create a BinX document based on another

30
Binary file indexer
  • Generating indices for binary data files
  • Such indices can be used for fast data access

ltdatasetgt lt/datasetgt
The BinX library
0101010101
31
Applications for astronomy
  • FITS and VOTable conversion

DataBinX Utility
BinX library Core
SIMPLE T END 01010101
lt?xml version. ltVOTABLEgt lt/VOTABLEgt
32
FITS ?DataBinX ?VOTable
  • FITS to VOTable conversion

DataBinx Utility
FITS
XSLT transformer
DataBinx
Schema BinX
Preprocessor
VOTable
XSLT
33
VOTable?DataBinX?FITS
  • VOTable to FITS conversion

Schema BinX
DataBinx Utility
XSLT transformer
VOTable
Binary Data
DataBinx
XSLT
Post processor
Preprocessor
FITS
FITS Header
34
FITS-VOTable experiment
  • Sample FITS file
  • A data table of 82 rows X 20 fields
  • File size 37KB
  • Generated DataBinx by DataBinx utility
  • Time spent 268 ms
  • DataBinx document size 1.2MB
  • VOTable transformed by MSXML
  • Time spent about 1 second
  • VOTable document size 51KB

35
Possible future releases
  • DataBinX parsing
  • Utilities (GUI BinX editor)
  • XPath-based data query
  • DFDL support
  • Preserving special tags
  • For comments, application-specific tags
  • Text file support

36
Features or issues to consider
  • Converting floating point numbers
  • 80-bit, 96-bit, 128-bit floating point
  • Array manipulation (slice, section)
  • SAX-based XML document parsing
  • Use cases in place of DOM parsing
  • Built in the library or as add-on component?
  • Database support
  • Annotating database tables?
  • Query database tables through BinX?
  • Java version of the library
  • Keeping exactly the same features with the C
    version?
  • Supporting XQuery
  • Query binary data files with XQuery on BinX

37
Support
  • For problems of usage
  • http//www.edikt.org/binx (coming soon)
  • support_at_edikt.org
  • For requirements and suggestions
  • tedwen_at_edikt.org
  • robertc_at_edikt.org
Write a Comment
User Comments (0)
About PowerShow.com