Diverse data to diverse visualization systems end to end - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Diverse data to diverse visualization systems end to end

Description:

Other work focusses on these aspects ... We focus on this here. 10/14/09. show me. 12. 12. UK e-Science AHM2004 - 1 September 2004 ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 23
Provided by: julian100
Category:

less

Transcript and Presenter's Notes

Title: Diverse data to diverse visualization systems end to end


1
  • Diverse data to diverse visualization systems end
    to end
  • Julian Gallop
  • CCLRC Rutherford Appleton Laboratory

2
Outline
  • The present situation
  • Introduction to solution
  • More details
  • Assessment and further work
  • Acknowledgements

3
diversity
diversity of visualization systems
  • diversity of data sources
  • text, de facto, legacy
  • Comma Separated Values (CSV)-
  • Text values allowed/disallowed
  • Missing values
  • netCDF
  • HDF5
  • FEA data
  • Growth of XML-based data
  • VisAD
  • Matlab
  • AVS
  • Iris Explorer
  • vtk
  • gnuplot
  • - IDL
  • PV3
  • ArcInfo
  • XMDV
  • Excel
  • R
  • Although, we refer mainly to visualization
    systems, most of what follows applies to more
    general data analysis tools too

4
Effect of the Grid on diversity
  • Grid developments aim to bring about more
    effective use of data
  • Find and access any data that you are entitled to
    use
  • Trend towards using XML for descriptive purposes
  • However, the diversity of data structures
    remains
  • Valuable ( even irreplaceable) legacy data
    holdings will continue
  • XML developments initiated within application
    domains (e.g. marine data, earth science)
  • Grid Virtual Organisations (VOs) form ? change
    ? disperse
  • Suppose we have a collaborating group
  • Multidisciplinary knowledge of multiple data
    sources
  • Multiple preferred visualization systems

5
  • So, there is still a gap to be bridged between -
  • Conventional approaches to this-
  • - No problem, I only use one combination
  • - or Thats easy, Ill write a converter
  • or Collaborating team agrees to use just one
    viz system
  • But Grid-enabled VO encourages teams that
  • form, change and disperse
  • and are multidisciplinary

multiple data formats and models
multiple preferred visualization systems
and
Precious legacy data
Programming script oriented e.g. Matlab
Satellite data HDF5
MVE e.g. Iris Explorer
New data
Joe Bloggs data
Application-oriented XML
Toolkit e.g. VisAD, PV3
??
6
Many characterisations of data sources
  • legacy / current / being planned
  • de facto / self-describing non-XML / XML
  • metadata no / some / good
  • application dependent / independent
  • text / binary
  • access private (by intent / by default) /
    restricted / public
  • spatial / non-spatial
  • regular / irregular
  • references none / rich (e.g. FEA cells, GIS,
    networks)
  • dimensions single / three / many
  • DBMS / or not
  • defined by API / format

7
Many characterisations of visualization and data
analysis systems
  • fixed function
  • adaptable by API / scripts / visual networks
  • API C / Java / Python / etc
  • dimensions single only / volume / multivariate
  • regular only / irregular possible
  • formats readable
  • native / other popular / netCDF / HDF(5) /
    limited XML
  • purchase cost none / cheap /expensive

8
  • Investigate whether we can do this instead

Precious legacy data
Programming script oriented e.g. Matlab
Satellite data HDF5
MVE e.g. Iris Explorer
New data
Joe Bloggs data
Application-oriented XML
Toolkit e.g. VisAD, PV3
Investigate moving from mxn to mn .. In
more detail Axmxn to Bxm Cxn D (and avoid
making B,C,D too big)
9
A possible framework
  • This work is investigating a possible framework.
  • Some general principles
  • Make use of XML for description wide acceptance
    and supported by wide range of conversion tools
  • Convert description to XML as soon as possible in
    the chain
  • No requirement for tagging each datum with XML
    (unless the data source does this already)
  • Adopt a descriptive approach, not prescriptive
    (approach followed by BinX, DFDL, ESML)
  • Decompose transformations into single purpose
    components, which could potentially be located in
    different places
  • Use existing tools such as XSLT or, when more
    complexity required, XQuery
  • Avoid undue gross loss of speed e.g. avoid
    repeated conversions of very large datasets

10
  • 2 approaches
  • (1) Use an intermediate form
  • Data source ? bridge ? ready for vis system
  • Presence of intermediate form may make this
    easier to understand
  • Suitable for small amounts of data
  • (2) Convert in one transformation
  • Requires analysis of data source and vis system
  • Requires creating a converter instance
  • Using one transformation may be suitable for
    large data object

11
Converting metadata
  • Metadata has several aspects which include
  • Essential information about the data
  • e.g. circumstances in which the data was obtained
  • Other work focusses on these aspects
  • Here, we provide a mechanism for delivering them
    to the visualization/analysis system
  • Structure of the data
  • We focus on this here

12

data source expertise
visualization/analysis expertise
input to visualization / analysis system
data source
DataBridgeML
metadata
metadata
metadata
convert
convert
all except structure
all except structure
all except structure
structure info
structure info
structure info
data object
Metadata converted using an bridge ML referred
at present as DataBridgeML Conversion of data
object deferred
13
Converting the large data object
  • Next, we deal with converting the large data
    object.
  • For performance reasons, we wish to avoid
    converting this twice

14

data source expertise
visualization/analysis expertise
input to visualization / analysis system
data source
DataBridgeML
structure info
structure info
structure info
vis system capabilities
Convert
data object
data object
Conversion of large data object Converter depends
on structure information and the visualization
system capabilities
15
Converting the data structure
  • Actions of the data converter include
  • Resequence Extract Convert ASCII/binary
    Convert representation of member elements
    Manage separators Split/Combine files
  • In each specific instance, actions required
    depend on
  • Description of the data structure of the data
    source
  • Description of data capabilities of the
    visualization system
  • Description of the required subset
  • Recognise easy cases e.g.
  • no conversion required
  • no resequencing required
  • Note the conversion itself does not require XML
    processing XML is only used to describe it.
  • Current status still under investigation

16
Use of XML
  • So, we need XML languages for
  • the data bridge
  • specifying the subset to be delivered
  • specifying the visualization system capabilities
    for reading data

17
Candidates for DataBridgeML
  • Requires
  • Ability to describe wide range of data sources
  • Need high level description of data structure
    (e.g. arrays and tables, not just the low level
    detail). Needs to support data object converter
    instance so experimentation with the markup
    needed here.
  • Need to be able to specify how the dataset is
    accessed e.g. ftp, DODS, user/password
    required, HTML table (or could separate this
    out). Current Grid middleware developments could
    simplify this (e.g. OGSA-DAI)
  • Some relevant existing markup languages
  • BinX
  • DFDL being discussed within Global Grid Forum
  • XDF developed at NASA Goddard Centre to convert
    their archives
  • Currently using slightly modified XDF in the
    interim, but also tracking DFDL developments.

18
Data source example
  • Gridded Population of the World at CIESIN at
    Columbia University.
  • Contains the population for each
    latitude/longitude cell
  • Available as a directory of files via FTP
  • Description file for whole dataset
  • Dataset divided into files by continent
  • For each continent file
  • 2 metadata files contents include extent of
    data in longitude and latitude number of
    entries on each axis projection
  • 1 file containing header and data
  • Header includes value for missing data
  • Although comparatively simple, it raises issues
    regarding metadata requires knowledge of the
    circumstances
  • Some metadata is duplicated, but not identical
  • No units easily extractable (000s, 000000s ?)

19
Assessment (1/2)
  • Diversity of data sources and visualization
    systems likely to continue.
  • Framework for transforming metadata and data
    structure from data sources to visualization
    systems is presented here. It splits the
    knowledge required between data source expertise
    and visualization system expertise
  • Concept appears to be feasible, but further work
    needed (next slide)
  • Framework could be populated with a (distributed)
    catalogue of data source descriptions and
    visualization system descriptions.

20
Assessment (2/2)
  • Further work
  • Further prototyping needed, particularly on
    specifying the conversion of data objects
  • Requires further tests of XML-based descriptions
    and investigation of the relation to other
    initiatives such as BinX and DFDL (daffodil)
  • Need to validate feasibility with more complex
    cases (more complex data sources adaptable
    visualization systems)

21
Acknowledgements
  • This investigation has been part of the gViz
    project (Visualization Middleware for e-Science)
  • Project in the e-Science core programme, ended 31
    July 2004
  • Partners
  • Universities of Leeds, Oxford and Oxford Brookes
    CCLRC RAL IBM Nag and Streamline Computing
  • major work in the project included
  • Visualization for computational steering on the
    Grid
  • talk by Ken Brodlie in the Mini Workshop
    Computational Steering and Visualisation on the
    Grid Practice and Experience
  • Generalization of data flow networks for
    visualization, using an XML-based language (sKML)

22
Invitation
  • I am interested in evaluating the approach on
    different classes of data source
  • If you use or are responsible for datasets which
    may be of interest, please contact me
  • Email Julian.Gallop_at_rl.ac.uk
  • or see me at this meeting
  • Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com