Rapid Access to Large MultiDimensional Datasets - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Rapid Access to Large MultiDimensional Datasets

Description:

University of Surrey. UniS - Peter Voke RasDaMan, Peter Baumann. Motivation ... requirements, awareness, UIG. University of Surrey (UK) HPC specialists . – PowerPoint PPT presentation

Number of Views:204
Avg rating:3.0/5.0
Slides: 17
Provided by: kkle3
Category:

less

Transcript and Presenter's Notes

Title: Rapid Access to Large MultiDimensional Datasets


1
Rapid Access to Large Multi-Dimensional Datasets
Peter Voke Fluids Research Centre University of
Surrey
2
Motivation
  • Sensor, image, statistics datashare array
    concept as common data abstraction
  • Multidimensional Discrete Data
  • Key characteristics
  • Dimensional
  • Discretised
  • Large!

3
The Problem
  • Currently the speed at which data can be moved
    in and out of secondary cache and tertiary
    storage systems is an order of magnitude less
    than the rate at which data can be processed by
    the CPU. High Performance Computers can operate
    at speeds exceeding a trillion operations per
    second, but I/O operations run closer to 10
    million bytes per second on state-of-the-art
    disk.
  • National Computational Science Alliance, USA,
    1998

Even with multi-terabyte local disk sub-systems
and multi-petabyte archives, I/O can become a
bottleneck in high performance computing.
Jeanette Jenness, LLNL, ASCI-Project, 1998
4
The Goal
  • Overcome the HPC data delivery bottleneck
  • user-centric system
  • data-oriented storage
  • thin client-side
  • use server-side power

5
Bottlenecks
  • Different bottlenecks can occur with different
    models
  • Large continuous input stream
  • Large input sets, but only selected parts
    required (101)
  • Time dependent input
  • Communication between coupled models via files
  • High output rates, sustained or in bursts
  • Support can be provided on hardware, software or
    application level

6
Storage Model
  • multidimensional object -gt set of n-D tiles
  • tile subarray
  • arbitrary tiling
  • under admin control

7
Tiling
8
Data Traffic Optimisation
  • Server-based evaluation
  • server does selection, aggregation, predicate
    evaluation
  • short path from source to processing
  • internal optimization on server
  • Send only needed results to client
  • tiling to read few data
  • minimal data transmitted over long path
  • What You Need Is What You Get

9
System Architecture
SQL,ODMG, Web
conventional base DBMS
10
User Interface
Web Interface
11
MD Database System
  • multidimensional DBMS
  • arrays, collections (sets) thereof
  • dynamic type system
  • multidimensional SQL, object-oriented APIs
  • Array algebra
  • intelligent query and storage optimisation

12
RasQL
  • selection section
  • select c , 100200, , 42 from
    ClimateSimulations as c
  • result processing
  • select img (img.green gt 130)from
    LandsatArchive as img
  • search aggregation
  • select mrifrom MRI as img, masks as amwhere
    some_cells( mri gt 250 and m )
  • data format conversion
  • select png( c , , 100, 42 ) from
    ClimateSimulations as c

13
ESTEDI
  • Addresses a main technical obstacle
  • the delivery bottleneck of large HPC results to
    the users
  • Augmenting high-volume data generators with
    flexible spatio-temporal DBMS support
  • basis multidimensional DBMS RasDaMan
  • developed in FP4
  • plus mass storage handling, parallelisation
  • European standard for the storage and retrieval
    of multidimensional HPC data
  • FP5 IST

www.estedi.org
14
ESTEDI Partners
  • Partners
  • ERCOFTAC ................. requirements,
    awareness, UIG
  • University of Surrey (UK)
  • HPC specialists ............. apps, use new tech,
    bring to community
  • CINECA (I), CLRC (UK), CSCS (CH), DKRZ/MPIM (D),
    DLR (D), IHPCDB (Russia), NUMECA (B)
  • database specialists ..... database technology
  • RasDaMan (D), FORWISS (D)
  • User Interest Group
  • University of Reading, Max-Planck Institute for
    Meteorology, SUNY - Stony Brook, University of
    Gent, University Libre de Bruxelles, Grundfos A/S
    - Denmark, ETH-Zurich ...

15
Pilots
  • Each HPC site has
  • individual domain, platform, application tools
  • identical RasDaMan DBMS installation

16
Summary
  • Overcome HPC data access bottleneck
  • approach coupling data generator with DBMS
  • user-centric data handling data cubes instead of
    file sets
  • Implement common platform application pilots
  • user-driven, in-practice evaluation under
    real-life conditions
  • Define common HPC array data mgmt standard
  • core HPC app fields covered
  • Mining Large Arrays
  • essential building block for The Grid

17
Links
  • RasDaMan
  • http//www.rasdaman.com
  • http//www.rasdaman.com/Product
  • User Interest Group
  • http//vortex.mech.surrey.ac.uk/estedi/
  • gives access to HPC pilot sites
  • ESTEDI
  • http//www.estedi.org/
  • information on the ESTEDI project
Write a Comment
User Comments (0)
About PowerShow.com