Indexing Scientific Data With FastBit - PowerPoint PPT Presentation

About This Presentation
Title:

Indexing Scientific Data With FastBit

Description:

Find the collision events with the most distinct signature of Quark ... Working on extensions to AMR mesh (Kurt), GTC mesh (John), and tetrahedral mesh (Rishi) ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 6
Provided by: joh5150
Learn more at: https://sdm.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: Indexing Scientific Data With FastBit


1
Indexing Scientific Data With FastBit
  • Motivating Examples
  • Find the collision events with the most distinct
    signature of Quark Gluon Plasma
  • Find the ignition kernels in a combustion
    simulation
  • Track a layer of exploding supernova
  • These are not typical database searches
  • Large high-dimensional data sets (1000 time steps
    X 1000 X 1000 X 1000 cells X 100 variables)
  • Most data records never modified, i.e.,
    append-only data
  • Multi-dimensional queries 500 lt Temp lt 1000
    CH3 gt 10-4
  • Large answers (hit thousands or millions of
    records)
  • Seek collective features e.g., regions of
    interest, not average and sum operations
  • New searching technology needed

2
A Good Candidate Bitmap Index
  • First commercial version
  • Model 204, P. ONeil, 1987
  • Take less time to build than B-trees
  • Efficient for querying only bitwise logical
    operations
  • A lt 2 ? b0 OR b1
  • A gt 2 ? b3 OR b4 OR b5
  • Efficient for multi-dimensional queries
  • Use bitwise operations to combine the partial
    results
  • Size may be large one bit per distinct value per
    row
  • Definition Cardinality number of distinct
    values
  • Compact for low cardinality attributes, say,
    cardinality lt 100
  • Worst case cardinality N, number of rows
    index size NN bits

Data values
b0
b1
b2
b3
b4
b5
  • First commercial version
  • Model 204, P. ONeil, 1987
  • Take less time to build than B-trees
  • Efficient for querying only bitwise logical
    operations
  • A lt 2 ? b0 OR b1
  • A gt 2 ? b3 OR b4 OR b5
  • Efficient for multi-dimensional queries
  • Use bitwise operations to combine the partial
    results
  • Size may be large one bit per distinct value per
    row
  • Definition Cardinality number of distinct
    values
  • Compact for low cardinality attributes, say,
    cardinality lt 100
  • Worst case cardinality N, number of rows
    index size NN bits

0
1
2
3
4
5
0 1 5 3 1 2 0 4 1
1 0 0 0 0 0 1 0 0
0 1 0 0 1 0 0 0 1
0 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0
A lt 2
2 lt A
A lt 2
3
Compression Makes It Better
Example 2015 bits
10000000000000000000011100000000000000000000000000
000.0000000000000000000000000000000111111111
1111111111111111
Main Idea Use run-length-encoding,
but... partition bits into 31-bit groups not 32
bit on 32-bit machines
  • Name Word-Aligned Hybrid (WAH) code
  • Key features
  • Compressed indices typically 30 of raw data
  • 10X faster in answering queries than the most
    competitive bitmap index
  • Worst case index size 4N words, not NN

4
Handling Collective FeaturesRegions of Interest
FastBit
Data
Query
Region Growing
Index
Region Tracking
  • FastBit has been used in
  • GridCollector for High-Energy Physics Experiment
    STAR
  • Dexterous Data Explorer (DEX) for query driven
    visualization
  • Dynamic histograming for network traffic analysis
  • On the right is an illustration of our
    region-growing approach

2-D connected regions identified with line
segments (in green) Line segments come out of
FastBit compressed bitmaps
5
Future Plans
  • Software development
  • Release FastBit under LGPL (John, March 07)
  • Fastbit Integration with ROOT (John, Sept 07)
  • Fastbit Integration with HDF5 for Particle
    Physics (Kurt)
  • Finding Regions of Interest
  • Existing work only dealt with data on regular
    meshes
  • Working on extensions to AMR mesh (Kurt), GTC
    mesh (John), and tetrahedral mesh (Rishi)
  • New features (research)
  • Parallel version
  • Table groups / partitions
  • Range join
Write a Comment
User Comments (0)
About PowerShow.com