Compressing Query Results for Mobile Clients - PowerPoint PPT Presentation

About This Presentation
Title:

Compressing Query Results for Mobile Clients

Description:

Why not just GZip or WinZip? Small decompression unit for PDA ... Chosen combinations beat universal compression tools like WinZip. 7. Related Work ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 18
Provided by: csCor
Category:

less

Transcript and Presenter's Notes

Title: Compressing Query Results for Mobile Clients


1
Compressing Query Results for Mobile Clients
  • Zhiyuan Chen and Praveen Seshadri
  • Cornell University

2
Motivation(1)
Results
Database Server
Slow network!!!
9.6 - 170 kbps, 1 - 10 minutes for 1 MB
Compress results on the server to save network
bandwidth.
Concern compress as much as possible.
3
Motivation(2)
Database Server
Results
  • I am a PDA, I cant store that much for later
    use!
  • Often work offline. Need to store results for
    later use.
  • Severe storage constraints.
  • Usually only a small portion of results will be
    accessed, random access possible.

Store the result compressed, decompress on demand.
Both space decompression cost matters!
4
Why not just GZip or WinZip?
  • Small decompression unit for PDA
  • Utilize information of the query result
  • Choose a combination of compression methods based
    on semantic and statistical information of the
    result.
  • Because different attributes have different
    characteristics, there is no unique winner.
  • The choice is made by cost-based optimization.

reduce decompression cost
More compression
5
An Example
  • Query
  • Select Year, Month, Day, Ticker, Low, High
  • From Quotes where Year is between 1998 and 1999
  • Ordered by Year, Month, Day

Semantic compression
Universal compression
Ziv-Lempel on each field
6
Outline
  • Related work
  • An algebraic framework
  • To represent the valid combinations.
  • Compression optimization
  • To choose the best combination.
  • Experiments
  • Chosen combinations beat universal compression
    tools like WinZip.

7
Related Work
  • Compression community
  • Different compression methods. Run-length
    encoding, Ziv-Lempel, differential, etc.
  • Database community compress tables or indices
  • Specific compression methods
  • Iyer Wilhite - tuple level Ziv-Lempel.
  • Goldstein etc.- per-page offset-encoding.
  • Ng Ravishankar - tuple differential coding.
  • Theodore Johnson - compressing bitmap indices.
  • Impact on query processing
  • (RothHorn, Graefe Shapiro, Theodore)

8
Compression Framework
  • Compressed results as a compressed table.
  • A compression method as a compression operator
    Input table - output table.
  • A combination of compressing methods as a
    compression plan
  • Decompression operators and plans.

9
Compressed Table
  • Field and tuple boundaries may blur. - Compressed
    data blocks
  • Extra information enables decompression. -
    Compression schema

A compressed block a value compressed from some
cells in the uncompressed table. Also the unit of
decompression.
A compression schema includes Things
compressed to a compressed block, compression
method, relational schema.
10
Compression Operator
  • Defines how a compression method is applied on
    what part of the input table.
  • The fields the operator is applied on.
  • What input blocks will be compressed together.
  • The compression method and information used in
    compression.
  • Applied on Customer ID
  • Same Customer IDs compressed together.
  • Method grouping.

11
Compression Plan
A sequence of compression operators applied on
the original result table, each takes the output
of the previous operator as the input.
12
Optimization - Cost Model
  • Formula
  • Cost w1 compression cost
  • w2 decompression cost
  • - w3 saving on network transfer
  • - w4 saving on client side storage.
  • Adjust weight based on goal of compression.
  • Compression cost
  • CPU speed, compression plans, results size.
  • Decompression cost
  • Client processor speed, access pattern, etc.
    Provided by clients!
  • Network transfer saving - compressed results
    size.
  • Client side storage saving - If decompress on
    demand.

13
Searching
  • Naïve algorithm has exponential search space.
  • Heuristics
  • Consider semantic compression first.
  • For each field, use the naïve algorithm to find
    the best plan compressing this field only.
  • Combine these plans.
  • Consider universal compression methods.
  • Add Ziv-Lempel applied on each field only if this
    will reduce the overall cost.
  • Not complete but polynomial search space
  • O(of fields of valid plans on each field)

14
Experiments
  • Data
  • TPCD
  • Queries
  • Adapted from TPCD queries by deleting aggregates.
  • Experiment 1 To save network bandwidth.
  • Measure the overall end-to-end time.
  • Compression time transfer time decompression
    time.
  • Experiment 2 To save PDA clients storage.
  • Measure the space and random access time.
  • Compare Common tools v.s. chosen combinations.

15
  • Semantic compression - allow decompress
    individual attribute values. (S)
  • WinZip on the whole table. (W)
  • WinZip applied on each field. (PW)
  • Semantic compression WinZip applied on each
    field. (SPW)

Un/compressed S 3.1 W 3.4 PW 4.4 SPW6.5
Modem
Wireless
Internet
16
  • Compression plans
  • Semantic compression (S) - allow to decompress
    an individual tuple.
  • Windows CEs default compressor. (D)
  • Ziv-Lempel applied on each page. (Z)
  • Result size 3.7 MB. 2 MB Data storage, 2MB
    program storage. 50(1 for semantic compression)
    KB program size.

Storage usage and time to randomly access 1000
tuples

17
Summary
  • A combination of compression methods based on
    semantic and statistical information of the
    result.
  • Choice made by cost-based optimization.
  • A framework to model combinations of compression
    methods.
  • Future work
  • Apply the methodology to compress data tables.
  • Joint optimization of result compression query
    processing.
Write a Comment
User Comments (0)
About PowerShow.com