Medians and Beyond: New Aggregation Techniques for Sensor Networks - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Medians and Beyond: New Aggregation Techniques for Sensor Networks

Description:

Only an approximation of quantile is possible. Related work and ... Inverse Quantile: Given value x, determine its rank in the sorted sequence of input values ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 28
Provided by: bibudh
Category:

less

Transcript and Presenter's Notes

Title: Medians and Beyond: New Aggregation Techniques for Sensor Networks


1
Medians and Beyond New Aggregation Techniques
for Sensor Networks
  • Shrivastava, Buragohain, Agrawal, Suri
  • Presented by Bibudh Lahiri

2
Motivation
  • Power constraint makes communication expensive
  • Individual sensor readings are
  • inherently unreliable
  • Sending all data to a central node is inefficient

3
Motivation (contd.)
  • Energy efficient query processing
  • Individual sensor readings do not hold much value
  • Gather aggregate measures rather than extracting
    all the data
  • Single-values aggregates like AVG or SUM can be
    largely affected by even a few outliers

4
Goals and Challenges
  • Goal
  • Estimating data distribution at the base station
    in an energy efficient manner while providing
    strict error guarantees
  • Challenges
  • MEDIAN needs to keep track of all distinct values
  • Message size and memory to store it grows
    linearly with the size of the network

5
Solution approach
  • Approximation schemes
  • Adaptable to meet any user specified tolerance
  • Expenses
  • Higher memory
  • Higher bandwidth consumption

6
Model
  • Integer readings in range 1,s
  • Sensors arranged in a spanning tree, rooted at
    the base station
  • No loops or duplicate packets
  • No packet loss
  • Only an approximation of quantile is possible

7
Related work and their limitations
  • TinyDB does not perform any in-network
    aggregation techniques for MEDIAN
  • Complex queries like contours are provided
    without any strict bounds on error
  • In data stream, data can be examined only once
  • In sensor networks the data is stored, but
    distributed

8
The Quantile Digest
  • Captures the distribution of sensor data
    approximately
  • Properties
  • Error-memory trade-off Users can decide
    appropriate message size and error trade-offs
  • Confidence Factor Strict error bound for any
    answer
  • Multiple Queries Data aggregated for one query
    can be reused for other

9
Properties of q-digest
  • Consists of a set of buckets with associated
    counts
  • Possible value space
  • Each node can be considered a bucket and has a
    range v.min, v.max
  • A q-digest is a subset of these possible buckets
    and their counts

10
Rules to build the digest
  • Compression parameter k
  • A particular sensor s has at its disposal n data
    values
  • No node should have a high count unless it is a
    leaf node
  • count(v) floor(n/k)
  • If two adjacent sibling buckets have low counts,
    do not include two separate counters for them.
    Merge the children into their parent
  • count(v) count(vp) count(vs) gt floor(n/k)

11
Q-Digest
12
The compression algorithm
  • COMPRESS(Q,n, k)
  • l log s - 1
  • while l gt 0 do
  • for all v in level l do
  • if count(v) count(vs) count(vp) lt
    floor(n/k)
  • count(vp) count(v) count(vs)
  • delete v and vs from Q
  • end if
  • end for
  • l l 1
  • end while

13
Building a q-digest
14
How does k influence the digest?
  • Lower k implies higher n/k
  • Higher k implies more chance that a given node
    gets merged with its children
  • Higher degree of compression

15
Observations about the digest
  • The world remembers you if you show up more
    often
  • Detailed information concerning data values which
    occur frequently
  • are preserved in the digest
  • Less frequently occurring values
  • are lumped into larger buckets resulting in
    information loss

16
Merging q-digests
17
Space Complexity Error Bound
  • A q-digest constructed with compression parameter
    k has a size at most 3k
  • In a q-digest created using the compression
    factor k, the maximum error in count of any node
    is
  • (nlog s)/k

18
Space Complexity Error Bound (contd.)
  • Given p q-digests Q1,Q2, ...Qp, built on n1, n2,
    ...np values, each with maximum relative error of
    (log s)/k, the algorithm MERGE combines them into
    a q-digest for ?ni values, with the same relative
    error
  • Given memory m to build a q-digest, it is
    possible to answer any quantile query with error
    e such that e 3 (log s)/m

19
Quantile query
  • Given a fraction q e (0,1), find the value whose
    rank in sorted sequence is qn
  • Strategy
  • Do a post-order traversal, list
  • ltnodeid, countgt tuples
  • This arranges nodes in increasing order of right
    end-points
  • Scan the list adding the counts. When sum gt qn,
    report v.max as the quantile estimate

20
Quantile query (contd.)
  • Take q 0.5 (median)
  • qn 0.515 7.5
  • ltnodeid, countgt tuples by post-order
    traversal is lt10,4gt,lt11,6gt, lt6,2gt, lt7,2gt, lt1,1gt
  • Cumulative count at lt11,6gt is 4 6 10 gt 7.5.
    So estimated median is 4.

21
Other queries
  • Inverse Quantile Given value x, determine its
    rank in the sorted sequence of input values
  • Range Query Find the number of values in the
    given range low, high
  • Frequent items Given a fraction s e (0, 1), find
    all the values which are reported by more than
    sn sensors

22
Simulation setup
  • Experiment with random and correlated values
  • Compared results with naïve un-aggregated data
    scheme list
  • At each node, list contains all the distinct
    sensor values that occur in the subtree rooted at
    the node

23
Range Queries and Histograms
24
Accuracy and Message Size
Error declines very rapidly with growing message
size.
25
Accuracy and Message Size
  • Regardless of n, q- digest needs messages no
    bigger than 400 bytes to achieve 2 accuracy
  • For random data, the size for list increases
    steadily with n
  • For correlated data, only 1500 distinct values,
    so maximum message size for list plateaus with
  • increasing number of sensors

26
Power consumption
  • Total power consumption is roughly proportional
    to total amount of data transmitted in the
    network

27
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com