Medians and Beyond: New Aggregation Techniques for Sensor Networks - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Medians and Beyond: New Aggregation Techniques for Sensor Networks

Description:

Only an approximation of quantile is possible. Related work and ... Inverse Quantile: Given value x, determine its rank in the sorted sequence of input values ... – PowerPoint PPT presentation

Number of Views:176

Avg rating:3.0/5.0

Slides: 28

Provided by: bibudh

Category:

more less

Transcript and Presenter's Notes

Title: Medians and Beyond: New Aggregation Techniques for Sensor Networks

1
Medians and Beyond New Aggregation Techniques
for Sensor Networks

Shrivastava, Buragohain, Agrawal, Suri
Presented by Bibudh Lahiri

2
Motivation

Power constraint makes communication expensive
Individual sensor readings are
inherently unreliable
Sending all data to a central node is inefficient

3
Motivation (contd.)

Energy efficient query processing
Individual sensor readings do not hold much value
Gather aggregate measures rather than extracting
all the data
Single-values aggregates like AVG or SUM can be
largely affected by even a few outliers

4
Goals and Challenges

Goal
Estimating data distribution at the base station
in an energy efficient manner while providing
strict error guarantees
Challenges
MEDIAN needs to keep track of all distinct values
Message size and memory to store it grows
linearly with the size of the network

5
Solution approach

Approximation schemes
Adaptable to meet any user specified tolerance
Expenses
Higher memory
Higher bandwidth consumption

6
Model

Integer readings in range 1,s
Sensors arranged in a spanning tree, rooted at
the base station
No loops or duplicate packets
No packet loss
Only an approximation of quantile is possible

7
Related work and their limitations

TinyDB does not perform any in-network
aggregation techniques for MEDIAN
Complex queries like contours are provided
without any strict bounds on error
In data stream, data can be examined only once
In sensor networks the data is stored, but
distributed

8
The Quantile Digest

Captures the distribution of sensor data
approximately
Properties
Error-memory trade-off Users can decide
appropriate message size and error trade-offs
Confidence Factor Strict error bound for any
answer
Multiple Queries Data aggregated for one query
can be reused for other

9
Properties of q-digest

Consists of a set of buckets with associated
counts
Possible value space
Each node can be considered a bucket and has a
range v.min, v.max
A q-digest is a subset of these possible buckets
and their counts

10
Rules to build the digest

Compression parameter k
A particular sensor s has at its disposal n data
values
No node should have a high count unless it is a
leaf node
count(v) floor(n/k)
If two adjacent sibling buckets have low counts,
do not include two separate counters for them.
Merge the children into their parent
count(v) count(vp) count(vs) gt floor(n/k)

11
Q-Digest
12
The compression algorithm

COMPRESS(Q,n, k)
l log s - 1
while l gt 0 do
for all v in level l do
if count(v) count(vs) count(vp) lt
floor(n/k)
count(vp) count(v) count(vs)
delete v and vs from Q
end if
end for
l l 1
end while

13
Building a q-digest
14
How does k influence the digest?

Lower k implies higher n/k
Higher k implies more chance that a given node
gets merged with its children
Higher degree of compression

15
Observations about the digest

The world remembers you if you show up more
often
Detailed information concerning data values which
occur frequently
are preserved in the digest
Less frequently occurring values
are lumped into larger buckets resulting in
information loss

16
Merging q-digests
17
Space Complexity Error Bound

A q-digest constructed with compression parameter
k has a size at most 3k
In a q-digest created using the compression
factor k, the maximum error in count of any node
is
(nlog s)/k

18
Space Complexity Error Bound (contd.)

Given p q-digests Q1,Q2, ...Qp, built on n1, n2,
...np values, each with maximum relative error of
(log s)/k, the algorithm MERGE combines them into
a q-digest for ?ni values, with the same relative
error
Given memory m to build a q-digest, it is
possible to answer any quantile query with error
e such that e 3 (log s)/m

19
Quantile query

Given a fraction q e (0,1), find the value whose
rank in sorted sequence is qn
Strategy
Do a post-order traversal, list
ltnodeid, countgt tuples
This arranges nodes in increasing order of right
end-points
Scan the list adding the counts. When sum gt qn,
report v.max as the quantile estimate

20
Quantile query (contd.)

Take q 0.5 (median)
qn 0.515 7.5
ltnodeid, countgt tuples by post-order
traversal is lt10,4gt,lt11,6gt, lt6,2gt, lt7,2gt, lt1,1gt
Cumulative count at lt11,6gt is 4 6 10 gt 7.5.
So estimated median is 4.

21
Other queries

Inverse Quantile Given value x, determine its
rank in the sorted sequence of input values
Range Query Find the number of values in the
given range low, high
Frequent items Given a fraction s e (0, 1), find
all the values which are reported by more than
sn sensors

22
Simulation setup

Experiment with random and correlated values
Compared results with naïve un-aggregated data
scheme list
At each node, list contains all the distinct
sensor values that occur in the subtree rooted at
the node

23
Range Queries and Histograms
24
Accuracy and Message Size
Error declines very rapidly with growing message
size.
25
Accuracy and Message Size

Regardless of n, q- digest needs messages no
bigger than 400 bytes to achieve 2 accuracy
For random data, the size for list increases
steadily with n
For correlated data, only 1500 distinct values,
so maximum message size for list plateaus with
increasing number of sensors

26
Power consumption