Title: TAG: A Tiny Aggregation Service for AdHoc Sensor Networks
1TAG A Tiny Aggregation Service for Ad-Hoc
Sensor Networks
- Samuel Madden
- UC Berkeley
- with
- Michael Franklin, Joseph Hellerstein, and Wei
Hong - December 9th, 2002 _at_ OSDI
2TAG Introduction
- What is a sensor network?
- Programming Sensor Networks Is Hard
- Declarative Queries Are Easy
- Tiny Aggregation (TAG) In-network processing via
declarative queries! - Example
- Vehicle tracking application 2 weeks for 2
students - Vehicle tracking query took 2 minutes to write,
worked just as well!
SELECT MAX(mag) FROM sensors WHERE mag
thresh EPOCH DURATION 64ms
3Overview
- Sensor Networks
- Queries in Sensor Nets
- Tiny Aggregation
- Overview
- Simulation Results
4Overview
- Sensor Networks
- Queries in Sensor Nets
- Tiny Aggregation
- Overview
- Simulation Results
5Device Capabilities
- Mica Motes
- 8bit, 4Mhz processor
- Roughly a PC AT
- 40kbit CSMA radio
- 4KB RAM, 128K flash, 512K EEPROM
- TinyOS based
- Variety of other, similar platforms exist
- UCLA WINS, Medusa, Princeton ZebraNet, MIT
Cricket
6Sensor Net Sample Apps
- Habitat Monitoring Storm petrels on great duck
island, microclimates on James Reserve.
Earthquake monitoring in shake-test sites.
Vehicle detection sensors along a road, collect
data about passing vehicles.
- Traditional monitoring apparatus.
7Metric Communication
- Lifetime from one pair of AA batteries
- 2-3 days at full power
- 6 months at 2 duty cycle
- Communication dominates cost
- 100s of uS to compute
- 30mS to send message
- Our metric communication!
8Communication In Sensor Nets
- Radio communication has high link-level losses
- typically about 20 _at_ 5m
- Ad-hoc neighbor discovery
- Tree-based routing
9Overview
- Sensor Networks
- Queries in Sensor Nets
- Tiny Aggregation
- Overview
- Optimizations Results
10Declarative Queries for Sensor Networks
- Examples
- SELECT nodeid, light
- FROM sensors
- WHERE light 400
- EPOCH DURATION 1s
1
11Overview
- Sensor Networks
- Queries in Sensor Nets
- Tiny Aggregation
- Overview
- Optimizations Results
12TAG
- In-network processing of aggregates
- Common data analysis operation
- Aka gather operation or reduction in
programming - Communication reducing
- Benefit operation dependent
- Across nodes during same epoch
- Exploit semantics improve efficiency!
13Query Propagation
SELECT COUNT()
14Pipelined Aggregates
Value from 2 produced at time t arrives at 1 at
time (t1)
- In each epoch
- Each node samples local sensors once
- Generates partial state record (PSR)
- local readings
- readings from children
- Outputs PSR from previous epoch.
- After (depth-1) epochs, PSR for the whole tree
output at root
Value from 5 produced at time t arrives at 1 at
time (t3)
- To avoid combining PSRs from different epochs,
- sensors must cache values from children
15Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Depth d
16Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 1
1
Sensor
1
1
1
Epoch
1
17Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 2
3
Sensor
1
2
2
Epoch
1
18Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 3
4
Sensor
1
3
2
Epoch
1
19Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 4
5
Sensor
1
3
2
Epoch
1
20Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 5
5
Sensor
1
3
2
Epoch
1
21Aggregation Framework
- As in extensible databases, we support any
aggregation function conforming to
Aggnfinit, fmerge, fevaluate finita0 ?
Fmerge, ? Fevaluate ?
aggregate value (Merge associative, commutative!)
Partial State Record (PSR)
Example Average AVGinit v ?
AVGmerge , ? S2 , C1 C2 AVGevaluate ? S/C
22Types of Aggregates
- SQL supports MIN, MAX, SUM, COUNT, AVERAGE
- Any function can be computed via TAG
- In network benefit for many operations
- E.g. Standard deviation, top/bottom N, spatial
union/intersection, histograms, etc. - Compactness of PSR
23Taxonomy of Aggregates
- TAG insight classify aggregates according to
various functional properties - Yields a general set of optimizations that can
automatically be applied
24TAG Advantages
- Communication Reduction
- Important for power and contention
- Continuous stream of results
- In the absence of faults, will converge to right
answer - Lots of optimizations
- Based on shared radio channel
- Semantics of operators
25Simulation Environment
- Evaluated via simulation
- Coarse grained event based simulator
- Sensors arranged on a grid
- Two communication models
- Lossless All neighbors hear all messages
- Lossy Messages lost with probability that
increases with distance
26Simulation Result
- Simulation Results
- 2500 Nodes
- 50x50 Grid
- Depth 10
- Neighbors 20
Some aggregates require dramatically more state!
27Optimization Channel Sharing (Snooping)
- Insight Shared channel enables optimizations
- Suppress messages that wont affect aggregate
- E.g., MAX
- Applies to all exemplary, monotonic aggregates
28Optimization Hypothesis Testing
- Insight Guess from root can be used for
suppression - E.g. MIN
- Works for monotonic exemplary aggregates
- Also summary, if imprecision allowed
- How is hypothesis computed?
- Blind or statistically informed guess
- Observation over network subset
29Experiment Hypothesis Testing
- Uniform Value Distribution, Dense Packing, Ideal
Communication
30Optimization Use Multiple Parents
- For duplicate insensitive aggregates
- Or aggregates that can be expressed as a linear
combination of parts - Send (part of) aggregate to all parents
- In just one message, via broadcast
- Decreases variance
31Multiple Parents Results
- Better than previous analysis expected!
- Losses arent independent!
- Insight spreads data over many links
32Summary
- TAG enables in-network declarative query
processing - State dependent communication benefit
- Transparent optimization via taxonomy
- Hypothesis Testing
- Parent Sharing
- Declarative queries are the right interface for
data collection in sensor nets! - Easier to program and more efficient for vast
majority of users
TinyDB Release Available - http//telegraph.cs.ber
keley.edu/tinydb
33Questions?
- TinyDB Demo After The Session
34TinyOS
- Operating system from David Cullers group at
Berkeley - C-like programming environment
- Provides messaging layer, abstractions for major
hardware components - Split phase highly asynchronous, interrupt-driven
programming model
Hill, Szewczyk, Woo, Culler, Pister. Systems
Architecture Directions for Networked Sensors.
ASPLOS 2000. See http//webs.cs.berkeley.edu/tos
35In-Network Processing in TinyDB
- SELECT AVG(light)
- EPOCH DURATION 4s
- Cost metric msgs
- 16 nodes
- 150 Epochs
- In-net loss rates 5
- External loss 15
- Network depth 4
36Grouping
- Recall GROUP BY expression partitions sensors
into distinct logical groups - E.g. partition sensors by room number
- If query is grouped, sensors apply expression on
each epoch - PSRs tagged with group
- When a PSR (with group) is received
- If it belongs to a stored group, merge with
existing PSR - If not, just store it
- At the end of each epoch, transmit one PSR per
group - Need to evict if storage overflows.
37Group Eviction
- Problem Number of groups in any one iteration
may exceed available storage on sensor - Solution Evict! (Partial Preaggregation)
- Choose one or more groups to forward up tree
- Rely on nodes further up tree, or root, to
recombine groups properly - What policy to choose?
- Intuitively least popular group, since dont
want to evict a group that will receive more
values this epoch. - Experiments suggest
- Policy matters very little
- Evicting as many groups as will fit into a single
message is good
Per-Åke Larson. Data Reduction by Partial
Preaggregation. ICDE 2002.
38Declarative Benefits In Sensor Networks
- Vastly simplifies execution for large networks
- Since locations are described by predicates
- Operations are over groups
- Enables tolerance to faults
- Since system is free to choose where and when
operations happen - Data independence
- System is free to choose where data lives, how it
is represented
39Simulation Screenshot
40Hypothesis Testing For Average
- AVERAGE each node suppresses readings within
some ? of a approximate average µ. - Parents assume children who dont report have
value µ - Computed average cannot be off by more than ?.
41TinyAlloc
- Handle Based Compacting Memory Allocator
- For Catalog, Queries
Handle h call MemAlloc.alloc(h,10) (h)0
Sam call MemAlloc.lock(h) tweakString(h) cal
l MemAlloc.unlock(h) call MemAlloc.free(h)
User Program
Compaction
42Schema
- Attribute Command IF
- At INIT(), components register attributes and
commands they support - Commands implemented via wiring
- Attributes fetched via accessor command
- Catalog API allows local and remote queries over
known attributes / commands. - Demo of adding an attribute, executing a command.
43Q1 Expressiveness
- Simple data collection satisfies most users
- How much of what people want to do is just simple
aggregates? - Anecdotally, most of it
- EE people want filters simple statistics
(unless they can have signal processing) - However, wed like to satisfy everyone!
44Query Language
- New Features
- Joins
- Event-based triggers
- Via extensible catalog
- In network nested queries
- Split-phase (offline) delivery
- Via buffers
45Sample Query 1
- Bird counter
- CREATE BUFFER birds(uint16 cnt)
- SIZE 1
-
- ON EVENT bird-enter()
- SELECT b.cnt1
- FROM birds AS b
- OUTPUT INTO b
- ONCE
46Sample Query 2
- Birds that entered and left within time t of each
other - ON EVENT bird-leave AND bird-enter WITHIN t
- SELECT bird-leave.time, bird-leave.nest
- WHERE bird-leave.nest bird-enter.nest
- ONCE
47Sample Query 3
- Delta compression
- SELECT light
- FROM buf, sensors
- WHERE s.light buf.light t
- OUTPUT INTO buf
- SAMPLE PERIOD 1s
48Sample Query 4
- Offline Delivery Event Chaining
- CREATE BUFFER equake_data( uint16 loc, uint16
xAccel, uint16 yAccel) - SIZE 1000
- PARTITION BY NODE
- SELECT xAccel, yAccel
- FROM SENSORS
- WHERE xAccel t OR yAccel t
- SIGNAL shake_start()
- SAMPLE PERIOD 1s
- ON EVENT shake_start()
- SELECT loc, xAccel, yAccel
- FROM sensors
- OUTPUT INTO BUFFER equake_data(loc, xAccel,
yAccel) - SAMPLE PERIOD 10ms
49Event Based Processing
- Enables internal and chained actions
- Language Semantics
- Events are inter-node
- Buffers can be global
- Implementation plan
- Events and buffers must be local
- Since n-to-n communication not (well) supported
- Next operator expressiveness
50Attribute Driven Topology Selection
- Observation internal queries often over local
area - Or some other subset of the network
- E.g. regions with light value in 10,20
- Idea build topology for those queries based on
values of range-selected attributes - Requires range attributes, connectivity to be
relatively static
Heideman et. Al, Building Efficient Wireless
Sensor Networks With Low Level Naming. SOSP, 2001.
51Attribute Driven Query Propagation
SELECT WHERE a 5 AND a Precomputed intervals Query Dissemination
Index
4
1,10
20,40
7,15
1
2
3
52Attribute Driven Parent Selection
Even without intervals, expect that sending to
parent with closest value will help
1
2
3
1,10
20,40
7,15
3,6 ? 1,10 3,6 3,7 ? 7,15 ø 3,7 ?
20,40 ø
4
3,6
53Hot off the press
54Grouping
- GROUP BY expr
- expr is an expression over one or more attributes
- Evaluation of expr yields a group number
- Each reading is a member of exactly one group
- Example SELECT max(light) FROM sensors
- GROUP BY TRUNC(temp/10)
Result
55Having
- HAVING preds
- preds filters out groups that do not satisfy
predicate - versus WHERE, which filters out tuples that do
not satisfy predicate - Example
- SELECT max(temp) FROM sensors
- GROUP BY light
- HAVING max(temp)
- Yields all groups with temperature under 100
56Group Eviction
- Problem Number of groups in any one iteration
may exceed available storage on sensor - Solution Evict!
- Choose one or more groups to forward up tree
- Rely on nodes further up tree, or root, to
recombine groups properly - What policy to choose?
- Intuitively least popular group, since dont
want to evict a group that will receive more
values this epoch. - Experiments suggest
- Policy matters very little
- Evicting as many groups as will fit into a single
message is good
57Experiment Basic TAG
- Dense Packing, Ideal Communication
58Experiment Hypothesis Testing
- Uniform Value Distribution, Dense Packing, Ideal
Communication
59Experiment Effects of Loss
60Experiment Benefit of Cache
61Pipelined Aggregates
- After query propagates, during each epoch
- Each sensor samples local sensors once
- Combines them with PSRs from children
- Outputs PSR representing aggregate state in the
previous epoch. - After (d-1) epochs, PSR for the whole tree output
at root - d Depth of the routing tree
- If desired, partial state from top k levels could
be output in kth epoch - To avoid combining PSRs from different epochs,
sensors must cache values from children
Value from 2 produced at time t arrives at 1 at
time (t1)
Value from 5 produced at time t arrives at 1 at
time (t3)
62Pipelining Example
63Pipelining Example
Epoch 0
64Pipelining Example
Epoch 1
65Pipelining Example
Epoch 2
66Pipelining Example
Epoch 3
67Pipelining Example
Epoch 4
68Our Stream Semantics
- One stream, sensors
- We control data rates
- Joins between that stream and buffers are allowed
- Joins are always landmark, forward in time, one
tuple at a time - Result of queries over sensors either a single
tuple (at time of query) or a stream - Easy to interface to more sophisticated systems
- Temporal aggregates enable fancy window
operations
69Formal Spec.
- ON EVENT ... WITHIN
SELECT agg()temporalag
g() FROM sensors
events WHERE GROUP BY
HAVING ACTION
WHERE BUFFER
SIGNAL () (SELECT ...
) INTO BUFFER SAMPLE PERIOD
FOR INTERPOLATE
COMBINE temporal_agg()
ONCE
70Buffer Commands
- AT
- CREATE BUFFER ()
- PARTITION BY
- SIZE ,
- AS SELECT ...
- SAMPLE PERIOD
- DROP BUFFER