InNetwork Query Processing - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

InNetwork Query Processing

Description:

EPOCH DURATION 10s. 2. Benefits of Declarative Queries. Specification of 'whole-network' behavior ... EPOCH DURATION 1s. Novel, general pushdown technique ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 57
Provided by: S719
Category:

less

Transcript and Presenter's Notes

Title: InNetwork Query Processing


1
In-Network Query Processing
  • Sam Madden
  • CS294-1
  • 9/30/03

2
Outline
  • TinyDB
  • And demo!
  • Aggregate Queries
  • ACQP
  • Break
  • Adaptive Operator Placement

3
Outline
  • TinyDB
  • And demo!
  • Aggregate Queries
  • ACQP
  • Break
  • Adaptive Operator Placement

4
Programming Sensor Nets Is Hard
  • Months of lifetime required from small batteries
  • 3-5 days naively cant recharge often
  • Interleave sleep with processing
  • Lossy, low-bandwidth, short range communication
  • Nodes coming and going
  • Multi-hop
  • Remote, zero administration deployments
  • Highly distributed environment
  • Limited Development Tools
  • Embedded, LEDs for Debugging!

High-Level Abstraction Is Needed!
5
A Solution Declarative Queries
  • Users specify the data they want
  • Simple, SQL-like queries
  • Using predicates, not specific addresses
  • Our system TinyDB
  • Challenge is to provide
  • Expressive easy-to-use interface
  • High-level operators
  • Transparent Optimizations that many programmers
    would miss
  • Sensor-net specific techniques
  • Power efficient execution framework

6
TinyDB Demo
7
TinyDB Architecture
Multihop Network
  • Schema
  • Catalog of commands attributes

Query Processor
10,000 Lines Embedded C Code 5,000 Lines
(PC-Side) Java 3200 Bytes RAM (w/ 768 byte
heap) 58 kB compiled code (3x larger than 2nd
largest TinyOS Program)
Filterlight gt 400
Schema
TinyOS
TinyDB
8
Declarative Queries for Sensor Networks
Find the sensors in bright nests.
Sensors
  • Examples
  • SELECT nodeid, nestNo, light
  • FROM sensors
  • WHERE light gt 400
  • EPOCH DURATION 1s

1
9
Aggregation Queries
Count the number occupied nests in each loud
region of the island.
10
Benefits of Declarative Queries
  • Specification of whole-network behavior
  • Simple, safe
  • Complex behavior via multiple queries, app logic
  • Optimizable
  • Exploit (non-obvious) interactions
  • E.g.
  • ACQP operator ordering, Adaptive join operator
    placement, Lifetime selection, Topology selection
  • Versus other approaches, e.g., Diffusion
  • Black box filter operators
  • Intanagonwiwat ,  Directed Diffusion, Mobicomm
    2000

11
Outline
  • TinyDB
  • And demo!
  • Aggregate Queries
  • ACQP
  • Break
  • Adaptive Operator Placement

12
Tiny Aggregation (TAG)
  • Not in todays reading
  • In-network processing of aggregates
  • Common data analysis operation
  • Aka gather operation or reduction in
    programming
  • Communication reducing
  • Operator dependent benefit
  • Exploit query semantics to improve efficiency!

Madden, Franklin, Hellerstein, Hong. Tiny
AGgregation (TAG), OSDI 2002.
13
Query Propagation Via Tree-Based Routing
  • Tree-based routing
  • Used in
  • Query delivery
  • Data collection
  • Topology selection is important e.g.
  • Krishnamachari, DEBS 2002, Intanagonwiwat, ICDCS
    2002, Heidemann, SOSP 2001
  • LEACH/SPIN, Heinzelman et al. MOBICOM 99
  • SIGMOD 2003
  • Continuous process
  • Mitigates failures

14
Basic Aggregation
  • In each epoch
  • Each node samples local sensors once
  • Generates partial state record (PSR)
  • local readings
  • readings from children
  • Outputs PSR during assigned comm. interval
  • Communication scheduling for power reduction
  • At end of epoch, PSR for whole network output at
    root
  • New result on each successive epoch
  • Extras
  • Predicate-based partitioning via GROUP BY

15
Illustration Aggregation
SELECT COUNT() FROM sensors
Sensor
lt- Time
1
16
Illustration Aggregation
SELECT COUNT() FROM sensors
Sensor
2
lt- Time
17
Illustration Aggregation
SELECT COUNT() FROM sensors
Sensor
1
3
lt- Time
18
Illustration Aggregation
SELECT COUNT() FROM sensors
5
Sensor
lt- Time
19
Illustration Aggregation
SELECT COUNT() FROM sensors
Sensor
lt- Time
1
20
Aggregation Framework
  • As in extensible databases, TAG supports any
    aggregation function conforming to

Aggnfinit, fmerge, fevaluate Finit a0 ?
lta0gt Fmerge lta1gt,lta2gt ? lta12gt Fevaluate lta1gt
? aggregate value
Partial State Record (PSR)
Example Average AVGinit v ?
ltv,1gt AVGmerge ltS1, C1gt, ltS2, C2gt ? lt S1
S2 , C1 C2gt AVGevaluateltS, Cgt ? S/C
Restriction Merge associative, commutative
21
Types of Aggregates
  • SQL supports MIN, MAX, SUM, COUNT, AVERAGE
  • Any function over a set can be computed via TAG
  • In network benefit for many operations
  • E.g. Standard deviation, top/bottom N, spatial
    union/intersection, histograms, etc.
  • Compactness of PSR

22
Taxonomy of Aggregates
  • TAG insight classify aggregates according to
    various functional properties
  • Yields a general set of optimizations that can
    automatically be applied

Drives an API!
23
Partial State
  • Growth of PSR vs. number of aggregated values (n)
  • Algebraic PSR 1 (e.g. MIN)
  • Distributive PSR c (e.g. AVG)
  • Holistic PSR n (e.g. MEDIAN)
  • Unique PSR d (e.g. COUNT DISTINCT)
  • d of distinct values
  • Content Sensitive PSR lt n (e.g. HISTOGRAM)

Data Cube, Gray et. al
24
Benefit of In-Network Processing
  • Simulation Results
  • 2500 Nodes
  • 50x50 Grid
  • Depth 10
  • Neighbors 20
  • Uniform Dist
  • over 0,100
  • Aggregate depth dependent benefit!

25
Outline
  • TinyDB
  • And demo!
  • Aggregate Queries
  • ACQP
  • Break
  • Adaptive Operator Placement

26
Acquisitional Query Processing (ACQP)
  • Traditional DBMS processes data already in the
    system
  • Acquisitional DBMS generates the data in the
    system!
  • An acquisitional query processor controls
  • when,
  • where,
  • and with what frequency data is collected
  • Versus traditional systems where data is provided
    a priori

27
ACQP Whats Different?
  • Basic Acquisitional Processing
  • Continuous queries, with rates or lifetimes
  • Events for asynchronous triggering
  • Avoiding Acquisition Through Optimization
  • Sampling as a query operator
  • Choosing Where to Sample via Co-acquisition
  • Index-like data structures
  • Acquiring data from the network
  • Prioritization, summary, and rate control

28
Lifetime Queries
  • Lifetime vs. sample rate
  • SELECT
  • EPOCH DURATION 10 s
  • SELECT
  • LIFETIME 30 days
  • Extra Allow a MAX SAMPLE PERIOD
  • Discard some samples
  • Sampling cheaper than transmitting

29
(Single Node) Lifetime Prediction
SELECT nodeid, light LIFETIME 24 Weeks
30
Operator Ordering Interleave Sampling Selection
At 1 sample / sec, total power savings could be
as much as 3.5mW ? Comparable to processor!
  • SELECT light, mag
  • FROM sensors
  • WHERE pred1(mag)
  • AND pred2(light)
  • EPOCH DURATION 1s
  • E(sampling mag) gtgt E(sampling light)
  • 1500 uJ vs. 90 uJ

31
Exemplary Aggregate Pushdown
  • SELECT WINMAX(light,8s,8s)
  • FROM sensors
  • WHERE mag gt x
  • EPOCH DURATION 1s
  • Novel, general pushdown technique
  • Mag sampling is the most expensive operation!

32
Event Based Processing
  • Epochs are synchronous
  • Might want to issue queries in response to
    asynchronous events
  • Avoid unneccessary polling

CREATE TABLE birds(uint16 cnt) SIZE 1
CIRCULAR
ON EVENT bird-enter() SELECT b.cnt1 FROM
birds AS b OUTPUT INTO b ONCE
33
Attribute Driven Network Construction
  • Goal co-acquisition -- sensors that sample
    together route together
  • Observation queries often over limited area
  • Or some other subset of the network
  • E.g. regions with light value in 10,20
  • Idea build network topology such that
    like-valued nodes route through each other
  • For range queries
  • Relatively static attributes (e.g. location)
  • Maintenance Issues

34
Tracking Co-Acquisition Via Semantic Routing Trees
  • Idea send range queries only to participating
    nodes
  • Parents maintain ranges of descendants

35
Parent Selection for SRTs
Idea Node picks parent whose ancestors
interval most overlap its descendants interval
0
3,6 ? 1,10 3,6 3,6 ? 7,15 ø 3,6 ?
20,40 ø
36
Simulation Result
37
Outline
  • TinyDB
  • And demo!
  • Aggregate Queries
  • ACQP
  • Break
  • Adaptive Operator Placement

38
Outline
  • TinyDB
  • And demo!
  • Aggregate Queries
  • ACQP
  • Break
  • Adaptive Operator Placement

39
Adaptive Decentralized Operator Placement
  • IPSN 2003 Paper
  • Main Idea
  • Place operators near data sources
  • Greater operator rate ? Closer placement
  • For each operator
  • Explore candidate neighbors
  • Migrate to lower cost placements
  • Via extra messages

Proper placement depends on path lengths and
relative rates!
40
Adaptivity in Databases
  • Adaptivity changing query plans on the fly
  • Typically at the physical level
  • Where the plan runs
  • Ordering of operators
  • Instantiations of operators, e.g. hash join vs
    merge join
  • Non-traditional
  • Conventionally, complete plans are built prior to
    execution
  • Using cost estimates (collected from history)
  • Important in volatile or long running
    environments
  • Where a priori estimates are unlikely to be good
  • E.g., sensor networks

41
Adaptivity for Operator Placement
  • Adaptivity comes at a cost
  • Extra work on each operator, each tuple
  • In a DBMS, processing per tuple is small
  • 100s of instructions per operator
  • Unless you have to hit the disk!
  • Costs in this case?
  • Extra communication hurts
  • Finding candidate placements (exploration)
  • Cost advertisements from local node
  • New costs from candidates
  • Moving state (migration)
  • Joins, windowed aggregates

42
Do Benefits Justify Costs?
  • Not Evaluated!
  • 3x reduction on messages vs. external
  • Excluding exploration migration costs
  • Seems somewhat implausible, especially given
    added complexity
  • Hard to make migration protocol work
  • Depends on ability to reliably quiesce child ops.
  • What else could you do?
  • Static placement

43
Summary
  • Declarative QP
  • Simplify data collection in sensornets
  • In-network processing, query optimization for
    performance
  • Acquisitional QP
  • Focus on costs associated with sampling data
  • New challenge of sensornets, other streaming
    systems?
  • Adaptive Join Placement
  • In-network optimization
  • Some benefit, but practicality unclear
  • Operator pushdown still a good idea

44
Open Problems
  • Many a few
  • In-network storage and operator placement
  • Dealing with heterogeneity
  • Dealing with loss
  • Need real implementations of many of these ideas
  • See me! (madden_at_cs.berkeley.edu)

45
Questions / Discussion
46
Making TinyDB REALLY Work
  • Berkeley Botanical Garden
  • First real deployment
  • Requirements
  • At least 1 month unattended operation
  • Support for calibrated environmental sensors
  • Multi-hop routing
  • What we started with
  • Limited power management, no time-synchronization
  • Motes crashed hard occasionally
  • Limited, relatively untested multihop routing

47
Power Consumption in Sensornets
  • Waking current 12mA
  • Fairly evenly spread between sensors, processor,
    radio
  • Sleeping current 20-100uA
  • Power consumption dominated by sensing,
    reception
  • 1s Power up on Mica2Dot sensor board
  • Most mote apps use always on radio
  • Completely unstructured communication
  • Bad for battery life

48
Why Not Use TDMA?
  • CSMA is very flexible easy for new nodes to
    join
  • Reasonably scalable (relative to Bluetooth)
  • CSMA implemented, available
  • We wanted to build something that worked

49
Power Management Approach
  • Coarse-grained communication scheduling

Epoch (10s -100s of seconds)
Mote ID
1
zzz
zzz
2
3
4
5
time
2-4s Waking Period
50
Benefits / Drawbacks
  • Benefits
  • Can still use CSMA within waking period
  • No reservation required new nodes can join
    easily!
  • Waking period duration is easily tunable
  • Depending on network size
  • Drawbacks
  • Longer waking time vs. TDMA?
  • Could stagger slots based on tree-depth
  • No guaranteed slot reservation
  • Nothing is guaranteed anyway

51
Challenges
  • Time Synchronization
  • Fault recovery
  • Joining, starting, and stopping
  • Parameter Tuning
  • Waking period Hardcode at 4s?

52
Time Synchronization
  • All messages include a 5 byte time stamp
    indicating system time in ms
  • Synchronize (e.g. set local system time to
    timestamp) with
  • Any message from parent
  • Any new query message (even if not from parent)
  • Punt on multiple queries
  • Timestamps written just after preamble is xmitted
  • All nodes agree that the waking period begins
    when (system time epoch dur 0)
  • And lasts for WAKING_PERIOD ms

53
Wrinkles
  • If node hasnt heard from its parent for k
    epochs
  • Switch to always on mode for 1 epoch
  • If waking period ends
  • Punt on this epoch
  • Query dissemination / deletion via always-on
    basestation and viral propagation
  • Data messages as advertisements for running
    queries
  • Explicit requests for missing queries
  • Explicit dead query messages for stopped
    queries
  • Dont request the last dead query

54
Results
  • 30 Nodes in the Garden
  • Lasted 21 days
  • 10 duty cycle
  • Sample period 30s, waking period 3s
  • Next deployment
  • 1-2 duty cycle
  • Fixes to clock to reduce baseline power
  • Should last at least 60 days
  • Time sync test 2,500 readings
  • 5 readings out of synch
  • 15s epoch duration, 3s waking period

55
Garden Data
36m
33m 111
32m 110
30m 109,108,107
20m 106,105,104
10m 103, 102, 101
56
Data Quality
Write a Comment
User Comments (0)
About PowerShow.com