Implementation and Research Issues in Query Processing for Wireless Sensor Networks - PowerPoint PPT Presentation

Loading...

PPT – Implementation and Research Issues in Query Processing for Wireless Sensor Networks PowerPoint presentation | free to download - id: 3ef44d-YjU2M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Implementation and Research Issues in Query Processing for Wireless Sensor Networks

Description:

Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley whong_at_intel-research.net Sam Madden – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 110
Provided by: S719
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Implementation and Research Issues in Query Processing for Wireless Sensor Networks


1
Implementation and Research Issues in Query
Processing for Wireless Sensor Networks
  • Wei Hong
  • Intel Research, Berkeley
  • whong_at_intel-research.net

Sam Madden MIT madden_at_csail.mit.edu
ICDE 2004
2
Motivation
  • Sensor networks (aka sensor webs, emnets) are
    here
  • Several widely deployed HW/SW platforms
  • Low power radio, small processor, RAM/Flash
  • Variety of (novel) applications scientific,
    industrial, commercial
  • Great platform for mobile ubicomp
    experimentation
  • Real, hard research problems to be solved
  • Networking, systems, languages, databases
  • We will summarize
  • The state of the art
  • Our experiences building TinyDB
  • Current and future research directions

Berkeley Mote
3
Sensor Network Apps
Habitat Monitoring Storm petrels on Great Duck
Island, microclimates on James Reserve.
4
Declarative Queries
  • Programming Apps is Hard
  • Limited power budget
  • Lossy, low bandwidth communication
  • Require long-lived, zero admin deployments
  • Distributed Algorithms
  • Limited tools, debugging interfaces
  • Queries abstract away much of the complexity
  • Burden on the database developers
  • Users get
  • Safe, optimizable programs
  • Freedom to think about apps instead of details

5
TinyDB Prototype declarativequery processor
  • Platform Berkeley Motes TinyOS
  • Continuous variant of SQL TinySQL
  • Power and data-acquisition based in-network
    optimization framework
  • Extensible interface for aggregates, new types of
    sensors

6
Agenda
  • Part 1 Sensor Networks (50 Minutes)
  • TinyOS
  • NesC
  • Short Break
  • Part 2 TinyDB (1 Hour)
  • Data Model and Query Language
  • Software Architecture
  • Long Break Hands On
  • Part 3 Sensor Network Database Research
    Directions (1 Hour, 10 Minutes)

7
Part 1
  • Sensornet Background
  • Motes Mote Hardware
  • TinyOS
  • Programming Model NesC
  • TinyOS Architecture
  • Major Software Subsystems
  • Networking Services

8
A Brief History of Sensornets
  • People have used sensors for a long time
  • Recent CS History
  • (1998) Pottie Kaiser Radio based networks of
    sensors
  • (1998) Pister et al Smart Dust
  • Initial focus on optical communication
  • By 1999, radio based networks, COTS Dust, Motes
  • (1999) Estrin Govindan
  • Ad-hoc networks of sensors
  • (2000) Culler/Hill et al TinyOS Motes
  • (2002) Hill / Dust SPEC, mm3 scale computing
  • UCLA / USC / Berkeley Continue to Lead Research
  • Many other players now
  • TinyOS/Motes as most common platform
  • Emerging commercial space
  • Crossbow, Ember, Dust, Sensicast, Moteiv, Intel

9
Why Now?
  • Commoditization of radio hardware
  • Cellular and cordless phones, wireless
    communication
  • Low cost -gt many/tiny -gt new applications!
  • Real application for ad-hoc network research from
    the late 90s
  • Coming together of EE CS communities

10
Motes
4Mhz, 8 bit Atmel RISC uProc 40 kbit Radio 4 K
RAM, 128 K Program Flash, 512 K Data Flash AA
battery pack Based on TinyOS
Mica Mote
Mica2Dot
11
History of Motes
  • Initial research goal wasnt hardware
  • Has since become more of a priority with emerging
    hardware needs, e.g.
  • Power consumption
  • (Ultrasonic) ranging localization
  • MIT Cricket, NEST Project
  • Connectivity with diverse sensors
  • UCLA sensor board
  • Even so, now on the 5th generation of devices
  • Costs down to 50/node (Moteiv, Dust)
  • Greatly improved radio quality
  • Multitude of interfaces USB, Ethernet, CF, etc.
  • Variety of form factors, packages

12
Motes vs. Traditional Computing
  • Lossy, Adhoc Radio Communication
  • Sensing Hardware
  • Severe Power Constraints

13
Radio Communication
  • Low Bandwidth Shared Radio Channel
  • 40kBits on motes
  • Much less in practice
  • Encoding, Contention for Media Access (MAC)
  • Very lossy 30 base loss rate
  • Argues against TCP-like end-to-end retransmission
  • And for link-layer retries
  • Generally, not well behaved

14
Types of Sensors
  • Sensors attach via daughtercard
  • Weather
  • Temperature
  • Light x 2 (high intensity PAR, low intensity,
    full spectrum)
  • Air Pressure
  • Humidity
  • Vibration
  • 2 or 3 axis accelerometers
  • Tracking
  • Microphone (for ranging and acoustic signatures)
  • Magnetometer
  • GPS

15
Power Consumption and Lifetime
  • Power typically supplied by a small battery
  • 1000-2000 mAH
  • 1 mAH 1 milliamp current for 1 hour
  • Typically at optimum voltage, current drain rates
  • Power Watts (W) Amps (A) Volts (V)
  • Energy Joules (J) W time
  • Lifetime, power consumption varies by application
  • Processor 5mA active, 1 mA idle, 5 uA sleeping
  • Radio 5 mA listen, 10 mA xmit/receive, 20mS /
    packet
  • Sensors 1 uA -gt 100s mA, 1 uS -gt 1 S / sample

16
Energy Usage in A Typical Data Collection Scenario
  • Each mote collects 1 sample of (light,humidity)
    data every 10 seconds, forwards it
  • Each mote can hear 10 other motes
  • Process
  • Wake up, collect samples ( 1 second)
  • Listen to radio for messages to forward (1
    second)
  • Forward data

17
Sensors Slow, Power Hungry, Noisy
18
Programming Sensornets TinyOS
  • Component Based Programming Model
  • Suite of software components
  • Timers, clocks, clock synchronization
  • Single and multi-hop networking
  • Power management
  • Non-volatile storage management

19
Programming Philosophy
  • Component Based
  • Wiring to components together via interfaces,
    configurations
  • Split-Phased
  • Nothing blocks, ever.
  • Instead, completion events are signaled.
  • Highly Concurrent
  • Single thread of tasks, posted and scheduled
    FIFO
  • Events fired asynchronously in response to
    interrupts.

20
NesC
  • C-like programming language with component model
    support
  • Compiles into GCC-compatible C
  • 3 types of files
  • Interfaces
  • Set of function prototypes no implementations
    or variables
  • Modules
  • Provide (implement) zero or more interfaces
  • Require zero or more interfaces
  • May define module variables, scoped to functions
    in module
  • Configurations
  • Wire (connect) modules according to
    requires/provides relationship

21
Component Example Leds
. async command result_t Leds.redOn()
dbg(DBG_LED, "LEDS Red on.\n") atomic
TOSH_CLR_RED_LED_PIN() ledsOn
RED_BIT return SUCCESS .
  • module LedsC
  • provides interface Leds
  • implementation
  • uint8_t ledsOn
  • enum
  • RED_BIT 1,
  • GREEN_BIT 2,
  • YELLOW_BIT 4

22
Configuration Example
  • configuration CntToLedsAndRfm
  • implementation
  • components Main, Counter, IntToLeds, IntToRfm,
    TimerC
  • Main.StdControl -gt Counter.StdControl
  • Main.StdControl -gt IntToLeds.StdControl
  • Main.StdControl -gt IntToRfm.StdControl
  • Main.StdControl -gt TimerC.StdControl
  • Counter.Timer -gt TimerC.Timerunique("Timer")
  • IntToLeds lt- Counter.IntOutput
  • Counter.IntOutput -gt IntToRfm

23
Split Phase Example
  • module IntToRfmM
  • implementation
  • command result_t IntOutput.output
  • (uint16_t value)
  • IntMsg message (IntMsg )data.data
  • if (!pending)
  • pending TRUE
  • message-gtval value
  • atomic
  • message-gtsrc TOS_LOCAL_ADDRESS
  • if (call Send.send(TOS_BCAST_ADDR,

  • sizeof(IntMsg), data))
  • return SUCCESS
  • pending FALSE
  • return FAIL

event result_t Send.sendDone
(TOS_MsgPtr msg,
result_t success) if (pending msg
data) pending FALSE signal
IntOutput.outputComplete
(success) return SUCCESS

24
Major Components
  • Timers Clock, TimerC, LogicalTime
  • Networking Send, GenericComm, AMStandard,
    lib/Route
  • Power Management HPLPowerManagement
  • Storage Management EEPROM, MatchBox

25
Timers
  • Clock Basic abstraction over hardware timers
    periodic events, single frequency.
  • LogicalTime Fire an event some number of
    HMSms in the future.
  • TimerC Multiplex multiple periodic timers on
    top of LogicalTime.

26
Radio Stack
  • Interfaces
  • Send
  • Broadcast, or to a specific ID
  • split phase
  • Receive
  • asynchronous signal
  • Implementations
  • AMStandard
  • Application specific messages
  • Id-based dispatch
  • GenericComm
  • AMStandard Serial IO
  • Lib/Route
  • Mulithop

IntMsg message (IntMsg )data.data message-gt
val value atomic message-gtsrc
TOS_LOCAL_ADDRESS call Send.send(TOS_BCAST_ADDR
, sizeof(IntMsg), data))
event TOS_MsgPtr ReceiveIntMsg. receive(TOS_MsgP
tr m) IntMsg message (IntMsg )m-gtdata
call IntOutput.output(message-gtval)
return m
Wiring to equate IntMsg to ReceiveIntMsg
27
Multihop Networking
  • Standard implementation tree based routing

Problems Parent Selection Asymmetric
Links Adaptation vs. Stability
28
Geographic Routing
  • Any-to-any routing via geographic coordinates
  • See GPSR, MOBICOM 2000, Karp Kung.
  • Requires coordinate system
  • Requires endpont coordinates
  • Hard to route around local minima (holes)

B
A
Could be virtual, as in Rao et al Geographic
Routing Without Coordinate Information. MOBICOM
2003
29
Power Management
  • HPLPowerManagement
  • TinyOS sleeps processor when possible
  • Observes the radio, sensor, and timer state
  • Application managed, for the most part
  • App. must turn off subsystems when not in use
  • Helper utility ServiceScheduler
  • Peridically calls the start and stop methods
    of an app
  • More on power management in TinyDB later
  • Approach works because
  • single application
  • no interactivity requirements

30
Non-Volatile Storage
  • EEPROM
  • 512K off chip, 32K on chip
  • Writes at disk speeds, reads at RAM speeds
  • Interface random access, read/write 256 byte
    pages
  • Maximum throughput 10Kbytes / second
  • MatchBox Filing System
  • Provides a Unix-like file I/O interface
  • Single, flat directory
  • Only one file being read/written at a time

31
TinyOS Getting Started
  • The TinyOS home page
  • http//webs.cs.berkeley.edu/tinyos
  • Start with the tutorials!
  • The CVS repository
  • http//sf.net/projects/tinyos
  • The NesC Project Page
  • http//sf.net/projects/nescc
  • Crossbow motes (hardware)
  • http//www.xbow.com
  • Intel Imote
  • www.intel.com/research/exploratory/motes.htm.

32
Part 2
  • The Design and Implementation of TinyDB

33
Part 2 Outline
  • TinyDB Overview
  • Data Model and Query Language
  • TinyDB Java API and Scripting
  • Demo with TinyDB GUI
  • TinyDB Internals
  • Extending TinyDB
  • TinyDB Status and Roadmap

34
TinyDB Revisited
SELECT MAX(mag) FROM sensors WHERE mag gt
thresh SAMPLE PERIOD 64ms
  • High level abstraction
  • Data centric programming
  • Interact with sensor network as a whole
  • Extensible framework
  • Under the hood
  • Intelligent query processing query optimization,
    power efficient execution
  • Fault Mitigation automatically introduce
    redundancy, avoid problem areas

App
Query, Trigger
Data
TinyDB
35
Feature Overview
  • Declarative SQL-like query interface
  • Metadata catalog management
  • Multiple concurrent queries
  • Network monitoring (via queries)
  • In-network, distributed query processing
  • Extensible framework for attributes, commands and
    aggregates
  • In-network, persistent storage

36
Architecture
TinyDB GUI
JDBC
TinyDB Client API
DBMS
PC side
0
Mote side
0
TinyDB query processor
2
1
3
8
4
5
6
Sensor network
7
37
Data Model
  • Entire sensor network as one single,
    infinitely-long logical table sensors
  • Columns consist of all the attributes defined in
    the network
  • Typical attributes
  • Sensor readings
  • Meta-data node id, location, etc.
  • Internal states routing tree parent, timestamp,
    queue length, etc.
  • Nodes return NULL for unknown attributes
  • On server, all attributes are defined in
    catalog.xml
  • Discussion other alternative data models?

38
Query Language (TinySQL)
  • SELECT ltaggregatesgt, ltattributesgt
  • FROM sensors ltbuffergt
  • WHERE ltpredicatesgt
  • GROUP BY ltexprsgt
  • SAMPLE PERIOD ltconstgt ONCE
  • INTO ltbuffergt
  • TRIGGER ACTION ltcommandgt

39
Comparison with SQL
  • Single table in FROM clause
  • Only conjunctive comparison predicates in WHERE
    and HAVING
  • No subqueries
  • No column alias in SELECT clause
  • Arithmetic expressions limited to column op
    constant
  • Only fundamental difference SAMPLE PERIOD clause

40
TinySQL Examples
Find the sensors in bright nests.
Sensors
  • SELECT nodeid, nestNo, light
  • FROM sensors
  • WHERE light gt 400
  • EPOCH DURATION 1s

1
Epoch Nodeid nestNo Light
0 1 17 455
0 2 25 389
1 1 17 422
1 2 25 405
41
TinySQL Examples (cont.)
Count the number occupied nests in each loud
region of the island.
Epoch region CNT() AVG()
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
42
Event-based Queries
  • ON event SELECT
  • Run query only when interesting events happens
  • Event examples
  • Button pushed
  • Message arrival
  • Bird enters nest
  • Analogous to triggers but events are user-defined

43
Query over Stored Data
  • Named buffers in Flash memory
  • Store query results in buffers
  • Query over named buffers
  • Analogous to materialized views
  • Example
  • CREATE BUFFER name SIZE x (field1 type1, field2
    type2, )
  • SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO
    name
  • SELECT field1, field2, FROM name SAMPLE PERIOD d

44
Using the Java API
  • SensorQueryer
  • translateQuery() converts TinySQL string into
    TinyDBQuery object
  • Static query optimization
  • TinyDBNetwork
  • sendQuery() injects query into network
  • abortQuery() stops a running query
  • addResultListener() adds a ResultListener that is
    invoked for every QueryResult received
  • removeResultListener()
  • QueryResult
  • A complete result tuple, or
  • A partial aggregate result, call
    mergeQueryResult() to combine partial results
  • Key difference from JDBC push vs. pull

45
Writing Scripts with TinyDB
  • TinyDBs text interface
  • java net.tinyos.tinydb.TinyDBMain run select
  • Query results printed out to the console
  • All motes get reset each time new query is posed
  • Handy for writing scripts with shell, perl, etc.

46
Using the GUI Tools
  • Demo time

47
Inside TinyDB
Multihop Network
Query Processor
10,000 Lines Embedded C Code 5,000 Lines
(PC-Side) Java 3200 Bytes RAM (w/ 768 byte
heap) 58 kB compiled code (3x larger than 2nd
largest TinyOS Program)
Filterlight gt 400
Schema
TinyOS
TinyDB
48
Tree-based Routing
  • Tree-based routing
  • Used in
  • Query delivery
  • Data collection
  • In-network aggregation
  • Relationship to indexing?

49
Power Management Approach
  • Coarse-grained app-controlled communication
    scheduling

Epoch (10s -100s of seconds)
Mote ID
1
zzz
zzz
2
3
4
5
time
2-4s Waking Period
50
Time Synchronization
  • All messages include a 5 byte time stamp
    indicating system time in ms
  • Synchronize (e.g. set system time to timestamp)
    with
  • Any message from parent
  • Any new query message (even if not from parent)
  • Punt on multiple queries
  • Timestamps written just after preamble is xmitted
  • All nodes agree that the waking period begins
    when (system time epoch dur 0)
  • And lasts for WAKING_PERIOD ms
  • Adjustment of clock happens by changing duration
    of sleep cycle, not wake cycle.

51
Extending TinyDB
  • Why extending TinyDB?
  • New sensors ? attributes
  • New control/actuation ? commands
  • New data processing logic ? aggregates
  • New events
  • Analogous to concepts in object-relational
    databases

52
Adding Attributes
  • Types of attributes
  • Sensor attributes raw or cooked sensor readings
  • Introspective attributes parent, voltage, ram
    usage, etc.
  • Constant attributes constant values that can be
    statically or dynamically assigned to a mote,
    e.g., nodeid, location, etc.

53
Adding Attributes (cont)
  • Interfaces provided by Attr component
  • StdControl init, start, stop
  • AttrRegister
  • command registerAttr(name, type, len)
  • event getAttr(name, resultBuf, errorPtr)
  • event setAttr(name, val)
  • command getAttrDone(name, resultBuf, error)
  • AttrUse
  • command startAttr(attr)
  • event startAttrDone(attr)
  • command getAttrValue(name, resultBuf, errorPtr)
  • event getAttrDone(name, resultBuf, error)
  • command setAttrValue(name, val)

54
Adding Attributes (cont)
  • Steps to adding attributes to TinyDB
  • Create attribute nesC components
  • Wire new attribute components to TinyDBAttr
    configuration
  • Reprogram TinyDB motes
  • Add new attribute entries to catalog.xml
  • Constant attributes can be added on the fly
    through TinyDB GUI

55
Adding Aggregates
  • Step 1 wire new nesC components

56
Adding Aggregates (cont)
  • Step 2 add entry to catalog.xml
  • ltaggregategt
  • ltnamegtAVGlt/namegt
  • ltidgt5lt/idgt
  • lttemporalgtfalselt/temporalgt
  • ltreaderClassgtnet.tinyos.tinydb.AverageClasslt/read
    erClassgt
  • lt/aggregategt
  • Step 3 (optional) implement reader class in Java
  • a reader class interprets and finalizes aggregate
    state received from the mote network, returns
    final result as a string for display.

57
TinyDB Status
  • Latest released with TinyOS 1.1 (9/03)
  • Install the task-tinydb package in TinyOS 1.1
    distribution
  • First release in TinyOS 1.0 (9/02)
  • Widely used by research groups as well as
    industry pilot projects
  • Successful deployments in Intel Berkeley Lab and
    redwood trees at UC Botanical Garden
  • Largest deployment 80 weather station nodes
  • Network longevity 4-5 months

58
The Redwood Tree Deployment
  • Redwood Grove in UC Botanical Garden, Berkeley
  • Collect dense sensor readings to monitor climatic
    variations across
  • altitudes,
  • angles,
  • time,
  • forest locations, etc.
  • Versus sporadic monitoring points with 30lb
    loggers!
  • Current focus study how dense sensor data affect
    predictions of conventional tree-growth models

59
Data from Redwoods
36m
33m 111
32m 110
30m 109,108,107
20m 106,105,104
10m 103, 102, 101
60
TinyDB Roadmap (near term)
  • Support for high frequency sampling
  • Equipment vibration monitoring, structural
    monitoring, etc.
  • Store and forward
  • Bulk reliable data transfer
  • Scheduling of communications
  • Port to Intel Mote
  • Deployment in Intel Fab equipment monitoring
    application and the Golden Gate Bridge monitoring
    application

61
For more information
  • http//berkeley.intel-research.net/tinydb or
    http//triplerock.cs.bekeley.edu/tinydb

62
Part 3
  • Database Research Issues in Sensor Networks

63
Sensor Network Research
  • Very active research area
  • Cant summarize it all
  • Focus database-relevant research topics
  • Some outside of Berkeley
  • Other topics that are itching to be scratched
  • But, some bias towards work that we find
    compelling

64
Topics
  • In-network aggregation
  • Acquisitional Query Processing
  • Heterogeneity
  • Intermittent Connectivity
  • In-network Storage
  • Statistics-based summarization and sampling
  • In-network Joins
  • Adaptivity and Sensor Networks
  • Multiple Queries

65
Topics
  • In-network aggregation
  • Acquisitional Query Processing
  • Heterogeneity
  • Intermittent Connectivity
  • In-network Storage
  • Statistics-based summarization and sampling
  • In-network Joins
  • Adaptivity and Sensor Networks
  • Multiple Queries

66
Tiny Aggregation (TAG)
  • In-network processing of aggregates
  • Common data analysis operation
  • Aka gather operation or reduction in
    programming
  • Communication reducing
  • Operator dependent benefit
  • Across nodes during same epoch
  • Exploit query semantics to improve efficiency!

Madden, Franklin, Hellerstein, Hong. Tiny
AGgregation (TAG), OSDI 2002.
67
Basic Aggregation
  • In each epoch
  • Each node samples local sensors once
  • Generates partial state record (PSR)
  • local readings
  • readings from children
  • Outputs PSR during assigned comm. interval
  • At end of epoch, PSR for whole network output at
    root
  • New result on each successive epoch
  • Extras
  • Predicate-based partitioning via GROUP BY

68
Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Epoch
1 2 3 4 5
4 1
3
2
1
4
Interval
1
69
Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 3
Sensor
1 2 3 4 5
4 1
3 2
2
1
4
2
Interval
70
Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 2
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1
4
1
3
Interval
71
Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 1
5
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
Interval
72
Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
Interval
1
73
Aggregation Framework
  • As in extensible databases, TinyDB supports any
    aggregation function conforming to

Aggnfinit, fmerge, fevaluate Finit a0 ?
lta0gt Fmerge lta1gt,lta2gt ? lta12gt Fevaluate lta1gt
? aggregate value
Partial State Record (PSR)
Example Average AVGinit v ?
ltv,1gt AVGmerge ltS1, C1gt, ltS2, C2gt ? lt S1
S2 , C1 C2gt AVGevaluateltS, Cgt ? S/C
Restriction Merge associative, commutative
74
Taxonomy of Aggregates
  • TAG insight classify aggregates according to
    various functional properties
  • Yields a general set of optimizations that can
    automatically be applied

Drives an API!
Property Examples Affects
Partial State MEDIAN unbounded, MAX 1 record Effectiveness of TAG
Monotonicity COUNT monotonic AVG non-monotonic Hypothesis Testing, Snooping
Exemplary vs. Summary MAX exemplary COUNT summary Applicability of Sampling, Effect of Loss
Duplicate Sensitivity MIN dup. insensitive, AVG dup. sensitive Routing Redundancy
75
Use Multiple Parents
  • Use graph structure
  • Increase delivery probability with no
    communication overhead
  • For duplicate insensitive aggregates, or
  • Aggs expressible as sum of parts
  • Send (part of) aggregate to all parents
  • In just one message, via multicast
  • Assuming independence, decreases variance

SELECT COUNT()
of parents n E(cnt) n (c/n
p2) Var(cnt) n (c/n)2 p2 (1 p2) V/n
P(link xmit successful) p P(success from A-gtR)
p2 E(cnt) c p2 Var(cnt) c2 p2 (1
p2) ? V
76
Multiple Parents Results
  • Better than previous analysis expected!
  • Losses arent independent!
  • Insight spreads data over many links

77
Acquisitional Query Processing (ACQP)
  • TinyDB acquires AND processes data
  • Could generate an infinite number of samples
  • An acqusitional query processor controls
  • when,
  • where,
  • and with what frequency data is collected!
  • Versus traditional systems where data is provided
    a priori

Madden, Franklin, Hellerstein, and Hong. The
Design of An Acqusitional Query Processor.
SIGMOD, 2003.
78
ACQP Whats Different?
  • How should the query be processed?
  • Sampling as a first class operation
  • How does the user control acquisition?
  • Rates or lifetimes
  • Event-based triggers
  • Which nodes have relevant data?
  • Index-like data structures
  • Which samples should be transmitted?
  • Prioritization, summary, and rate control

79
Operator Ordering Interleave Sampling Selection
At 1 sample / sec, total power savings could be
as much as 3.5mW ? Comparable to processor!
  • SELECT light, mag
  • FROM sensors
  • WHERE pred1(mag)
  • AND pred2(light)
  • EPOCH DURATION 1s
  • E(sampling mag) gtgt E(sampling light)
  • 1500 uJ vs. 90 uJ

80
Exemplary Aggregate Pushdown
  • SELECT WINMAX(light,8s,8s)
  • FROM sensors
  • WHERE mag gt x
  • EPOCH DURATION 1s
  • Novel, general pushdown technique
  • Mag sampling is the most expensive operation!

81
Topics
  • In-network aggregation
  • Acquisitional Query Processing
  • Heterogeneity
  • Intermittent Connectivity
  • In-network Storage
  • Statistics-based summarization and sampling
  • In-network Joins
  • Adaptivity and Sensor Networks
  • Multiple Queries

82
Heterogeneous Sensor Networks
  • Leverage small numbers of high-end nodes to
    benefit large numbers of inexpensive nodes
  • Still must be transparent and ad-hoc
  • Key to scalability of sensor networks
  • Interesting heterogeneities
  • Energy battery vs. outlet power
  • Link bandwidth Chipcon vs. 802.11x
  • Computing and storage ATMega128 vs. Xscale
  • Pre-computed results
  • Sensing nodes vs. QP nodes

83
Computing Heterogeneity with TinyDB
  • Separate query processing from sensing
  • Provide query processing on a small number of
    nodes
  • Attract packets to query processors based on
    service value
  • Compare the total energy consumption of the
    network
  • No aggregation
  • All aggregation
  • Opportunistic aggregation
  • HSN proactive aggregation

Mark Yarvis and York Liu, Intels Heterogeneous
Sensor Network Project, ftp//download.intel.com/r
esearch/people/HSN_IR_Day_Poster_03.pdf.
84
5x7 TinyDB/HSN Mica2 Testbed
85
Data Packet Saving
  • How many aggregators are desired?
  • Does placement matter?

86
Occasionally Connected Sensornets
internet
TinyDB Server
GTWY
TinyDB QP
Mobile GTWY
Mobile GTWY
GTWY
GTWY
TinyDB QP
TinyDB QP
87
Occasionally Connected Sensornets Challenges
  • Networking support
  • Tradeoff between reliability, power consumption
    and delay
  • Data custody transfer duplicates?
  • Load shedding
  • Routing of mobile gateways
  • Query processing
  • Operation placement in-network vs. on mobile
    gateways
  • Proactive pre-computation and data movement
  • Tight interaction between networking and QP

Fall, Hong and Madden, Custody Transfer for
Reliable Delivery in Delay Tolerant Networks,
http//www.intel-research.net/Publications/Berkele
y/081220030852_157.pdf.
88
Distributed In-network Storage
  • Collectively, sensornets have large amounts of
    in-network storage
  • Good for in-network consumption or caching
  • Challenges
  • Distributed indexing for fast query dissemination
  • Resilience to node or link failures
  • Graceful adaptation to data skews
  • Minimizing index insertion/maintenance cost

89
Example DIM
  • Functionality
  • Efficient range query for multidimensional data.
  • Approaches
  • Divide sensor field into bins.
  • Locality preserving mapping from m-d space to
    geographic locations.
  • Use geographic routing such as GPSR.
  • Assumptions
  • Nodes know their locations and network boundary
  • No node mobility

Xin Li, Young Jin Kim, Ramesh Govindan and Wei
Hong, Distributed Index for Multi-dimentional
Data (DIM) in Sensor Networks, SenSys 2003.
90
Statistical Techniques
  • Approximations, summaries, and sampling based on
    statistics and statistical models
  • Applications
  • Limited bandwidth and large number of nodes -gt
    data reduction
  • Lossiness -gt predictive modeling
  • Uncertainty -gt tracking correlations and changes
    over time
  • Physical models -gt improved query answering

91
Correlated Attributes
  • Data in sensor networks is correlated e.g.,
  • Temperature and voltage
  • Temperature and light
  • Temperature and humidity
  • Temperature and time of day
  • etc.

92
IDSQ
  • Idea task sensors in order of best improvement
    to estimate of some value
  • Choose leader(s)
  • Suppress subordinates
  • Task subordinates, one at a time
  • Until some measure of goodness (error bound) is
    met
  • E.g. Mahalanobis Distance -- Accounts for
    correlations in axes, tends to favor minimizing
    principal axis

See Scalable Information-Driven Sensor Querying
and Routing for ad hoc Heterogeneous Sensor
Networks. Chu, Haussecker and Zhao. Xerox TR
P2001-10113. May, 2001.
93
Model location estimate as a point with
2-dimensional Gaussian uncertainty.
Graphical Representation
Principal Axis
94
MQSN Model-based Probabilistic Querying over
Sensor Networks
Joint work with Amol Desphande, Carlos Guestrin,
and Joe Hellerstein
Query Processor
Model
1
3
4
2
5
6
7
8
9
95
MQSN Model-based Probabilistic Querying over
Sensor Networks
Query Processor
Model
Consult Model
1
3
4
2
5
6
7
8
9
96
MQSN Model-based Probabilistic Querying over
Sensor Networks
Query Processor
Model
Consult Model
1
3
4
2
5
6
7
8
9
97
MQSN Model-based Probabilistic Querying over
Sensor Networks
Query Results
Query Processor
Model
Update Model
1
3
4
2
5
6
7
8
9
98
Challenges
  • What kind of models to use ?
  • Optimization problem
  • Given a model and a query, find the best set of
    attributes to observe
  • Cost not easy to measure
  • Non-uniform network communication costs
  • Changing network topologies
  • Large plan space
  • Might be cheaper to observe attributes not in
    query
  • e.g. Voltage instead of Temperature
  • Conditional Plans
  • Change the observation plan based on observed
    values

99
MQSN Current Prototype
  • Multi-variate Gaussian Models
  • Kalman Filters to capture correlations across
    time
  • Handles
  • Range predicate queries
  • sensor value within x,y, w/ confidence
  • Value queries
  • sensor value x, w/in epsilon, w/ confidence
  • Simple aggregate queries
  • AVG(sensor value) ? n, w/in epsilon, w/confidence
  • Uses a greedy algorithm to choose the observation
    plan

100
In-Net Regression
  • Linear regression simple way to predict future
    values, identify outliers
  • Regression can be across local or remote values,
    multiple dimensions, or with high degree
    polynomials
  • E.g., node A readings vs. node Bs
  • Or, location (X,Y), versus temperature
  • E.g., over many nodes

Guestrin, Thibaux, Bodik, Paskin, Madden.
Distributed Regression an Efficient Framework
for Modeling Sensor Network Data . Under
submission.
101
In-Net Regression (Continued)
  • Problem may require data from all sensors to
    build model
  • Solution partition sensors into overlapping
    kernels that influence each other
  • Run regression in each kernel
  • Requiring just local communication
  • Blend data between kernels
  • Requires some clever matrix manipulation
  • End result regressed model at every node
  • Useful in failure detection, missing value
    estimation

102
Exploiting Correlations in Query Processing
  • Simple idea
  • Given predicate P(A) over expensive attribute A
  • Replace it with P over cheap attribute A such
    that P evaluates to P
  • Problem unless A and A are perfectly
    correlated, P ? P for all time
  • So we could incorrectly accept or reject some
    readings
  • Alternative use correlations to improve
    selectivity estimates in query optimization
  • Construct conditional plans that vary predicate
    order based on prior observations

103
Exploiting Correlations (Cont.)
  • Insight by observing a (cheap and correlated)
    variable not involved in the query, it may be
    possible to improve query performance
  • Improves estimates of selectivities
  • Use conditional plans
  • Example

104
In-Network Join Strategies
  • Types of joins
  • non-sensor -gt sensor
  • sensor -gt sensor
  • Optimization questions
  • Should the join be pushed down?
  • If so, where should it be placed?
  • What if a join table exceeds the memory available
    on one node?

105
Choosing Where to Place Operators
  • Idea choose a join node to run the operator
  • Over time, explore other candidate placements
  • Nodes advertise data rates to their neighbors
  • Neighbors compute expected cost of running the
    join based on these rates
  • Neighbors advertise costs
  • Current join node selects a new, lower cost node

Bonfils Bonnet, Adaptive and Decentralized
Operator Placement for In-Network QueryProcessing
IPSN 2003.
106
Topics
  • In-network aggregation
  • Acquisitional Query Processing
  • Heterogeneity
  • Intermittent Connectivity
  • In-network Storage
  • Statistics-based summarization and sampling
  • In-network Joins
  • Adaptivity and Sensor Networks
  • Multiple Queries

107
Adaptivity In Sensor Networks
  • Queries are long running
  • Selectivities change
  • E.g. night vs day
  • Network load and available energy vary
  • All suggest that some adaptivity is needed
  • Of data rates or granularity of aggregation when
    optimizing for lifetimes
  • Of operator orderings or placements when
    selectivities change (c.f., conditional plans for
    correlations)
  • As far as we know, this is an open problem!

108
Multiple Queries and Work Sharing
  • As sensornets evolve, users will run many queries
    simultaneously
  • E.g., traffic monitoring
  • Likely that queries will be similar
  • But have different end points, parameters, etc
  • Would like to share processing, routing as much
    as possible
  • But how? Again, an open problem.

109
Concluding Remarks
  • Sensor networks are an exciting emerging
    technology, with a wide variety of applications
  • Many research challenges in all areas of computer
    science
  • Database community included
  • Some agreement that a declarative interface is
    right
  • TinyDB and other early work are an important
    first step
  • But theres lots more to be done!
About PowerShow.com