Title: Implementation and Research Issues in Query Processing for Wireless Sensor Networks
1Implementation and Research Issues in Query
Processing for Wireless Sensor Networks
- Wei Hong
- Intel Research, Berkeley
- whong_at_intel-research.net
Sam Madden MIT madden_at_csail.mit.edu
MDM Tutorial, January 19th 2004
2Motivation
- Sensor networks (aka sensor webs, emnets) are
here - Several widely deployed HW/SW platforms
- Low power radio, small processor, RAM/Flash
- Variety of (novel) applications scientific,
industrial, commercial - Great platform for mobile ubicomp
experimentation - Real, hard research problems to be solved
- Networking, systems, languages, databases
- We will summarize
- The state of the art
- Our experiences building TinyDB
- Current and future research directions
Berkeley Mote
3Sensor Network Apps
Habitat Monitoring Storm petrels on Great Duck
Island, microclimates on James Reserve.
Just the tip of the iceberg -- more tomorrow!
4Declarative Queries
- Programming Apps is Hard
- Limited power budget
- Lossy, low bandwidth communication
- Require long-lived, zero admin deployments
- Distributed Algorithms
- Limited tools, debugging interfaces
- Queries abstract away much of the complexity
- Burden on the database developers
- Users get
- Safe, optimizable programs
- Freedom to think about apps instead of details
5TinyDB Prototype declarativequery processor
- Platform Berkeley Motes TinyOS
- Continuous variant of SQL TinySQL
- Power and data-acquisition based in-network
optimization framework - Extensible interface for aggregates, new types of
sensors
6Agenda
- Part 1 Sensor Networks (50 Minutes)
- TinyOS
- NesC
- Short Break
- Part 2 TinyDB (1 Hour)
- Data Model and Query Language
- Software Architecture
- Long Break Hands On
- Part 3 Sensor Network Database Research
Directions (1 Hour, 10 Minutes)
7Part 1
- Sensornet Background
- Motes Mote Hardware
- TinyOS
- Programming Model NesC
- TinyOS Architecture
- Major Software Subsystems
- Networking Services
8A Brief History of Sensornets
- People have used sensors for a long time
- Recent CS History
- (1998) Pottie Kaiser Radio based networks of
sensors - (1998) Pister et al Smart Dust
- Initial focus on optical communication
- By 1999, radio based networks, COTS Dust, Motes
- (1999) Estrin Govindan
- Ad-hoc networks of sensors
- (2000) Culler/Hill et al TinyOS Motes
- (2002) Hill / Dust SPEC, mm3 scale computing
- UCLA / USC / Berkeley Continue to Lead Research
- Many other players now
- TinyOS/Motes as most common platform
- Emerging commercial space
- Crossbow, Ember, Dust, Sensicast, Moteiv, Intel
9Why Now?
- Commoditization of radio hardware
- Cellular and cordless phones, wireless
communication - (some radio pictures, etc.)
- Low cost -gt many/tiny -gt new applications!
- Real application for ad-hoc network research from
the late 90s - Coming together of EE CS communities
10Motes
4Mhz, 8 bit Atmel RISC uProc 40 kbit Radio 4 K
RAM, 128 K Program Flash, 512 K Data Flash AA
battery pack Based on TinyOS
Mica Mote
Mica2Dot
11History of Motes
- Initial research goal wasnt hardware
- Has since become more of a priority with emerging
hardware needs, e.g. - Power consumption
- (Ultrasonic) ranging localization
- MIT Cricket, NEST Project
- Connectivity with diverse sensors
- UCLA sensor board
- Even so, now on the 5th generation of devices
- Costs down to 50/node (Moteiv, Dust)
- Greatly improved radio quality
- Multitude of interfaces USB, Ethernet, CF, etc.
- Variety of form factors, packages
12Motes vs. Traditional Computing
- Lossy, Adhoc Radio Communication
- Sensing Hardware
- Severe Power Constraints
13Radio Communication
- Low Bandwidth Shared Radio Channel
- 40kBits on motes
- Much less in practice
- Encoding, Contention for Media Access (MAC)
- Very lossy 30 base loss rate
- Argues against TCP-like end-to-end retransmission
- And for link-layer retries
- Generally, not well behaved
14Types of Sensors
- Sensors attach via daughtercard
- Weather
- Temperature
- Light x 2 (high intensity PAR, low intensity,
full spectrum) - Air Pressure
- Humidity
- Vibration
- 2 or 3 axis accelerometers
- Tracking
- Microphone (for ranging and acoustic signatures)
- Magnetometer
- GPS
15Power Consumption and Lifetime
- Power typically supplied by a small battery
- 1000-2000 mAH
- 1 mAH 1 milliamp current for 1 hour
- Typically at optimum voltage, current drain rates
- Power Watts (W) Amps (A) Volts (V)
- Energy Joules (J) W time
- Lifetime, power consumption varies by application
- Processor 5mA active, 1 mA idle, 5 uA sleeping
- Radio 5 mA listen, 10 mA xmit/receive, 20mS /
packet - Sensors 1 uA -gt 100s mA, 1 uS -gt 1 S / sample
16Energy Usage in A Typical Data Collection Scenario
- Each mote collects 1 sample of (light,humidity)
data every 10 seconds, forwards it - Each mote can hear 10 other motes
- Process
- Wake up, collect samples ( 1 second)
- Listen to radio for messages to forward (1
second) - Forward data
17Sensors Slow, Power Hungry, Noisy
18Programming Sensornets TinyOS
- Component Based Programming Model
- Suite of software components
- Timers, clocks, clock synchronization
- Single and multi-hop networking
- Power management
- Non-volatile storage management
19Programming Philosophy
- Component Based
- Wiring to components together via interfaces,
configurations - Split-Phased
- Nothing blocks, ever.
- Instead, completion events are signaled.
- Highly Concurrent
- Single thread of tasks, posted and scheduled
FIFO - Events fired asynchronously in response to
interrupts.
20NesC
- C-like programming language with component model
support - Compiles into GCC-compatible C
- 3 types of files
- Interfaces
- Set of function prototypes no implementations
or variables - Modules
- Provide (implement) zero or more interfaces
- Require zero or more interfaces
- May define module variables, scoped to functions
in module - Configurations
- Wire (connect) modules according to
requires/provides relationship
21Component Example Leds
. async command result_t Leds.redOn()
dbg(DBG_LED, "LEDS Red on.\n") atomic
TOSH_CLR_RED_LED_PIN() ledsOn
RED_BIT return SUCCESS .
- module LedsC
- provides interface Leds
-
- implementation
-
- uint8_t ledsOn
- enum
- RED_BIT 1,
- GREEN_BIT 2,
- YELLOW_BIT 4
-
22Configuration Example
- configuration CntToLedsAndRfm
-
- implementation
- components Main, Counter, IntToLeds, IntToRfm,
TimerC - Main.StdControl -gt Counter.StdControl
- Main.StdControl -gt IntToLeds.StdControl
- Main.StdControl -gt IntToRfm.StdControl
- Main.StdControl -gt TimerC.StdControl
- Counter.Timer -gt TimerC.Timerunique("Timer")
- IntToLeds lt- Counter.IntOutput
- Counter.IntOutput -gt IntToRfm
-
23Split Phase Example
- module IntToRfmM
- implementation
- command result_t IntOutput.output
- (uint16_t value)
- IntMsg message (IntMsg )data.data
- if (!pending)
- pending TRUE
- message-gtval value
- atomic
- message-gtsrc TOS_LOCAL_ADDRESS
-
- if (call Send.send(TOS_BCAST_ADDR,
-
sizeof(IntMsg), data)) - return SUCCESS
- pending FALSE
-
- return FAIL
-
event result_t Send.sendDone
(TOS_MsgPtr msg,
result_t success) if (pending msg
data) pending FALSE signal
IntOutput.outputComplete
(success) return SUCCESS
24Major Components
- Timers Clock, TimerC, LogicalTime
- Networking Send, GenericComm, AMStandard,
lib/Route - Power Management HPLPowerManagement
- Storage Management EEPROM, MatchBox
25Timers
- Clock Basic abstraction over hardware timers
periodic events, single frequency. - LogicalTime Fire an event some number of
HMSms in the future. - TimerC Multiplex multiple periodic timers on
top of LogicalTime.
26Radio Stack
- Interfaces
- Send
- Broadcast, or to a specific ID
- split phase
- Receive
- asynchronous signal
- Implementations
- AMStandard
- Application specific messages
- Id-based dispatch
- GenericComm
- AMStandard Serial IO
- Lib/Route
- Mulithop
IntMsg message (IntMsg )data.data message-gt
val value atomic message-gtsrc
TOS_LOCAL_ADDRESS call Send.send(TOS_BCAST_ADDR
, sizeof(IntMsg), data))
event TOS_MsgPtr ReceiveIntMsg. receive(TOS_MsgP
tr m) IntMsg message (IntMsg )m-gtdata
call IntOutput.output(message-gtval)
return m
Wiring to equate IntMsg to ReceiveIntMsg
27Multihop Networking
- Standard implementation tree based routing
Problems Parent Selection Asymmetric
Links Adaptation vs. Stability
28Geographic Routing
- Any-to-any routing via geographic coordinates
- See GPSR, MOBICOM 2000, Karp Kung.
- Requires coordinate system
- Requires endpont coordinates
- Hard to route around local minima (holes)
B
A
Could be virtual, as in Rao et al Geographic
Routing Without Coordinate Information. MOBICOM
2003
29Power Management
- HPLPowerManagement
- TinyOS sleeps processor when possible
- Observes the radio, sensor, and timer state
- Application managed, for the most part
- App. must turn off subsystems when not in use
- Helper utility ServiceScheduler
- Peridically calls the start and stop methods
of an app - More on power management in TinyDB later
- Approach works because
- single application
- no interactivity requirements
30Non-Volatile Storage
- EEPROM
- 512K off chip, 32K on chip
- Writes at disk speeds, reads at RAM speeds
- Interface random access, read/write 256 byte
pages - Maximum throughput 10Kbytes / second
- MatchBox Filing System
- Provides a Unix-like file I/O interface
- Single, flat directory
- Only one file being read/written at a time
31TinyOS Getting Started
- The TinyOS home page
- http//webs.cs.berkeley.edu/tinyos
- Start with the tutorials!
- The CVS repository
- http//sf.net/projects/tinyos
- The NesC Project Page
- http//sf.net/projects/nescc
- Crossbow motes (hardware)
- http//www.xbow.com
- Intel Imote
- www.intel.com/research/exploratory/motes.htm.
32Part 2
- The Design and Implementation of TinyDB
33Part 2 Outline
- TinyDB Overview
- Data Model and Query Language
- TinyDB Java API and Scripting
- Demo with TinyDB GUI
- TinyDB Internals
- Extending TinyDB
- TinyDB Status and Roadmap
34TinyDB Revisited
SELECT MAX(mag) FROM sensors WHERE mag gt
thresh SAMPLE PERIOD 64ms
- High level abstraction
- Data centric programming
- Interact with sensor network as a whole
- Extensible framework
- Under the hood
- Intelligent query processing query optimization,
power efficient execution - Fault Mitigation automatically introduce
redundancy, avoid problem areas
App
Query, Trigger
Data
TinyDB
35Feature Overview
- Declarative SQL-like query interface
- Metadata catalog management
- Multiple concurrent queries
- Network monitoring (via queries)
- In-network, distributed query processing
- Extensible framework for attributes, commands and
aggregates - In-network, persistent storage
36Architecture
TinyDB GUI
JDBC
TinyDB Client API
DBMS
PC side
0
Mote side
0
TinyDB query processor
2
1
3
8
4
5
6
Sensor network
7
37Data Model
- Entire sensor network as one single,
infinitely-long logical table sensors - Columns consist of all the attributes defined in
the network - Typical attributes
- Sensor readings
- Meta-data node id, location, etc.
- Internal states routing tree parent, timestamp,
queue length, etc. - Nodes return NULL for unknown attributes
- On server, all attributes are defined in
catalog.xml - Discussion other alternative data models?
38Query Language (TinySQL)
- SELECT ltaggregatesgt, ltattributesgt
- FROM sensors ltbuffergt
- WHERE ltpredicatesgt
- GROUP BY ltexprsgt
- SAMPLE PERIOD ltconstgt ONCE
- INTO ltbuffergt
- TRIGGER ACTION ltcommandgt
39Comparison with SQL
- Single table in FROM clause
- Only conjunctive comparison predicates in WHERE
and HAVING - No subqueries
- No column alias in SELECT clause
- Arithmetic expressions limited to column op
constant - Only fundamental difference SAMPLE PERIOD clause
40TinySQL Examples
Find the sensors in bright nests.
Sensors
- SELECT nodeid, nestNo, light
- FROM sensors
- WHERE light gt 400
- EPOCH DURATION 1s
1
Epoch Nodeid nestNo Light
0 1 17 455
0 2 25 389
1 1 17 422
1 2 25 405
41TinySQL Examples (cont.)
Count the number occupied nests in each loud
region of the island.
Epoch region CNT() AVG()
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
42Event-based Queries
- ON event SELECT
- Run query only when interesting events happens
- Event examples
- Button pushed
- Message arrival
- Bird enters nest
- Analogous to triggers but events are user-defined
43Query over Stored Data
- Named buffers in Flash memory
- Store query results in buffers
- Query over named buffers
- Analogous to materialized views
- Example
- CREATE BUFFER name SIZE x (field1 type1, field2
type2, ) - SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO
name - SELECT field1, field2, FROM name SAMPLE PERIOD d
44Using the Java API
- SensorQueryer
- translateQuery() converts TinySQL string into
TinyDBQuery object - Static query optimization
- TinyDBNetwork
- sendQuery() injects query into network
- abortQuery() stops a running query
- addResultListener() adds a ResultListener that is
invoked for every QueryResult received - removeResultListener()
- QueryResult
- A complete result tuple, or
- A partial aggregate result, call
mergeQueryResult() to combine partial results - Key difference from JDBC push vs. pull
45Writing Scripts with TinyDB
- TinyDBs text interface
- java net.tinyos.tinydb.TinyDBMain run select
- Query results printed out to the console
- All motes get reset each time new query is posed
- Handy for writing scripts with shell, perl, etc.
46Using the GUI Tools
47Inside TinyDB
Multihop Network
Query Processor
10,000 Lines Embedded C Code 5,000 Lines
(PC-Side) Java 3200 Bytes RAM (w/ 768 byte
heap) 58 kB compiled code (3x larger than 2nd
largest TinyOS Program)
Filterlight gt 400
Schema
TinyOS
TinyDB
48Tree-based Routing
- Tree-based routing
- Used in
- Query delivery
- Data collection
- In-network aggregation
- Relationship to indexing?
49Power Management Approach
- Coarse-grained app-controlled communication
scheduling
Epoch (10s -100s of seconds)
Mote ID
1
zzz
zzz
2
3
4
5
time
2-4s Waking Period
50Time Synchronization
- All messages include a 5 byte time stamp
indicating system time in ms - Synchronize (e.g. set system time to timestamp)
with - Any message from parent
- Any new query message (even if not from parent)
- Punt on multiple queries
- Timestamps written just after preamble is xmitted
- All nodes agree that the waking period begins
when (system time epoch dur 0) - And lasts for WAKING_PERIOD ms
- Adjustment of clock happens by changing duration
of sleep cycle, not wake cycle.
51Extending TinyDB
- Why extending TinyDB?
- New sensors ? attributes
- New control/actuation ? commands
- New data processing logic ? aggregates
- New events
- Analogous to concepts in object-relational
databases
52Adding Attributes
- Types of attributes
- Sensor attributes raw or cooked sensor readings
- Introspective attributes parent, voltage, ram
usage, etc. - Constant attributes constant values that can be
statically or dynamically assigned to a mote,
e.g., nodeid, location, etc.
53Adding Attributes (cont)
- Interfaces provided by Attr component
- StdControl init, start, stop
- AttrRegister
- command registerAttr(name, type, len)
- event getAttr(name, resultBuf, errorPtr)
- event setAttr(name, val)
- command getAttrDone(name, resultBuf, error)
- AttrUse
- command startAttr(attr)
- event startAttrDone(attr)
- command getAttrValue(name, resultBuf, errorPtr)
- event getAttrDone(name, resultBuf, error)
- command setAttrValue(name, val)
54Adding Attributes (cont)
- Steps to adding attributes to TinyDB
- Create attribute nesC components
- Wire new attribute components to TinyDBAttr
configuration - Reprogram TinyDB motes
- Add new attribute entries to catalog.xml
- Constant attributes can be added on the fly
through TinyDB GUI
55Adding Aggregates
- Step 1 wire new nesC components
56Adding Aggregates (cont)
- Step 2 add entry to catalog.xml
- ltaggregategt
- ltnamegtAVGlt/namegt
- ltidgt5lt/idgt
- lttemporalgtfalselt/temporalgt
- ltreaderClassgtnet.tinyos.tinydb.AverageClasslt/read
erClassgt - lt/aggregategt
- Step 3 (optional) implement reader class in Java
- a reader class interprets and finalizes aggregate
state received from the mote network, returns
final result as a string for display.
57TinyDB Status
- Latest released with TinyOS 1.1 (9/03)
- Install the task-tinydb package in TinyOS 1.1
distribution - First release in TinyOS 1.0 (9/02)
- Widely used by research groups as well as
industry pilot projects - Successful deployments in Intel Berkeley Lab and
redwood trees at UC Botanical Garden - Largest deployment 80 weather station nodes
- Network longevity 4-5 months
58The Redwood Tree Deployment
- Redwood Grove in UC Botanical Garden, Berkeley
- Collect dense sensor readings to monitor climatic
variations across - altitudes,
- angles,
- time,
- forest locations, etc.
- Versus sporadic monitoring points with 30lb
loggers! - Current focus study how dense sensor data affect
predictions of conventional tree-growth models
59Data from Redwoods
36m
33m 111
32m 110
30m 109,108,107
20m 106,105,104
10m 103, 102, 101
60TinyDB Roadmap (near term)
- Support for high frequency sampling
- Equipment vibration monitoring, structural
monitoring, etc. - Store and forward
- Bulk reliable data transfer
- Scheduling of communications
- Port to Intel Mote
- Deployment in Intel Fab equipment monitoring
application and the Golden Gate Bridge monitoring
application
61For more information
- http//berkeley.intel-research.net/tinydb or
http//triplerock.cs.bekeley.edu/tinydb
62Part 3
- Database Research Issues in Sensor Networks
63Sensor Network Research
- Very active research area
- Cant summarize it all
- Focus database-relevant research topics
- Some outside of Berkeley
- Other topics that are itching to be scratched
- But, some bias towards work that we find
compelling
64Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
65Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
66Tiny Aggregation (TAG)
- In-network processing of aggregates
- Common data analysis operation
- Aka gather operation or reduction in
programming - Communication reducing
- Operator dependent benefit
- Across nodes during same epoch
- Exploit query semantics to improve efficiency!
Madden, Franklin, Hellerstein, Hong. Tiny
AGgregation (TAG), OSDI 2002.
67Basic Aggregation
- In each epoch
- Each node samples local sensors once
- Generates partial state record (PSR)
- local readings
- readings from children
- Outputs PSR during assigned comm. interval
- At end of epoch, PSR for whole network output at
root - New result on each successive epoch
- Extras
- Predicate-based partitioning via GROUP BY
68Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Epoch
1 2 3 4 5
4 1
3
2
1
4
Interval
1
69Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 3
Sensor
1 2 3 4 5
4 1
3 2
2
1
4
2
Interval
70Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 2
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1
4
1
3
Interval
71Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 1
5
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
Interval
72Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
Interval
1
73Aggregation Framework
- As in extensible databases, TinyDB supports any
aggregation function conforming to
Aggnfinit, fmerge, fevaluate Finit a0 ?
lta0gt Fmerge lta1gt,lta2gt ? lta12gt Fevaluate lta1gt
? aggregate value
Partial State Record (PSR)
Example Average AVGinit v ?
ltv,1gt AVGmerge ltS1, C1gt, ltS2, C2gt ? lt S1
S2 , C1 C2gt AVGevaluateltS, Cgt ? S/C
Restriction Merge associative, commutative
74Taxonomy of Aggregates
- TAG insight classify aggregates according to
various functional properties - Yields a general set of optimizations that can
automatically be applied
Drives an API!
Property Examples Affects
Partial State MEDIAN unbounded, MAX 1 record Effectiveness of TAG
Monotonicity COUNT monotonic AVG non-monotonic Hypothesis Testing, Snooping
Exemplary vs. Summary MAX exemplary COUNT summary Applicability of Sampling, Effect of Loss
Duplicate Sensitivity MIN dup. insensitive, AVG dup. sensitive Routing Redundancy
75Use Multiple Parents
- Use graph structure
- Increase delivery probability with no
communication overhead - For duplicate insensitive aggregates, or
- Aggs expressible as sum of parts
- Send (part of) aggregate to all parents
- In just one message, via multicast
- Assuming independence, decreases variance
SELECT COUNT()
of parents n E(cnt) n (c/n
p2) Var(cnt) n (c/n)2 p2 (1 p2) V/n
P(link xmit successful) p P(success from A-gtR)
p2 E(cnt) c p2 Var(cnt) c2 p2 (1
p2) ? V
76Multiple Parents Results
- Better than previous analysis expected!
- Losses arent independent!
- Insight spreads data over many links
77Acquisitional Query Processing (ACQP)
- TinyDB acquires AND processes data
- Could generate an infinite number of samples
- An acqusitional query processor controls
- when,
- where,
- and with what frequency data is collected!
- Versus traditional systems where data is provided
a priori
Madden, Franklin, Hellerstein, and Hong. The
Design of An Acqusitional Query Processor.
SIGMOD, 2003.
78ACQP Whats Different?
- How should the query be processed?
- Sampling as a first class operation
- How does the user control acquisition?
- Rates or lifetimes
- Event-based triggers
- Which nodes have relevant data?
- Index-like data structures
- Which samples should be transmitted?
- Prioritization, summary, and rate control
79Operator Ordering Interleave Sampling Selection
At 1 sample / sec, total power savings could be
as much as 3.5mW ? Comparable to processor!
- SELECT light, mag
- FROM sensors
- WHERE pred1(mag)
- AND pred2(light)
- EPOCH DURATION 1s
- E(sampling mag) gtgt E(sampling light)
- 1500 uJ vs. 90 uJ
80Exemplary Aggregate Pushdown
- SELECT WINMAX(light,8s,8s)
- FROM sensors
- WHERE mag gt x
- EPOCH DURATION 1s
- Novel, general pushdown technique
- Mag sampling is the most expensive operation!
81Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
82Heterogeneous Sensor Networks
- Leverage small numbers of high-end nodes to
benefit large numbers of inexpensive nodes - Still must be transparent and ad-hoc
- Key to scalability of sensor networks
- Interesting heterogeneities
- Energy battery vs. outlet power
- Link bandwidth Chipcon vs. 802.11x
- Computing and storage ATMega128 vs. Xscale
- Pre-computed results
- Sensing nodes vs. QP nodes
83Computing Heterogeneity with TinyDB
- Separate query processing from sensing
- Provide query processing on a small number of
nodes - Attract packets to query processors based on
service value - Compare the total energy consumption of the
network
- No aggregation
- All aggregation
- Opportunistic aggregation
- HSN proactive aggregation
Mark Yarvis and York Liu, Intels Heterogeneous
Sensor Network Project, ftp//download.intel.com/r
esearch/people/HSN_IR_Day_Poster_03.pdf.
845x7 TinyDB/HSN Mica2 Testbed
85Data Packet Saving
- How many aggregators are desired?
- Does placement matter?
86Occasionally Connected Sensornets
internet
GTWY
Mobile GTWY
Mobile GTWY
Mobile GTWY
GTWY
87Occasionally Connected Sensornets Challenges
- Networking support
- Tradeoff between reliability, power consumption
and delay - Data custody transfer duplicates?
- Load shedding
- Routing of mobile gateways
- Query processing
- Operation placement in-network vs. on mobile
gateways - Proactive pre-computation and data movement
- Tight interaction between networking and QP
Fall, Hong and Madden, Custody Transfer for
Reliable Delivery in Delay Tolerant Networks,
http//www.intel-research.net/Publications/Berkele
y/081220030852_157.pdf.
88Distributed In-network Storage
- Collectively, sensornets have large amounts of
in-network storage - Good for in-network consumption or caching
- Challenges
- Distributed indexing for fast query dissemination
- Resilience to node or link failures
- Graceful adaptation to data skews
- Minimizing index insertion/maintenance cost
89Example DIM
- Functionality
- Efficient range query for multidimensional data.
- Approaches
- Divide sensor field into bins.
- Locality preserving mapping from m-d space to
geographic locations. - Use geographic routing such as GPSR.
- Assumptions
- Nodes know their locations and network boundary
- No node mobility
Xin Li, Young Jin Kim, Ramesh Govindan and Wei
Hong, Distributed Index for Multi-dimentional
Data (DIM) in Sensor Networks, SenSys 2003.
90Statistical Techniques
- Approximations, summaries, and sampling based on
statistics - Applications
- Limited bandwidth and large number of nodes -gt
data reduction - Lossiness -gt predictive modeling
- Uncertainty -gt tracking correlations and changes
over time
91IDSQ
- Idea task sensors in order of best improvement
to estimate of some value - Choose leader(s)
- Suppress subordinates
- Task subordinates, one at a time
- Until some measure of goodness (error bound) is
met - E.g. Mahalanobis Distance -- Accounts for
correlations in axes, tends to favor minimizing
principal axis
See Scalable Information-Driven Sensor Querying
and Routing for ad hoc Heterogeneous Sensor
Networks. Chu, Haussecker and Zhao. Xerox TR
P2001-10113. May, 2001.
92Model location estimate as a point with
2-dimensional Gaussian uncertainty.
Graphical Representation
Principal Axis
93In-Net Regression
- Linear regression simple way to predict future
values, identify outliers
- Regression can be across local or remote values,
multiple dimensions, or with high degree
polynomials - E.g., node A readings vs. node Bs
- Or, location (X,Y), versus temperature
- E.g., over many nodes
Guestrin, Thibaux, Bodik, Paskin, Madden.
Distributed Regression an Efficient Framework
for Modeling Sensor Network Data . Under
submission.
94In-Net Regression (Continued)
- Problem may require data from all sensors to
build model - Solution partition sensors into overlapping
kernels that influence each other - Run regression in each kernel
- Requiring just local communication
- Blend data between kernels
- Requires some clever matrix manipulation
- End result regressed model at every node
- Useful in failure detection, missing value
estimation
95Correlated Attributes
- Data in sensor networks is correlated e.g.,
- Temperature and voltage
- Temperature and light
- Temperature and humidity
- Temperature and time of day
- etc.
96Exploiting Correlations in Query Processing
- Simple idea
- Given predicate P(A) over expensive attribute A
- Replace it with P over cheap attribute A such
that P evaluates to P - Problem unless A and A are perfectly
correlated, P ? P for all time - So we could incorrectly accept or reject some
readings - Alternative use correlations to improve
selectivity estimates in query optimization - Construct conditional plans that vary predicate
order based on prior observations
97Exploiting Correlations (Cont.)
- Insight by observing a (cheap and correlated)
variable not involved in the query, it may be
possible to improve query performance - Improves estimates of selectivities
- Use conditional plans
- Example
98In-Network Join Strategies
- Types of joins
- non-sensor -gt sensor
- sensor -gt sensor
- Optimization questions
- Should the join be pushed down?
- If so, where should it be placed?
- What if a join table exceeds the memory available
on one node?
99Choosing Where to Place Operators
- Idea choose a join node to run the operator
- Over time, explore other candidate placements
- Nodes advertise data rates to their neighbors
- Neighbors compute expected cost of running the
join based on these rates - Neighbors advertise costs
- Current join node selects a new, lower cost node
Bonfils Bonnet, Adaptive and Decentralized
Operator Placement for In-Network QueryProcessing
IPSN 2003.
100Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
101Adaptivity In Sensor Networks
- Queries are long running
- Selectivities change
- E.g. night vs day
- Network load and available energy vary
- All suggest that some adaptivity is needed
- Of data rates or granularity of aggregation when
optimizing for lifetimes - Of operator orderings or placements when
selectivities change (c.f., conditional plans for
correlations) - As far as we know, this is an open problem!
102Multiple Queries and Work Sharing
- As sensornets evolve, users will run many queries
simultaneously - E.g., traffic monitoring
- Likely that queries will be similar
- But have different end points, parameters, etc
- Would like to share processing, routing as much
as possible - But how? Again, an open problem.
103Concluding Remarks
- Sensor networks are an exciting emerging
technology, with a wide variety of applications - Many research challenges in all areas of computer
science - Database community included
- Some agreement that a declarative interface is
right - TinyDB and other early work are an important
first step - But theres lots more to be done!