Title: Implementation and Research Issues in Query Processing for Wireless Sensor Networks
1Implementation and Research Issues in Query
Processing for Wireless Sensor Networks
- Wei Hong
- Intel Research, Berkeley
- whong_at_intel-research.net
Sam Madden MIT madden_at_csail.mit.edu
Adapted by L.B.
2Declarative Queries
- Programming Apps is Hard
- Limited power budget
- Lossy, low bandwidth communication
- Require long-lived, zero admin deployments
- Distributed Algorithms
- Limited tools, debugging interfaces
- Queries abstract away much of the complexity
- Burden on the database developers
- Users get
- Safe, optimizable programs
- Freedom to think about apps instead of details
3TinyDB Prototype declarativequery processor
- Platform Berkeley Motes TinyOS
- Continuous variant of SQL TinySQL
- Power and data-acquisition based in-network
optimization framework - Extensible interface for aggregates, new types of
sensors
4TinyDB Revisited
SELECT MAX(mag) FROM sensors WHERE mag gt
thresh SAMPLE PERIOD 64ms
- High level abstraction
- Data centric programming
- Interact with sensor network as a whole
- Extensible framework
- Under the hood
- Intelligent query processing query optimization,
power efficient execution - Fault Mitigation automatically introduce
redundancy, avoid problem areas
App
Query, Trigger
Data
TinyDB
5Feature Overview
- Declarative SQL-like query interface
- Metadata catalog management
- Multiple concurrent queries
- Network monitoring (via queries)
- In-network, distributed query processing
- Extensible framework for attributes, commands and
aggregates - In-network, persistent storage
6Architecture
TinyDB GUI
JDBC
TinyDB Client API
DBMS
PC side
0
Mote side
0
TinyDB query processor
2
1
3
8
4
5
6
Sensor network
7
7Data Model
- Entire sensor network as one single,
infinitely-long logical table sensors - Columns consist of all the attributes defined in
the network - Typical attributes
- Sensor readings
- Meta-data node id, location, etc.
- Internal states routing tree parent, timestamp,
queue length, etc. - Nodes return NULL for unknown attributes
- On server, all attributes are defined in
catalog.xml - Discussion other alternative data models?
8Query Language (TinySQL)
- SELECT ltaggregatesgt, ltattributesgt
- FROM sensors ltbuffergt
- WHERE ltpredicatesgt
- GROUP BY ltexprsgt
- SAMPLE PERIOD ltconstgt ONCE
- INTO ltbuffergt
- TRIGGER ACTION ltcommandgt
9Comparison with SQL
- Single table in FROM clause
- Only conjunctive comparison predicates in WHERE
and HAVING - No subqueries
- No column alias in SELECT clause
- Arithmetic expressions limited to column op
constant - Only fundamental difference SAMPLE PERIOD clause
10TinySQL Examples
Find the sensors in bright nests.
Sensors
- SELECT nodeid, nestNo, light
- FROM sensors
- WHERE light gt 400
- EPOCH DURATION 1s
1
Epoch Nodeid nestNo Light
0 1 17 455
0 2 25 389
1 1 17 422
1 2 25 405
11TinySQL Examples (cont.)
Count the number occupied nests in each loud
region of the island.
Epoch region CNT() AVG()
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
12Event-based Queries
- ON event SELECT
- Run query only when interesting events happens
- Event examples
- Button pushed
- Message arrival
- Bird enters nest
- Analogous to triggers but events are user-defined
13Query over Stored Data
- Named buffers in Flash memory
- Store query results in buffers
- Query over named buffers
- Analogous to materialized views
- Example
- CREATE BUFFER name SIZE x (field1 type1, field2
type2, ) - SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO
name - SELECT field1, field2, FROM name SAMPLE PERIOD d
14Inside TinyDB
Multihop Network
Query Processor
10,000 Lines Embedded C Code 5,000 Lines
(PC-Side) Java 3200 Bytes RAM (w/ 768 byte
heap) 58 kB compiled code (3x larger than 2nd
largest TinyOS Program)
Filterlight gt 400
Schema
TinyOS
TinyDB
15Tree-based Routing
- Tree-based routing
- Used in
- Query delivery
- Data collection
- In-network aggregation
- Relationship to indexing?
16Sensor Network Research
- Very active research area
- Cant summarize it all
- Focus database-relevant research topics
- Some outside of Berkeley
- Other topics that are itching to be scratched
- But, some bias towards work that we find
compelling
17Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
18Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
19Tiny Aggregation (TAG)
- In-network processing of aggregates
- Common data analysis operation
- Aka gather operation or reduction in
programming - Communication reducing
- Operator dependent benefit
- Across nodes during same epoch
- Exploit query semantics to improve efficiency!
Madden, Franklin, Hellerstein, Hong. Tiny
AGgregation (TAG), OSDI 2002.
20Basic Aggregation
- In each epoch
- Each node samples local sensors once
- Generates partial state record (PSR)
- local readings
- readings from children
- Outputs PSR during assigned comm. interval
- At end of epoch, PSR for whole network output at
root - New result on each successive epoch
- Extras
- Predicate-based partitioning via GROUP BY
21Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Epoch
1 2 3 4 5
4 1
3
2
1
4
Interval
1
22Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 3
Sensor
1 2 3 4 5
4 1
3 2
2
1
4
2
Interval
23Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 2
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1
4
1
3
Interval
24Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 1
5
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
Interval
25Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
Interval
1
26Aggregation Framework
- As in extensible databases, TinyDB supports any
aggregation function conforming to
Aggnfinit, fmerge, fevaluate Finit a0 ?
lta0gt Fmerge lta1gt,lta2gt ? lta12gt Fevaluate lta1gt
? aggregate value
Partial State Record (PSR)
Example Average AVGinit v ?
ltv,1gt AVGmerge ltS1, C1gt, ltS2, C2gt ? lt S1
S2 , C1 C2gt AVGevaluateltS, Cgt ? S/C
Restriction Merge associative, commutative
27Taxonomy of Aggregates
- TAG insight classify aggregates according to
various functional properties - Yields a general set of optimizations that can
automatically be applied
Drives an API!
Property Examples Affects
Partial State MEDIAN unbounded, MAX 1 record Effectiveness of TAG
Monotonicity COUNT monotonic AVG non-monotonic Hypothesis Testing, Snooping
Exemplary vs. Summary MAX exemplary COUNT summary Applicability of Sampling, Effect of Loss
Duplicate Sensitivity MIN dup. insensitive, AVG dup. sensitive Routing Redundancy
28Use Multiple Parents
- Use graph structure
- Increase delivery probability with no
communication overhead - For duplicate insensitive aggregates, or
- Aggs expressible as sum of parts
- Send (part of) aggregate to all parents
- In just one message, via multicast
- Assuming independence, decreases variance
SELECT COUNT()
of parents n E(cnt) n (c/n
p2) Var(cnt) n (c/n)2 p2 (1 p2) V/n
P(link xmit successful) p P(success from A-gtR)
p2 E(cnt) c p2 Var(cnt) c2 p2 (1
p2) ? V
29Multiple Parents Results
- Better than previous analysis expected!
- Losses arent independent!
- Insight spreads data over many links
30Acquisitional Query Processing (ACQP)
- TinyDB acquires AND processes data
- Could generate an infinite number of samples
- An acqusitional query processor controls
- when,
- where,
- and with what frequency data is collected!
- Versus traditional systems where data is provided
a priori
Madden, Franklin, Hellerstein, and Hong. The
Design of An Acqusitional Query Processor.
SIGMOD, 2003.
31ACQP Whats Different?
- How should the query be processed?
- Sampling as a first class operation
- How does the user control acquisition?
- Rates or lifetimes
- Event-based triggers
- Which nodes have relevant data?
- Index-like data structures
- Which samples should be transmitted?
- Prioritization, summary, and rate control
32Operator Ordering Interleave Sampling Selection
At 1 sample / sec, total power savings could be
as much as 3.5mW ? Comparable to processor!
- SELECT light, mag
- FROM sensors
- WHERE pred1(mag)
- AND pred2(light)
- EPOCH DURATION 1s
- E(sampling mag) gtgt E(sampling light)
- 1500 uJ vs. 90 uJ
33Exemplary Aggregate Pushdown
- SELECT WINMAX(light,8s,8s)
- FROM sensors
- WHERE mag gt x
- EPOCH DURATION 1s
- Novel, general pushdown technique
- Mag sampling is the most expensive operation!
34Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
35Heterogeneous Sensor Networks
- Leverage small numbers of high-end nodes to
benefit large numbers of inexpensive nodes - Still must be transparent and ad-hoc
- Key to scalability of sensor networks
- Interesting heterogeneities
- Energy battery vs. outlet power
- Link bandwidth Chipcon vs. 802.11x
- Computing and storage ATMega128 vs. Xscale
- Pre-computed results
- Sensing nodes vs. QP nodes
36Computing Heterogeneity with TinyDB
- Separate query processing from sensing
- Provide query processing on a small number of
nodes - Attract packets to query processors based on
service value - Compare the total energy consumption of the
network
- No aggregation
- All aggregation
- Opportunistic aggregation
- HSN proactive aggregation
Mark Yarvis and York Liu, Intels Heterogeneous
Sensor Network Project, ftp//download.intel.com/r
esearch/people/HSN_IR_Day_Poster_03.pdf.
375x7 TinyDB/HSN Mica2 Testbed
38Data Packet Saving
- How many aggregators are desired?
- Does placement matter?
39Occasionally Connected Sensornets
internet
TinyDB Server
GTWY
TinyDB QP
Mobile GTWY
Mobile GTWY
GTWY
GTWY
TinyDB QP
TinyDB QP
40Occasionally Connected Sensornets Challenges
- Networking support
- Tradeoff between reliability, power consumption
and delay - Data custody transfer duplicates?
- Load shedding
- Routing of mobile gateways
- Query processing
- Operation placement in-network vs. on mobile
gateways - Proactive pre-computation and data movement
- Tight interaction between networking and QP
Fall, Hong and Madden, Custody Transfer for
Reliable Delivery in Delay Tolerant Networks,
http//www.intel-research.net/Publications/Berkele
y/081220030852_157.pdf.
41Distributed In-network Storage
- Collectively, sensornets have large amounts of
in-network storage - Good for in-network consumption or caching
- Challenges
- Distributed indexing for fast query dissemination
- Resilience to node or link failures
- Graceful adaptation to data skews
- Minimizing index insertion/maintenance cost
42Example DIM
- Functionality
- Efficient range query for multidimensional data.
- Approaches
- Divide sensor field into bins.
- Locality preserving mapping from m-d space to
geographic locations. - Use geographic routing such as GPSR.
- Assumptions
- Nodes know their locations and network boundary
- No node mobility
Xin Li, Young Jin Kim, Ramesh Govindan and Wei
Hong, Distributed Index for Multi-dimentional
Data (DIM) in Sensor Networks, SenSys 2003.
43Statistical Techniques
- Approximations, summaries, and sampling based on
statistics and statistical models - Applications
- Limited bandwidth and large number of nodes -gt
data reduction - Lossiness -gt predictive modeling
- Uncertainty -gt tracking correlations and changes
over time - Physical models -gt improved query answering
44Correlated Attributes
- Data in sensor networks is correlated e.g.,
- Temperature and voltage
- Temperature and light
- Temperature and humidity
- Temperature and time of day
- etc.
45IDSQ
- Idea task sensors in order of best improvement
to estimate of some value - Choose leader(s)
- Suppress subordinates
- Task subordinates, one at a time
- Until some measure of goodness (error bound) is
met - E.g. Mahalanobis Distance -- Accounts for
correlations in axes, tends to favor minimizing
principal axis
See Scalable Information-Driven Sensor Querying
and Routing for ad hoc Heterogeneous Sensor
Networks. Chu, Haussecker and Zhao. Xerox TR
P2001-10113. May, 2001.
46Model location estimate as a point with
2-dimensional Gaussian uncertainty.
Graphical Representation
Principal Axis
47MQSN Model-based Probabilistic Querying over
Sensor Networks
Joint work with Amol Desphande, Carlos Guestrin,
and Joe Hellerstein
Query Processor
Model
1
3
4
2
5
6
7
8
9
48MQSN Model-based Probabilistic Querying over
Sensor Networks
Query Processor
Model
Consult Model
1
3
4
2
5
6
7
8
9
49MQSN Model-based Probabilistic Querying over
Sensor Networks
Query Processor
Model
Consult Model
1
3
4
2
5
6
7
8
9
50MQSN Model-based Probabilistic Querying over
Sensor Networks
Query Results
Query Processor
Model
Update Model
1
3
4
2
5
6
7
8
9
51Challenges
- What kind of models to use ?
- Optimization problem
- Given a model and a query, find the best set of
attributes to observe - Cost not easy to measure
- Non-uniform network communication costs
- Changing network topologies
- Large plan space
- Might be cheaper to observe attributes not in
query - e.g. Voltage instead of Temperature
- Conditional Plans
- Change the observation plan based on observed
values
52MQSN Current Prototype
- Multi-variate Gaussian Models
- Kalman Filters to capture correlations across
time - Handles
- Range predicate queries
- sensor value within x,y, w/ confidence
- Value queries
- sensor value x, w/in epsilon, w/ confidence
- Simple aggregate queries
- AVG(sensor value) ? n, w/in epsilon, w/confidence
- Uses a greedy algorithm to choose the observation
plan
53In-Net Regression
- Linear regression simple way to predict future
values, identify outliers
- Regression can be across local or remote values,
multiple dimensions, or with high degree
polynomials - E.g., node A readings vs. node Bs
- Or, location (X,Y), versus temperature
- E.g., over many nodes
Guestrin, Thibaux, Bodik, Paskin, Madden.
Distributed Regression an Efficient Framework
for Modeling Sensor Network Data . Under
submission.
54In-Net Regression (Continued)
- Problem may require data from all sensors to
build model - Solution partition sensors into overlapping
kernels that influence each other - Run regression in each kernel
- Requiring just local communication
- Blend data between kernels
- Requires some clever matrix manipulation
- End result regressed model at every node
- Useful in failure detection, missing value
estimation
55Exploiting Correlations in Query Processing
- Simple idea
- Given predicate P(A) over expensive attribute A
- Replace it with P over cheap attribute A such
that P evaluates to P - Problem unless A and A are perfectly
correlated, P ? P for all time - So we could incorrectly accept or reject some
readings - Alternative use correlations to improve
selectivity estimates in query optimization - Construct conditional plans that vary predicate
order based on prior observations
56Exploiting Correlations (Cont.)
- Insight by observing a (cheap and correlated)
variable not involved in the query, it may be
possible to improve query performance - Improves estimates of selectivities
- Use conditional plans
- Example
57In-Network Join Strategies
- Types of joins
- non-sensor -gt sensor
- sensor -gt sensor
- Optimization questions
- Should the join be pushed down?
- If so, where should it be placed?
- What if a join table exceeds the memory available
on one node?
58Choosing Where to Place Operators
- Idea choose a join node to run the operator
- Over time, explore other candidate placements
- Nodes advertise data rates to their neighbors
- Neighbors compute expected cost of running the
join based on these rates - Neighbors advertise costs
- Current join node selects a new, lower cost node
Bonfils Bonnet, Adaptive and Decentralized
Operator Placement for In-Network QueryProcessing
IPSN 2003.
59Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
60Adaptivity In Sensor Networks
- Queries are long running
- Selectivities change
- E.g. night vs day
- Network load and available energy vary
- All suggest that some adaptivity is needed
- Of data rates or granularity of aggregation when
optimizing for lifetimes - Of operator orderings or placements when
selectivities change (c.f., conditional plans for
correlations) - As far as we know, this is an open problem!
61Multiple Queries and Work Sharing
- As sensornets evolve, users will run many queries
simultaneously - E.g., traffic monitoring
- Likely that queries will be similar
- But have different end points, parameters, etc
- Would like to share processing, routing as much
as possible - But how? Again, an open problem.
62Concluding Remarks
- Sensor networks are an exciting emerging
technology, with a wide variety of applications - Many research challenges in all areas of computer
science - Database community included
- Some agreement that a declarative interface is
right - TinyDB and other early work are an important
first step - But theres lots more to be done!