Title: The Cougar Approach to InNetwork Query Processing in Sensor Networks
1The Cougar Approach to In-Network Query
Processing in Sensor Networks
- Authors Yong Yao and Johannes Gehrke
- Cornell University
- Presented by Nanyan Jiang
2Talk Outline
- Motivation
- Declarative queries
- In-network processing
- Architecture
- Research Problems
- Aggregation
- Query plan and optimization
- Catalog management
- Multi-query optimization
- Conclusion and Discussion
3Motivation
- Many Sensor Network Applications are Data
oriented - Queries natural and efficient data processing
mechanism - Easy, e.g. declarative query
- Enable optimizations through abstraction
- Aggregates common case
- In-network processing a must
- Sensor networks power and bandwidth constrained
- Communication dominates power cost
4Cougar Approach Distributed Sensor Databases
- Access to sensors is dictated by the query
workload - Trade-off between local computation on devices
and reduced communication (in-network processing)
5The Cougar View of Sensor Networks
- Traditional
- Procedural addressing of individual sensor nodes
user specifies how task executes, data is
processed centrally. - Cougar
- Complex declarative querying and tasking. User
isolated from how the network works, in-network
distributed processing.
Queries
Temperature
Time Value
200 15 400 12
Temperature
Time Value
200 10 400 13
Pressure
Humidity
Temperature
Time Value
Time Value
Time Value
100 30 400 35
230 70 330 75
200 20 300 18
6Database Approaches for accessing Sensor Networks
- Warehousing approach
- Device data is extracted in a predefined way
- Device data is stored in a centralized DB server
- Queries are evaluated on the centralized DB
server - Distributed approach
- Queries are evaluated by contacting devices
- Portions of queries are executed on the devices
7Query examples
- Snapshot queries
- How many empty bird nests are in the northeastern
- quadrant of the forest?
- SELECT SUM(s)
- FROM SensorData s
- WHERE s.nest empty and s.loc in
(50,50,100,100) - Long-running queries
- Notify me over the next hour whenever the number
of - empty nests in an area exceeds a threshold.
- SELECT s.area, SUM(s)
- FROM SensorData s
- WHERE s.nest empty
- GROUP BY s.area
- HAVING SUM(s) T
- DURATION (now, now60)
- EVERY 5
8Cougar System Architecture
Higher-level tasking and analysis
Proxy Server
Frontend
Diffusion Routing
Query Proxy
Node
Diffusion Routing
Signal Processing
9Cougar (Mica Mote)
- A platform for testing query processing
techniques over ad-hoc sensor networks - Three tier system
- Running TinyOS, an embedded operating system from
Berkeley, on the motes - Server handling query interface with motes and
database mapping - GUI human usable interface
10Three Tier Architecture
Thin Client GUI
Wire/Wireless Communication
Server (Java)
Radio Communication
TinyOS/NesC
11Architecture
- The gateway node
- A query optimizer generates distributed query
plans - Query plan is created according to catalog
information and the query specification - Data flow (between sensors)
- Computation plan (at each sensor)
- Plan is disseminated to the network
- Sensor node
- Execute plan accordingly
12Query Plan
- Example What is the quietest classroom in Upson
Hall? - A flow block is the basic component of a query
plan - Determines the data flow between sensors
- Determines the computation plan.
- Query plan
- computation
- compute average acoustic value of each open
classroom - Select the room with the smallest number
- The output of first level aggregation is the
input of 2nd level - communication
- structure, e.g. Tree or DAG
13How To Execute The Query?
- Sensor nodes have computation and storage
capabilities. Two choices Centralized processing
or in-network processing. Why in-network
processing? - Sensor networks are power constrained.
- Local computation is much cheaper than
communication. - A lot of sensor data but few queries, and only a
subset of data is involved in queries - In-network processing provides a trade-off
between computation and communication. Cougar
prolongs the lifetime of sensor networks through
distributed in-network processing and carefully
designed sensor coordination
14Query Execution
- Server creates an optimized plan according to the
user query, and catalog information. - Query plan is forwarded to relevant sensor nodes.
- Sensor nodes coordinate to execute the query
according to the specification of the query plan.
- Server gets results from gateway nodes.
15Research Issues
- In-network processing
- Aggregation
- Catalog Management
- Multi-query Optimization
16Aggregation
SELECT SUM(s) FROM SensorData s WHERE s.nest
empty EVERY 60 min
- Aggregation refers to delivering data from
distributed source sensors to a central node for
computation. - Data records are delivered from source sensor
nodes to designated leader node - Aggregation is executed at the leader node
- Centralized Processing
- In-network Processing
- Gehrke et al. (Cougar)
- Madden et al (TAG)
- Forming the routing tree
- Compute an aggregate
- Chalermek et al. (Directed diffusion)
17In-network Processing
SELECT SUM(s) FROM SensorData s WHERE s.nest
empty EVERY 60 min
- With in-network processing,
- the number of results at each
- edge remains constant.
- Reduces communication overhead
- Reduces energy consumption
- Increases network lifetime
18Schemes of In-network processing
View Node
Query
Result
19Approaches of In-network Processing
20Catalog Management
- To generate plans for queries, optimizer needs
metadata about the status of sensor network to
evaluate costs and benefits of different plans - Catalog could be built/maintained on server to
hold information like sensor position, density,
connectivity, system workload, and network
stability - Queries could be used to update the catalog
periodically - Or catalogs may be assembled dynamically through
gossip-style information dissemination - Due to size of metadata and dynamics of sensor
networks, it is likely prohibitive to collect all
metadata at central node and to keep them
up-to-date. - Need to define synopsis data structures that are
cheap to create and maintain, but still contain
enough detail for query optimization
21Multi Query Optimization
- Multi-query optimization can achieve further
energy savings - identify common sub-aggregates shared among
queries - consider query and sensor update probabilities
- return results of updated queries ?
- return values of updated sensors ?
- determine suitable routes for sending back the
results
22Conclusion
- View sensor networks as database
- Declarative queries
- In-network query processing
23Discussion
- Decentralized query execution/optimization
- A single site cannot maintain precise
meta-information about the complete system - Adaptive query processing
- Conditions in a sensor network change over time
- sensors fail, move or disconnect
24Thank You