Title: Data Quality and Query Cost in Wireless Sensor Networks
1Data Quality and Query Cost in Wireless Sensor
Networks
- David Yates, Erich Nahum, Jim Kurose, and
Prashant Shenoy - IEEE PerCom 2008
2Papers
Data Quality and Query Cost in Wireless Sensor
Networks IEEE PerSeNS 2007
with analysis of performance trend
Data Quality and Query Cost in Wireless Sensor
Networks IEEE PerCom 2008
3Outline
- Introduction
- Caching and Lookup Policies
- Data Quality and Query Cost
- Discussion of Results
- Performance Trends
- when value deviation is most important
- when end-to-end delay is most important
- Conclusion
4Introduction (1/4)
- Data-centric WSNs
- Environmental and infrastructure monitoring
- Commercial and industrial sensing
- Performance Metrics
- accuracy
- total system end-to-end delay
- the quality of the data provided to sensor
networks applications
5Introduction (2/4)
Sensor Network Deployment Example
Monitoring and control center
Routers and switches
Sensor Field
Data server / Gateway (and cache)
What if the gateway is augmented with storage?
Data Acquisition and Caching
6Introduction (3/4)
Data Server or Gateway with a Cache
cache hit vs. cache miss
7Introduction (4/4)
-
- system delay
- the time between a query arriving
and corresponding - reply departing from
- zero for a cache hit
-
- value deviation
- the unsigned difference between the data value in
- and the true value at location i
8Caching and Lookup PoliciesPrecise Policies and
Approximate Policies
Full
age threshold parameter
All hits
cache entries are never deleted, updated, or
replaced
Greedy Policies
Spatial Locality
Cache Utilization
Greedy age lookups ( ) Greedy distance lookups
( ) Median-of-3 lookups ( )
Precise Policies
Simple lookups ( ), Piggybacked queries ( )
All misses
Not Available
9Data Quality and Query CostQuality Measurement
- Data Quality
- linear combination of normalized system delay and
normalized value deviation - relative importance
-
Softmax normalization
Small values indicate better data quality!
Z-score normalization
10Data Quality and Query CostSimulated Changes to
the Environment (1/2)
- 3-dimensional sensor field
- Rectangular planes on six faces
- sensors
- Four base stations are placed on the X-Y plane
- These base stations are connected to the gateway
server that has the common cache. - The sensors always communicate with their closest
base station.
X
8 unit
Y
6 unit
4 unit
Z
11Data Quality and Query CostSimulated Changes to
the Environment (2/2)
- One-way communication to and from
-
-
- minimum cost to query a location 2 units (query
and reply) - maximum delay to query a location 2 seconds
normalization constant
distance
normalization constant
distance
12Data Quality and Query CostTrace-driven Changes
to the Environment
- Intel Lab Dataset
- 2-dimensional field
- 54 Mica2Dot sensors
- light intensity the most dynamically changing of
sensor values - Assume the sensors always communicate with their
closest base station.
Sensor Field Intel Berkeley Research Lab
13Data Quality and Query CostQuery Workload Model
(1/2)
- Query Workload Model
- periodic arrival process
- random arrival process
- The superposition of two query processes
- polling component
- slowly scans the sensor field at fixed rate
- the period of the polling component of the query
workload - random component
- queries to different locations in the sensor
field - average query arrival rate of the random
component
14Data Quality and Query CostQuery Workload Model
(2/2)
- Simulated changes to the environment
-
-
- exponentially distributed inter-arrival times
with mean - 90 queries per second
- Trace-driven changes to the environment
-
-
- 0.9 queries per second
9 queries/second
0.09 queries/second
15Discussion of ResultsSimulated Testing Dataset
- A. Jindal and K. Psounis
- Reference
- Modeling Spatially-correlated Sensor Network
Data, SECON 2004 - Modeling Spatially Correlated Data in Sensor
Networks, TOSN 2006
Download Tools
16Discussion of ResultsQuery Cost vs. Data Quality
Trade-off
Query Cost vs. Data Quality
A 0.1
A 0.1
Correlated changes over 1000 locations
Trace-driven changes over 54 locations
0 cache hit
linear trade-off
linear trade-off
100 cache hit
17Discussion of ResultsQuery Cost vs. End-to-End
Delay
Query Cost vs. End-to-End Delay
A 0.1
A 0.1
Correlated changes over 1000 locations
Trace-driven changes over 54 locations
1.18
4.4
an increase in the normalized delay term!
18Discussion of ResultsQuery Cost vs. Data Quality
Trade-off
Query Cost vs. Data Quality
A 0.9
A 0.9
Correlated changes over 1000 locations
Trace-driven changes over 54 locations
No trade-off
No trade-off
the best performance
the best performance
19Discussion of ResultsHit Ratios, Query Costs,
and End-to-End Delays
Hit Ratios, Query Costs, and End-to-End Delays
, 90 queries/second
T 90, 0.9 queries/second
Correlated changes over 1000 locations
Trace-driven changes over 54 locations
Hit ratio
Query Cost
End-to-End Delay
20Discussion of ResultsQuery Cost vs. Value
Deviation
Query Cost vs. Value Deviation
A 0.1
A 0.1
Correlated changes over 1000 locations
Trace-driven changes over 54 locations
increase the dispersion
21Discussion of ResultsWhether Delay or Value
Deviation?
Query Cost vs. Data Quality
value deviation is more important than delay
A 0.1
A 0.1
Correlated changes over 1000 locations
Trace-driven changes over 54 locations
Quality is more important.
Cost is at a premium.
22Discussion of ResultsWhether Delay or Value
Deviation?
Query Cost vs. Data Quality
value deviation is more important than delay
A 0.9
A 0.9
Correlated changes over 1000 locations
Trace-driven changes over 54 locations
Getting the fast response time of a cache hit
is worthwhile!
23Performance TrendsWhen Value Deviation is Most
Important
Query Cost vs. Data Quality
value deviation is more important than delay
A 0.1
A 0.1
A 0.1
9 of 1000
90 of 1000
900 of 1000
Correlated changes / sec
Correlated changes / sec
Correlated changes / sec
linear trade-off
The results are robust!
24Performance TrendsWhen Value Deviation is Most
Important
Value Deviation vs. Data Quality
value deviation is more important than delay
A 0.1
A 0.1
A 0.1
9 of 1000
90 of 1000
900 of 1000
Correlated changes / sec
Correlated changes / sec
Correlated changes / sec
strong positive correlation!
Environment Changes
Value Deviation
25Performance TrendsWhen Value Deviation is Most
Important
Query Cost vs. Data Quality
value deviation is more important than delay
A 0.1
A 0.1
A 0.1
90 Queries/second
9 Queries/second
0.9 Queries/second
Trace-driven changes
Trace-driven changes
Trace-driven changes
linear trade-off
26Performance TrendsWhen Value Deviation is Most
Important
Value Deviation vs. Data Quality
value deviation is more important than delay
A 0.1
A 0.1
A 0.1
90 Queries/second
9 Queries/second
0.9 Queries/second
Trace-driven changes
Trace-driven changes
Trace-driven changes
strong positive correlation!
27Performance TrendsWhen System Delay is Most
Important
Query Cost vs. Data Quality
delay is more important than value deviation
A 0.9
A 0.9
A 0.9
9 of 1000
90 of 1000
900 of 1000
Correlated changes / sec
Correlated changes / sec
Correlated changes / sec
No trade-off
the best performance
The results are robust!
28Performance TrendsWhen System Delay is Most
Important
End-to-End Delay vs. Data Quality
delay is more important than value deviation
A 0.9
A 0.9
A 0.9
9 of 1000
90 of 1000
900 of 1000
Correlated changes / sec
Correlated changes / sec
Correlated changes / sec
strong positive correlation!
29Performance TrendsWhen System Delay is Most
Important
Query Cost vs. Data Quality
delay is more important than value deviation
A 0.9
A 0.9
A 0.9
90 Queries/second
9 Queries/second
0.9 Queries/second
Trace-driven changes
Trace-driven changes
Trace-driven changes
the best performance
30Conclusion
- We measure the benefit and cost of seven
different caching and lookup policies. - when delay drives data quality
- when value deviation drives data quality
- Query Cost vs. Data Quality
- linear trade-off
- cost vs. accuracy and/or cost vs. delay are also
linear - The performance trends generally remain the same.
- with the environment changes on query cost and
data quality performance