Quality Aware Sensor Database (QUASAR) Project** presentation

About This Presentation

Transcript and Presenter's Notes

Title: Quality Aware Sensor Database (QUASAR) Project**

1
Quality Aware Sensor Database (QUASAR) Project

Sharad Mehrotra
Department of Information and Computer Science
University of California, Irvine

Supported in part by a collaborative NSF ITR
grant entitled real-time data capture, analysis,
and querying of dynamic spatio-temporal events
in collaboration with UCLA, U. Maryland, U.
Chicago
2
Talk Outline

Quasar Project
motivation and background
data collection and archival components
query processing
tracking application using QUASAR framework
challenges and ongoing work
Brief overview of other research projects
MARS Project - incorporating similarity retrieval
and refinement over structured and
semi-structured data to aid interactive data
analysis/mining
Database as a Service (DAS) Project - supporting
the application service provider model for data
management

3
Emerging Computing Infrastructure
In-body, in-cell, in-vitro spaces

Generational advances to computing infrastructure
sensors will be everywhere
Emerging applications with limitless
possibilities
real-time monitoring and control, analysis
New challenges
limited bandwidth energy
highly dynamic systems
System architectures are due for an overhaul
at all levels of the system OS, middleware,
databases, applications

Instrumented wide-area spaces
4
Impact to Data Management

Traditional data management
client-server architecture
efficient approaches to data storage querying
query shipping versus data shipping
data changes with explicit update

Emerging Challenge
data producers must be considered as first
class entities
sensors generate continuously changing highly
dynamic data
sensors may store, process, and communicate data

5
Data Management Architecture Issues
producer cache
Data/query request
Data/query result
server

Where to store data?
Do not store -- stream model
not suitable if we wish to archive data for
future analysis or if data is too important to
lose
at the producers
limited storage, network, compute resources
at the servers
server may not be able to cope with high data
production rates. May lead to data staleness
and/or wasted resources
Where to compute?
At the client, server, data producers

6
Quasar Architecture

Hierarchical architecture
data flows from producers to server to clients
periodically
queries flow the other way
If client cache does not suffices, then
query routed to appropriate server
If server cache does not suffice, then access
current data at producer
This is a logical architecture-- producers could
also be clients.

Client cache
Query flow
Server cache archive
server
data flow
producer cache
7
Quasar Observations Approach

Applications can tolerate errors in sensor data
applications may not require exact answers
small errors in location during tracking or
error in answer to query result may be OK
data cannot be precise due to measurement errors,
transmission delays, etc.
Communication is the dominant cost
limited wireless bandwidth, source of major
energy drain
Quasar Approach
exploit application error tolerance to reduce
communication between producer and server
Two approaches
Minimize resource usage given quality constraints
Maximize quality given resource constraints

8
Quality-based Data Collection Problem
Sensor time series pn, pn-1, , p1

Let P lt p1, p2, , pn gt be a sequence of
environmental measurements (time series)
generated by the producer, where n now
Let S lts1, s2, , sngt be the server side
representation of the sequence
A within-? quality data collection protocol
guarantees that
for all i error(pi, si) lt ?
? is derived from application quality tolerance

9
Simple Data Collection Protocol
Sensor time series pn, pn-1, , p1

sensor Logic (at time step n)
Let p last value sent to server
if error(pn, p) gt ?
send pn to server
server logic (at time step n)
If new update pn received at step n
sn pn
Else
sn last update sent by sensor
guarantees maximum error at server less than
equal to ?

10
Exploiting Prediction Models

Producer and server agree upon a prediction model
(M, ?)
Let spredi be the predicted value at time i
based on (M, ?)
sensor Logic (at time step n)
if error(pn, spredn ) gt ?
send pn to server
server logic (at time step n)
If new update pn received at step n
sn pn
Else
sn spredn based on model (M, ?)

11
Challenges in Prediction

Simple versus complex models?
Complex and more accurate models require more
parameters (that will need to be transmitted).
Goal is to minimize communication not
necessarily best prediction
How is a model M generated?
static -- one out of a fixed set of models
dynamic -- dynamically learn a model from data
When should a model M or parameters ? be changed?
immediately on model violation
too aggressive -- violation may be a temporary
phenomena
never changed
too conservative -- data rarely follows a single
model

12
Challenges in Prediction (cont.)

who does the model update?
Server
Long-haul prediction models possible, since
server maintains history
might not predict recent behavior well since
server does not know exact S sequence server has
only samples
extra communication to inform the producer
Producer
better knowledge of recent history
long haul models not feasible since producer does
not have history
producers share computation load
Both
server looks for new models, sensor performs
parameter fitting given existing models.

13
Archiving Sensor Data

Often sensor-based applications are built with
only the real-time utility of time series data.
Values at time instants ltltn are discarded.
Archiving such data consists of maintaining the
entire S sequence, or an approximation thereof.
Importance of archiving
Discovering large-scale patterns
Once-only phenomena, e.g., earthquakes
Discovering events detected post facto by
rewinding the time series
Future usage of data which may be not known while
it is being collected

14
Problem Formulation

Let P lt p1, p2, , pn gt be the sensor
time series
Let S lt s1, s2, , sn gt be the server
side representation
A within ?archive quality data archival protocol
guarantees that
error(pi, si) lt ?archive
Trivial Solution modify collection protocol to
collect data at quality guarantee of
min(?archive , ?collect)
then prediction model by itself will provide a
?archive quality data stream that can be
archived.
Better solutions possible since
archived data not needed for immediate access by
real-time or forecasting applications (such as
monitoring, tracking)
compression can be used to reduce data transfer

15
Data Archival Protocol
Sensor updates for data collection
Compressed representation for archiving
pn, pn-1, ..
compress
processing at sensor exploited to reduce
communication cost and hence battery drain
Sensor memory buffer

Sensors compresses observed time series p1n
and sends a lossy compression to the server
At time n
p1n-nlag is at the server in compressed form
s 1n-nlag within-?archive
sn-nlag1n is estimated via a predictive model
(M, ?)
collection protocol guarantees that this remains
within- ?collect
sn1?? can be predicted but its quality is not
guaranteed (because it is in the future and thus
the sensor has not observed these values)

16
Piecewise Constant Approximation (PCA)

Given a time series Sn s1n a piecewise
constant approximation of it is a sequence
PCA(Sn) lt (ci, ei) gt
that allows us to estimate sj as
scapt j ci if j in ei-11, ei
c1 if jlte1

Value
c1
c4
c3
Time
c2
e1 e2 e3 e4
17
Online Compression using PCA

Goal Given stream of sensor values, generate a
within-?archive PCA representation of a time
series
Approach (PMC-midrange)
Maintain m, M as the minimum/maximum values of
observed samples since last segment
On processing pn, update m and M if needed
if M - m gt 2?archive , output a segment ((mM
)/2, n)

Value
Example ?archive 1.5
Time
1 2 3 4 5
18
Online Compression using PCA

PMC-MR
guarantees that each segment compresses the
corresponding time series segment to
within-?archive
requires O(1) storage
is instance optimal
no other PCA representation with fewer segments
can meet the within-?archive constraint
Variant of PMC-MR
PMC-MEAN, which takes the mean of the samples
seen thus far instead of mid range.

19
Improving PMC using Prediction

Observation Prediction models guarantee a
within- ?collect version of the time series at
server even before the compressed time series
arrives from the producer.
Can the prediction model be exploited to reduce
the overhead of compression.
If ?archivegt ?collect no additional effort is
required for archival --gt simply archive the
predicted model.
Approach
Define an error time series Ei pi-spredi
Compress E1n to within-?archive instead of
compressing p1n
The archive contains the prediction parameters
and the compressed error time series
Within-?archive of EI (M, ?) can be used to
reconstruct a within- ?archive version of p

20
Combing Compression and Prediction (Example)
21
Estimating Time Series Values

Historical samples (before n-nlag) is maintained
at the server within-?archive
Recent samples (between n-nlag1 and n) is
maintained by the sensor and predicted at the
server.
If an application requires ?q precision, then
if ?q ? ?collect then it must wait for ? time in
case a parameter refresh is en route
if ?q ? ?archive but ?q lt ?collect then it may
probe the sensor or wait for a compressed segment
Otherwise only probing meets precision
For future samples (after n) immediate probing
not available as an option

22
Experiments

Data sets
Synthetic Random-Walk
x1 0 and xixi-1sn where sn drawn
uniformly from -1,1
Oceanographic Buoy Data
Environmental attributes (temperature, salinity,
wind-speed, etc.) sampled at 10min intervals from
a buoy in the Pacific Ocean (Tropical Atmosphere
Ocean Project, Pacific Marine Environment
Laboratory)
GPS data collected using IPAQs
Experiments to test
Compression Performance of PMC
Benefits of Model Selection
Query Accuracy over Compressed Data
Benefits of Prediction/Compression Combination

23
Compression Performance
K/n ratio number of segments/number of samples
24
Query Performance Over Compressed Data
How many sensors have values gtv? (Mean
selectivity 50)
25
Impact of Model Selection

Objects moved at approximately constant speed (
measurement noise)
Three models used
locn c
locn cvt
locn cvt0.5at2
Parameters v, a were estimated at sensor over
moving-window of 5 samples

K/n ratio number of segments/number of samples.
?pred is the localization tolerance in meters
26
Combining Prediction with Compression
K/n ratio number of segments/number of samples
27
GPS Mobility Data from Mobile Clients (iPAQs)
QUASAR Client Time Series
Latitude Time Series 1800 samples
Compressed Time Series (PMC-MR, ICDE
2003) Accuracy of 100 m 130 segments
28
Query Processing in Quasar

Problem Definition
Given
sensor time series with quality-guarantees
captured at the server
A query with a specified quality-tolerance
Return
query results incurring least cost
Techniques depend upon
nature of queries
Cost measures
resource consumption -- energy, communication,
I/O
query response time

29
Aggregate Queries
minQ 2 maxQ 7 countQ 3 sumQ 276
15 avgQ 15/3 5
S
30
Processing Aggregate Queries (minimize producer
probe)
Let S lts1,s2, ,sngt be set of sensors that
meet the query criteria si.high sipredt
?jpred sj.low sipredt - ?jpred
sn

MIN Query
c minj(si.high)
b c - ?query
Probe all sensors where sj.low lt b
only s1 and s3 will be probed
Sum Query
select a minimal subset S ? S such that
?si in S (?jpred) gt ?si in S(?jpred)- ?query
If ?query 15, only s1 will be probed

s3
s2
s1
a
b
c
5
s5
3
s4
s3
5
2
s2
10
s1
31
Minimizing Cost at Server

Error tolerance of queries can be exploited to
reduce processing at server.
Key Idea
Use a multi-resolution index structure (MRA-tree)
for processing aggregate queries at server.
An MRA-Tree is a modified multi-dimensional index
trees (R-Tree, quadtree, Hybrid tree, etc.)
A non-leaf node contains (for each of its
subtrees) four aggregates MIN,MAX,COUNT,SUM
A leaf node contains the actual data points
(sensor models)

32
MRA Tree Data Structure
Spatial View
Tree Structure View
A
D
B
C
E
B
G
D
E
F
G
F
C
A
33
MRA-Tree Node Structure
Non-Leaf Node
Leaf Node
Probe Pointers (each costs 2 messages)
Disk Page Pointers (each costs 1 I/O)
34
Node Classification

Two sets of nodes
NP (partial contribution to the query)
NC (complete contribution)

35
Aggregate Queries using MRA Tree

Initialize NP with the root
At each iteration Remove one node N from NP and
for each Nchild of its children
discard, if Nchild disjoint with Q
insert into NP if Q is contained or partially
overlaps with Nchild
insert into NC if Q contains Nchild (we only
need to maintain aggNC)
compute the best estimate based on contents of NP
and NC

N
Q
36
MIN (and MAX)
Interval minNC min 4, 5 4 minNP min
3, 9 3 L min minNC, minNP 3 H minNC
4 hence, I 3, 4
9
4
5
Estimate Lower bound E(minQ) L 3
3
37
MRA Tree Traversal

Progressive answer refinement until NP is
exhausted
Greedy priority-based local decision for next
node to be explored based on
Cost (1 I/O or 2 messages)
Benefit (Expected Reduction in answer
uncertainty)

A
B
C
D
E
F
G
38
Adaptive Tracking of mobile objects in sensor
networks

Tracking Architecture
A network of wireless acoustic sensors arranged
as a grid transmitting via a base station to
server
A track of the mobile object generated at the
base station or server
Objective
Track a mobile object at the server such that
the track deviates from the real trajectory
within a user defined error threshold ?track
with minimum communication overhead.

39
Sensor Model

Wireless sensors battery operated, energy
constrained
Operate on the received acoustic waveforms
Signal attenuation of target object given by
Is(t) P /4? r2
P source object power
r distance of object from sensor
Is(t) intensity reading at time t at ith
sensor
Ith Intensity threshold at ith sensor

40
Sensor States

S0 Monitor ( processor on, sensor on, radio off
)
shift to S1 if intensity above threshold
S1 Active state ( processor on, sensor on,
radio on)
send intensity readings to base station.
On receiving message from BS containing error
tolerance shift to S2
S2 Quasi-active (processor on, sensor on, radio
intermittent)
send intensity reading to BS if error from
previous reading exceeds error threshold
Quasar Collection approach used in Quasi-active
state

41
Server side protocol

Server maintains
list of sensors in the active/ quasi-active
state
history of their intensity readings over a
period of time
Server Side Protocol
convert track quality to a relative intensity
error at sensors
Send relative intensity error to sensor when
sensor state S1( quasi- active state)
Triangulate using n sensor readings at discrete
time intervals.

42
Basic Triangulation Algorithm (using 3 sensor
readings)
P source object power, Ii intensity reading at
ith sensor (x-x1)2 (y- y1)2 P/4? I1 (x-x2)2
(y- y2)2 P/4? I2 (x-x3)2 (y- y3)2 P/4?
I3 Solving we get (x, y)f(x1,x2,x3,y1,y2,y3,
P,I1, I2 , I3, )
(x, y)

More complex approaches to amalgamate more than
three sensor readings possible
Based on numerical methods -- do not provide a
closed form equation between sensor reading and
tracking location !
Server can use simple triangulation to convert
track quality to sensor intensity quality
tolerances and a more complex approach to track.

43
Adaptive Tracking Mapping track quality to
sensor reading

Claim 1 (power constant)
Let Ii be the intensity value of sensor
If then, track quality is guaranteed
to be within ?track
where and C is a constant derived from the
known locations of the sensors and the power of
the object.
Claim 2 (power varies between Pmin , Pmax )
If then
track quality is guaranteed to be within ?track
where C C/ P2 and is a constant .
The above constraint is a conservative estimate.
Better bounds possible

t i t( i1 )
? I2
Intensity ( I2 )
t i t( i1 )
time
? I3
Intensity ( I3 )
t i t( i1 )
time
? track
X (m)
Y (m)
44
Adaptive Tracking prediction to improve
performance

Communication overhead further reduced by
exploiting the predictability of the object being
tracked
Static Prediction sensor server agree on a
set of prediction models
only 2 models used stationary constant
velocity
Who Predicts sensor based mobility prediction
protocol
Every sensor by default follows a stationary
model
Based on its history readings may change to
constant velocity model (number of readings
limited by sensor memory size)
informs server of model switch

45
Actual Track versus track on Adaptive Tracking
(error tolerance 20m)

A restricted random motion the object starts
at (0,d) and moves from one node to another
randomly chosen node until it walks out of the
grid.

46
Energy Savings due to Adaptive Tracking

total energy consumption over all sensor nodes
for random mobility model with varying ?track or
track error.
significant energy savings using adaptive
precision protocol over non adaptive tracking (
constant line in graph)
for a random model, prediction does not work
well !

47
Energy consumption with Distance from BS

total energy consumption over all sensor nodes
for random mobility model with varying base
station distance from sensor grid.
As base station moves away, one can expect
energy consumption to increase since transmission
cost varies as d n ( n 2 )
adaptive precision algorithm gives us better
results with increasing base station distance

48
Challenges Ongoing Work

Ongoing Work
Supporting a larger class of SQL queries
Supporting continuous monitoring queries
Larger class of sensors (e.g., video sensors)
Better approaches to model fitting/switching in
prediction
In the future
distributed Quasar architecture
optimizing quality given resource constraints
supporting applications with real-time
constraints
dealing with failures

49
The DAS Project

Goals
Support Database as a Service on the Internet
Collaboration
IBM (Dr. Bala Iyer)
UCI (Gene Tsudik)

Supported in part by NSF ITR grant entitled
Privacy in Database as a Service and by the IBM
Corporation
50
Software as a Service

Get
what you need
when you need
Pay
what you use
Dont worry
how to deploy, implement, maintain, upgrade

51
Software As a Service Why?

Driving Forces
Faster, cheaper, more accessible networks
Virtualization in server and storage technologies
Established e-business infrastructures
Already in Market
ERP and CRM (many examples)
More horizontal storage services, disaster
recovery services, e-mail services,
rent-a-spreadsheet services etc.
Sun ONE, Oracle Online Services, Microsoft .NET
My Services etc

Advantages
reduced cost to client
pay for what you use and not for hardware,
software infrastructure or personnel to deploy,
maintain, upgrade
reduced overall cost
cost amortization across users
Better service
leveraging experts across organizations

Better Service for Cheaper
52
Database As a Service

Why?
Most organizations need DBMSs
DBMSs extremely complex to deploy, setup,
maintain
require skilled DBAs with high cost

53
What do we want to do?
Server
Application Service Provider (ASP)

Database as a Service (DAS) Model
DB management transferred to service provider for
backup, administration, restoration, space
management, upgrades etc.
use the database as a service provided by an
ASP
use SW, HW, human resources of ASP, instead of
your own

BUT.
54
Challenges

Economic/business model?
How to charge for service, what kind of service
guarantees can be offered, costing of
guarantees, liability of service provider.
Powerful interfaces to support complete
application development environment
User Interface for SQL, support for embedded SQL
programming, support for user defined interfaces,
etc.
Scalability in the web environment
overheads due to network latency (data proxies?)
Privacy and Security
Protecting data at service providers from
intruders and attacks.
Protecting clients from misuse of data by service
providers
Ensuring result integrity

55
Data privacy from service provider
Server
User Data
Encrypted User Database
Untrusted
Application Service Provider

The problem is we do not trust the service
provider for sensitive information!
Fact 1 Theft of intellectual property due to
database vulnerabilities costs American
businesses 103 billion annually
Fact 2 45 of those attacks are conducted by
insiders! (CSI/FBI Computer Crime and
Security Survey, 2001)
encrypt the data and store it
but still be able to run queries over the
encrypted data
do most of the work at the server

Server Site
56
System Architecture
Server Site
Client Site
Encrypted Results
Client Side Query
?
Server Side Query
Service Provider
Original Query
?
Actual Results
?
57
NetDB2 Service

Developed in collaboration with IBM
Deployed on the Internet about 2 years ago
Been used by 15 universities and more than 2500
students to help teaching database classes
Currently offered through IBM Scholars Program

58
MARS Project

Goals integration of similarity retrieval and
query refinement over structured and
semi-structured databases to help interactive
data analysis/mining

Supported in part by NSF CAREER award, NSF
grant entitled learning digital behavior and a
KDD grant entitled Mining events and entities
over large spatio-temporal data sets
59
Similarity Search in Databases (SR)
Honda sedan, inexpensive, after 1994, around LA
Alice
60
Query Refinement (QR)
Refined Results
61
Why are SR and QR important?

Most queries are similarity searches
Specially in exploratory data analysis tasks
(e.g., catalog search)
Users have only a partial idea of their
information need
Existing Search technologies (text retrieval,
SQL) do not provide appropriate support for SR
and (almost) no support for QR.
Users must artificially convert similarity
queries to keyword-searches or exact-match
queries
Good mappings difficult or not feasible
Lack of good knowledge of the underlying data or
its structure
Exact-match may be meaningless for certain data
types (e.g., images, text, multimedia)

62
Similarity Access and Interactive Mining
Architecture
Search Client
History-based Refinement Method
Query/Feedback
Ranked Results
Feedback-based Refinement Method
Query Session Manager
Feedback
Feedback Table
Refinement Manager
Schemes
Answer Table
Initial Query
Ranked Results
Legend --- logging __ Process
Similarity Query Processor
Scores Table
Ranking Rules
Query Log Manager/Miner
Query Log
Query
Results
Similarity Operators
Types
ORDBMS
Database
63
MARS Challenges...

Learning queries from
user interactions
user profiles
past history of other users
Efficient implementation of
similarity queries
refined queries
Role of similarity queries in
OLAP
interactive data mining

64
Similarity Query Processor
Query-Session Manager
-executes query on ORDBMS - ranks results (e.g.
can exclude already see tuples, etc) - logs
query(query or Top-k)
-parse the query - check query validity -generate
schema for support tables - maintain sessions
registry
Refinement Manager
Query Log Manager/Miner
- maintains a registry of query refinement
policies (content/collaborative) - generates the
scores table - identifies and invokes
intra-predicate refiners.
- maintains query log . Initial-Final pair
. Top-K results . Complete trajectory -
Query-query similarity (can have multiple
policies) - Query clustering
65
Text Search Technologies (Altavista, Verity,
Vality, Infoseek)
Movies
Actors
Approach convert enterprise structured data into
a searchable text index.
Limitations cannot capture semantics of
relationships in data cannot capture semantics of
non-text fields (e.g., multimedia) limited
support for refinement or preferences in current
systems cannot express similarity queries over
structured or semi-structured data (e.g., price,
location)
Directors
Al Pacino acted in a movie directed by Francis
Ford Coppola
Strengths support ranked retrieval can handle
missing data, synonyms, data entry errors
66
SQL-based Search TechnologiesOracle, Informix,
DB2, Mercado
Approach translate similarity query into exact
SQL query.
Limitations translation is difficult or not
possible difficult to guess right ranges causes
near misses not feasible for non-numeric
fields cannot rank answers based on
relevance does not account for user preference or
query refinement
Strengths support structured as well as
semi-structured data support for arbitrary data
types Scalable attribute-based lookup
select from user_car_catalog where model
Honda Accord and 1993 year 1995 and
dist(90210) 50 and price lt 5000

Write a Comment

User Comments (0)

About PowerShow.com

Quality Aware Sensor Database (QUASAR) Project** PowerPoint PPT Presentation