Title: Approximate%20querying%20about%20the%20Past,%20the%20Present,%20and%20the%20Future%20in%20Spatio-Temporal%20Databases
1Approximate querying about the Past, the Present,
and the Futurein Spatio-Temporal Databases
- Jimeng Sun, Dimitris Papadias,
- Yufei Tao, Bin Liu
2Motivation
- Spatio-temporal databases vs. Data streams
- The monitoring applications
- Traffic supervision
- Mobile users monitoring
- Weather forecasting
- Example
- find the number of vehicles
- in the city center now
- The challenge is to provide fast query response
in highly intensive environment
3Problems and methods
- Problems
- How to efficiently store/summarize the
spatio-temporal information? - How to approximately answer the query about the
past, the present, and the future? - Methods
- Adaptive multi-dimensional histogram (AMH)
- Historical synopsis
- Stochastic prediction method
4Related work
- Histograms
- Static multi-dimensional histograms
- Equi-depth, Mhist, Minskew, Genhist, SQ
- Query-adaptive multi-dimensional histograms
- STGrid, STHoles, SASH
- Other approximation methods
- DCT, Wavelet, Sketch
- Spatio-temporal databases
- Historical retrieval
- Future prediction
5Outline
- Introduction
- Problem and proposed methods
- Adaptive multi-dimensional histogram
- Historical synopsis
- Prediction model
- Experiment
- Conclusion
6Query types
Queries
location
Present Time (PT)
Historical Time (HT)
Future Time (FT)
time
current
past
future
7System Overview
Historical Synopsis
AMH
Queries
Spatio-temporal updates
PT
Past Index
HT
FT
Prediction Model
8Histogram
- Partition the space into buckets
- Data within a bucket summarize by the mean
- The properties of a good histogram
- Uniformity within each bucket
- Incremental updateable
bad
good
9Adaptive Multi-dimensional Histogram (AMH)
- Objective minimize WVS?(areaivari) (Minskew
Acharya, Poosala, Ramaswamy 99)
10Dynamic Maintenance of AMH
- Our scheme record the information during the
construction and modify the structure as needed. - 1. information update
- Update the bucket count
- 2. bucket reorganization
- Merge to claim buckets
- Split to reduce WVS
11Information update of AMH
Buckets
n1
n1
b1
b3
b6
b1
n3
n2
n2
mapping
n4
b5
b2
b1
b2
b1
n5
b6
b4
BPT
b5
b4
b3
12Bucket reorganization -Merge
- Merge the subtree that leads to minimal WVS
increase
BPT
n1
n3
n2
b5
n1
n1
b
b1
b2
n3
n3
n2
n2
Buckets
n4
n4
n4
b5
b5
b1
b
Merge
b1
b2
b1
b2
n5
b6
n5
b6
BPT
b4
b3
b4
b3
b2
Bucket Info 1. region x-, xy-,y 2.
frequency count/area 3. 2nd moment (for
variance calculation)
b5
13Bucket reorganization -Split
- Split the bucket that leads to maximal WVS
decrease
n1
n1
n3
n2
Split
n3
b5
b
n2
b2
n4
b5
b
b1
b2
b1
b2
14Features of AMH
- Bucket information is updated as new data arrive
- Bucket extents continuously adapt the data
distribution changes - The maintenance does not affect the normal query
processing - It is interruptible at any moment of time
- It is performed at the CPU idle time
15Outline
- Introduction
- Problem and proposed methods
- Adaptive multi-dimensional histogram
- Historical synopsis
- Prediction model
- Experiment
- Conclusion
16Historical Synopsis
- AMH maintains the current buckets.
- Past index stores the obsolete buckets.
- Past index
- Packed B-tree
- 3D R-tree
17Prediction Model
- Prediction based on velocity doesnt work!
- It is not realistic to assume velocity remains
constant between current time and query time - Velocity is highly dynamic
- We suggest to use only the past and present
location information to do prediction.
18Prediction Model (cont.)
FT
PT
Parse
Prediction Model
HT
results
forecast the future using any time series
prediction method we use AR
19Outline
- Introduction
- Related work
- Problem and proposed methods
- Adaptive multi-dimensional histogram
- Historical synopsis
- Prediction model
- Experiment
- Conclusion
20Experiment settings
- Datasets
- 2.5M updates for each dataset
- spatial 50K mobile objects from 2 spatial
dataset - road from a spatio-temporal generator
(described in Brinkhoff 2002 )
final
initial
median
Road network
Data distribution
21Robustness with time
Query qlength 6 of the data space 25K
queries uniformly distribute along space and time
spatial
road
22Comparison with conventional histogram
- Minskew (a static spatial histogram) is rebuilt
every 50k location updates - tp is the proportion between the cost of AMH and
that of Minskew - The re-organization operations of AMH are
uniformly distributed among the 50k location
updates.
minskew
spatial
AMH
minskew
road
AMH
23The effect of update intensity
road
spatial
Query type
3D r-tree
b-tree
- B-tree performs better at the high update rate.
- R-tree provides much faster query response.
- In general, when query/update ratio is large
(gt30), R-tree performs better.
24Conclusion
- We present a comprehensive approach for
processing queries that refer to any time in
history. - The proposed architecture maintains
- an incremental multi-dimensional histogram
- a past index structure for storing the outdated
buckets. - Future queries are answered by a stochastic
method that uses the recent history to predict
the future.
25QA
26Summary
Historical Synopsis
AMH
0. goal min(WVS) 1. Info update 2.
Reorganization happens when CPU is idle
Prediction Model
Old buckets
Past Index
Forecast based on the present and past.
1.Recent buckets in memory 2.Old buckets dump to
the disk
27Related work
- Static multi-dimensional histograms
- Query-adaptive multi-dimensional histograms
- Other multi-dimensional approximation methods
- Spatio-temporal prediction methods
- Spatio-temporal aggregation methods
28Evaluation over different query types
spatial
road
29Motivation (cont.)
- Spatio-temporal database (STDB) research
- historical retrieval
- future prediction
30Bucket reorganization -Split
n1
n1
n3
n2
n3
b5
b
n2
b2
n4
b5
b
b1
b2
b1
b2
Buckets
Split
Buckets
b1
b
b2
b3
b1
b
b2
b2
b4
b5
b5