On the Effect of Trajectory Compression in Spatiotemporal Querying - PowerPoint PPT Presentation

About This Presentation

Title:

On the Effect of Trajectory Compression in Spatiotemporal Querying

Description:

... Trajectory Compression in Spatiotemporal Querying. 2. Problem ... We restrict our discussion in a special type of spatiotemporal query, the timeslice queries ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 27

Provided by: eliasfr

Learn more at: http://www.adbis.org

Category:

more less

Transcript and Presenter's Notes

Title: On the Effect of Trajectory Compression in Spatiotemporal Querying

1
On the Effect of Trajectory Compression in
Spatio-temporal Querying

Elias Frentzos, and Yannis Theodoridis
Data Management Group, University of Piraeus
http//isl.cs.unipi.gr/db

ADBIS, October 2 2007
2
Talk Outline

Problem Statement
Background
Compressing Trajectories
Related work on Error Estimation
Estimating the Effect of Compression ST Querying
Evaluating the Effect of Compression ST Querying
Experimental Results
On the performance
On the quality
Conclusions and Future Work

3
Talk Outline

Problem Statement
Background
Compressing Trajectories
Related work on Error Estimation
Estimating the Effect of Compression ST Querying
Evaluating the Effect of Compression ST Querying
Experimental Results
On the performance
On the quality
Conclusions and Future Work

4
Problem Statement (1)

Trajectory is the data obtained from moving point
objects and can be seen as a string in the 3D
space
Trajectory compression is a very promising field
since moving objects recording their position in
time produce large amounts of frequently
redundant data
Existing work on trajectory compression is mainly
driven by research advances in the fields of line
generalization and time series compression.
Our interest is in lossy compression techniques
which eliminate some repeated or unnecessary
information under well-defined error bounds.

5
Problem Statement (2)

The objectives for trajectory compression are
To obtain a lasting reduction in data size
To obtain a data series that still allows various
computations at acceptable (low) complexity
To obtain a data series with known, small margins
of error, which are preferably parametrically
adjustable.
Our goal is to calculate the mean error
introduced in query results over compressed
trajectory data, which is by no means a trivial
task
We argue that this mean error can be used for
deciding whether the compressed data are suitable
for the user needs
We restrict our discussion in a special type of
spatiotemporal query, the timeslice queries

6
Talk Outline

Problem Statement
Background
Compressing Trajectories
Related work on Error Estimation
Estimating the Effect of Compression ST Querying
Evaluating the Effect of Compression ST Querying
Experimental Results
On the performance
On the quality
Conclusions and Future Work

7
Compressing Trajectories SED

Methods exploiting line simplification algorithms
for compressing a trajectory are based on the so
called Synchronous Euclidean Distance (SED)
SED is the distance between the sampled point Pi
(xi , yi , ti ) being under examination, and the
point of the line (Ps, Pe) where the moving
object would lie, supposed it was moving on this
line, at time instance ti determined by the point
under examination

8
Compressing Trajectories TD-TR algorithm

The TD-TR algorithm (Meratnia and By, EDBT 2004)
is a spatiotemporal extension of the quite famous
Top Down Douglas Peucker algorithm which was
originally used in cartography
The algorithm tries (and achieves) to preserve
directional trends in the approximated line using
a distance threshold
The TD-TR algorithm uses SED instead of the
perpendicular distance
It is a batch algorithm since it requires the
full line at its start

9
Compressing Trajectories OPW-TR algorithm

Opening window (OW) algorithms anchor the start
point of a potential segment, and then attempt to
approximate the subsequent data series with
increasingly longer segments.
The algorithm also achieves to preserve
directional trends in the approximated line using
a distance threshold
The OPW-TR algorithm (Meratnia and By, EDBT 2004)
also uses SED instead of the perpendicular
distance
It can be used as an online algorithm

10
Talk Outline

Problem Statement
Background
Compressing Trajectories
Related work on Error Estimation
Estimating the Effect of Compression ST Querying
Evaluating the Effect of Compression ST Querying
Experimental Results
On the performance
On the quality
Conclusions and Future Work

11
Related work on Error Estimation

The only relative work estimates the average
value of the Synchronous Euclidean Distance
(SED), also termed as Synchronous Error, between
an original trajectory and its approximation.
There is no obvious way on how to use it in order
to determine the error introduced in query
results

12
Talk Outline

Problem Statement
Background
Compressing Trajectories
Related work on Error Estimation
Estimating the Effect of Compression in ST
Querying
Evaluating the Effect of Compression in ST
Querying
Experimental Results
On the performance
On the quality
Conclusions and Future Work

13
Estimating the Effect of Compression in ST
Querying Preliminaries

Our goal is to provide closed-form formulas that
estimate the number of false hits introduced in
query results over compressed trajectory datasets
Among the query types executed against trajectory
datasets, we focus on a special type or range
query, the so-called timeslice query
Two types of errors are introduced in query
results when executing a timeslice query over a
trajectory dataset

false negatives are the trajectories which
originally qualified the query but their
compressed counterparts were not retrieved
false positives are the compressed trajectories
retrieved by the query while their original
counterparts are not qualifying it

14
Estimating the Effect of Compression in ST
Querying Analysis (1)

We first calculate AvgPi,P / AvgPi,N, which is
the average probability of a single compressed
trajectory to be retrieved as false positive /
negative, regarding all possible timeslice query
windows with sides a ? b
We then sum-up these average probabilities of all
dataset trajectories in order to produce the
global average probability
The error introduced in the position of a
trajectory can be calculated as a function of time

15
Estimating the Effect of Compression in ST
Querying Analysis (2)

We calculate the average probability of a
compressed trajectory Ti to be retrieved as false
positive / negative regarding a timeslice query
window at timestamp tj
The quantity of timeslice query windows that may
retrieve a compressed trajectory as false
positive / negative at timestamp tj can be
extracted geometrically
We distinguish among 4 cases, regarding the signs
of dx and dy values
Finally by integrating the area Ai,j over all the
timestamps inside the unit space we obtain
AvgPi,P / AvgPi,N

W
W
16
Estimating the Effect of Compression in ST
Querying Analysis (3)

Summing up the average probabilities of all
trajectories and performing the necessary
calculations, we obtain
where

17
Talk Outline

Problem Statement
Background
Compressing Trajectories
Related work on Error Estimation
Estimating the Effect of Compression in ST
Querying
Evaluating the Effect of Compression in ST
Querying
Experimental Results
On the performance
On the quality
Conclusions and Future Work

18
Evaluating the Effect of Compression in ST
Querying

The evaluation of this formula is a costly
operation O(n?m) its calculation requires to
process the entire original dataset along with
its compressed counterpart
However, any compression algorithm evaluating
SED, need also to calculate dxi,k dyi,k in every
timestamp
As a consequence, the evaluation of the average
error in the query results, can be integrated in
the compressions algorithm, introducing only a
small overhead on its execution

19
Talk Outline

Problem Statement
Background
Compressing Trajectories
Related work on Error Estimation
Estimating the Effect of Compression in ST
Querying
Evaluating the Effect of Compression in ST
Querying
Experimental Results
On the performance
On the quality
Conclusions and Future Work

20
Experimental Study Settings

Datasets
One real trajectory dataset of a fleet of trucks
(273 trajectories, 112K entries)
A synthetic dataset of 2000 trajectories
generated using network-based data generator and
the San Joaquin road network
Implementation
We implemented the TD-TR algorithm and compressed
the real and synthetic datasets varying its
threshold
Experiments
Average overhead introduced in the TD-TR
algorithm
Average number of false positives and false
negatives in 10000 randomly distributed timeslice
queries

21
Experimental Study On the performance

Scaling the value of the TD-TR threshold
The algorithms execution time reduces as the
value of the TD-TR threshold increases
The overhead introduced in the algorithms
execution, is typically small (bellow 7)
In absolute times, the overhead introduced never
exceeds 0.2 milliseconds per trajectory

Trucks dataset
Synthetic dataset
22
Experimental Study On the quality (1)

Scaling the value of the TD-TR threshold
The average number of false hits (negatives and
positives) is linear with the value of the TD-TR
compression threshold
The average error in the estimation for the
synthetic dataset is around 6, varying between
0.2 and 14
In the trucks dataset the average error increases
around 10.6, mainly due to the error introduced
in small values of TD-TR threshold

Trucks dataset
Synthetic dataset
23
Experimental Study On the quality (2)

Scaling the query size
The average number of false hits (negatives and
positives) is sub-linear with the size of the
query
The average error in the estimation for the
synthetic dataset is around 2.9, varying between
0.2 and 8.7
In the trucks dataset the average error increases
around 7.5

Trucks dataset
Synthetic dataset
24
Summary and Future Work

We provided a closed formula of the average
number of false negatives and false positives
covering the case of uniformly distributed query
windows and arbitrarily distributed trajectory
data
Through an experimental study we demonstrated the
efficiency of the proposed model
We illustrated the applicability of our model
under real-life requirements it turns out that
the estimation of the model parameters introduce
only a small overhead in the trajectory
compression algorithm
We presented the accuracy of our estimations,
with an average error being around 6.
Future work
Extension of our model in nearest neighbor and
general range queries
Applicability of our model in the case of
spatiotemporal warehouses

25
Acknowledgements

Research partially supported by
GEOPKDD (Geographic Privacy-aware Knowledge
Discovery and Delivery) project funded by the
European Community under FP6-014915 contract

26
On the Effect of Trajectory Compression in
Spatiotemporal Querying
Thank you!

Write a Comment

User Comments (0)