Title: On the Effect of Trajectory Compression in Spatiotemporal Querying
1On the Effect of Trajectory Compression in
Spatio-temporal Querying
- Elias Frentzos, and Yannis Theodoridis
- Data Management Group, University of Piraeus
- http//isl.cs.unipi.gr/db
ADBIS, October 2 2007
2Talk Outline
- Problem Statement
- Background
- Compressing Trajectories
- Related work on Error Estimation
- Estimating the Effect of Compression ST Querying
- Evaluating the Effect of Compression ST Querying
- Experimental Results
- On the performance
- On the quality
- Conclusions and Future Work
3Talk Outline
- Problem Statement
- Background
- Compressing Trajectories
- Related work on Error Estimation
- Estimating the Effect of Compression ST Querying
- Evaluating the Effect of Compression ST Querying
- Experimental Results
- On the performance
- On the quality
- Conclusions and Future Work
4Problem Statement (1)
- Trajectory is the data obtained from moving point
objects and can be seen as a string in the 3D
space - Trajectory compression is a very promising field
since moving objects recording their position in
time produce large amounts of frequently
redundant data - Existing work on trajectory compression is mainly
driven by research advances in the fields of line
generalization and time series compression. - Our interest is in lossy compression techniques
which eliminate some repeated or unnecessary
information under well-defined error bounds.
5Problem Statement (2)
- The objectives for trajectory compression are
- To obtain a lasting reduction in data size
- To obtain a data series that still allows various
computations at acceptable (low) complexity - To obtain a data series with known, small margins
of error, which are preferably parametrically
adjustable. - Our goal is to calculate the mean error
introduced in query results over compressed
trajectory data, which is by no means a trivial
task - We argue that this mean error can be used for
deciding whether the compressed data are suitable
for the user needs - We restrict our discussion in a special type of
spatiotemporal query, the timeslice queries
6Talk Outline
- Problem Statement
- Background
- Compressing Trajectories
- Related work on Error Estimation
- Estimating the Effect of Compression ST Querying
- Evaluating the Effect of Compression ST Querying
- Experimental Results
- On the performance
- On the quality
- Conclusions and Future Work
7Compressing Trajectories SED
- Methods exploiting line simplification algorithms
for compressing a trajectory are based on the so
called Synchronous Euclidean Distance (SED) - SED is the distance between the sampled point Pi
(xi , yi , ti ) being under examination, and the
point of the line (Ps, Pe) where the moving
object would lie, supposed it was moving on this
line, at time instance ti determined by the point
under examination
8Compressing Trajectories TD-TR algorithm
- The TD-TR algorithm (Meratnia and By, EDBT 2004)
is a spatiotemporal extension of the quite famous
Top Down Douglas Peucker algorithm which was
originally used in cartography - The algorithm tries (and achieves) to preserve
directional trends in the approximated line using
a distance threshold - The TD-TR algorithm uses SED instead of the
perpendicular distance - It is a batch algorithm since it requires the
full line at its start
9Compressing Trajectories OPW-TR algorithm
- Opening window (OW) algorithms anchor the start
point of a potential segment, and then attempt to
approximate the subsequent data series with
increasingly longer segments. - The algorithm also achieves to preserve
directional trends in the approximated line using
a distance threshold - The OPW-TR algorithm (Meratnia and By, EDBT 2004)
also uses SED instead of the perpendicular
distance - It can be used as an online algorithm
10Talk Outline
- Problem Statement
- Background
- Compressing Trajectories
- Related work on Error Estimation
- Estimating the Effect of Compression ST Querying
- Evaluating the Effect of Compression ST Querying
- Experimental Results
- On the performance
- On the quality
- Conclusions and Future Work
11Related work on Error Estimation
- The only relative work estimates the average
value of the Synchronous Euclidean Distance
(SED), also termed as Synchronous Error, between
an original trajectory and its approximation. - There is no obvious way on how to use it in order
to determine the error introduced in query
results
12Talk Outline
- Problem Statement
- Background
- Compressing Trajectories
- Related work on Error Estimation
- Estimating the Effect of Compression in ST
Querying - Evaluating the Effect of Compression in ST
Querying - Experimental Results
- On the performance
- On the quality
- Conclusions and Future Work
13Estimating the Effect of Compression in ST
Querying Preliminaries
- Our goal is to provide closed-form formulas that
estimate the number of false hits introduced in
query results over compressed trajectory datasets - Among the query types executed against trajectory
datasets, we focus on a special type or range
query, the so-called timeslice query - Two types of errors are introduced in query
results when executing a timeslice query over a
trajectory dataset
- false negatives are the trajectories which
originally qualified the query but their
compressed counterparts were not retrieved - false positives are the compressed trajectories
retrieved by the query while their original
counterparts are not qualifying it
14Estimating the Effect of Compression in ST
Querying Analysis (1)
- We first calculate AvgPi,P / AvgPi,N, which is
the average probability of a single compressed
trajectory to be retrieved as false positive /
negative, regarding all possible timeslice query
windows with sides a ? b - We then sum-up these average probabilities of all
dataset trajectories in order to produce the
global average probability - The error introduced in the position of a
trajectory can be calculated as a function of time
15Estimating the Effect of Compression in ST
Querying Analysis (2)
- We calculate the average probability of a
compressed trajectory Ti to be retrieved as false
positive / negative regarding a timeslice query
window at timestamp tj - The quantity of timeslice query windows that may
retrieve a compressed trajectory as false
positive / negative at timestamp tj can be
extracted geometrically - We distinguish among 4 cases, regarding the signs
of dx and dy values - Finally by integrating the area Ai,j over all the
timestamps inside the unit space we obtain
AvgPi,P / AvgPi,N
W
W
16Estimating the Effect of Compression in ST
Querying Analysis (3)
- Summing up the average probabilities of all
trajectories and performing the necessary
calculations, we obtain - where
17Talk Outline
- Problem Statement
- Background
- Compressing Trajectories
- Related work on Error Estimation
- Estimating the Effect of Compression in ST
Querying - Evaluating the Effect of Compression in ST
Querying - Experimental Results
- On the performance
- On the quality
- Conclusions and Future Work
18Evaluating the Effect of Compression in ST
Querying
- The evaluation of this formula is a costly
operation O(n?m) its calculation requires to
process the entire original dataset along with
its compressed counterpart - However, any compression algorithm evaluating
SED, need also to calculate dxi,k dyi,k in every
timestamp - As a consequence, the evaluation of the average
error in the query results, can be integrated in
the compressions algorithm, introducing only a
small overhead on its execution
19Talk Outline
- Problem Statement
- Background
- Compressing Trajectories
- Related work on Error Estimation
- Estimating the Effect of Compression in ST
Querying - Evaluating the Effect of Compression in ST
Querying - Experimental Results
- On the performance
- On the quality
- Conclusions and Future Work
20Experimental Study Settings
- Datasets
- One real trajectory dataset of a fleet of trucks
(273 trajectories, 112K entries) - A synthetic dataset of 2000 trajectories
generated using network-based data generator and
the San Joaquin road network - Implementation
- We implemented the TD-TR algorithm and compressed
the real and synthetic datasets varying its
threshold - Experiments
- Average overhead introduced in the TD-TR
algorithm - Average number of false positives and false
negatives in 10000 randomly distributed timeslice
queries
21Experimental Study On the performance
- Scaling the value of the TD-TR threshold
- The algorithms execution time reduces as the
value of the TD-TR threshold increases - The overhead introduced in the algorithms
execution, is typically small (bellow 7) - In absolute times, the overhead introduced never
exceeds 0.2 milliseconds per trajectory
Trucks dataset
Synthetic dataset
22Experimental Study On the quality (1)
- Scaling the value of the TD-TR threshold
- The average number of false hits (negatives and
positives) is linear with the value of the TD-TR
compression threshold - The average error in the estimation for the
synthetic dataset is around 6, varying between
0.2 and 14 - In the trucks dataset the average error increases
around 10.6, mainly due to the error introduced
in small values of TD-TR threshold
Trucks dataset
Synthetic dataset
23Experimental Study On the quality (2)
- Scaling the query size
- The average number of false hits (negatives and
positives) is sub-linear with the size of the
query - The average error in the estimation for the
synthetic dataset is around 2.9, varying between
0.2 and 8.7 - In the trucks dataset the average error increases
around 7.5
Trucks dataset
Synthetic dataset
24Summary and Future Work
- We provided a closed formula of the average
number of false negatives and false positives
covering the case of uniformly distributed query
windows and arbitrarily distributed trajectory
data - Through an experimental study we demonstrated the
efficiency of the proposed model - We illustrated the applicability of our model
under real-life requirements it turns out that
the estimation of the model parameters introduce
only a small overhead in the trajectory
compression algorithm - We presented the accuracy of our estimations,
with an average error being around 6. - Future work
- Extension of our model in nearest neighbor and
general range queries - Applicability of our model in the case of
spatiotemporal warehouses
25Acknowledgements
- Research partially supported by
- GEOPKDD (Geographic Privacy-aware Knowledge
Discovery and Delivery) project funded by the
European Community under FP6-014915 contract
26On the Effect of Trajectory Compression in
Spatiotemporal Querying
Thank you!