On the Effect of Trajectory Compression in Spatiotemporal Querying - PowerPoint PPT Presentation

About This Presentation
Title:

On the Effect of Trajectory Compression in Spatiotemporal Querying

Description:

... Trajectory Compression in Spatiotemporal Querying. 2. Problem ... We restrict our discussion in a special type of spatiotemporal query, the timeslice queries ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 27
Provided by: eliasfr
Learn more at: http://www.adbis.org
Category:

less

Transcript and Presenter's Notes

Title: On the Effect of Trajectory Compression in Spatiotemporal Querying


1
On the Effect of Trajectory Compression in
Spatio-temporal Querying
  • Elias Frentzos, and Yannis Theodoridis
  • Data Management Group, University of Piraeus
  • http//isl.cs.unipi.gr/db

ADBIS, October 2 2007
2
Talk Outline
  • Problem Statement
  • Background
  • Compressing Trajectories
  • Related work on Error Estimation
  • Estimating the Effect of Compression ST Querying
  • Evaluating the Effect of Compression ST Querying
  • Experimental Results
  • On the performance
  • On the quality
  • Conclusions and Future Work

3
Talk Outline
  • Problem Statement
  • Background
  • Compressing Trajectories
  • Related work on Error Estimation
  • Estimating the Effect of Compression ST Querying
  • Evaluating the Effect of Compression ST Querying
  • Experimental Results
  • On the performance
  • On the quality
  • Conclusions and Future Work

4
Problem Statement (1)
  • Trajectory is the data obtained from moving point
    objects and can be seen as a string in the 3D
    space
  • Trajectory compression is a very promising field
    since moving objects recording their position in
    time produce large amounts of frequently
    redundant data
  • Existing work on trajectory compression is mainly
    driven by research advances in the fields of line
    generalization and time series compression.
  • Our interest is in lossy compression techniques
    which eliminate some repeated or unnecessary
    information under well-defined error bounds.

5
Problem Statement (2)
  • The objectives for trajectory compression are
  • To obtain a lasting reduction in data size
  • To obtain a data series that still allows various
    computations at acceptable (low) complexity
  • To obtain a data series with known, small margins
    of error, which are preferably parametrically
    adjustable.
  • Our goal is to calculate the mean error
    introduced in query results over compressed
    trajectory data, which is by no means a trivial
    task
  • We argue that this mean error can be used for
    deciding whether the compressed data are suitable
    for the user needs
  • We restrict our discussion in a special type of
    spatiotemporal query, the timeslice queries

6
Talk Outline
  • Problem Statement
  • Background
  • Compressing Trajectories
  • Related work on Error Estimation
  • Estimating the Effect of Compression ST Querying
  • Evaluating the Effect of Compression ST Querying
  • Experimental Results
  • On the performance
  • On the quality
  • Conclusions and Future Work

7
Compressing Trajectories SED
  • Methods exploiting line simplification algorithms
    for compressing a trajectory are based on the so
    called Synchronous Euclidean Distance (SED)
  • SED is the distance between the sampled point Pi
    (xi , yi , ti ) being under examination, and the
    point of the line (Ps, Pe) where the moving
    object would lie, supposed it was moving on this
    line, at time instance ti determined by the point
    under examination

8
Compressing Trajectories TD-TR algorithm
  • The TD-TR algorithm (Meratnia and By, EDBT 2004)
    is a spatiotemporal extension of the quite famous
    Top Down Douglas Peucker algorithm which was
    originally used in cartography
  • The algorithm tries (and achieves) to preserve
    directional trends in the approximated line using
    a distance threshold
  • The TD-TR algorithm uses SED instead of the
    perpendicular distance
  • It is a batch algorithm since it requires the
    full line at its start

9
Compressing Trajectories OPW-TR algorithm
  • Opening window (OW) algorithms anchor the start
    point of a potential segment, and then attempt to
    approximate the subsequent data series with
    increasingly longer segments.
  • The algorithm also achieves to preserve
    directional trends in the approximated line using
    a distance threshold
  • The OPW-TR algorithm (Meratnia and By, EDBT 2004)
    also uses SED instead of the perpendicular
    distance
  • It can be used as an online algorithm

10
Talk Outline
  • Problem Statement
  • Background
  • Compressing Trajectories
  • Related work on Error Estimation
  • Estimating the Effect of Compression ST Querying
  • Evaluating the Effect of Compression ST Querying
  • Experimental Results
  • On the performance
  • On the quality
  • Conclusions and Future Work

11
Related work on Error Estimation
  • The only relative work estimates the average
    value of the Synchronous Euclidean Distance
    (SED), also termed as Synchronous Error, between
    an original trajectory and its approximation.
  • There is no obvious way on how to use it in order
    to determine the error introduced in query
    results

12
Talk Outline
  • Problem Statement
  • Background
  • Compressing Trajectories
  • Related work on Error Estimation
  • Estimating the Effect of Compression in ST
    Querying
  • Evaluating the Effect of Compression in ST
    Querying
  • Experimental Results
  • On the performance
  • On the quality
  • Conclusions and Future Work

13
Estimating the Effect of Compression in ST
Querying Preliminaries
  • Our goal is to provide closed-form formulas that
    estimate the number of false hits introduced in
    query results over compressed trajectory datasets
  • Among the query types executed against trajectory
    datasets, we focus on a special type or range
    query, the so-called timeslice query
  • Two types of errors are introduced in query
    results when executing a timeslice query over a
    trajectory dataset
  • false negatives are the trajectories which
    originally qualified the query but their
    compressed counterparts were not retrieved
  • false positives are the compressed trajectories
    retrieved by the query while their original
    counterparts are not qualifying it

14
Estimating the Effect of Compression in ST
Querying Analysis (1)
  • We first calculate AvgPi,P / AvgPi,N, which is
    the average probability of a single compressed
    trajectory to be retrieved as false positive /
    negative, regarding all possible timeslice query
    windows with sides a ? b
  • We then sum-up these average probabilities of all
    dataset trajectories in order to produce the
    global average probability
  • The error introduced in the position of a
    trajectory can be calculated as a function of time

15
Estimating the Effect of Compression in ST
Querying Analysis (2)
  • We calculate the average probability of a
    compressed trajectory Ti to be retrieved as false
    positive / negative regarding a timeslice query
    window at timestamp tj
  • The quantity of timeslice query windows that may
    retrieve a compressed trajectory as false
    positive / negative at timestamp tj can be
    extracted geometrically
  • We distinguish among 4 cases, regarding the signs
    of dx and dy values
  • Finally by integrating the area Ai,j over all the
    timestamps inside the unit space we obtain
    AvgPi,P / AvgPi,N

W
W
16
Estimating the Effect of Compression in ST
Querying Analysis (3)
  • Summing up the average probabilities of all
    trajectories and performing the necessary
    calculations, we obtain
  • where

17
Talk Outline
  • Problem Statement
  • Background
  • Compressing Trajectories
  • Related work on Error Estimation
  • Estimating the Effect of Compression in ST
    Querying
  • Evaluating the Effect of Compression in ST
    Querying
  • Experimental Results
  • On the performance
  • On the quality
  • Conclusions and Future Work

18
Evaluating the Effect of Compression in ST
Querying
  • The evaluation of this formula is a costly
    operation O(n?m) its calculation requires to
    process the entire original dataset along with
    its compressed counterpart
  • However, any compression algorithm evaluating
    SED, need also to calculate dxi,k dyi,k in every
    timestamp
  • As a consequence, the evaluation of the average
    error in the query results, can be integrated in
    the compressions algorithm, introducing only a
    small overhead on its execution

19
Talk Outline
  • Problem Statement
  • Background
  • Compressing Trajectories
  • Related work on Error Estimation
  • Estimating the Effect of Compression in ST
    Querying
  • Evaluating the Effect of Compression in ST
    Querying
  • Experimental Results
  • On the performance
  • On the quality
  • Conclusions and Future Work

20
Experimental Study Settings
  • Datasets
  • One real trajectory dataset of a fleet of trucks
    (273 trajectories, 112K entries)
  • A synthetic dataset of 2000 trajectories
    generated using network-based data generator and
    the San Joaquin road network
  • Implementation
  • We implemented the TD-TR algorithm and compressed
    the real and synthetic datasets varying its
    threshold
  • Experiments
  • Average overhead introduced in the TD-TR
    algorithm
  • Average number of false positives and false
    negatives in 10000 randomly distributed timeslice
    queries

21
Experimental Study On the performance
  • Scaling the value of the TD-TR threshold
  • The algorithms execution time reduces as the
    value of the TD-TR threshold increases
  • The overhead introduced in the algorithms
    execution, is typically small (bellow 7)
  • In absolute times, the overhead introduced never
    exceeds 0.2 milliseconds per trajectory

Trucks dataset
Synthetic dataset
22
Experimental Study On the quality (1)
  • Scaling the value of the TD-TR threshold
  • The average number of false hits (negatives and
    positives) is linear with the value of the TD-TR
    compression threshold
  • The average error in the estimation for the
    synthetic dataset is around 6, varying between
    0.2 and 14
  • In the trucks dataset the average error increases
    around 10.6, mainly due to the error introduced
    in small values of TD-TR threshold

Trucks dataset
Synthetic dataset
23
Experimental Study On the quality (2)
  • Scaling the query size
  • The average number of false hits (negatives and
    positives) is sub-linear with the size of the
    query
  • The average error in the estimation for the
    synthetic dataset is around 2.9, varying between
    0.2 and 8.7
  • In the trucks dataset the average error increases
    around 7.5

Trucks dataset
Synthetic dataset
24
Summary and Future Work
  • We provided a closed formula of the average
    number of false negatives and false positives
    covering the case of uniformly distributed query
    windows and arbitrarily distributed trajectory
    data
  • Through an experimental study we demonstrated the
    efficiency of the proposed model
  • We illustrated the applicability of our model
    under real-life requirements it turns out that
    the estimation of the model parameters introduce
    only a small overhead in the trajectory
    compression algorithm
  • We presented the accuracy of our estimations,
    with an average error being around 6.
  • Future work
  • Extension of our model in nearest neighbor and
    general range queries
  • Applicability of our model in the case of
    spatiotemporal warehouses

25
Acknowledgements
  • Research partially supported by
  • GEOPKDD (Geographic Privacy-aware Knowledge
    Discovery and Delivery) project funded by the
    European Community under FP6-014915 contract

26
On the Effect of Trajectory Compression in
Spatiotemporal Querying
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com