Nearest Neighbor Search in Spatial and Spatiotemporal Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Nearest Neighbor Search in Spatial and Spatiotemporal Databases

Description:

Spatiotemporal databases deal with the same queries assuming, however, moving ... Here we consider low dimensional spaces (spatial and spatiotemporal databases) ... – PowerPoint PPT presentation

Number of Views:710
Avg rating:3.0/5.0
Slides: 30
Provided by: Sun163
Category:

less

Transcript and Presenter's Notes

Title: Nearest Neighbor Search in Spatial and Spatiotemporal Databases


1
Nearest Neighbor Search in Spatial and
Spatiotemporal Databases
  • Dimitris Papadias
  • Hong Kong University of Science and Technology

2
Spatial and spatiotemporal databases
  • Spatial databases manage large collection of
    multi-dimensional objects.
  • Important query types
  • Window query Retrieve all rivers in CA
  • Nearest neighbor Find my nearest gas station
  • Spatial join Report pairs of (city C, river R)
    such that R crosses C
  • Spatiotemporal databases deal with the same
    queries assuming, however, moving objects
  • Mobile computing
  • Traffic supervision
  • Flight control
  • Weather forecasting

3
R-trees Guttman SIGMOD 84. Sellis et al VLDB 87,
Beckman et al SIGMOD 00
4
TPR-trees Saltenis et al., SIGMOD 00, our group
VLDB 03
  • Extends the R-tree by introducing the velocity
    bounding rectangle (VBR) in non-leaf entries.
  • Objects are grouped together based on both their
    location and velocities.

5
Conventional NN search with R-(TPR-) trees
  • Depth-first Roussopoulos et al., SIGMOD 95
  • Best-first traversal Hjaltason and Samet TODS
    99, incremental and optimal

6
NN search - other approaches
  • Several algorithms and theoretical performance
    bounds have been devised for exact and
    approximate processing in main memory. Here we
    care about I/O efficiency (minimization of node
    and page accesses) as well as cost models about
    the practical performance (suitable for query
    optimization).
  • Several approaches for NN in high-dimensional
    spaces (but the problem is different due to the
    dimensionality curse). Here we consider low
    dimensional spaces (spatial and spatiotemporal
    databases).
  • Ferhatosmanoglu et al SSTD 01 discover the NN
    in a constrained area of the data space (e.g.,
    find the NN to the south of the query point).
  • Korn and Muthukrishnan SIGMOD 00 discuss
    reverse nearest neighbor queries, where the goal
    is to retrieve the data points whose nearest
    neighbor is a specified query point.
  • Korn et al. VLDB 02 study the same problem in
    the context of data streams, where the data are
    not known in advance.

7
NN search for mobile queries
  • Zheng and Lee, SSTD 01 return the current NN
    and the validity time of the result.
  • Restrictions (i) assumes a maximum speed (ii)
    applicable only to single NN (iii) requires
    voronoi diagrams.
  • Song and Roussopoulos, SSTD 01 minimize the
    number of queries for moving clients by returning
    mgtk NNs.
  • Problem how to determine m.

IF 2?dist(q,q') ? dist(q,b)-dist(q,a), THEN the 2
NN at q' be among the 4 NN of the first query.
8
Time parameterized NN (our group, SIGMOD 02)
  • Assuming a constant and known velocity, a TPNN
    returns
  • The current query result R
  • The validity period T of R
  • The change C of the result at the end of T

Result
Ri, T2, Cj
9
TP NN queries Influence Time
  • Some objects have infinite influence time.
  • The object that will become the next nearest
    neighbor is the one with the minimum influence
    time.

10
Processing TP NN with R- (TPR-) trees
  • Influence time of a MBR the earliest possible
    time that any object in the MBR will become the
    new NN.
  • Algorithm traverse the R-tree using depth-first
    or best-first traversal using the influence time
    instead of the mindist .
  • Cost of TPNN queries about the same as that of
    conventional queries because we have to visit the
    influencing nodes anyway (to find the NN).

11
Continuous Nearest Neighbors (CNN) (our group,
VLDB 02)
  • Given a line segment qs,e, find the NN of
    every point on q.
  • Result representation s(.NNa), s1(.NNc),
    s2(.NNf), s3(.NNh), e.
  • The points (s, s1, s2, s3, e) are the split
    points.

12
Main idea
  • Maintain the set of split points incrementally.

After processing a
After processing c
13
Processing TP NN with an R- (TPR-) tree
  • Avoid examination of all points.
  • Given an MBR E and query segment q, E must be
    searched if and only if there exists a split
    point si?SL such that dist(si,si.NN) gt
    mindist(si, E).

14
Location Based NN queries (LBNN) (our group,
SIGMOD 03)
  • A location-based kNN query q returns
  • The current k NNs
  • A validity region such that the result remains
    the same as long as q remains in the region.
  • The validity region of q is the Voronoi Cell (VC)
    of the NN o.

15
Computing the Voronoi Cell on-the-fly
  • Step 1 Find the current NN
  • Step 2 Use time TP NN queries to tighten the
    validity region

16
NN queries in road networks (our group, VLDB 03)
  • Find my nearest gas station in terms of driving
    distance.
  • Answer Hotel b (the Euclidean NN is d)
  • Assumptions
  • We can incrementally compute Euclidean NN using
    conventional NN algorithms.
  • We can compute the network distance between the
    query and any point (i.e., the length of the
    shortest path connecting them) using Dijkstra's
    algorithm.

17
Euclidean Restriction Algorithm
1st Euclidean NN
2nd Euclidean NN
18
Network Expansion Algorithm
19
NN in the presence of obstacles (not published)
  • The NN of q in terms of obstructed distance is b,
    although the Euclidean NN is a.

20
Visibility graphs
  • Have been used widely in Computational Geometry
    for shortest path problems (e.g., find the
    shortest path from pstart to pend that does not
    cross any obstacle).
  • Problem We cannot maintain the entire visibility
    graph in memory for real spatial datasets.
  • Solution We only need the obstacles and objects
    that affect the result of the query.

21
Obstacle nearest neighbor algorithm
  • Idea Similar to the Euclidean Restriction
    algorithm for road networks.
  • BUT how do we perform the obstructed distance
    computations?

22
Obstructed distance computation
  • Goal compute the obstructed distance between p
    and q.
  • First retrieve obstacles o1, o2 in the Euclidean
    range.
  • Compute a provisional distance d1(p,q) using only
    o1, o2.
  • d1(p,q) is not enough because the shortest path
    is obstructed by o3.
  • Perform a second Euclidean range query on the
    obstacle R-tree using d1(p,q) and retrieve o3,
    o4.
  • Compute a new obstructed distance d2(p,q) taking
    o3, o4 into account.
  • Repeat the process until the obstructed distance
    remains the same for two consecutive iterations.

23
Other related work
  • By our group Similar concepts to the ones
    presented here, apply to several other spatial
    queries, i.e., TP spatial joins, Continuous
    window queries.
  • Cost Models for TP and continuous queries TODS
    03.
  • Analysis of predictive NN (and other) queries
    TODS to appear.
  • An Efficient Cost Model for Optimization of
    Nearest Neighbor Search in Low and Medium
    Dimensional Spaces TKDE to appear.
  • By other groups increasing interest for novel
    types of NN search in the context of mobile
    computing and data streams applications
  • Iwerks et al VLDB03 discuss continuous NN in
    the presence of object updates.
  • Shekhar et al ACM GIS 03 discuss the in-route
    nearest neighbor query, which, given a
    trajectory, retrieves the single NN (e.g., gas
    station) that results in the minimum diversion
    from the trajectory.
  • Jensen et al ACM GIS 03 discuss NN for objects
    moving on road networks.

24
Group NN queries (our group, ICDE 04)
  • Input a set Pp1,,pN of static data points in
    multidimensional space and a group of query
    points Qq1,,qn.
  • Output the k (?1) data point(s) with the
    smallest sum of distances to all points in Q. The
    distance between a data point p and Q is defined
    as dist(p,Q)?i1npqi, where pqi is the
    Euclidean distance between p and query point qi.
  • Example three users at locations q1, q2 and q3
    want to find a meeting point (e.g., a
    restaurant) the corresponding query returns the
    data point p that minimizes the sum of Euclidean
    distances pqi for 1?i?3
  • Assumption the data points are indexed by an
    R-trees. Q may or may not fit in main memory.

25
Multiple Query Method (MQM)
  • Idea Perform incremental NN queries for each
    point in Q and combine their results.
  • ltp10, 7gt, ltp11, 6gt, T5 (23)
  • ltp11, 7gt
  • T6 (33)
  • MQM terminates
  • Problem MQM may visit the same node and discover
    the same data point many times (for different
    query points).

26
Minimum Bounding Method (MBM)
  • Applies the MBR of Q to prune the search space.
  • Heuristic 1 Let M be the MBR of Q, and best_dist
    be the distance of the best GNN found so far. A
    node N cannot contain qualifying points, if
  • Heuristic 2 A node N cannot contain qualifying
    points, if

27
File Multiple Query Method (F-MQM)
  • What happens if Q does not fit in memory.
  • F-MQM sorts query points according to their
    Hilbert value and splits Q into blocks Q1, ..,
    Qm that fit in memory.
  • For each block, it computes the GNN using one of
    the main memory algorithms
  • It finally combines their results using MQM.
  • Complication once a NN of a group has been
    retrieved, we cannot compute its global distance
    (i.e., with respect to all data points)
    immediately.

28
F-MQM (cont)
  • Solution lazy evaluation
  • First we find the GNN p1 of the first group Q1
  • Then, we load in memory the second group Q2 and
    retrieve its NN p2. At the same time, we also
    compute the distance between p1 and Q2.
  • Similarly, when we load Q3, we update the current
    distances of p1 and p2 taking into account the
    objects of the third group.
  • After the end of the first round, we only have
    one data point (p1), whose global distance with
    respect to all query points has been computed.

29
File Minimum Bounding Method (F-MBM)
  • First, the points of Q are sorted by their
    Hilbert value and are assigned to groups (that
    fit in memory) according to this order.
  • For each group Qi, F-MBM keeps in memory its MBR
    Mi and cardinality ni (but not its contents).
  • F-MBM descends the R-tree of P (in depth-first or
    best-first traversal), only following nodes that
    may contain qualifying points.

Heuristic Let best_dist be the distance of the
best GNN found so far. A node N can be safely
pruned if
Write a Comment
User Comments (0)
About PowerShow.com