Nearest Neighbor Search in Spatial and Spatiotemporal Databases - PowerPoint PPT Presentation

About This Presentation

Title:

Nearest Neighbor Search in Spatial and Spatiotemporal Databases

Description:

Spatiotemporal databases deal with the same queries assuming, however, moving ... Here we consider low dimensional spaces (spatial and spatiotemporal databases) ... – PowerPoint PPT presentation

Number of Views:710

Avg rating:3.0/5.0

Slides: 30

Provided by: Sun163

Category:

more less

Transcript and Presenter's Notes

Title: Nearest Neighbor Search in Spatial and Spatiotemporal Databases

1
Nearest Neighbor Search in Spatial and
Spatiotemporal Databases

Dimitris Papadias
Hong Kong University of Science and Technology

2
Spatial and spatiotemporal databases

Spatial databases manage large collection of
multi-dimensional objects.
Important query types
Window query Retrieve all rivers in CA
Nearest neighbor Find my nearest gas station
Spatial join Report pairs of (city C, river R)
such that R crosses C
Spatiotemporal databases deal with the same
queries assuming, however, moving objects
Mobile computing
Traffic supervision
Flight control
Weather forecasting

3
R-trees Guttman SIGMOD 84. Sellis et al VLDB 87,
Beckman et al SIGMOD 00
4
TPR-trees Saltenis et al., SIGMOD 00, our group
VLDB 03

Extends the R-tree by introducing the velocity
bounding rectangle (VBR) in non-leaf entries.
Objects are grouped together based on both their
location and velocities.

5
Conventional NN search with R-(TPR-) trees

Depth-first Roussopoulos et al., SIGMOD 95
Best-first traversal Hjaltason and Samet TODS
99, incremental and optimal

6
NN search - other approaches

Several algorithms and theoretical performance
bounds have been devised for exact and
approximate processing in main memory. Here we
care about I/O efficiency (minimization of node
and page accesses) as well as cost models about
the practical performance (suitable for query
optimization).
Several approaches for NN in high-dimensional
spaces (but the problem is different due to the
dimensionality curse). Here we consider low
dimensional spaces (spatial and spatiotemporal
databases).
Ferhatosmanoglu et al SSTD 01 discover the NN
in a constrained area of the data space (e.g.,
find the NN to the south of the query point).
Korn and Muthukrishnan SIGMOD 00 discuss
reverse nearest neighbor queries, where the goal
is to retrieve the data points whose nearest
neighbor is a specified query point.
Korn et al. VLDB 02 study the same problem in
the context of data streams, where the data are
not known in advance.

7
NN search for mobile queries

Zheng and Lee, SSTD 01 return the current NN
and the validity time of the result.
Restrictions (i) assumes a maximum speed (ii)
applicable only to single NN (iii) requires
voronoi diagrams.

Song and Roussopoulos, SSTD 01 minimize the
number of queries for moving clients by returning
mgtk NNs.
Problem how to determine m.

IF 2?dist(q,q') ? dist(q,b)-dist(q,a), THEN the 2
NN at q' be among the 4 NN of the first query.
8
Time parameterized NN (our group, SIGMOD 02)

Assuming a constant and known velocity, a TPNN
returns
The current query result R
The validity period T of R
The change C of the result at the end of T

Result
Ri, T2, Cj
9
TP NN queries Influence Time

Some objects have infinite influence time.
The object that will become the next nearest
neighbor is the one with the minimum influence
time.

10
Processing TP NN with R- (TPR-) trees

Influence time of a MBR the earliest possible
time that any object in the MBR will become the
new NN.

Algorithm traverse the R-tree using depth-first
or best-first traversal using the influence time
instead of the mindist .
Cost of TPNN queries about the same as that of
conventional queries because we have to visit the
influencing nodes anyway (to find the NN).

11
Continuous Nearest Neighbors (CNN) (our group,
VLDB 02)

Given a line segment qs,e, find the NN of
every point on q.
Result representation s(.NNa), s1(.NNc),
s2(.NNf), s3(.NNh), e.
The points (s, s1, s2, s3, e) are the split
points.

12
Main idea

Maintain the set of split points incrementally.

After processing a
After processing c
13
Processing TP NN with an R- (TPR-) tree

Avoid examination of all points.
Given an MBR E and query segment q, E must be
searched if and only if there exists a split
point si?SL such that dist(si,si.NN) gt
mindist(si, E).

14
Location Based NN queries (LBNN) (our group,
SIGMOD 03)

A location-based kNN query q returns
The current k NNs
A validity region such that the result remains
the same as long as q remains in the region.
The validity region of q is the Voronoi Cell (VC)
of the NN o.

15
Computing the Voronoi Cell on-the-fly

Step 1 Find the current NN
Step 2 Use time TP NN queries to tighten the
validity region

16
NN queries in road networks (our group, VLDB 03)

Find my nearest gas station in terms of driving
distance.
Answer Hotel b (the Euclidean NN is d)

Assumptions
We can incrementally compute Euclidean NN using
conventional NN algorithms.
We can compute the network distance between the
query and any point (i.e., the length of the
shortest path connecting them) using Dijkstra's
algorithm.

17
Euclidean Restriction Algorithm
1st Euclidean NN
2nd Euclidean NN
18
Network Expansion Algorithm
19
NN in the presence of obstacles (not published)

The NN of q in terms of obstructed distance is b,
although the Euclidean NN is a.

20
Visibility graphs

Have been used widely in Computational Geometry
for shortest path problems (e.g., find the
shortest path from pstart to pend that does not
cross any obstacle).

Problem We cannot maintain the entire visibility
graph in memory for real spatial datasets.
Solution We only need the obstacles and objects
that affect the result of the query.

21
Obstacle nearest neighbor algorithm

Idea Similar to the Euclidean Restriction
algorithm for road networks.

BUT how do we perform the obstructed distance
computations?

22
Obstructed distance computation

Goal compute the obstructed distance between p
and q.
First retrieve obstacles o1, o2 in the Euclidean
range.
Compute a provisional distance d1(p,q) using only
o1, o2.
d1(p,q) is not enough because the shortest path
is obstructed by o3.
Perform a second Euclidean range query on the
obstacle R-tree using d1(p,q) and retrieve o3,
o4.
Compute a new obstructed distance d2(p,q) taking
o3, o4 into account.
Repeat the process until the obstructed distance
remains the same for two consecutive iterations.

23
Other related work

By our group Similar concepts to the ones
presented here, apply to several other spatial
queries, i.e., TP spatial joins, Continuous
window queries.
Cost Models for TP and continuous queries TODS
03.
Analysis of predictive NN (and other) queries
TODS to appear.
An Efficient Cost Model for Optimization of
Nearest Neighbor Search in Low and Medium
Dimensional Spaces TKDE to appear.
By other groups increasing interest for novel
types of NN search in the context of mobile
computing and data streams applications
Iwerks et al VLDB03 discuss continuous NN in
the presence of object updates.
Shekhar et al ACM GIS 03 discuss the in-route
nearest neighbor query, which, given a
trajectory, retrieves the single NN (e.g., gas
station) that results in the minimum diversion
from the trajectory.
Jensen et al ACM GIS 03 discuss NN for objects
moving on road networks.

24
Group NN queries (our group, ICDE 04)

Input a set Pp1,,pN of static data points in
multidimensional space and a group of query
points Qq1,,qn.
Output the k (?1) data point(s) with the
smallest sum of distances to all points in Q. The
distance between a data point p and Q is defined
as dist(p,Q)?i1npqi, where pqi is the
Euclidean distance between p and query point qi.
Example three users at locations q1, q2 and q3
want to find a meeting point (e.g., a
restaurant) the corresponding query returns the
data point p that minimizes the sum of Euclidean
distances pqi for 1?i?3
Assumption the data points are indexed by an
R-trees. Q may or may not fit in main memory.

25
Multiple Query Method (MQM)

Idea Perform incremental NN queries for each
point in Q and combine their results.

ltp10, 7gt, ltp11, 6gt, T5 (23)
ltp11, 7gt
T6 (33)
MQM terminates

Problem MQM may visit the same node and discover
the same data point many times (for different
query points).

26
Minimum Bounding Method (MBM)

Applies the MBR of Q to prune the search space.
Heuristic 1 Let M be the MBR of Q, and best_dist
be the distance of the best GNN found so far. A
node N cannot contain qualifying points, if

Heuristic 2 A node N cannot contain qualifying
points, if

27
File Multiple Query Method (F-MQM)

What happens if Q does not fit in memory.
F-MQM sorts query points according to their
Hilbert value and splits Q into blocks Q1, ..,
Qm that fit in memory.
For each block, it computes the GNN using one of
the main memory algorithms
It finally combines their results using MQM.
Complication once a NN of a group has been
retrieved, we cannot compute its global distance
(i.e., with respect to all data points)
immediately.

28
F-MQM (cont)

Solution lazy evaluation
First we find the GNN p1 of the first group Q1
Then, we load in memory the second group Q2 and
retrieve its NN p2. At the same time, we also
compute the distance between p1 and Q2.
Similarly, when we load Q3, we update the current
distances of p1 and p2 taking into account the
objects of the third group.
After the end of the first round, we only have
one data point (p1), whose global distance with
respect to all query points has been computed.

29
File Minimum Bounding Method (F-MBM)

First, the points of Q are sorted by their
Hilbert value and are assigned to groups (that
fit in memory) according to this order.
For each group Qi, F-MBM keeps in memory its MBR
Mi and cardinality ni (but not its contents).
F-MBM descends the R-tree of P (in depth-first or
best-first traversal), only following nodes that
may contain qualifying points.

Heuristic Let best_dist be the distance of the
best GNN found so far. A node N can be safely
pruned if

Write a Comment

User Comments (0)