Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation - PowerPoint PPT Presentation

Loading...

PPT – Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation PowerPoint presentation | free to download - id: 4b0c09-MjQ3Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation

Description:

Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation Danzhou Liu Ee-Peng Lim Wee-Keong Ng Center for Advanced Information Systems ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 27
Provided by: sas1186
Learn more at: http://www.cs.ucf.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation


1
Efficient k Nearest Neighbor Queries on Remote
Spatial Databases Using Range Estimation
  • Danzhou Liu Ee-Peng Lim Wee-Keong Ng
  • Center for Advanced Information Systems, School
    of Computer Engineering
  • Nanyang Technological University, Nanyang Ave,
    Singapore 639798, Singapore

2
Outline
  • Introduction
  • Related work
  • k-NN query algorithm based on range estimation
  • Range estimation methods
  • Experiments
  • Conclusions

3
Introduction
  • Spatial database provides persistent storage for
    spatial objects (e.g., points, polylines,
    polygons)
  • Spatial database supports
  • Representation of spatial attributes
  • Storage/indexing of spatial data values using
    some spatial indices (e.g., R-tree and Quadtree)
  • Queries involving spatial attributes

4
k-Nearest Neighbor Queries
  • Definition
  • k-Nearest Neighbor (k-NN) query locating k
    spatial objects nearest to a given query point
  • Wide range of applications
  • Geographic Information Systems (GIS), e.g.,
    finding the nearest two hospitals
  • Computer Aided Design (CAD), e.g, finding the
    nearest three resistors in a circuit board

5
Motivation
  • Large volume of spatial data on WWW
  • Geospatial Data Clearinghouse (a collection of
    over 250 spatial database servers)
  • Yahoo, Tiger and other map services
  • Limited Web-based query interfaces
  • Support simple spatial queries (e.g., window
    queries)
  • No support for remote index access

6
The Geospatial Data Clearinghouse
  • Large amount of useful geospatial information on
    WWW

7
The Geospatial Data Clearinghouse
  • Limited Web-based query interface supports only
    window queries

8
Objective
  • Develop efficient algorithms to evaluate k-NN
    queries on remote spatial databases using window
    queries
  • Propose a generic k-NN query processing algorithm
    that accommodates different range estimation
    methods
  • Develop efficient range estimation methods
  • Conduct experiments to evaluate performance of
    proposed range estimation methods
  • Develop sampling methods to obtain statistical
    knowledge of remote databases needed for range
    estimation methods

9
Related Work
  • Algorithms for simple k-NN queries may be divided
    into three major groups
  • Partition-based algorithms
  • Graph-based algorithms
  • Range-based algorithms

10
Partition-based Algorithms
  • Retrieve k nearest neighbors from spatial indices
    by pruning away nodes that cannot lead to k
    nearest neighbors
  • Examples
  • Branch-and-bound R-tree traversal algorithm
  • Pipelined fashion algorithm
  • Not applicable to Web environment
  • Spatial indices are usually not available to
    non-local applications
  • Creating local indices is infeasible due to large
    amount of data

11
Graph-based Algorithms
  • Pre-compute nearest neighbors of spatial objects
    create new index structures for pre-computed
    nearest neighbor information to support search
  • Example
  • Voronoi-based algorithm
  • Not applicable to Web environment
  • Retrieving all spatial objects on remote database
    servers is sometimes impractical
  • Creating local indices is infeasible due to large
    amount of data

12
Range-based Algorithms
  • Use range queries to retrieve k nearest neighbors
  • Examples
  • Use sampling for range estimation
  • Use distance distributions for range estimation
  • Use reference points for range estimation
  • Not applicable to Web environment
  • Determining sample size and selecting samples of
    spatial objects properly are still a challenge
  • Creating local indices is infeasible due to large
    amount of data

13
Proposed k-NN Algorithm
  • Based on range estimation
  • New strategies for k-NN query evaluation in Web
    environment are required
  • Use window queries for probing spatial database

14
Density-based Range Estimation Method
  • Based on uniform spatial object distribution
    assumption
  • Range estimated by EstiRange1 function is
  • Ranges estimated by EstiRange2 function are

15
Bucket-based Range Estimation Method
  • Use summary information about partitions or
    buckets of spatial objects for range estimation
  • Summary information
  • Bucket MBB, number of spatial objects in bucket
  • Buckets are created using different strategies
    1
  • Sort the set of max distance between buckets and
    query point
  • Range estimated is the minimal bucket-query point
    max distance that contains at least k nearest
    neighbor objects
  • Use one window query

16
  • Example k 5

17
Experiments
  • New Jersey road dataset from TIGER 30

18
  • Performance measures
  • Number of iterations h
  • A
  • A

19
Experimental Results
  • Minimum, maximum and upper bounds on the number
    of iterations of the density-based range
    estimation method

20
  • Iteration and accuracy of the density-based range
    estimation method

21
Experimental Results
  • Efficiency of density-based and bucket-based
    range estimation methods

22
Conclusions
  • A window query approach to evaluate k-NN queries
    on remote spatial databases motivated by
  • Large amount of spatial information on the Web
  • Limited query interface
  • Proposed range estimation methods
  • Performances increase with k.
  • No a clear winner

23
(No Transcript)
24
Types of Range Estimation Methods
  • Tight estimation methods
  • Estimated range is not large enough i.e., both
    EstiRange1 and EstiRange2 functions may be
    invoked
  • e.g., density-based method
  • Loose estimation methods
  • Estimated range is large enough i.e., only the
    EstiRange1 function is invoked
  • e.g., bucket-based method

25
Future Work
  • Extending range estimation methods with sampling
    techniques to determine data distribution
  • Current range estimation methods depend on
    statistical knowledge provided by database owners
  • Investigate how the statistical knowledge can be
    approximated through sampling
  • Developing strategies to select the appropriate
    range estimation methods for evaluating k-NN
    queries.
  • Developing Web applications of k-NN queries.

26
Four Strategies to Create Buckets
  • Equi-Count, Equi-Area, Min-Skew, and Min-Overlap
    partitioning strategies 1

Charminar Dataset
Spatial Densities in Charminar
Equi-Area Partitioning
Min-Overlap Partitioning
Equi-Count Partitioning
Min-Skew Partitioning
About PowerShow.com