Guest Lecture Tonight - PowerPoint PPT Presentation

About This Presentation
Title:

Guest Lecture Tonight

Description:

What is the distance between the points Blue (1,1) and Red (5,7) ... Brian's eats 99. Can o Beans. Cocos. Champagnes. Chris's Fish House ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 36
Provided by: Informatio349
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Guest Lecture Tonight


1
Guest Lecture Tonight
  • Room 205 in EBII (under Dr. Keoghs office)
  • Starts at 610pm
  • Attendance is mandatory
  • Best behavior is mandatory
  • Having an intelligent question is mandatory

2
Spatial Access Methods Dr Eamonn
Keogh Computer Science Engineering
DepartmentUniversity of California -
RiversideRiverside,CA 92521eamonn_at_cs.ucr.edu
3
Mathematical Preliminaries I
?y
What is the distance between the points Blue
(1,1) and Red (5,7) ? The formula for the
Euclidian distance between two points is
(5,7)
(1,1)
?x
4
Consider a classic database
Assume that record(2).Name returns Joes Bugers
and record(1).Location returns (244, 365) etc.
5
  • Given such a database we can easily answer
    queries such as
  • List all Mexican restaurants.
  • List all Grade A restaurants.
  • However, classic databases do not allow queries
    such as
  • List all Mexican restaurants within five miles
    of UCR
  • List the 5 pizza restaurants nearest to 91 and
    60.
  • These kinds of queries are called spatial
    queries.

6
One possible way to do spatial queries
Algorithm One_Nearest_Neighbor_Query(location)
best_so_far infinity for i 1 to
number_items_in_database retrieve record(i)
from database if distance(location,record(i).
location ) best_so_far distance(location,re
cord(i).location ) NNpointer i end
end disp(The nearest neighbor
is,record(NNpointer ).name ) end
But this does not work well because
7
so we need to index the data
My informal definition of indexing. Organizing
the data such that queries can be answered in sub
linear time.
Indexing some kinds of data is trivial We can
index the names field simply by creating 26 files
that have the names plus pointers back to the
original database.
Adams Fish House 23 Als Burgers 34 Als
Pizza 1 Amom Tacos 45
Bills Sushi 56 Bobs Burgers 78 Bombs
Away! 12 Brians eats 99
Can o Beans Cocos Champagnes Chris's Fish House
8
So, we call always index 1-dimensional data (if
you can sort it, you can index it), such that we
can answer 1-nearest neighbor queries by
accessing just O(log(n) ) of the database. (n is
the number of items in the database). But we
cant sort 2 dimensional data This is an
important problem which has attracted a great
deal of research interest. We can in fact
index 2 (and higher) dimensional reasonably
efficiently with special data structures known as
Spatial Access Methods (SAM) or Multidimensional
Access Methods. Although there are many
variations we will focus on the R-Tree,
introduced by Guttman in the 1984 SIGMOD
conference.
9
We have seen the formula for the distance between
two points. We will find it useful to have a
formula for the distance between a point and the
closest possible point within an Minimum Bounding
Rectangle MBR
1) Suppose we have a query point Q and one known
point R Could any of the points in the MBR be
closer to Q than R is?
R (1,7)
Q (3,5)
MBR (6,1),(8,4)
10
We have seen the formula for the distance between
two points. We will find it useful to have a
formula for the distance between a point and the
closest possible point within an Minimum Bounding
Rectangle MBR
Suppose we have a query point Q and one known
point B Could any of the points in the MBR be
closer to Q than B is?
Q (3,5)
MBR (6,1),(8,4)
B (1,1)
11
The formula for the distance between a point and
the closest possible point within an MBR
MBR (L.x,L.y)(U.x,U.y)
Q (x,y)
MINDIST(Q,MBR) if L.x lt x lt U.x and L.y lt y lt
U.y then 0 elseif L.x lt x lt U.x then min( (L.y
-y)2 , (U.y -y)2 ) elseif .
12
We can visualize the locations of interest simply
as points in space...
13
Q1
then both nearest neighbor queries and range
queries have a simple geometric
interpretation Imagine that each point has a
ID pointer back to the original database
Q2
2
23
1
14
We can group clusters of datapoints into boxes,
called Minimum Bounding Rectangles (MBRs). Each
MBR can be represented with just two points. The
lower left corner, and the upper right
corner. We can further recursively group MBRs
into larger MBRs.
R1
R4
R2
R5
R3
R6
R9
R7
R8
15
these nested MBRs are organized as a tree
(called a spatial access tree or a
multidimensional tree).
R10 R11 R12
R1 R2 R3
R4 R5 R6
R7 R8 R9
Data nodes containing points
16
R10
(1,0),(7,9)
R1 R2 R3
R12
(1,3),(5,4) (2,0),(7,9) (1,1),(7,8)
At the leave nodes we have the location, and a
pointer to the record in question. At the
internal nodes, we just have MBR information.
(5,1) 32 (1,4) 45 (5,6) 27 (7,8) 73
(3,4) 77 (1,3) 88 (2,3) 22 (5,4) 13
(2,2) 47 (3,0) 86 (7,9) 52
Data nodes
17
We can define a function, MINDIST(point, MBR),
which tells us the minimum possible distance
between any point and any MBR, at any level of
the tree.
MINDIST(point, MBR) 5
MINDIST(point, MBR) 0
18
We can use the MINDIST(point, MBR), to do fast
search..
MINDIST( , R10) 0
MINDIST( , R11) 10
MINDIST( , R12) 17
R10 R11 R12
R1 R2 R3
R4 R5 R6
R7 R8 R9
Data nodes containing points
19
We can use the MINDIST(point, MBR), to do fast
search..
MINDIST( , R1) 0
MINDIST( , R2) 2
MINDIST( , R3) 10
0 10 17
R10 R11 R12
R1 R2 R3
R4 R5 R6
R7 R8 R9
Data nodes containing points
20
  • We now go to disk, and retrieve all the data
    objects whose pointers are in the green node. We
    measure the true distance between our query and
    those objects. Let us imagine two scenarios, the
    closest object, the best-so-far has a value
    of..
  • 1.5 units (we are done searching!)
  • 4.0 units (we have to look in R2, but then we
    are done)

0 10 17
R10 R11 R12
0 2 10
R1 R2 R3
R4 R5 R6
R7 R8 R9
Data nodes containing points
21
(No Transcript)
22
A simplifying assumption to make coding the
R-Tree easier
We have seen that searching R_trees is easy! But
building the MBRs looks like a non-trivial
problem. I will allow a simplifying assumption
to make it much easer to understand and code
23
12,12
We have a collection of datapoints in 2
dimensions. Before we begin, we must decide
on the max number of points per leaf MBR, lets
say 3
0,0
24
12,12
The definition of an MBR is the smallest (axis
parallel) rectangle that can contain a set of
points. We can represent the MBR with just 2
points, the lower left corner (L) and the upper
right corner (U). MBR1 (L.x,L.y)(U.x,U.y)
MBR1 (1,1)(9.6,8.9)
0,0
25
This is the root of our R-Tree
MBR (1,1)(9.6,8.9)
26
MBR1 (1,1)(9.6,8.9)
MBR4
MBR2 (1,1)(4.3,3.95)
MBR3 (1,3.95)(4.3,8.9)
MBR4
MBR3
MBR2
MBR5
27
MBR1 (1,1)(9.6,8.9)
MBR4
MBR2 (1,1)(4.3,3.95)
MBR3 (1,3.95)(4.3,8.9)
MBR10 (1,3.95)(2.65,6.42)
MBR11 (1,6.42)(2.65,8.9)
MBR12
MBR11
MBR4
(1.1, 4.4) 44 (1.3, 4.1) 34 (1.5, 4.3) 12
MBR13
MBR10
MBR2
MBR5
28
  • Under this definition of an R-tree (which is not
    the classic definition)
  • Every internal node has exactly 4 children.
  • Every MBR at a given level of the tree is
    mutually exclusive (a point cannot be in two MBRs
    at a given level).
  • Every leaf node is in one of these two classes
  • It does not cover any points, and therefore
    contains a null pointer.
  • It covers at least one point, and has a pointer
    to a file which contains those points, together
    with the primary key.

29
Project Report Format
  • A new file on the webpage gives details on the
    format of your final report (the white binders)

30
Title Page Requirements Analysis. Requirements
Elicitation. System Specification. System
Design Program Design Coding Unit and
Integration Testing System Testing Acceptance
Testing A Maintenance Plan Assume that like
the vast majority of software that is
commissioned, your contract requires you to
maintain the software. Conclusions Assume that
you write this just before handing in your
document. Summarize briefly what you did for you
project. Address the following what most
surprised you about the process of creating the
project. What would you do differently if you had
to do it again? References Every, book, paper,
webpage you used must be cited in a standard
format. You could use the American Psychological
format 1, 2, or the IEEE standard 3, or any
other format so long as you are consistent and
complete. You should also reference any standard
template libraries used in your code.
Appendices Including source code, printed in 2
pages per page format. You should also list a
professional quality, one-page resume for each
member of the group (use the same template for
each). Acknowledgements Your chance to thank
anyone who helped you complete the project. Two
(identical) Cd roms in a high quality plastic
wallet Containing source code, test data, all
the weekly archives, and a readme.txt file that
contains a brief (one line) description of all
the files.
31
How I grade the project
32
Grading Rubric for CS179. Database project.
Group name (all last names) The project
binder is in the correct basic format, and is
neat. The format is consistent.
The project is well documented (the students
carefully referenced, the books, papers, URLs
that they used)
33
The students made some effort to do elicitation.
The came up with interview questions in advance,
they noted the answers to the questions.
(optional, they also use some other elicitation
technique )
The students made some effort to do
requirements specification. The specification
considers things like the minimum hardware
requirements, usability factors, and user
scenarios, privacy issues. The specification
actually says what the system will do under
normal scenarios, and how it will handle errors.

34
The students actually designed the project (as
opposed to going straight to coding) Design may
include flowcharts or pseudocode or English
descriptions.
The code is documented, indented etc..

35
The project was tested. The students demonstrated
how they tested the project. There was unit
testing and integration testing.
The overall quality of the finished product
General remarks.
Tentative grade (may be modified
after second pass). Final grade
Special Circumstances Students in same group
get different grades because
Write a Comment
User Comments (0)
About PowerShow.com