Title: Principles and Concepts of Geospatial Data Structure, Algorithms, Mining,
1Principles and Concepts of Geospatial Data
Structure, Algorithms, Mining, Fusion
- Presented by
- GDF Learning Group
2Geospatial Data Structure
- Entity-Based Models
- Tessellation
- Vector Mode
- Half-Plane Representation
- (Refer to Chapter Two, course textbook for
additional information)
3Entity-Based Models
- Zero-dimensional objects or points
- One-dimensional objects or linear objects
- Two-dimensional objects or surface objects
40-D Object or Point
- A point is used to represent the location of an
object whose shape is not considered useful.
51-D Object or Linear Object
- Polyline
- Simple Polyline
- Monotone Polyline
62-D objects
- Polygon
- Convex Polygon
- Monotone Polygon
7Tessellation
- Basically the partitioning of space into cells or
a grid. - Instead of using x,y coordinates, tessellation
partitions cells and names them with numbers
(this could be confusing).
8Vector Mode
- Vector Mode uses an x and y axis to plot the
coordinate of points.
9Half-Plane Representation
- Half-Plane Representation can be defined as a set
of points that satisfy an inequation of the form
a1x1a2x2adxd - Really it is an ongoing group of linked polygons
spread out over an x,y, and z axis. - This fives the impression that it is
3-Dimensional, however we know that this is not
truly so.
10Algorithms
11Algorithms
- Point in polygon
- Line intersection
- Polygon intersection
12Point in a polygon
- If the point lies on an edge of the polygon then
the point is contained in the polygon - If a horizontal line is drawn from the point to
the right and intersects an even number of edges
of the polygon it is outside the polygon - Only non collinear lines are counted when
intersected lines are counted
13Point in a polygon
14Line intersection
- A line is drawn vertically at the left most
endpoint of the lines in question - The y coordinate of the lines are noted
- The vertical line is moved to the right till it
gets to the rightmost endpoint of the lines - If the y coordinate of two lines is the same then
the lines intersect
15Line intersection
16Polygon intersection
- Starts off with a synchronized scan of the
boundaries of both polygons - Reports the intersection points of the polygons
and which vertexes are inside the other polygons - Continues till all boundaries have been scanned
- If no intersections are detected then it tests if
one polygon is completely in the other polygon
17Polygon intersection
18Data Mining
19Data Mining
- The process of discovering interesting and
potentially useful patterns of information
embedded in large databases - Examples of large databases are Earth Observation
Satellites, the U.S. Census, and weather and
climate databases
20Pattern Discovery
- A pattern can be a summary statistic, like the
mean, median, or standard deviation of a dataset,
or a simple rule such as Beach property is, on
average, 40 percent more expensive than inland
property
21Data Mining Process
- Domain expert provides a database to the data
mining analyst - The DE and DMA must agree on a problem statement
- The DMA decides which technique and algorithm
should be used, resulting in hypotheses of a
potential pattern
22Data Mining Process
- The next step is verification, refinement, and
visualization of the pattern, usually done with
GIS software - The final step is interpretation of the pattern
and deciding what action to take
23Statistics and Data Mining
- Statistics are used to verify whether the
hypotheses are true or not, but there are some
false dismissals
24Unique Features of Data Mining
- Spatial data tends to be highly self-correlated
- For example, people with similar characteristics,
occupations, and backgrounds tend to cluster
together in the same areas - The first law of geography states that
Everything is related to everything else, but
nearby things are more related than distant
things (Tobler, 1979) - In spatial statistics this is called spatial
correlation
25Example of Spatial Data MiningBefore the
Invention of Computers
- In 1855, Asiatic cholera was all over London
- An epidemiologist marked all locations where
disease struck - A cluster formed around a water pump
- Water pump was turned off and the disease began
to subside - The goal of spatial data mining is to automate
the discoveries of such correlations, which can
then be examined by specialists for further
validation and verification
26Motivating Spatial Data Mining
27Measures of Spatial Form and Autocorrelation
- The propensity of a variable to exhibit similar
values as a function of the distance between the
spatial locations at which it is measured - Spatial autocorrelation is used to measure this
28Spatial Autocorrelation
- A property that is often exhibited by variables
which are sampled over space - For example, soil fertility, rainfall, and air
pressure all vary gradually over space - Morans I is a measure used to quantify this
interdependence - There are both global and local Morans I
29Spatial Statistical Models
- Often used to represent the observations in terms
of random variables - Can be used for estimation, description, and
prediction based on probability theory
30Point Process
- A model for the spatial distribution of the
points in a point pattern - Examples are the position of trees in a forest or
locations of gas stations in a city
31Lattices
- A countable collection of regular or irregular
spatial sites - An example is census data defined on census
blocks - Several spatial analysis functions can be applied
on lattice models
32Geostatistics
- Deals with the analysis of spatial continuity,
which is an inherent characteristic of spatial
data sets - Provides a set of statistical tools for modeling
spatial variability and interpolation
(prediction) of attributes at unsampled locations - Kriging is a well-known estimation procedure used
in geostatistics
33The Data Mining Trinity
- Classification
- Clustering
- Association rules
34Location Prediction andThematic Classification
- The goal of classification is to estimate the
value of an attribute of a relation based on the
value of the relations other attributes
35Determining the Interactionamong Attributes
- When x happens y is likely to occur also
36Identification of Hot SpotsClusters and Outliers
- Hot spots are regions in the study space that
stand out compared with the overall behavior
prevalent in the space - Outliers are observations that appear to be
inconsistent with the remainder of the data set - Law enforcement agencies use hot spot analysis to
determine areas within their jurisdiction that
have unusually high levels of crime
37Data Fusion
- How will we apply our area into the overall
homeland security project?
38Fusions 2 Main Categories
- Fusion of collection of measurements done by data
mining. - Field data collected by actual measurements done
by using algorithms. - Also could be analysis of the remote sensing.
- Fusion of remote sensing images.
- Vector data
- Raster data
- Tins of areas
- (map data of routes, locations,)
39Whats Next?
- We can apply our data (mapping and measurements)
to a conclusion on how to solve a problem.
- We can take the acquired knowledge and continue
crunching numbers.
40Conclusion
- Today, we took a deeper look at principles and
concepts of Geospatial Data Structure,
Algorithms, Mining, Data Fusion. - In our next presentation, we will explore
implications of these concepts for the Homeland
Security GIS application