A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results

About This Presentation

Title:

A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results

Description:

... (e.g. Geographically Weighted Regression) Acknowledgement: Reviewers of ACM GIS Members of the Spatial database and spatial data mining group, UMN. – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 22

Provided by: Pradee45

Learn more at: https://www.spatial.cs.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results

1
A spatial neighborhood graph approach to Regional
Co-location pattern discovery summary of results

Pradeep Mohan, Shashi Shekhar, Zhe Jiang
University of Minnesota, Twin-Cities, MN

James A. Shine, James P. Rogers, Nicole Wayant US
Army- ERDC, Topographic Engineering Center,
Alexandria, VA
Contact mohan_at_cs.umn.edu
2
Outline

Motivation
Problem Formulation
Computational Approach
Conclusions and Future work

3
Motivation Spatial Heterogeneity, the second law
of Geography
Spatial Heterogeneity (Goodchild, 2004 Goodchild
2003)

Expectations vary across space.
Global models may not explain locally observed
phenomena.
Need for place based analysis.

Spatial Heterogeneity in Retail

Traditional Data Mining Which pair of items
sell together frequently ?
Ans Diaper in Transaction ? Beer in
Transaction.

Is this association true every where ?

Answer Blue Collar neighborhoods
Global Spatial Data Mining Global Co-location
patterns

Which pairs of spatial features are located
together frequently ?

Example Gas stations and Convenience Stores
Our Focus

Where do certain pairs of spatial features
co-locate frequently ?

Example Assaults happen frequently around
downtown bars.
4
Applications

Crime analysis
Localizing frequent crime patterns,
Opportunities for crime vary across space!

Question Do downtown bars often lead to
assaults more frequently ?

Public Health
Localizing elevated disease risks around
putative sources (e.g. mining areas)

Courtsey www.amazon.com
Question Where does high asbestos concentration
often lead to lung cancer ?

Ecology
Localizing symbiotic relationships between
different species of plants / animals.

Question Where are Plover birds frequently
found in the vicinity of a crocodile ?

Courtsey www.startribune.com

Predicting localities of the next crime.
5
Regional co-location patterns (RCP)

Input Spatial Features, Crime Reports.
Output RCP (e.g. lt (Bar, Assaults), Downtown gt)
Subsets of spatial features.
Frequently located in certain regions of a study
area.

6
Outline

Motivation
Problem Formulation
Basic Concepts
Problem Statement
Challenges
Related Work
Computational Approach
Conclusions and Future work

7
Basic Concepts Neighborhoods
Prevalence locality

Subsets of spatial framework containing
instances of a Pattern.
Simple representation to visualize Convex Hull
Other Representations possible.

Neighborhood Graph

Given A Spatial Neighbor Relation (spatial
neighborhood size)
Nodes Individual event instances
Edges Presence (If neighbor relation is
satisfied)
Based on Event Centric Model (Huang , 2004)

8
Basic Concepts Quantifying regional
interestingness

Conditional probability of observing a pattern
instance within a locality given an instance of a
feature within that locality.

Regional Participation Ratio
Example
Regional Participation index
Quantifies the local fraction participating in a
relationship.
Example
9
Detailed Statement
Prevalence Threshold 0.25 Spatial
neighborhood Size 1 Mile

Given
A spatial framework,
A collection of boolean spatial event types and
their instances.
A minimum interestingness threshold, P?
A symmetric and transitive neighbor relation R
(e.g. based on Spatial neighborhood size)

Find
All RCPs with prevalence gt P?

Objective
Minimize computational cost.
Constraints
Spatial framework is Heterogeneous.
Interest measure captures spatial
heterogeneity.
Completeness All prevalent RCPs are reported.
Correctness Only prevalent RCPs are reported.

10
Challenges

Conflicting Requirements
Interest measure captures spatial heterogeneity
while supporting scalable algorithms.
Exponential search space.
Candidate pattern set cardinality is exponential
in the number of event types.

Illustration
11
Challenges

Conflicting Requirements
Interest measure captures spatial heterogeneity
while supporting scalable algorithms.
Exponential search space.
Candidate pattern set cardinality is exponential
in the number of event types.

Illustration
n Patterns O(2M)
3 4 k123 (k1 gt0)
4 11 k224 (k2 gt0)
5 26 k325 (k3 gt0)
6 57 k426 (k4 gt0)
12
Contributions

Regional Co-location Patterns
Neighborhood based Formulation
Interest Measure
Captures the local fraction of events
participating in patterns.
Shows attractive computational properties,
Honors spatial heterogeneity.
Computational Approach
Computational Structure Pattern Space
Enumeration
Performance Enhancement- Maximal locality based
Pruning Strategies
Experimental Evaluation
Performance Evaluation using real datasets,
Lincoln, NE
Real world case study.

13
Related Work
Approaches for Regional Co-location Pattern
discovery
Spatial Neighborhood based
Fitness function Clustering (Eick et al., 2008)
Zoning Based (Celik et al., 2007)
Our Work
Zoning Based
Fitness Function Clustering

Reports one pattern per interesting region based
on a criterion (e.g. Max)
Computational structure and pruning strategies
not explored.
Clustering is based on real valued attributes.

14
Outline

Motivation
Problem Formulation
Computational Approach
Pattern Space Enumeration
Performance Tuning
Experimental Evaluation
Conclusions and Future work

15
Computational Approach
Prevalence Threshold 0.25
Null
C
B
A
?
ltBC,PL1(BC)gt
ltBC,PL2(BC)gt
ltBC,PL3(BC)gt
ltBC,PL4(BC)gt
0.16
?
?
ltAB,PL1(AB)gt
ltAB,PL2(AB)gt
ltAB,PL3(AB)gt
ltAC,PL1(AC)gt
ltAC,PL2(AC)gt
ltAC,PL3(AC)gt
0.16
0.25
?
0.25
?
?
0.33
0.25
?
0.16
?
?
0.25
?
0.25
0.16
Key Idea

Enumerate Entire Pattern Space.

Expensive !
?
ltABC,PL1(ABC)gt
ltABC,PL2(ABC)gt
ltABC,PL3(ABC)gt
0.16

Examine each pattern and prune.

?
0.25
?
0.25
Compute Neighborhoods
?
Pruned RCP
Identify candidate RCP instance
?
Accepted RCP
16
Performance Tuning Key Ideas
Key Idea

Interest Measure shows special pruning
properties in certain subsets of the spatial
framework.

Maximal Locality
Key Properties

Collection of connected instances.
Maximal localities are mutually disjoint.
Contains several RCPs.

Key Observations

RPI shows anti-monotonicity property within
Maximal Localities
Pruning a co-location, AB, prunes all its
super sets (e.g. ABC, ABCDetc.).

RPI within a Maximal locality is an upper bound
to RPI of constituent Prevalence localities.

17
Performance Tuning
Prevalence Threshold 0.25
Null
C
B
A
ML1
ML2
ML3
AB,0.167
BC,0.167
AC,0.25
AB,0.25
BC,0.33
AC,0.25
?
?
No RCP
No RCP
?
ltBC,PL3(BC)gt,0.167
ltAC,PL1(AC)gt,0.25
?
ltBC,PL4(BC)gt,0.167
Completeness
ABC Pruned Automatically

Pruning a pattern within a maximal locality does
not prune any valid RCPs.

Compute Maximal Locality
Correctness
Due to upper bound property of RPI

Accepting a pattern involves additional checks
so that only prevalent RCPs are reported.

Due to anti-monotonicity of RPI
18
Experimental Evaluation Setup
Real Crime Dataset
Experiment Design
Experimental Parameters

Lincoln, NE, Years 2002 2007, 40 different
Crime Types and other spatial features.

Effect on run time/ Patterns for different
parameters
Spatial Neighborhood size.
Feature types
Prevalence threshold, Data Size

Experimental Platform

Macintosh running OS-X 10.6, 16GB main memory,
2.26 GHz dual quad core Intel Xeon processors
Implementation C

19
Experimental Evaluation Spatial Neighborhood Size

What is the effect of spatial neighborhood size
on performance of different algorithms ?
Fixed Parameters Dataset Size 7498 instances
Features 5 Prevalence Threshold 0.07

of RCPs
Run Time
Trends

Run Time ML Pruning out performs PS Enumeration
by a factor of 1.5 - 5
of RCPs examined ML Pruning out performs PS
Enumeration by a factor of 4.13 - 19

20
Experimental Evaluation Feature Types

What is the effect of number of feature types
on performance of different algorithms ?
Fixed Parameters Dataset Size 7498 instances
Spatial neighborhood size 800 feet Prevalence
Threshold 0.07

of RCPs
Run Time
Trends

Run Time ML Pruning out performs PS Enumeration
by a factor of 1.2
of RCPs examined ML Pruning out performs PS
Enumeration by a factor of 1.6 3.5

21
Experimental Evaluation Prevalence Threshold

What is the effect of prevalence threshold on
performance of different algorithms ?
Fixed Parameters Dataset Size 7498 instances
Features 5 Spatial Neighborhood Size 700
feet.

Run Time
of RCPs
Trends

Run Time ML Pruning out performs PS Enumeration
by a factor of 1.4 - 105
of RCPs examined ML Pruning out performs PS
Enumeration by a factor of 0 Evaluations - 18

22
Real Dataset Case study
Q Where do assaults frequently occur around bars
? Are there other factors ?
Dataset Lincoln, NE, Crime data (Winter 07),
Neighborhood Size 0.25 miles, Prevalence
Threshold 0.07
RCP of Larceny, Bars and Assaults
RCP of Bar and Assaults
RCP of Larceny and Assaults
Observations

Assaults are more likely to be found in areas
reporting larceny (e.g. 47.6 vs 21.1) Crimes.

Bars in Downtown are more likely to be crime
prone than bars in other areas (e.g. 21.1, 20.1
)

23
Conclusion and Future work

Conclusions
Neighborhood based formulation of Regional
Spatial Patterns.
Regional Participation Index Measures the local
fraction of the global count.
Vector representation for Prevalence Localities
(other representations possible, convex for
simplicity)
Future Work
Other representations for prevalence localities.
Statistical interpretation LISA statistics /
variants of Local Ripleys K , multiple
hypothesis testing.
Interpretation using predictive methods (e.g.
Geographically Weighted Regression)

Acknowledgement
Reviewers of ACM GIS
Members of the Spatial database and spatial data
mining group, UMN.
U.S. Department of Defense.
Mr. Tom Casady and Kim Koffolt.

Write a Comment

User Comments (0)

About PowerShow.com

A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results - PowerPoint PPT Presentation

A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results

... (e.g. Geographically Weighted Regression) Acknowledgement: Reviewers of ACM GIS Members of the Spatial database and spatial data mining group, UMN. – PowerPoint PPT presentation