Title: A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results
1A spatial neighborhood graph approach to Regional
Co-location pattern discovery summary of results
- Pradeep Mohan, Shashi Shekhar, Zhe Jiang
- University of Minnesota, Twin-Cities, MN
James A. Shine, James P. Rogers, Nicole Wayant US
Army- ERDC, Topographic Engineering Center,
Alexandria, VA
Contact mohan_at_cs.umn.edu
2Outline
- Motivation
- Problem Formulation
- Computational Approach
- Conclusions and Future work
3Motivation Spatial Heterogeneity, the second law
of Geography
Spatial Heterogeneity (Goodchild, 2004 Goodchild
2003)
- Expectations vary across space.
- Global models may not explain locally observed
phenomena. - Need for place based analysis.
Spatial Heterogeneity in Retail
- Traditional Data Mining Which pair of items
sell together frequently ? - Ans Diaper in Transaction ? Beer in
Transaction.
- Is this association true every where ?
Answer Blue Collar neighborhoods
Global Spatial Data Mining Global Co-location
patterns
- Which pairs of spatial features are located
together frequently ?
Example Gas stations and Convenience Stores
Our Focus
- Where do certain pairs of spatial features
co-locate frequently ?
Example Assaults happen frequently around
downtown bars.
4Applications
- Crime analysis
- Localizing frequent crime patterns,
Opportunities for crime vary across space!
Question Do downtown bars often lead to
assaults more frequently ?
- Public Health
- Localizing elevated disease risks around
putative sources (e.g. mining areas)
Courtsey www.amazon.com
Question Where does high asbestos concentration
often lead to lung cancer ?
- Ecology
- Localizing symbiotic relationships between
different species of plants / animals.
Question Where are Plover birds frequently
found in the vicinity of a crocodile ?
- Courtsey www.startribune.com
Predicting localities of the next crime.
5Regional co-location patterns (RCP)
- Input Spatial Features, Crime Reports.
- Output RCP (e.g. lt (Bar, Assaults), Downtown gt)
- Subsets of spatial features.
- Frequently located in certain regions of a study
area.
6Outline
- Motivation
- Problem Formulation
- Basic Concepts
- Problem Statement
- Challenges
- Related Work
- Computational Approach
- Conclusions and Future work
7Basic Concepts Neighborhoods
Prevalence locality
- Subsets of spatial framework containing
instances of a Pattern. - Simple representation to visualize Convex Hull
- Other Representations possible.
Neighborhood Graph
- Given A Spatial Neighbor Relation (spatial
neighborhood size) - Nodes Individual event instances
- Edges Presence (If neighbor relation is
satisfied) - Based on Event Centric Model (Huang , 2004)
8Basic Concepts Quantifying regional
interestingness
- Conditional probability of observing a pattern
instance within a locality given an instance of a
feature within that locality.
Regional Participation Ratio
Example
Regional Participation index
Quantifies the local fraction participating in a
relationship.
Example
9Detailed Statement
Prevalence Threshold 0.25 Spatial
neighborhood Size 1 Mile
- Given
- A spatial framework,
- A collection of boolean spatial event types and
their instances. - A minimum interestingness threshold, P?
- A symmetric and transitive neighbor relation R
(e.g. based on Spatial neighborhood size)
- Find
- All RCPs with prevalence gt P?
- Objective
- Minimize computational cost.
- Constraints
- Spatial framework is Heterogeneous.
- Interest measure captures spatial
heterogeneity. - Completeness All prevalent RCPs are reported.
- Correctness Only prevalent RCPs are reported.
10Challenges
- Conflicting Requirements
- Interest measure captures spatial heterogeneity
while supporting scalable algorithms. - Exponential search space.
- Candidate pattern set cardinality is exponential
in the number of event types.
Illustration
11Challenges
- Conflicting Requirements
- Interest measure captures spatial heterogeneity
while supporting scalable algorithms. - Exponential search space.
- Candidate pattern set cardinality is exponential
in the number of event types.
Illustration
n Patterns O(2M)
3 4 k123 (k1 gt0)
4 11 k224 (k2 gt0)
5 26 k325 (k3 gt0)
6 57 k426 (k4 gt0)
12Contributions
- Regional Co-location Patterns
- Neighborhood based Formulation
- Interest Measure
- Captures the local fraction of events
participating in patterns. - Shows attractive computational properties,
Honors spatial heterogeneity. - Computational Approach
- Computational Structure Pattern Space
Enumeration - Performance Enhancement- Maximal locality based
Pruning Strategies - Experimental Evaluation
- Performance Evaluation using real datasets,
Lincoln, NE - Real world case study.
13Related Work
Approaches for Regional Co-location Pattern
discovery
Spatial Neighborhood based
Fitness function Clustering (Eick et al., 2008)
Zoning Based (Celik et al., 2007)
Our Work
Zoning Based
Fitness Function Clustering
- Reports one pattern per interesting region based
on a criterion (e.g. Max) - Computational structure and pruning strategies
not explored. - Clustering is based on real valued attributes.
14Outline
- Motivation
- Problem Formulation
- Computational Approach
- Pattern Space Enumeration
- Performance Tuning
- Experimental Evaluation
- Conclusions and Future work
15Computational Approach
Prevalence Threshold 0.25
Null
C
B
A
?
ltBC,PL1(BC)gt
ltBC,PL2(BC)gt
ltBC,PL3(BC)gt
ltBC,PL4(BC)gt
0.16
?
?
ltAB,PL1(AB)gt
ltAB,PL2(AB)gt
ltAB,PL3(AB)gt
ltAC,PL1(AC)gt
ltAC,PL2(AC)gt
ltAC,PL3(AC)gt
0.16
0.25
?
0.25
?
?
0.33
0.25
?
0.16
?
?
0.25
?
0.25
0.16
Key Idea
- Enumerate Entire Pattern Space.
Expensive !
?
ltABC,PL1(ABC)gt
ltABC,PL2(ABC)gt
ltABC,PL3(ABC)gt
0.16
- Examine each pattern and prune.
?
0.25
?
0.25
Compute Neighborhoods
?
Pruned RCP
Identify candidate RCP instance
?
Accepted RCP
16Performance Tuning Key Ideas
Key Idea
- Interest Measure shows special pruning
properties in certain subsets of the spatial
framework.
Maximal Locality
Key Properties
- Collection of connected instances.
- Maximal localities are mutually disjoint.
- Contains several RCPs.
Key Observations
- RPI shows anti-monotonicity property within
Maximal Localities - Pruning a co-location, AB, prunes all its
super sets (e.g. ABC, ABCDetc.).
- RPI within a Maximal locality is an upper bound
to RPI of constituent Prevalence localities.
17Performance Tuning
Prevalence Threshold 0.25
Null
C
B
A
ML1
ML2
ML3
AB,0.167
BC,0.167
AC,0.25
AB,0.25
BC,0.33
AC,0.25
?
?
No RCP
No RCP
?
ltBC,PL3(BC)gt,0.167
ltAC,PL1(AC)gt,0.25
?
ltBC,PL4(BC)gt,0.167
Completeness
ABC Pruned Automatically
- Pruning a pattern within a maximal locality does
not prune any valid RCPs.
Compute Maximal Locality
Correctness
Due to upper bound property of RPI
- Accepting a pattern involves additional checks
so that only prevalent RCPs are reported.
Due to anti-monotonicity of RPI
18Experimental Evaluation Setup
Real Crime Dataset
Experiment Design
Experimental Parameters
- Lincoln, NE, Years 2002 2007, 40 different
Crime Types and other spatial features.
- Effect on run time/ Patterns for different
parameters - Spatial Neighborhood size.
- Feature types
- Prevalence threshold, Data Size
Experimental Platform
- Macintosh running OS-X 10.6, 16GB main memory,
2.26 GHz dual quad core Intel Xeon processors - Implementation C
19Experimental Evaluation Spatial Neighborhood Size
- What is the effect of spatial neighborhood size
on performance of different algorithms ? - Fixed Parameters Dataset Size 7498 instances
Features 5 Prevalence Threshold 0.07
of RCPs
Run Time
Trends
- Run Time ML Pruning out performs PS Enumeration
by a factor of 1.5 - 5 - of RCPs examined ML Pruning out performs PS
Enumeration by a factor of 4.13 - 19
20Experimental Evaluation Feature Types
- What is the effect of number of feature types
on performance of different algorithms ? - Fixed Parameters Dataset Size 7498 instances
Spatial neighborhood size 800 feet Prevalence
Threshold 0.07
of RCPs
Run Time
Trends
- Run Time ML Pruning out performs PS Enumeration
by a factor of 1.2 - of RCPs examined ML Pruning out performs PS
Enumeration by a factor of 1.6 3.5
21Experimental Evaluation Prevalence Threshold
- What is the effect of prevalence threshold on
performance of different algorithms ? - Fixed Parameters Dataset Size 7498 instances
Features 5 Spatial Neighborhood Size 700
feet.
Run Time
of RCPs
Trends
- Run Time ML Pruning out performs PS Enumeration
by a factor of 1.4 - 105 - of RCPs examined ML Pruning out performs PS
Enumeration by a factor of 0 Evaluations - 18
22Real Dataset Case study
Q Where do assaults frequently occur around bars
? Are there other factors ?
Dataset Lincoln, NE, Crime data (Winter 07),
Neighborhood Size 0.25 miles, Prevalence
Threshold 0.07
RCP of Larceny, Bars and Assaults
RCP of Bar and Assaults
RCP of Larceny and Assaults
Observations
- Assaults are more likely to be found in areas
reporting larceny (e.g. 47.6 vs 21.1) Crimes.
- Bars in Downtown are more likely to be crime
prone than bars in other areas (e.g. 21.1, 20.1
)
23Conclusion and Future work
- Conclusions
- Neighborhood based formulation of Regional
Spatial Patterns. - Regional Participation Index Measures the local
fraction of the global count. - Vector representation for Prevalence Localities
(other representations possible, convex for
simplicity) - Future Work
- Other representations for prevalence localities.
- Statistical interpretation LISA statistics /
variants of Local Ripleys K , multiple
hypothesis testing. - Interpretation using predictive methods (e.g.
Geographically Weighted Regression)
- Acknowledgement
- Reviewers of ACM GIS
- Members of the Spatial database and spatial data
mining group, UMN. - U.S. Department of Defense.
- Mr. Tom Casady and Kim Koffolt.