A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results

Description:

... (e.g. Geographically Weighted Regression) Acknowledgement: Reviewers of ACM GIS Members of the Spatial database and spatial data mining group, UMN. – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results


1
A spatial neighborhood graph approach to Regional
Co-location pattern discovery summary of results
  • Pradeep Mohan, Shashi Shekhar, Zhe Jiang
  • University of Minnesota, Twin-Cities, MN

James A. Shine, James P. Rogers, Nicole Wayant US
Army- ERDC, Topographic Engineering Center,
Alexandria, VA
Contact mohan_at_cs.umn.edu
2
Outline
  • Motivation
  • Problem Formulation
  • Computational Approach
  • Conclusions and Future work

3
Motivation Spatial Heterogeneity, the second law
of Geography
Spatial Heterogeneity (Goodchild, 2004 Goodchild
2003)
  • Expectations vary across space.
  • Global models may not explain locally observed
    phenomena.
  • Need for place based analysis.

Spatial Heterogeneity in Retail
  • Traditional Data Mining Which pair of items
    sell together frequently ?
  • Ans Diaper in Transaction ? Beer in
    Transaction.
  • Is this association true every where ?

Answer Blue Collar neighborhoods
Global Spatial Data Mining Global Co-location
patterns
  • Which pairs of spatial features are located
    together frequently ?

Example Gas stations and Convenience Stores
Our Focus
  • Where do certain pairs of spatial features
    co-locate frequently ?

Example Assaults happen frequently around
downtown bars.
4
Applications
  • Crime analysis
  • Localizing frequent crime patterns,
    Opportunities for crime vary across space!

Question Do downtown bars often lead to
assaults more frequently ?
  • Public Health
  • Localizing elevated disease risks around
    putative sources (e.g. mining areas)

Courtsey www.amazon.com
Question Where does high asbestos concentration
often lead to lung cancer ?
  • Ecology
  • Localizing symbiotic relationships between
    different species of plants / animals.

Question Where are Plover birds frequently
found in the vicinity of a crocodile ?
  • Courtsey www.startribune.com

Predicting localities of the next crime.
5
Regional co-location patterns (RCP)
  • Input Spatial Features, Crime Reports.
  • Output RCP (e.g. lt (Bar, Assaults), Downtown gt)
  • Subsets of spatial features.
  • Frequently located in certain regions of a study
    area.

6
Outline
  • Motivation
  • Problem Formulation
  • Basic Concepts
  • Problem Statement
  • Challenges
  • Related Work
  • Computational Approach
  • Conclusions and Future work

7
Basic Concepts Neighborhoods
Prevalence locality
  • Subsets of spatial framework containing
    instances of a Pattern.
  • Simple representation to visualize Convex Hull
  • Other Representations possible.

Neighborhood Graph
  • Given A Spatial Neighbor Relation (spatial
    neighborhood size)
  • Nodes Individual event instances
  • Edges Presence (If neighbor relation is
    satisfied)
  • Based on Event Centric Model (Huang , 2004)

8
Basic Concepts Quantifying regional
interestingness
  • Conditional probability of observing a pattern
    instance within a locality given an instance of a
    feature within that locality.

Regional Participation Ratio
Example
Regional Participation index
Quantifies the local fraction participating in a
relationship.
Example
9
Detailed Statement
Prevalence Threshold 0.25 Spatial
neighborhood Size 1 Mile
  • Given
  • A spatial framework,
  • A collection of boolean spatial event types and
    their instances.
  • A minimum interestingness threshold, P?
  • A symmetric and transitive neighbor relation R
    (e.g. based on Spatial neighborhood size)
  • Find
  • All RCPs with prevalence gt P?
  • Objective
  • Minimize computational cost.
  • Constraints
  • Spatial framework is Heterogeneous.
  • Interest measure captures spatial
    heterogeneity.
  • Completeness All prevalent RCPs are reported.
  • Correctness Only prevalent RCPs are reported.

10
Challenges
  • Conflicting Requirements
  • Interest measure captures spatial heterogeneity
    while supporting scalable algorithms.
  • Exponential search space.
  • Candidate pattern set cardinality is exponential
    in the number of event types.

Illustration
11
Challenges
  • Conflicting Requirements
  • Interest measure captures spatial heterogeneity
    while supporting scalable algorithms.
  • Exponential search space.
  • Candidate pattern set cardinality is exponential
    in the number of event types.

Illustration
n Patterns O(2M)
3 4 k123 (k1 gt0)
4 11 k224 (k2 gt0)
5 26 k325 (k3 gt0)
6 57 k426 (k4 gt0)
12
Contributions
  • Regional Co-location Patterns
  • Neighborhood based Formulation
  • Interest Measure
  • Captures the local fraction of events
    participating in patterns.
  • Shows attractive computational properties,
    Honors spatial heterogeneity.
  • Computational Approach
  • Computational Structure Pattern Space
    Enumeration
  • Performance Enhancement- Maximal locality based
    Pruning Strategies
  • Experimental Evaluation
  • Performance Evaluation using real datasets,
    Lincoln, NE
  • Real world case study.

13
Related Work
Approaches for Regional Co-location Pattern
discovery
Spatial Neighborhood based
Fitness function Clustering (Eick et al., 2008)
Zoning Based (Celik et al., 2007)
Our Work
Zoning Based
Fitness Function Clustering
  • Reports one pattern per interesting region based
    on a criterion (e.g. Max)
  • Computational structure and pruning strategies
    not explored.
  • Clustering is based on real valued attributes.

14
Outline
  • Motivation
  • Problem Formulation
  • Computational Approach
  • Pattern Space Enumeration
  • Performance Tuning
  • Experimental Evaluation
  • Conclusions and Future work

15
Computational Approach
Prevalence Threshold 0.25
Null
C
B
A
?
ltBC,PL1(BC)gt
ltBC,PL2(BC)gt
ltBC,PL3(BC)gt
ltBC,PL4(BC)gt
0.16
?
?
ltAB,PL1(AB)gt
ltAB,PL2(AB)gt
ltAB,PL3(AB)gt
ltAC,PL1(AC)gt
ltAC,PL2(AC)gt
ltAC,PL3(AC)gt
0.16
0.25
?
0.25
?
?
0.33
0.25
?
0.16
?
?
0.25
?
0.25
0.16
Key Idea
  • Enumerate Entire Pattern Space.

Expensive !
?
ltABC,PL1(ABC)gt
ltABC,PL2(ABC)gt
ltABC,PL3(ABC)gt
0.16
  • Examine each pattern and prune.

?
0.25
?
0.25
Compute Neighborhoods
?
Pruned RCP
Identify candidate RCP instance
?
Accepted RCP
16
Performance Tuning Key Ideas
Key Idea
  • Interest Measure shows special pruning
    properties in certain subsets of the spatial
    framework.

Maximal Locality
Key Properties
  • Collection of connected instances.
  • Maximal localities are mutually disjoint.
  • Contains several RCPs.

Key Observations
  • RPI shows anti-monotonicity property within
    Maximal Localities
  • Pruning a co-location, AB, prunes all its
    super sets (e.g. ABC, ABCDetc.).
  • RPI within a Maximal locality is an upper bound
    to RPI of constituent Prevalence localities.

17
Performance Tuning
Prevalence Threshold 0.25
Null
C
B
A
ML1
ML2
ML3
AB,0.167
BC,0.167
AC,0.25
AB,0.25
BC,0.33
AC,0.25
?
?
No RCP
No RCP
?
ltBC,PL3(BC)gt,0.167
ltAC,PL1(AC)gt,0.25
?
ltBC,PL4(BC)gt,0.167
Completeness
ABC Pruned Automatically
  • Pruning a pattern within a maximal locality does
    not prune any valid RCPs.

Compute Maximal Locality
Correctness
Due to upper bound property of RPI
  • Accepting a pattern involves additional checks
    so that only prevalent RCPs are reported.

Due to anti-monotonicity of RPI
18
Experimental Evaluation Setup
Real Crime Dataset
Experiment Design
Experimental Parameters
  • Lincoln, NE, Years 2002 2007, 40 different
    Crime Types and other spatial features.
  • Effect on run time/ Patterns for different
    parameters
  • Spatial Neighborhood size.
  • Feature types
  • Prevalence threshold, Data Size

Experimental Platform
  • Macintosh running OS-X 10.6, 16GB main memory,
    2.26 GHz dual quad core Intel Xeon processors
  • Implementation C

19
Experimental Evaluation Spatial Neighborhood Size
  • What is the effect of spatial neighborhood size
    on performance of different algorithms ?
  • Fixed Parameters Dataset Size 7498 instances
    Features 5 Prevalence Threshold 0.07

of RCPs
Run Time
Trends
  • Run Time ML Pruning out performs PS Enumeration
    by a factor of 1.5 - 5
  • of RCPs examined ML Pruning out performs PS
    Enumeration by a factor of 4.13 - 19

20
Experimental Evaluation Feature Types
  • What is the effect of number of feature types
    on performance of different algorithms ?
  • Fixed Parameters Dataset Size 7498 instances
    Spatial neighborhood size 800 feet Prevalence
    Threshold 0.07

of RCPs
Run Time
Trends
  • Run Time ML Pruning out performs PS Enumeration
    by a factor of 1.2
  • of RCPs examined ML Pruning out performs PS
    Enumeration by a factor of 1.6 3.5

21
Experimental Evaluation Prevalence Threshold
  • What is the effect of prevalence threshold on
    performance of different algorithms ?
  • Fixed Parameters Dataset Size 7498 instances
    Features 5 Spatial Neighborhood Size 700
    feet.

Run Time
of RCPs
Trends
  • Run Time ML Pruning out performs PS Enumeration
    by a factor of 1.4 - 105
  • of RCPs examined ML Pruning out performs PS
    Enumeration by a factor of 0 Evaluations - 18

22
Real Dataset Case study
Q Where do assaults frequently occur around bars
? Are there other factors ?
Dataset Lincoln, NE, Crime data (Winter 07),
Neighborhood Size 0.25 miles, Prevalence
Threshold 0.07
RCP of Larceny, Bars and Assaults
RCP of Bar and Assaults
RCP of Larceny and Assaults
Observations
  • Assaults are more likely to be found in areas
    reporting larceny (e.g. 47.6 vs 21.1) Crimes.
  • Bars in Downtown are more likely to be crime
    prone than bars in other areas (e.g. 21.1, 20.1
    )

23
Conclusion and Future work
  • Conclusions
  • Neighborhood based formulation of Regional
    Spatial Patterns.
  • Regional Participation Index Measures the local
    fraction of the global count.
  • Vector representation for Prevalence Localities
    (other representations possible, convex for
    simplicity)
  • Future Work
  • Other representations for prevalence localities.
  • Statistical interpretation LISA statistics /
    variants of Local Ripleys K , multiple
    hypothesis testing.
  • Interpretation using predictive methods (e.g.
    Geographically Weighted Regression)
  • Acknowledgement
  • Reviewers of ACM GIS
  • Members of the Spatial database and spatial data
    mining group, UMN.
  • U.S. Department of Defense.
  • Mr. Tom Casady and Kim Koffolt.
Write a Comment
User Comments (0)
About PowerShow.com