Spatio-temporal%20frequent%20pattern%20mining%20for%20public%20safety:%20Concepts%20and%20Techniques

About This Presentation

Title:

Spatio-temporal%20frequent%20pattern%20mining%20for%20public%20safety:%20Concepts%20and%20Techniques

Description:

Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques Pradeep Mohan* Department of Computer Science University of Minnesota, Twin-Cities – PowerPoint PPT presentation

Number of Views:319

Avg rating:3.0/5.0

Slides: 36

Provided by: Pradee45

Learn more at: https://www.spatial.cs.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Spatio-temporal%20frequent%20pattern%20mining%20for%20public%20safety:%20Concepts%20and%20Techniques

1
Spatio-temporal frequent pattern mining for
public safety Concepts and Techniques

Pradeep Mohan
Department of Computer Science
University of Minnesota, Twin-Cities
Advisor Prof. Shashi Shekhar
Thesis Committee Prof. F. Harvey, Prof. G.
Karypis, Prof. J. Srivastava

Contact mohan_at_cs.umn.edu
2
Biography

Education
Ph.D., Student, Department. of Computer Science
and Engineering., University of Minnesota, MN,
2007 Present.
B. E., Department. of Computer Science and
Engineering, Birla Institute of Technology,
Mesra, Ranchi, India. 2003-2007

Major Projects during PhD
US DoJ/NIJ- Mapping and analysis for Public
Safety
CrimeStat .NET Libaries 1.0 Modularization of
CrimeStat, a tool for the analysis of crime
incidents.
Performance tuning of Spatial analysis routines
in CrimeStat
CrimeStat 3.2a - 3.3 Addition of new modules for
spatial analysis.

US DOD/ ERDC/ TEC Cascade models for multi
scale pattern discovery
Designed new interest measures and formulated
pattern mining algorithms for identifying
patterns from large crime report datasets.

1
3
Thesis Related Publications

Cascading spatio-temporal pattern discovery
(Chapter 2)
P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers.
Cascading spatio-temporal pattern discovery A
summary of results. In Proc. Of 10th SIAM
International Conference on Data Mining 2010 (SDM
2010, Full paper acceptance rate 20)
P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers.
Cascading spatio-temporal pattern discovery. IEEE
Transactions on Knowledge and Data Engineering
(TKDE). (Accepted Regular Paper, In Press 20
Acceptance Rate)

Regional co-location pattern discovery (Chapter
3)
P.Mohan, S.Shekhar, J.A. Shine, J.P. Rogers,
Z.Jiang, N.Wayant. A spatial neighborhood graph
based approach to Regional co-location pattern
discovery summary of results. In Proc. Of 19th
ACM SIGSPATIAL International Conference on
Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full
paper acceptance rate 23)

Crime Pattern Analysis Application (Chapter 4)
S.Shekhar, P. Mohan, D.Oliver, Z.Jiang, X.Zhou.
Crime pattern analysis A spatial frequent
pattern mining approach. M. Leitner (Ed.), Crime
modeling and mapping using Geospatial
Technologies, Springer (Accepted with Revisions).

2
4
Other Publications

Spatio-temporal data analysis
X.Zhou, S.Shekhar, P. Mohan, S. Leiss, P. Snyder.
Discovering Interesting sub-paths in
spatiotemporal datasets. In Proc. Of 19th ACM
SIGSPATIAL International Conference on Advances
in GIS 2011 (ACM SIGSPATIAL 2011, Full paper
acceptance rate 23)

Spatial data analysis
P. Mohan, R. E. Wilson, S.Shekhar, B.George,
N.Levine, M.Celik Should SDBMS support a join
index? a case study from CrimeStat. In Proc. Of
16th ACM SIGSPATIAL International Conference on
Advances in GIS 2008 (ACM SIGSPATIAL 2008, Full
paper acceptance rate 19)
P. Mohan, X. Zhou, S.Shekhar. Quantifying
resolution sensitivity of spatial
autocorrelation A Resolution Correlogram
approach. In Proc. Of 7th International
Conference on Geographic Information Science,
2012 (GIScience 2012, Full paper)
S.Shekhar, M.R.Evans, J.M.Kang, P. Mohan.
Identifying patterns in spatial information A
survey of methods. (Accepted) WIREs Data Mining
and Knowledge Discovery, Wiley Interdisciplinary
Reviews, John Wiley and Sons, Inc, 2011 (in
press)

3
5
Outline

Introduction
Motivation

Problem Statement

Our Approach

Future Work

4
6
Motivation Public Safety

Crime generators and attractors

Identifying events (e.g. Bar closing, football
games) that lead to increased crime.

Question What / Where are the frequent crime
generators ?

Identifying frequent crime hotspots
Law enforcement planning

Courtsey www.startribune.com

Predicting the next location of burglary.
Question Where are the crime hotspots ?

Predicting crime events
Predictive policing (e.g. Predict next location
of offense, forecast crime levels around
conventions etc.)

Question What are the crime levels 1 hour after
a football game within a radius of 1 mile ?

Courtsey https//www.llnl.gov/str/September02/Ha
ll.html

Other Applications Epidemiology
5
7
Scientific Domain Environmental Criminology
Routine activity theory and Crime Triangle
Crime pattern theory
Courtsey http//www.popcenter.org/learning/60step
s/index.cfm?stepnum8
Courtsey http//www.popcenter.org/learning/60step
s/index.cfm?stepNum16

Crime Event Motivated offender, vulnerable
victim (available at an appropriate location and
time), absence of a capable guardian.

Crime Generators offenders and targets come
together in time place, large gatherings (e.g.
Bars, Football games)

Crime Attractors places offering many
criminal opportunities and offenders may relocate
to these areas (e.g. drug areas)

6
8
Outline

Introduction

Problem Statement
Spatio-temporal frequent pattern mining problem
Challenges

Our Approach

Future Work

7
9
Spatio-temporal frequent pattern mining problem

Given
Spatial / Spatio-temporal framework.
Crime Reports with type, location and / or time.
Spatial Features of interest (e.g. Bars).
Interest measure threshold (P?)
Spatial / Spatio-temporal neighbor relation.
Find
Frequent patterns with interestingness gt P?
Objective
Minimize computation costs.
Constraints
Correctness and Completeness.
Statistical Interpretation (i.e. account for
autocorrelation or heterogeneity)

8
10
Illustration Output
Regional Co-location patterns (Inputs Spatial
Neighborhood 1 mile, Threshold- 0.25)
9
11
Challenges
Time partitioning misses relationships

Spatio-temporal Semantics
Continuity of space / time
Partial order
Conflicting Requirements
Statistical Interpretation
Computational Scalability
Computational Cost
Exponential set of Candidate patterns

Space partitioning misses relationships
Patterns Exponential ( event types)
10
12
Our Contributions

New Spatio-temporal frequent pattern families.
Ex Cascading ST Patterns and Regional
Co-location patterns.

Novel interest measures guarantee statistical
interpretation and computable in polynomial time.

Scalable algorithms based on properties of
spatio-temporal data and interest measures.

Experimental evaluation using synthetic and real
crime datasets.

11
13
Outline

Introduction

Problem Statement

Our Approach
Big Picture
Cascading Spatio-temporal pattern discovery
Other Frequent Pattern Families

Future Work

12
14
Spatio-temporal frequent pattern mining (SFPM)
Process of discovering interesting, useful and
non-trivial patterns from spatiotemporal data.
Taxonomy of Spatio-temporal frequent patterns
Input Data Input Data
Spatial Spatio-temporal (ST)
Pattern Semantics Unordered Co-location Patterns ST Co-occurrences
Pattern Semantics Totally Ordered X ST Sequences
Pattern Semantics Partially Ordered X Cascading ST Patterns
Statistical Foundation Autocorrelation Co-location Patterns Cascading ST Patterns
Statistical Foundation Heterogeneity Regional Co-location Patterns X
X Unexplored
Todays Focus
13
15
Cascading ST pattern (CSTP)

Input Crime reports with location and time.

Output CSTP
Partially ordered subsets of ST event types.
Located together in space.
Occur in stages over time.

14
16
Related Pattern Semantics ST Data mining

ST Co-occurrence Celik et al. 2008, Cao et al.
2006
Designed for moving object datasets by treating
trajectories as location time series
Performs partitioning over space and time.

ST Sequence Huang et al. 2008, Cao et al. 2005
Totally ordered patterns modeled as a chain.
Does not account for multiply connected
patterns(e.g. nonlinear)
Misses non-linear semantics.
No ST statistical interpretation.

16
15
17
Limitations of Related ST Pattern Semantics

ST Sequence
Total order
Ex. (B?A,A?C)
No ST statistical interpretation.

Limitations
Absence of Partial Order
Ex. (B?A, B?C, A?C)

16
18
Interpretation Model Directed Neighbor Graph
(DNG)

Nodes Individual Events
Directed Edge (N1 ? N2) iff
Neighbor( N1, N2)
and
After(N2, N1)

17
19
Statistical Foundation Interest Measures

Instances of CSTP P1 (B?A, B?C, A?C) are
(B1?A1, B1?C1, A1?C1)
(B1?A3, B1?C2, A3?C2)
? ?(B1?A1 A1? C2 B1 ? C2)
Cascade Participation Ratio CPR (CSTP, M)
Conditional Probability of an instance of CSTP
in neighborhood, given an instance of event-type
M
Examples
Cascade Participation Index CPI(CSTP)
Min ( CPR(CSTP, M) ) over all M in CSTP
Example

18
20
Analytical Evaluation Statistical Interpretation
Spatial Statistics ST K-Function (Diggle et al.
1995)

Cascade Participation Index (CPI) is an upper
bound to the ST K-Function per unit volume.

Example
ST -K (B ? A) 2/6 0.33 3/6 0.5 6/6 1
CPI (B ? A) 2/3 0.66 1 1
20
21
Comparison with Related Interest Measures
Measure Key Property
Frequency Double counting of pattern instances
Maximum Independent Set (MIS) Size Kuramochi and Karypis, 2004 NP Complete
Scoring Criterion for Bayesian Networks Neopolitan, 2003 Chickering, 1996 NP Complete Learning requires Prior specification
Lower bound on vertex label frequency Frequency based interpretation.
Measure Value
Frequency 3 / (What is the of transactions ?)
MIS 2
Lower Bound on Frequency min1,2,2 1
19
22
Computational Structure CSTP Miner Algorithm

Basic Idea

Initialization

for k in (1,23..K-1) and prevalent CSTP found
do

Generate size k candidates.

Compute CSTP instances / Materialize part of DNG

Calculate interest measure and select prevalent
CSTPs.

Item sets in Association rule mining
Chemical compounds/sub graphs in graph mining.
Directed acyclic graph in CSTP mining

Not part of a conventional apriori setting
21
23
CSTP Miner Algorithm Illustration
CPI Threshold 0.33
Null
C.2
0
0.4
0.8
0.75
0.2
0
A.1
B.1
C.3
A.3
0.4
0.4
0.8
C.4
C.1
A.5
B.2
A.2
0.4
A.4
Spatio-temporal join
22
24
Computational Structure CSTP Miner Algorithm
Fixed Parameters Spatial neighborhood 0.62
miles and temporal neighborhood 1hr, CPI
threshold 0.0055

Key Bottlenecks

Interest measure evaluation

Exponential pattern space

Computational Strategies

Reduce irrelevant interest measure evaluation

Filtering strategies

Compute interest measure efficiently

Time Ordered Nested Loop Strategy

Space-Time Partition Join Strategy

23
25
CSTP Miner Algorithm Interest Measure Evaluation

ST Join Strategies Perform each interest
measure computation efficiently
Time Ordered Nested Loop (TONL) Strategy
Space-Time Partitioning (STP) Strategy

volume of ST neighborhood
C.2
A.1
B.1
C.3
A.3
ST join by plane sweep
Space
C.4
C.1
A.5
A.2
B.2
A.4
Time
Edges 13
24
26
CSTP Miner Algorithm Alternative Ideas

Can neighborhood graph be pre-computed ?

Trade off Storage versus Online computation
Cost of Storage
Pre-computed Graph O(EdgesNodes)
Example 24
On-the-fly O(Nodes)
Example 11
Cost of computation
Pre-computed graph O(EdgesNodes)
Example 24
On-the-fly O(Nodes Log(Nodes))
Example 38

Other factors
Dense vs Sparse data
Positive ST autocorrelation

25
27
CSTP Miner Algorithm Filtering Strategies

Key Rationale Enhance Savings filter non
prevalent candidates early

Upper bound (UB) filter

Key Idea
CPI has anti-monotone upper bound.

Multi-resolution ST(MST) filter

Key Idea
There exists a low dimensional embedding in
space and time.
Over estimate CPI by coarsening ST dataset.
If Overestimate (CPI) lt Threshold Pruned

26
28
CSTP Miner Algorithm Filtering Strategies

Multi resolution ST Filter

Summarizing on a coarser neighborhood yields
compression in most cases.
CPI Threshold 0.33
B?A B?C A?C C?A
B.1 A.1 B.1 C.2 A.1 C.2 C.1 A.5
B.1 A.3 B.1 C.3 A.3 C.3
B.2 A.2 B.2 C.1 A.1 C.3
B.2 A.4 A.3 C.4
0.8 0.75 0.4 0.2
B?A B?C A?C C?A
(0,0) (1,0) (0,2) (1,2) (1,2)(1,2) (1,1)(2,0)
(0,2) (1,2) (0,0)(1,1) (1,0)(1,1) (2,1)(2,0)
(1,2)(2,1)
(1,0)(2,1)
0.8 0.75 0.8 0.2
27
29
Experimental Evaluation Experiment Setup
Goals 1. Compare different design decisions of
the CSTPM Algorithm - Performance
Run-time 2. Test effect of parameters on
performance - Number of event types,
Dataset Size, Clumpiness Degree Experiment
Platform CPU 3.2GHz, RAM 32GB, OS Linux,
Matlab 7.9
28
30
Experimental Evaluation Datasets
Lincoln, NE Dataset
Real Data

Data size 5 datasets
Drawn by increments of 2 months
5000- 33000 instances
Event types
Drawn by increments of 5 event types
5 25 event types.

Synthetic Data

Data size 5 datasets
5000- 26000 instances
Event types
5 25 event types.
Clumpiness Degree
5- 25 instances per event type per cell.

29
31
Experimental Evaluation Join strategy performance
Question What is the effect of dataset size on
performance of join strategies?
Fixed Parameters Real Data (CPI 0.15, 0.31
Miles, 10 Days) Synthetic data(0.5,25,25)
Trends ST Partitioning improves performance by a
factor of 5-10 on synthetic data and by a factor
of 3 on real data.
30
32
Experimental Evaluation Join strategy performance
Experiment 2 What is the effect of of event
types on performance of join strategies?
Fixed Parameters Real Data (CPI 0.15, 0.31
Miles, 10 Days) Synthetic data(0.5,25,25)
Trends ST Partitioning improves performance by a
factor 10 on synthetic data and by a factor of
2.5 on real data.
31
33
Experimental Evaluation Filtering strategy
performance
Experiment 3 What is the effect of dataset size
on performance of filtering strategies?
Fixed Parameters Real Data (CPI 0.15, 0.435
Miles, 10 Days) Synthetic data(0.65,70,70)
Trends Filtering improves performance by a
factor 5 on synthetic data and by a factor of 1.5
on real data.
32
34
Experimental Evaluation Filtering strategy
performance
Question What is the effect of of event types
on performance of filtering strategies?
Fixed Parameters Real Data (CPI 0.15, 0.435
Miles, 10 Days) Synthetic data(0.65,70,70)
Trends Filtering improves performance by a
factor 2.5 on synthetic data and by a factor of
1.3 on real data.
33
35
Experimental Evaluation Filtering strategy
performance
Question What is the effect of clumpiness
degree on different design decisions?
Fixed Parameters CPI 0.5, 15.53 Miles, 1.04
Days

Trends
Filtering improves performance by a factor 40
ST Partitioning improves performance by a factor
of 10.

34
36
Lincoln, NE crime dataset Case study

Is bar closing a generator for crime related
CSTP ?

Bar locations in Lincoln, NE
Questions

Observation Crime peaks around bar-closing!

Is bar closing a crime generator ?
Are there other generators (e.g. Saturday Nights
)?

K.S Test Saturday night significantly different
than normal day bar closing (P-value 1.249x10-7
, K 0.41)
35
37
Lincoln, NE crime dataset Case study
36
38
Lincoln, NE crime dataset Case study
Pop I Pop II KS P-Val. a 0.05 a 0.2
Sat Night All Year 0.4187 1.249x10-7 Yes Yes
Football Night All Year 0.3400 0.1067 NO Yes
Sat Night Football Night 0.1987 0.7899 NO No
37
39
Outline

Introduction

Problem Statement

Our Approach
Big Picture
Cascading Spatio-temporal pattern discovery
Other Frequent Pattern Families

Future Work

38
40
Regional co-location patterns (RCP)

Input Spatial Features, Crime Reports.
Output RCP (e.g. lt (Bar, Assaults), Downtown gt)
Subsets of spatial features.
Frequently located in certain regions of a study
area.

39
41
Statistical Foundation Accounting for
Heterogenity

Conditional probability of observing a pattern
instance within a locality given an instance of a
feature within that locality.

Regional Participation Ratio
Example
Regional Participation index
Example
Quantifies the local fraction participating in a
relationship.
40
42
Performance Tuning Key Ideas
Key Idea

Interest Measure shows special pruning
properties in certain subsets of the spatial
framework.

Maximal Locality
Key Properties

Collection of connected instances.
Maximal localities are mutually disjoint.
Contains several RCPs.

Key Observations

RPI shows anti-monotonicity property within
Maximal Localities
Pruning a co-location, AB, prunes all its
super sets (e.g. ABC, ABCDetc.).

RPI within a Maximal locality is an upper bound
to RPI of constituent Prevalence localities.

43
Performance Tuning
Prevalence Threshold 0.25
Null
C
B
A
ML1
ML2
ML3
AB,0.167
BC,0.167
AC,0.25
AB,0.25
BC,0.33
AC,0.25
?
?
No RCP
No RCP
?
ltBC,PL3(BC)gt,0.167
ltAC,PL1(AC)gt,0.25
?
ltBC,PL4(BC)gt,0.167
Completeness
ABC Pruned Automatically

Pruning a pattern within a maximal locality does
not prune any valid RCPs.

Compute Maximal Locality
Correctness
Due to upper bound property of RPI

Accepting a pattern involves additional checks
so that only prevalent RCPs are reported.

Due to anti-monotonicity of RPI
44
Experimental Evaluation Spatial Neighborhood Size

What is the effect of spatial neighborhood size
on performance of different algorithms ?
Fixed Parameters Dataset Size 7498 instances
Features 5 Prevalence Threshold 0.07

of RCPs
Run Time
Trends

Run Time ML Pruning out performs PS Enumeration
by a factor of 1.5 - 5
of RCPs examined ML Pruning out performs PS
Enumeration by a factor of 4.13 - 19

45
Experimental Evaluation Feature Types

What is the effect of number of feature types
on performance of different algorithms ?
Fixed Parameters Dataset Size 7498 instances
Spatial neighborhood size 800 feet Prevalence
Threshold 0.07

of RCPs
Run Time
Trends

Run Time ML Pruning out performs PS Enumeration
by a factor of 1.2
of RCPs examined ML Pruning out performs PS
Enumeration by a factor of 1.6 3.5

46
RCPs from Lincoln Crime Data
This result shows the interaction between Alcohol
and Vandalism apart from highlighting outbreaks
41
47
Conclusions

Proposed SFPM techniques (e.g., Cascading ST
Patterns and Regional Co-location patterns) honor
ST Semantics (e.g., Partial order, Continuity).
Interest measures achieve a balance between
statistical interpretation and computational
scalability.
Algorithmic strategies exploiting properties of
ST data (e.g., multiresolution filter) and
properties of interest measures enhance
computational savings.

42
48
Future Work Short and Medium Term
X Unexplored
Input Data Input Data
Spatial Spatio-temporal (ST)
Pattern Semantics Unordered ? ?
Pattern Semantics Totally Ordered X ?
Pattern Semantics Partially Ordered X CSTP discovery
Statistical Foundation Autocorrelation ? CSTP discovery
Statistical Foundation Heterogeneity RCP Discovery X
Underlying Framework Euclidean RCP Discovery CSTP discovery
Underlying Framework Non-Euclidean (Networks) X X
Neighbor Relation User specified RCP Discovery CSTP discovery
Neighbor Relation Algorithm Determined X X
Interestingness Criterion Interest measure threshold RCP Discovery CSTP discovery
Interestingness Criterion Threshold free X X
Type of data Boolean / Categorical RCP Discovery CSTP discovery
Type of data Quantitative data (e.g., Climate) X X
43
49
Future Work Long Term

Exploring interpretation of discovered patterns
by law enforcement.
ST Predictive analytics, Predictive models based
on SFPM and Predictive policing.
Towards Geo-social analytics for policing (e.g.
Criminal Flash mobs, gangs, groups of offenders
committing crimes)
New ST frequent pattern mining algorithms based
on depth first graph enumeration.
ST frequent pattern mining techniques that
account for patron demographic levels.
Explore evaluation of choloropeth maps via ST
frequent pattern mining.

43
50
Acknowledgment

Members of the Spatial Database and Data Mining
Research Group University of Minnesota,
Twin-Cities.
This Work was supported by Grants from U.S.ARMY,
NGA and U.S. DOJ.
Advisor Prof. Shashi Shekhar, Computer Science,
University of Minnesota.
Thesis committee.
U.S. DOJ National Institute of Justice Mr.
Ronald E. Wilson (Program Manager, Mapping and
Analysis for Public Safety) , Dr. Ned Levine (Ned
Levine and Associates, CrimeStat Program)
U.S. Army Topographic Engineering Center Dr.
J.A.Shine (Mathematician and Statistician,
Geospatial Research and Engineering Division )
and Dr. J.P. Rogers (Additional Director,
Topographic Engineering Center)
Mr. Tom Casady, Public Safety Director (Formerly
Lincoln Police Chief), Lincoln, NE, USA