Spatio-temporal%20frequent%20pattern%20mining%20for%20public%20safety:%20Concepts%20and%20Techniques - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Spatio-temporal%20frequent%20pattern%20mining%20for%20public%20safety:%20Concepts%20and%20Techniques

Description:

Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques Pradeep Mohan* Department of Computer Science University of Minnesota, Twin-Cities – PowerPoint PPT presentation

Number of Views:210
Avg rating:3.0/5.0
Slides: 36
Provided by: Pradee45
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Spatio-temporal%20frequent%20pattern%20mining%20for%20public%20safety:%20Concepts%20and%20Techniques


1
Spatio-temporal frequent pattern mining for
public safety Concepts and Techniques
  • Pradeep Mohan
  • Department of Computer Science
  • University of Minnesota, Twin-Cities
  • Advisor Prof. Shashi Shekhar
  • Thesis Committee Prof. F. Harvey, Prof. G.
    Karypis, Prof. J. Srivastava

Contact mohan_at_cs.umn.edu
2
Biography
  • Education
  • Ph.D., Student, Department. of Computer Science
    and Engineering., University of Minnesota, MN,
    2007 Present.
  • B. E., Department. of Computer Science and
    Engineering, Birla Institute of Technology,
    Mesra, Ranchi, India. 2003-2007
  • Major Projects during PhD
  • US DoJ/NIJ- Mapping and analysis for Public
    Safety
  • CrimeStat .NET Libaries 1.0 Modularization of
    CrimeStat, a tool for the analysis of crime
    incidents.
  • Performance tuning of Spatial analysis routines
    in CrimeStat
  • CrimeStat 3.2a - 3.3 Addition of new modules for
    spatial analysis.
  • US DOD/ ERDC/ TEC Cascade models for multi
    scale pattern discovery
  • Designed new interest measures and formulated
    pattern mining algorithms for identifying
    patterns from large crime report datasets.

1
3
Thesis Related Publications
  • Cascading spatio-temporal pattern discovery
    (Chapter 2)
  • P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers.
    Cascading spatio-temporal pattern discovery A
    summary of results. In Proc. Of 10th SIAM
    International Conference on Data Mining 2010 (SDM
    2010, Full paper acceptance rate 20)
  • P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers.
    Cascading spatio-temporal pattern discovery. IEEE
    Transactions on Knowledge and Data Engineering
    (TKDE). (Accepted Regular Paper, In Press 20
    Acceptance Rate)
  • Regional co-location pattern discovery (Chapter
    3)
  • P.Mohan, S.Shekhar, J.A. Shine, J.P. Rogers,
    Z.Jiang, N.Wayant. A spatial neighborhood graph
    based approach to Regional co-location pattern
    discovery summary of results. In Proc. Of 19th
    ACM SIGSPATIAL International Conference on
    Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full
    paper acceptance rate 23)
  • Crime Pattern Analysis Application (Chapter 4)
  • S.Shekhar, P. Mohan, D.Oliver, Z.Jiang, X.Zhou.
    Crime pattern analysis A spatial frequent
    pattern mining approach. M. Leitner (Ed.), Crime
    modeling and mapping using Geospatial
    Technologies, Springer (Accepted with Revisions).

2
4
Other Publications
  • Spatio-temporal data analysis
  • X.Zhou, S.Shekhar, P. Mohan, S. Leiss, P. Snyder.
    Discovering Interesting sub-paths in
    spatiotemporal datasets. In Proc. Of 19th ACM
    SIGSPATIAL International Conference on Advances
    in GIS 2011 (ACM SIGSPATIAL 2011, Full paper
    acceptance rate 23)
  • Spatial data analysis
  • P. Mohan, R. E. Wilson, S.Shekhar, B.George,
    N.Levine, M.Celik Should SDBMS support a join
    index? a case study from CrimeStat. In Proc. Of
    16th ACM SIGSPATIAL International Conference on
    Advances in GIS 2008 (ACM SIGSPATIAL 2008, Full
    paper acceptance rate 19)
  • P. Mohan, X. Zhou, S.Shekhar. Quantifying
    resolution sensitivity of spatial
    autocorrelation A Resolution Correlogram
    approach. In Proc. Of 7th International
    Conference on Geographic Information Science,
    2012 (GIScience 2012, Full paper)
  • S.Shekhar, M.R.Evans, J.M.Kang, P. Mohan.
    Identifying patterns in spatial information A
    survey of methods. (Accepted) WIREs Data Mining
    and Knowledge Discovery, Wiley Interdisciplinary
    Reviews, John Wiley and Sons, Inc, 2011 (in
    press)

3
5
Outline
  • Introduction
  • Motivation
  • Problem Statement
  • Our Approach
  • Future Work

4
6
Motivation Public Safety
  • Crime generators and attractors
  • Identifying events (e.g. Bar closing, football
    games) that lead to increased crime.

Question What / Where are the frequent crime
generators ?
  • Identifying frequent crime hotspots
  • Law enforcement planning
  • Courtsey www.startribune.com

Predicting the next location of burglary.
Question Where are the crime hotspots ?
  • Predicting crime events
  • Predictive policing (e.g. Predict next location
    of offense, forecast crime levels around
    conventions etc.)

Question What are the crime levels 1 hour after
a football game within a radius of 1 mile ?
  • Courtsey https//www.llnl.gov/str/September02/Ha
    ll.html

Other Applications Epidemiology
5
7
Scientific Domain Environmental Criminology
Routine activity theory and Crime Triangle
Crime pattern theory
Courtsey http//www.popcenter.org/learning/60step
s/index.cfm?stepnum8
Courtsey http//www.popcenter.org/learning/60step
s/index.cfm?stepNum16
  • Crime Event Motivated offender, vulnerable
    victim (available at an appropriate location and
    time), absence of a capable guardian.
  • Crime Generators offenders and targets come
    together in time place, large gatherings (e.g.
    Bars, Football games)
  • Crime Attractors places offering many
    criminal opportunities and offenders may relocate
    to these areas (e.g. drug areas)

6
8
Outline
  • Introduction
  • Problem Statement
  • Spatio-temporal frequent pattern mining problem
  • Challenges
  • Our Approach
  • Future Work

7
9
Spatio-temporal frequent pattern mining problem
  • Given
  • Spatial / Spatio-temporal framework.
  • Crime Reports with type, location and / or time.
  • Spatial Features of interest (e.g. Bars).
  • Interest measure threshold (P?)
  • Spatial / Spatio-temporal neighbor relation.
  • Find
  • Frequent patterns with interestingness gt P?
  • Objective
  • Minimize computation costs.
  • Constraints
  • Correctness and Completeness.
  • Statistical Interpretation (i.e. account for
    autocorrelation or heterogeneity)

8
10
Illustration Output
Regional Co-location patterns (Inputs Spatial
Neighborhood 1 mile, Threshold- 0.25)
9
11
Challenges
Time partitioning misses relationships
  • Spatio-temporal Semantics
  • Continuity of space / time
  • Partial order
  • Conflicting Requirements
  • Statistical Interpretation
  • Computational Scalability
  • Computational Cost
  • Exponential set of Candidate patterns

Space partitioning misses relationships
Patterns Exponential ( event types)
10
12
Our Contributions
  • New Spatio-temporal frequent pattern families.
  • Ex Cascading ST Patterns and Regional
    Co-location patterns.
  • Novel interest measures guarantee statistical
    interpretation and computable in polynomial time.
  • Scalable algorithms based on properties of
    spatio-temporal data and interest measures.
  • Experimental evaluation using synthetic and real
    crime datasets.

11
13
Outline
  • Introduction
  • Problem Statement
  • Our Approach
  • Big Picture
  • Cascading Spatio-temporal pattern discovery
  • Other Frequent Pattern Families
  • Future Work

12
14
Spatio-temporal frequent pattern mining (SFPM)
Process of discovering interesting, useful and
non-trivial patterns from spatiotemporal data.
Taxonomy of Spatio-temporal frequent patterns
Input Data Input Data
Spatial Spatio-temporal (ST)
Pattern Semantics Unordered Co-location Patterns ST Co-occurrences
Pattern Semantics Totally Ordered X ST Sequences
Pattern Semantics Partially Ordered X Cascading ST Patterns
Statistical Foundation Autocorrelation Co-location Patterns Cascading ST Patterns
Statistical Foundation Heterogeneity Regional Co-location Patterns X
X Unexplored
Todays Focus
13
15
Cascading ST pattern (CSTP)
  • Input Crime reports with location and time.
  • Output CSTP
  • Partially ordered subsets of ST event types.
  • Located together in space.
  • Occur in stages over time.

14
16
Related Pattern Semantics ST Data mining
  • ST Co-occurrence Celik et al. 2008, Cao et al.
    2006
  • Designed for moving object datasets by treating
    trajectories as location time series
  • Performs partitioning over space and time.
  • ST Sequence Huang et al. 2008, Cao et al. 2005
  • Totally ordered patterns modeled as a chain.
  • Does not account for multiply connected
    patterns(e.g. nonlinear)
  • Misses non-linear semantics.
  • No ST statistical interpretation.

16
15
17
Limitations of Related ST Pattern Semantics
  • ST Sequence
  • Total order
  • Ex. (B?A,A?C)
  • No ST statistical interpretation.
  • Limitations
  • Absence of Partial Order
  • Ex. (B?A, B?C, A?C)

16
18
Interpretation Model Directed Neighbor Graph
(DNG)
  • Nodes Individual Events
  • Directed Edge (N1 ? N2) iff
  • Neighbor( N1, N2)
  • and
  • After(N2, N1)

17
19
Statistical Foundation Interest Measures
  • Instances of CSTP P1 (B?A, B?C, A?C) are
  • (B1?A1, B1?C1, A1?C1)
  • (B1?A3, B1?C2, A3?C2)
  • ? ?(B1?A1 A1? C2 B1 ? C2)
  • Cascade Participation Ratio CPR (CSTP, M)
  • Conditional Probability of an instance of CSTP
    in neighborhood, given an instance of event-type
    M
  • Examples
  • Cascade Participation Index CPI(CSTP)
  • Min ( CPR(CSTP, M) ) over all M in CSTP
  • Example

18
20
Analytical Evaluation Statistical Interpretation
Spatial Statistics ST K-Function (Diggle et al.
1995)
  • Cascade Participation Index (CPI) is an upper
    bound to the ST K-Function per unit volume.

Example
ST -K (B ? A) 2/6 0.33 3/6 0.5 6/6 1
CPI (B ? A) 2/3 0.66 1 1
20
21
Comparison with Related Interest Measures
Measure Key Property
Frequency Double counting of pattern instances
Maximum Independent Set (MIS) Size Kuramochi and Karypis, 2004 NP Complete
Scoring Criterion for Bayesian Networks Neopolitan, 2003 Chickering, 1996 NP Complete Learning requires Prior specification
Lower bound on vertex label frequency Frequency based interpretation.
Measure Value
Frequency 3 / (What is the of transactions ?)
MIS 2
Lower Bound on Frequency min1,2,2 1
19
22
Computational Structure CSTP Miner Algorithm
  • Basic Idea
  • Initialization
  • for k in (1,23..K-1) and prevalent CSTP found
    do
  • Generate size k candidates.
  • Compute CSTP instances / Materialize part of DNG
  • Calculate interest measure and select prevalent
    CSTPs.
  • end
  • Item sets in Association rule mining
  • Chemical compounds/sub graphs in graph mining.
  • Directed acyclic graph in CSTP mining

Not part of a conventional apriori setting
21
23
CSTP Miner Algorithm Illustration
CPI Threshold 0.33
Null
C.2
0
0.4
0.8
0.75
0.2
0
A.1
B.1
C.3
A.3
0.4
0.4
0.8
C.4
C.1
A.5
B.2
A.2
0.4
A.4
Spatio-temporal join
22
24
Computational Structure CSTP Miner Algorithm
Fixed Parameters Spatial neighborhood 0.62
miles and temporal neighborhood 1hr, CPI
threshold 0.0055
  • Key Bottlenecks
  • Interest measure evaluation
  • Exponential pattern space
  • Computational Strategies
  • Reduce irrelevant interest measure evaluation
  • Filtering strategies
  • Compute interest measure efficiently
  • Time Ordered Nested Loop Strategy
  • Space-Time Partition Join Strategy

23
25
CSTP Miner Algorithm Interest Measure Evaluation
  • ST Join Strategies Perform each interest
    measure computation efficiently
  • Time Ordered Nested Loop (TONL) Strategy
  • Space-Time Partitioning (STP) Strategy

volume of ST neighborhood
C.2
A.1
B.1
C.3
A.3
ST join by plane sweep
Space
C.4
C.1
A.5
A.2
B.2
A.4
Time
Edges 13
24
26
CSTP Miner Algorithm Alternative Ideas
  • Can neighborhood graph be pre-computed ?
  • Trade off Storage versus Online computation
  • Cost of Storage
  • Pre-computed Graph O(EdgesNodes)
  • Example 24
  • On-the-fly O(Nodes)
  • Example 11
  • Cost of computation
  • Pre-computed graph O(EdgesNodes)
  • Example 24
  • On-the-fly O(Nodes Log(Nodes))
  • Example 38
  • Other factors
  • Dense vs Sparse data
  • Positive ST autocorrelation

25
27
CSTP Miner Algorithm Filtering Strategies
  • Key Rationale Enhance Savings filter non
    prevalent candidates early
  • Upper bound (UB) filter
  • Key Idea
  • CPI has anti-monotone upper bound.
  • Multi-resolution ST(MST) filter
  • Key Idea
  • There exists a low dimensional embedding in
    space and time.
  • Over estimate CPI by coarsening ST dataset.
  • If Overestimate (CPI) lt Threshold Pruned

26
28
CSTP Miner Algorithm Filtering Strategies
  • Multi resolution ST Filter

Summarizing on a coarser neighborhood yields
compression in most cases.
CPI Threshold 0.33
B?A B?C A?C C?A
B.1 A.1 B.1 C.2 A.1 C.2 C.1 A.5
B.1 A.3 B.1 C.3 A.3 C.3
B.2 A.2 B.2 C.1 A.1 C.3
B.2 A.4 A.3 C.4
0.8 0.75 0.4 0.2
B?A B?C A?C C?A
(0,0) (1,0) (0,2) (1,2) (1,2)(1,2) (1,1)(2,0)
(0,2) (1,2) (0,0)(1,1) (1,0)(1,1) (2,1)(2,0)
(1,2)(2,1)
(1,0)(2,1)
0.8 0.75 0.8 0.2
27
29
Experimental Evaluation Experiment Setup
Goals 1. Compare different design decisions of
the CSTPM Algorithm - Performance
Run-time 2. Test effect of parameters on
performance - Number of event types,
Dataset Size, Clumpiness Degree Experiment
Platform CPU 3.2GHz, RAM 32GB, OS Linux,
Matlab 7.9
28
30
Experimental Evaluation Datasets
Lincoln, NE Dataset
Real Data
  • Data size 5 datasets
  • Drawn by increments of 2 months
  • 5000- 33000 instances
  • Event types
  • Drawn by increments of 5 event types
  • 5 25 event types.

Synthetic Data
  • Data size 5 datasets
  • 5000- 26000 instances
  • Event types
  • 5 25 event types.
  • Clumpiness Degree
  • 5- 25 instances per event type per cell.

29
31
Experimental Evaluation Join strategy performance
Question What is the effect of dataset size on
performance of join strategies?
Fixed Parameters Real Data (CPI 0.15, 0.31
Miles, 10 Days) Synthetic data(0.5,25,25)
Trends ST Partitioning improves performance by a
factor of 5-10 on synthetic data and by a factor
of 3 on real data.
30
32
Experimental Evaluation Join strategy performance
Experiment 2 What is the effect of of event
types on performance of join strategies?
Fixed Parameters Real Data (CPI 0.15, 0.31
Miles, 10 Days) Synthetic data(0.5,25,25)
Trends ST Partitioning improves performance by a
factor 10 on synthetic data and by a factor of
2.5 on real data.
31
33
Experimental Evaluation Filtering strategy
performance
Experiment 3 What is the effect of dataset size
on performance of filtering strategies?
Fixed Parameters Real Data (CPI 0.15, 0.435
Miles, 10 Days) Synthetic data(0.65,70,70)
Trends Filtering improves performance by a
factor 5 on synthetic data and by a factor of 1.5
on real data.
32
34
Experimental Evaluation Filtering strategy
performance
Question What is the effect of of event types
on performance of filtering strategies?
Fixed Parameters Real Data (CPI 0.15, 0.435
Miles, 10 Days) Synthetic data(0.65,70,70)
Trends Filtering improves performance by a
factor 2.5 on synthetic data and by a factor of
1.3 on real data.
33
35
Experimental Evaluation Filtering strategy
performance
Question What is the effect of clumpiness
degree on different design decisions?
Fixed Parameters CPI 0.5, 15.53 Miles, 1.04
Days
  • Trends
  • Filtering improves performance by a factor 40
  • ST Partitioning improves performance by a factor
    of 10.

34
36
Lincoln, NE crime dataset Case study
  • Is bar closing a generator for crime related
    CSTP ?

Bar locations in Lincoln, NE
Questions
  • Observation Crime peaks around bar-closing!
  • Is bar closing a crime generator ?
  • Are there other generators (e.g. Saturday Nights
    )?

K.S Test Saturday night significantly different
than normal day bar closing (P-value 1.249x10-7
, K 0.41)
35
37
Lincoln, NE crime dataset Case study
36
38
Lincoln, NE crime dataset Case study
Pop I Pop II KS P-Val. a 0.05 a 0.2
Sat Night All Year 0.4187 1.249x10-7 Yes Yes
Football Night All Year 0.3400 0.1067 NO Yes
Sat Night Football Night 0.1987 0.7899 NO No
37
39
Outline
  • Introduction
  • Problem Statement
  • Our Approach
  • Big Picture
  • Cascading Spatio-temporal pattern discovery
  • Other Frequent Pattern Families
  • Future Work

38
40
Regional co-location patterns (RCP)
  • Input Spatial Features, Crime Reports.
  • Output RCP (e.g. lt (Bar, Assaults), Downtown gt)
  • Subsets of spatial features.
  • Frequently located in certain regions of a study
    area.

39
41
Statistical Foundation Accounting for
Heterogenity
  • Conditional probability of observing a pattern
    instance within a locality given an instance of a
    feature within that locality.

Regional Participation Ratio
Example
Regional Participation index
Example
Quantifies the local fraction participating in a
relationship.
40
42
Performance Tuning Key Ideas
Key Idea
  • Interest Measure shows special pruning
    properties in certain subsets of the spatial
    framework.

Maximal Locality
Key Properties
  • Collection of connected instances.
  • Maximal localities are mutually disjoint.
  • Contains several RCPs.

Key Observations
  • RPI shows anti-monotonicity property within
    Maximal Localities
  • Pruning a co-location, AB, prunes all its
    super sets (e.g. ABC, ABCDetc.).
  • RPI within a Maximal locality is an upper bound
    to RPI of constituent Prevalence localities.

43
Performance Tuning
Prevalence Threshold 0.25
Null
C
B
A
ML1
ML2
ML3
AB,0.167
BC,0.167
AC,0.25
AB,0.25
BC,0.33
AC,0.25
?
?
No RCP
No RCP
?
ltBC,PL3(BC)gt,0.167
ltAC,PL1(AC)gt,0.25
?
ltBC,PL4(BC)gt,0.167
Completeness
ABC Pruned Automatically
  • Pruning a pattern within a maximal locality does
    not prune any valid RCPs.

Compute Maximal Locality
Correctness
Due to upper bound property of RPI
  • Accepting a pattern involves additional checks
    so that only prevalent RCPs are reported.

Due to anti-monotonicity of RPI
44
Experimental Evaluation Spatial Neighborhood Size
  • What is the effect of spatial neighborhood size
    on performance of different algorithms ?
  • Fixed Parameters Dataset Size 7498 instances
    Features 5 Prevalence Threshold 0.07

of RCPs
Run Time
Trends
  • Run Time ML Pruning out performs PS Enumeration
    by a factor of 1.5 - 5
  • of RCPs examined ML Pruning out performs PS
    Enumeration by a factor of 4.13 - 19

45
Experimental Evaluation Feature Types
  • What is the effect of number of feature types
    on performance of different algorithms ?
  • Fixed Parameters Dataset Size 7498 instances
    Spatial neighborhood size 800 feet Prevalence
    Threshold 0.07

of RCPs
Run Time
Trends
  • Run Time ML Pruning out performs PS Enumeration
    by a factor of 1.2
  • of RCPs examined ML Pruning out performs PS
    Enumeration by a factor of 1.6 3.5

46
RCPs from Lincoln Crime Data
This result shows the interaction between Alcohol
and Vandalism apart from highlighting outbreaks
41
47
Conclusions
  • Proposed SFPM techniques (e.g., Cascading ST
    Patterns and Regional Co-location patterns) honor
    ST Semantics (e.g., Partial order, Continuity).
  • Interest measures achieve a balance between
    statistical interpretation and computational
    scalability.
  • Algorithmic strategies exploiting properties of
    ST data (e.g., multiresolution filter) and
    properties of interest measures enhance
    computational savings.

42
48
Future Work Short and Medium Term
X Unexplored
Input Data Input Data
Spatial Spatio-temporal (ST)
Pattern Semantics Unordered ? ?
Pattern Semantics Totally Ordered X ?
Pattern Semantics Partially Ordered X CSTP discovery
Statistical Foundation Autocorrelation ? CSTP discovery
Statistical Foundation Heterogeneity RCP Discovery X
Underlying Framework Euclidean RCP Discovery CSTP discovery
Underlying Framework Non-Euclidean (Networks) X X
Neighbor Relation User specified RCP Discovery CSTP discovery
Neighbor Relation Algorithm Determined X X
Interestingness Criterion Interest measure threshold RCP Discovery CSTP discovery
Interestingness Criterion Threshold free X X
Type of data Boolean / Categorical RCP Discovery CSTP discovery
Type of data Quantitative data (e.g., Climate) X X
43
49
Future Work Long Term
  • Exploring interpretation of discovered patterns
    by law enforcement.
  • ST Predictive analytics, Predictive models based
    on SFPM and Predictive policing.
  • Towards Geo-social analytics for policing (e.g.
    Criminal Flash mobs, gangs, groups of offenders
    committing crimes)
  • New ST frequent pattern mining algorithms based
    on depth first graph enumeration.
  • ST frequent pattern mining techniques that
    account for patron demographic levels.
  • Explore evaluation of choloropeth maps via ST
    frequent pattern mining.

43
50
Acknowledgment
  • Members of the Spatial Database and Data Mining
    Research Group University of Minnesota,
    Twin-Cities.
  • This Work was supported by Grants from U.S.ARMY,
    NGA and U.S. DOJ.
  • Advisor Prof. Shashi Shekhar, Computer Science,
    University of Minnesota.
  • Thesis committee.
  • U.S. DOJ National Institute of Justice Mr.
    Ronald E. Wilson (Program Manager, Mapping and
    Analysis for Public Safety) , Dr. Ned Levine (Ned
    Levine and Associates, CrimeStat Program)
  • U.S. Army Topographic Engineering Center Dr.
    J.A.Shine (Mathematician and Statistician,
    Geospatial Research and Engineering Division )
    and Dr. J.P. Rogers (Additional Director,
    Topographic Engineering Center)
  • Mr. Tom Casady, Public Safety Director (Formerly
    Lincoln Police Chief), Lincoln, NE, USA

Thank You for your Questions, Comments and
Attention!
44
About PowerShow.com