Computational Geometry and Spatial Data Mining - PowerPoint PPT Presentation

Loading...

PPT – Computational Geometry and Spatial Data Mining PowerPoint presentation | free to download - id: 6a8b6f-YmNiM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Computational Geometry and Spatial Data Mining

Description:

How do we define a cluster? In spatial data mining we have objects/ entities with a location given by coordinates Cluster definitions involve distance between ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 91
Provided by: Marcvan7
Learn more at: http://cs.fiu.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Computational Geometry and Spatial Data Mining


1
Computational Geometry and Spatial Data Mining
  • Marc van Kreveld (and Giri Narasimhan)
  • Department of Information and Computing Sciences
  • Utrecht University

2
(No Transcript)
3
(No Transcript)
4
Clustering?
  • Are the people clustered in this room?
  • How do we define a cluster?
  • In spatial data mining we have objects/ entities
    with a location given by coordinates
  • Cluster definitions involve distance between
    locations
  • How do we define distance?

5
Clustering - options
  • Determine whether clustering occurs
  • Determine the degree of clustering
  • Determine the clusters
  • Determine the largest cluster
  • Determine the largest empty region
  • Determine the outliers

6
(No Transcript)
7
(No Transcript)
8
Co-location
  • Are the men clustered?
  • Are the women clustered?
  • Is there a co-location of men and women?
  • Determine regions favored exclusively by women.
    Men? Loners? Couples? Families?
  • Determine empty regions.

9
(No Transcript)
10
Co-location
  • Like before, we may be interested in
  • is there co-location?
  • the degree of co-location
  • the largest co-location
  • the co-locations themselves
  • the objects not involved in co-location
  • Regions with no (or little) co-location

11
(No Transcript)
12
Spatio-temporal data
  • Locations have a time stamp
  • Interesting patterns involve space and time
  • Anomalies?

13
(No Transcript)
14
Trajectory data
  • Entities with a trajectory (time-stamped motion
    path)
  • Interesting patterns involve subgroupswith
    similar heading, expected arrival,joint motion,
    ...
  • n entities trajectories n 10 100,000
  • t time steps t 10 100,000? input size is nt
  • m size subgroup (unknown) m 10 100,000

15
Examples of trajectory data
  • Tracked animals (buffalo, birds, ...)
  • Tracked people (potential terrorists)
  • Tracked GSMs (e.g. for traffic purposes)
  • Trajectories of tornadoes
  • Sports scene analysis (players on a soccer field)

16
Example pattern in trajectories
  • What is the location visited by most entities?

location circular region of specified radius
17
Example pattern in trajectories
  • What is the location visited by most entities?

location circular region of specified radius
4 entities
18
Example pattern in trajectories
  • What is the location visited by most entities?

location circular region of specified radius
3 entities
19
Example pattern in trajectories
  • Compute buffer of each trajectory

20
Example pattern in trajectories
  • Compute buffer of each trajectory
  • Compute the arrangement of the buffers and
    the cover count of each cell

1
1
1
2
0
1
21
Example pattern in trajectories
  • One trajectory has t time stamps its buffer can
    be computed in O(t log t) time
  • All buffers can be computed in O(nt log t) time
  • The arrangement can be computed in O(nt log
    (nt) k) time, where k O( (nt)2 ) is the
    complexity of the arrangement
  • Cell cover counts are determined in O(k) time

22
Example pattern in trajectories
  • Total O(nt log (nt) k) time
  • If the most visited location is visited bym
    entities, this is O(nt log (nt) ntm)
  • Note input size is nt n entities, each with
    location at t moments

23
Patterns in entity data
  • Spatio-temporal data
  • n trajectories, each has t time steps
  • Distance is time-dependent
  • flock pattern
  • meet pattern
  • Heading and speed are important and are also
    time-dependent
  • Spatial data
  • n points (locations)
  • Distance is important
  • clustering pattern
  • Presence of attributes (e.g. man/woman)
  • co-location patterns

24
Entities in subdivisions
  • Also co-location pattern
  • Discovered simply by overlayE.g., occurrences
    of oakson different soil types

25
Clustering entities in subdivisions
  • What if it is known that the entities only occur
    in regions of a certain type?

Situation without subdivision
radius of cluster
bird nests
26
Clustering entities in subdivisions
  • What if it is known that the entities only occur
    in regions of a certain type?

Situation with subdivisionland-water
radius of cluster
bird nests
27
Clustering entities in subdivisions
burglary
28
Region-restricted clustering
Joint research with Joachim Gudmundsson (NICTA,
Sydney) and Giri Narasimhan (U of F, Miami), 2006
  • Determine clusters in point sets that are
    sensitive to the geographic context (at least,
    for the relevant aspects)? Assume that a set of
    regions is given where points can only be, how
    should we define clusters?

29
Region-restricted clustering
  • Given a set P of points, a set F of regions, a
    radius r and a subset size m, aregion-restricted
    cluster is a subset P ? P inside a circle C
    where
  • P has size at least m
  • C has radius at most 2r
  • C contains at most ?r2 area of regions of F

r
2r
sum area ?r2
30
Region-restricted clustering
  • Given a set P of n points, a set F of polygons
    with nf edges in total, and values for r and m,
    report all region-restricted clusters of exactly
    m points
  • Exactly m points?
  • Real clustering (partition)?
  • Outliers?

31
Region-restricted clustering
  • Exactly m points?Every cluster with gtm points
    consists of clusters with m points with smaller
    circles
  • Real clustering (partition)?
  • Outliers?

m 5
32
Region-restricted clustering
  • Exactly m points?Every cluster with gtm points
    consists of clusters with m points with smaller
    circles
  • Real clustering (partition)?
  • Outliers?

m 5
33
Region-restricted clustering
  1. Determine all smallest circles with m points of P
    inside
  2. Test if the radius is r (report) or gt 2r
    (discard)
  3. If the radius is in between, determine the area
    of regions of F inside

34
Region-restricted clustering Step 1
  1. Determine all minimal circles with m points of P
    inside
  2. Determine all minimal circles with 3 points of P
    inside

35
ordinary order-1 VD
36
Region-restricted clustering
  • Determine all smallest circles with m points of P
    inside
  • Use (m-2)-th order Voronoi diagram cells where
    the same (m-2) points are closest
  • Its vertices are centers of smallest circles
    around exactly m points

37
ordinary order-1 VD
38
order-2 VD
39
order-3 VD
40
Region-restricted clustering
  • The m-th order Voronoi diagram (or (m-2)) has
    O(nm) cells, edges, and vertices
  • It can be constructed in O(nm log n) time? we
    get O(nm) smallest circles with m points inside
    for each we also know the radius

41
Region-restricted clustering
  • 2. Test if the radius is r (report) or gt 2r
    (discard) Trivial in O(1) time per circle, so
    in O(nm) time overall

42
Region-restricted clustering
  • 3. Determine the area of regions of F inside

Brute force O(nf) time per circle, so in O(nmnf)
time overall
43
Region-restricted clustering
  • Complication This need not give all
    region-restricted clusters!
  • Need to compute area of F inside a circle with
    moving center
  • Requires solving high-degree polynomials

44
Region-restricted clusters
  • The anti-climax we cannot give an exact
    algorithm!
  • If we takes squares instead of circles, we can
    deal with the problem ....

45
Region-restricted clustering
  • 3. Determine the area of regions of F inside

Brute force O(nf) time per square, so in O(nmnf)
time overall
The total time for steps 1, 2, and 3 isO(nm log
n) O(nm) O(nmnf) O(nm log n nmnf) time
46
Region-restricted clustering
  • 3. Determine the area of regions of F inside

Using a suitable data structure (only possible
for squares) O(log2 nf) time per square, so in
O(nm log2 nf) time overall
The total time becomes O(nm log n nf log2 nf
nm log2 nf)
total query time in data structure
order- (m-2) VD construction
preprocessing of data structure
47
Region-restricted clustering
  • The squares solution generalizes toregular
    polygons (e.g. 20-gons)
  • An approximation of the radius within (1?)r
    gives a O(n/?2 nf log2 nf n log nf /(m ?2))
    time algorithm

16-gon
48
Region-restricted clustering
  • Open problems
  • Develop a region-restricted version of k-means
    clustering, single link clustering, ...
  • Region-restricted co-location?
  • Replace region-restricted by gradual model

typical
clusters
0 /unit
2 /unit
5 /unit
8 /unit
49
Patterns in trajectories
  • n trajectories, each with t time steps? n
    polygonal lines with t vertices
  • Already looked at most visited location

50
Patterns in trajectories
  • Flock near positions of (sub)trajectories for
    some subset of the entities during some time
  • Convergence same destination region for some
    subset of the entities
  • Encounter same destination region with same
    arrival time for some subset of the entities
  • Similarity of trajectories
  • Same direction of movement, leadership, ......

flock
convergence
51
Patterns in trajectories
  • Flocking, convergence, encounter patterns
  • Laube, van Kreveld, Imfeld (SDH 2004)
  • Gudmundsson, van Kreveld, Speckmann (ACM GIS
    2004)
  • Benkert, Gudmundsson, Huebner, Wolle (ESA 2006)
  • ...
  • Similarity of trajectories
  • Vlachos, Kollios, Gunopulos (ICDE 2002)
  • Shim, Chang (WAIM 2003)
  • ...
  • Lifelines, motion mining, modeling motion
  • Mountain, Raper (GeoComputation 2001)
  • Kollios, Scaroff, Betke (DMKD 2001)
  • Frank (GISDATA 8, 2001)
  • ...

52
Patterns in trajectories
  • Flock near positions of (sub)trajectories for
    some subset of the entities during some time
  • clustering-type pattern
  • different definitions are used
  • Given radius r, subset size m, and duration T,a
    flock is a subset of size ? m that is inside a
    (moving) circle of radius r for a duration ? T

53
(No Transcript)
54
Patterns in trajectories
  • Longest flock given a radius r and subset size
    m, determine the longest time interval for which
    m entities were within each others proximity
    (circle radius r)

Time 0
1
6
5
4
3
2
7
8
m 3
longest flock in 1.8 , 6.4
55
Patterns in trajectories
  • Meet near some position of (sub)trajectories for
    some subset of the entities
  • clustering-type pattern
  • Given radius r, subset size m, and duration T,a
    meet is a subset of size ? m that is inside a
    (stationary) circle of radius r for a duration ? T

this was moving for flock
56
(No Transcript)
57
Patterns in trajectories
  • The same subset required for a flock or meet?

Example meet with m 4 duration is 3 time
steps or 4 time steps?
58
Patterns in trajectories
fixed subset
variable subset
flock
meet
examples for m 3
59
Patterns in trajectories
fixed subset
variable subset
NP-hard
O(n3 ? log n)
flock
O(n4 ?2 log n n2 ?3)
meet
O(n4 ?2 log n n2 ?3)
Exact results ( input size is n ? )
60
Patterns in trajectories
  • A radius-2 approximation of the longest flock can
    be computed in time O(n2 ? log n)... meaning
    if the longest flock of size m for radius rhas
    duration T, then we surely find a flock of size m
    and duration ? T for radius 2r

61
Patterns in trajectories
Approximate radius results ( input size is n ? )
fixed subset
variable subset
flock
O(n2 ? log n)
O((n2 ? log n) / ?2)
factor 2
factor 2 ?
O(n3 ? log n)
NP-hard
meet
O((n2 ? log n) / (m?2))
O((n2 ? log n) / (m?2))
factor 1 ?
factor 1 ?
O(n4 ?2 log n n2 ?3)
O(n4 ?2 log n n2 ?3)
62
Fixed subset flock
  • It is NP-complete to decide if a graph has a
    subgraph with m nodes that is a clique

v7
v2
v4
For every node of the graph, make an entity with
a trajectory
v1
v3
v5
v1
v2
v3
v4
v5
v6
v7
v6
v1 is not adjacent to v4, v5, and v7
all nodes notadjacent to v1 go here
63
Fixed subset flock
v7
v2
v4
v1
v3
v1
v2
v3
v4
v5
v6
v7
v5
v6
64
Fixed subset flock
v7
v2
v4
v1
v3
v1
v2
v3
v4
v5
v6
v7
v5
v6
flock v4,v5,v7 of (full) duration 23 (372)
and size 3
The trajectories have a fixed flock of size m and
full duration if and only if the graph has a
clique of size m
65
Fixed subset flock
  • Longest fixed flock is NP-hard
  • Max clique has no approximation ?cannot
    approximate duration, nor flock size
  • The reduction applies for all radii lt 2r

v4 in flock
v1
v2
v3
v4
v5
v6
v7
v4 not in flock
66
Flock and meet algorithms
  • Go into 3D (space-time) for algorithms

time
4
3
duration
2
duration
1
0
flock
meet
67
Fixed subset flock, approximation
  • An efficient radius-2 approximation algorithm of
    longest fixed flock exists
  • Idea if some vi is in the longest flock, then
    all other entities are within distance 2r from vi

flock with vi
vi
radius 2r, centered at vi
2r
68
Fixed subset flock, approximation
  • For each vj, we can determine the O(?) time
    intervals where vj is in the column of vi
  • Maintain the intersections for all entities in an
    augmented tree inO(n ? log n) time
  • Do this for all columns (role of vi)and report
    longest overall pattern Total O(n2 ? log n)
    time

69
Variable subset flock, exact
  • The subset that forms the flock may change
    entities, but must stay of size ? m
  • Any flock subset at any instant has a disk D of
    radius r with at least 2 entities on the
    boundary? defining entities

70
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

71
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

72
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

73
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

74
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

75
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

76
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

77
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

78
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

79
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

80
Variable subset flock, exact
  • Two entities define two cylinders through time by
    tracing the two possible radius r disks

81
Variable subset flock, exact
  • A critical moment is where another entity is on
    the boundary of the disk it may go outside or
    inside

82
Variable subset flock, exact
  • At a critical moment
  • a variable subset flock may start (m entities)
  • a variable subset flock may stop (ltm entities)
  • Three pairs of defining entities have disks that
    coincide
  • There are also critical moments when two entities
    are at distance exactly 2r
  • Between two time steps ti and ti1 there are
    O(n3) critical moments ? in total there are O(n3
    ?) critical moments

2r
83
Variable subset flock, exact
  • Let the O(n3 ?) critical moments be the nodes in
    a directed acyclic graph G
  • Edges of G are between two consecutive critical
    moments of the same two defining entities
  • directed from earlier to later
  • weight is time between critical moments
  • only if at least m entities are inside the disk

A longest variable subset flock is a maximum
weight path in G
time
84
Variable subset flock, exact
  • The graph G can be built in O(n3 ? log n) time
  • A maximum weight path can be found in O(n3 ? log
    n) time

A longest variable subset flock is a maximum
weight path in G
time
85
Patterns in trajectories, summary
  • Flock and meet patterns require algorithms in
    3-dimensional space (space-time)
  • Exact algorithms are inefficient ? only suitable
    for smaller data sets
  • Approximation can reduce running time with one or
    two orders of magnitude

86
Patterns in trajectories, summary
fixed subset
variable subset
apx
O(n2 ? log n)
O((n2 ? log n) / ?2)
factor 2
factor 2 ?
flock
NP-hard
O(n3 ? log n)
exact
O((n2 ? log n) / (m?2))
apx
O((n2 ? log n) / (m?2))
factor 1 ?
factor 1 ?
meet
O(n4 ?2 log n n2 ?3)
O(n4 ?2 log n n2 ?3)
exact
87
Future research on longest trajectories
  • Faster exact and approximation algorithms
  • Better approximation factors
  • Remove restriction of fixed shape of flocking
    region (compact or elongated both possible during
    same flock)
  • Longest duration convergence

longest convergence
88
Patterns in trajectories
  • Flock and meet patterns require algorithms in
    3-dimensional space (space-time)
  • Exact algorithms are inefficient ? only suitable
    for smaller data sets
  • Approximation can reduce running time with an
    order of magnitude

89
To conclude
  • With an exact definition of a spatial or
    spatio-temporal pattern, geometric algorithms can
    be used to compute all patterns
  • Many known structures from computational geometry
    are useful (Voronoi diagrams, arrangements, ...)
  • Since the (exact) algorithms may be inefficient,
    approximation may be a solution

90
To discuss
  • What patterns must be detected in practice (both
    spatial and spatio-temporal)?
  • What is the most appropriate definition
    (formalization) of these?
  • Spatial association rules, auto-correlation,
    irregularities, classification, ... and other
    computable things in spatial/spatio-temporal data
    mining
About PowerShow.com