# Clustering Analysis (of Spatial Data and using Peano Count Trees) (Ptree technology is patented by NDSU) Notes: 1. over 100 slides - PowerPoint PPT Presentation

PPT – Clustering Analysis (of Spatial Data and using Peano Count Trees) (Ptree technology is patented by NDSU) Notes: 1. over 100 slides PowerPoint presentation | free to download - id: 4009bc-YWRlN

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Clustering Analysis (of Spatial Data and using Peano Count Trees) (Ptree technology is patented by NDSU) Notes: 1. over 100 slides

Description:

### The DBODLP Algorithm The algorithm is implemented based on a vertical storage and index model Vertical Data Structures ... mathematical model Statistical ... discrete ... – PowerPoint PPT presentation

Number of Views:259
Avg rating:3.0/5.0
Slides: 157
Provided by: Amb867
Category:
Tags:
Transcript and Presenter's Notes

Title: Clustering Analysis (of Spatial Data and using Peano Count Trees) (Ptree technology is patented by NDSU) Notes: 1. over 100 slides

1
Clustering Analysis (of Spatial Data and using
Peano Count Trees) (Ptree technology is patented
by NDSU) Notes 1. over 100 slides not
going to go through each in detail.
2
Clustering Methods
• A Categorization of Major Clustering Methods
• Partitioning methods
• Hierarchical methods
• Density-based methods
• Grid-based methods
• Model-based methods

3
Clustering Methods based on Partitioning
• Partitioning method Construct a partition of a
database D of n objects into a set of k clusters
• Given a k, find a partition of k clusters that
optimizes the chosen partitioning criterion
• k-means (MacQueen67) Each cluster is
represented by the center of the cluster
• k-medoids or PAM method (Partition Around
Medoids) (Kaufman Rousseeuw87) Each cluster
is represented by 1 object in the cluster ( the
middle object or median-like object)

4
The K-Means Clustering Method
• Given k, the k-means algorithm is implemented in
4 steps (assumes partitioning criteria is
maximize intra-cluster similarity and minimize
inter-cluster similarity. Of course, a heuristic
is used. Method isnt really an optimization)
• Partition objects into k nonempty subsets (or
pick k initial means).
• Compute the mean (center) or centroid of each
cluster of the current partition (if one started
with k means, then this step is done).
• centroid point that minimizes the sum of
dissimilarities from the mean or the sum of the
square errors from the mean.
• Assign each object to the cluster with the most
similar (closest) center.
• Go back to Step 2
• Stop when the new set of means doesnt change
(or some other stopping condition?)

5
k-Means
Step 1
Step 2
Step 3
10
9
8
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
Step 4
6
The K-Means Clustering Method
• Strength
• Relatively efficient O(tkn),
• n is objects,
• k is clusters
• t is iterations. Normally, k, t ltlt n.
• Weakness
• Applicable only when mean is defined (e.g., a
vector space)
• Need to specify k, the number of clusters, in
• It is sensitive to noisy data and outliers since
a small number of such data can substantially
influence the mean value.

7
The K-Medoids Clustering Method
• Find representative objects, called medoids,
(must be an actual object in the cluster, where
as the mean seldom is).
• PAM (Partitioning Around Medoids, 1987)
• starts from an initial set of medoids
• iteratively replaces one of the medoids by a
non-medoid
• if it improves the aggregate similarity measure,
retain the swap. Do this over all
medoid-nonmedoid pairs
• PAM works for small data sets. Does not scale
for large data sets
• CLARA (Clustering LARge Applications)
(Kaufmann,Rousseeuw, 1990) Sub-samples then apply
PAM
• CLARANS (Clustering Large Applications based on
RANdom
• Search) (Ng Han, 1994) Randomized the
sampling

8
PAM (Partitioning Around Medoids) (1987)
• Use real object to represent the cluster
• Select k representative objects arbitrarily
• For each pair of non-selected object h and
selected object i, calculate the total swapping
cost TCi,h
• For each pair of i and h,
• If TCi,h lt 0, i is replaced by h
• Then assign each non-selected object to the most
similar representative object
• repeat steps 2-3 until there is no change

9
CLARA (Clustering Large Applications) (1990)
• CLARA (Kaufmann and Rousseeuw in 1990)
• It draws multiple samples of the data set,
applies PAM on each sample, and gives the best
clustering as the output
• Strength deals with larger data sets than PAM
• Weakness
• Efficiency depends on the sample size
• A good clustering based on samples will not
necessarily represent a good clustering of the
whole data set if the sample is biased

10
CLARANS (Randomized CLARA) (1994)
• CLARANS (A Clustering Algorithm based on
Randomized Search) (Ng and Han94)
• CLARANS draws sample of neighbors dynamically
• The clustering process can be presented as
searching a graph where every node is a potential
solution, that is, a set of k medoids
• If the local optimum is found, CLARANS starts
with new randomly selected node in search for a
new local optimum (Genetic-Algorithm-like)
• Finally the best local optimum is chosen after
some stopping condition.
• It is more efficient and scalable than both PAM
and CLARA

11
Distance-based partitioning has drawbacks
• Simple and fast O(N)
• The number of clusters, K, has to be arbitrarily
chosen before it is known how many clusters is
correct.
• Produces round shaped clusters, not arbitrary
shapes (Chameleon data set below)
• Sensitive to the selection of the initial
partition and may converge to a local minimum of
the criterion function if the initial partition
is not well chosen.

Correct result
K-means result
12
Distance-based partitioning (Cont.)
• If we start with A, B, and C as the initial
centriods around which the three clusters are
built, then we end up with the partition A,
B, C, D, E, F, G shown by ellipses.
• Whereas, the correct three-cluster solution is
obtained by choosing, for example, A, D, and F as
the initial cluster means (rectangular clusters).

13
A Vertical Data Approach
• Partition the data set using rectangle P-trees (a
gridding)
• These P-trees can be viewed as a grouping
(partition) of data
• Pruning out outliers by disregard those sparse
values
• Input total number of objects (N), percentage
of outliers (t)
• Output Grid P-trees after prune
• (1) Choose the Grid P-tree with smallest root
count (Pgc)
• (2) outliersoutliers OR Pgc
• (3) if (outliers/Nlt t) then remove Pgc and
repeat (1)(2)
• Finding clusters using PAM method each grid
P-tree is an object
• Note when we have a P-tree mask for each
cluster, the mean is just
• the vector sum of the basic Ptrees ANDed with
the cluster Ptree,
• divided by the rootcount of the cluster Ptree

14
Distance Function
• Data Matrix
• n objects p variables
• Dissimilarity Matrix
• n objects n objects

15
AGNES (Agglomerative Nesting)
• Introduced in Kaufmann and Rousseeuw (1990)
• Use the Single-Link (distance between two sets is
the minimum pairwise distance) method
• Merge nodes that are most similarity
• Eventually all nodes belong to the same cluster

16
DIANA (Divisive Analysis)
• Introduced in Kaufmann and Rousseeuw (1990)
• Inverse order of AGNES (intitially all objects
are in one cluster then it is split according to
some criteria (e.g., maximize some aggregate
measure of pairwise dissimilarity again)
• Eventually each node forms a cluster on its own

17
Contrasting Clustering Techniques
• Partitioning algorithms Partition a dataset to k
clusters, e.g., k3 ?
• Hierarchical alg Create hierarchical
decomposition of ever-finer partitions.
• e.g., top down (divisively).

bottom up (agglomerative)
18
Hierarchical Clustering
19
Hierarchical Clustering (top down)
• In either case, one gets a nice dendogram in
which any maximal anti-chain (no 2 nodes linked)
is a clustering (partition).

20
Hierarchical Clustering (Cont.)
Recall that any maximal anti-chain (maximal set
of nodes in which no 2 are chained) is a
clustering (a dendogram offers many).
21
Hierarchical Clustering (Cont.)
But the horizontal anti-chains are the
clusterings resulting from the top down (or
bottom up) method(s).
22
Hierarchical Clustering (Cont.)
• Most hierarchical clustering algorithms are
popular.
• In the single-link method, the distance between
two clusters is the minimum of the distances
between all pairs of patterns drawn one from each
cluster.
• In the complete-link algorithm, the distance
between two clusters is the maximum of all
pairwise distances between pairs of patterns
drawn one from each cluster.
• In the average-link algorithm, the distance
between two clusters is the average of all
pairwise distances between pairs of patterns
drawn one from each cluster (which is the same as
the distance between the means in the vector
space case easier to calculate).

23
Distance Between Clusters
• Single Link smallest distance between any pair
of points from two clusters
• Complete Link largest distance between any pair
of points from two clusters

24
Distance between Clusters (Cont.)
• Average Link average distance between points
from two clusters
• Centroid distance between centroids of the two
clusters

25
26
27
1-cluster noise 2-cluster
28
Hierarchical vs. Partitional
• Hierarchical algorithms are more versatile than
partitional algorithms.
• For example, the single-link clustering algorithm
works well on data sets containing non-isotropic
(non-roundish) clusters including well-separated,
chain-like, and concentric clusters, whereas a
typical partitional algorithm such as the k-means
algorithm works well only on data sets having
isotropic clusters.
• On the other hand, the time and space
complexities of the partitional algorithms are
typically lower than those of the hierarchical
algorithms.

29
More on Hierarchical Clustering Methods
• Major weakness of agglomerative clustering
methods
• do not scale well time complexity of at least
O(n2), where n is the number of total objects
• can never undo what was done previously (greedy
algorithm)
• Integration of hierarchical with distance-based
clustering
• BIRCH (1996) uses Clustering Feature tree
(CF-tree) and incrementally adjusts the quality
of sub-clusters
• CURE (1998) selects well-scattered points from
the cluster and then shrinks them towards the
center of the cluster by a specified fraction
• CHAMELEON (1999) hierarchical clustering using
dynamic modeling

30
Density-Based Clustering Methods
• Clustering based on density (local cluster
criterion), such as density-connected points
• Major features
• Discover clusters of arbitrary shape
• Handle noise
• One scan
• Need density parameters as termination condition
• Several interesting studies
• DBSCAN Ester, et al. (KDD96)
• OPTICS Ankerst, et al (SIGMOD99).
• DENCLUE Hinneburg D. Keim (KDD98)
• CLIQUE Agrawal, et al. (SIGMOD98)

31
Density-Based Clustering Background
• Two parameters
• ? Maximum radius of the neighbourhood
• MinPts Minimum number of points in an ?
-neighbourhood of that point
• N?(p) q belongs to D dist(p,q) ? ?
• Directly (density) reachable A point p is
directly density-reachable from a point q wrt. ?,
MinPts if
• 1) p belongs to N?(q)
• 2) q is a core point
• N?(q) ? MinPts

32
Density-Based Clustering Background (II)
• Density-reachable
• A point p is density-reachable from a point q
(?p) wrt ?, MinPts if there is a chain of points
p1, , pn, p1q, pnp such that pi1 is
directly density-reachable from pi
• ?q, q is density-reachable from q.
• Density reachability is reflexive and transitive,
but not symmetric, since only core objects can
be density reachable to each other.
• Density-connected
• A point p is density-connected to a q wrt ?,
MinPts if there is a point o such that both, p
and q are density-reachable from o wrt ?, MinPts.
• Density reachability is not symmetric, Density
connectivity inherits the reflexivity and
transitivity and provides the symmetry. Thus,
density connectivity is an equivalence relation
and therefore gives a partition (clustering).

p
p1
q
p
q
o
33
DBSCAN Density Based Spatial Clustering of
Applications with Noise
• Relies on a density-based notion of cluster A
cluster is defined as an equivalence class of
density-connected points.
• Which gives the transitive property for the
density connectivity binary relation and
therefore it is an equivalence relation whose
components form a partition (clustering)
according to the duality.
• Discovers clusters of arbitrary shape in spatial
databases with noise

Outlier
Border
? 1cm MinPts 3
Core
34
DBSCAN The Algorithm
• Arbitrary select a point p
• Retrieve all points density-reachable from p wrt
?, MinPts.
• If p is a core point, a cluster is formed (note
it doesnt matter which of the core points within
a cluster you start at since density reachability
is symmetric on core points.)
• If p is a border point or an outlier, no points
are density-reachable from p and DBSCAN visits
the next point of the database. Keep track of
such points. If they dont get scooped up by a
later core point, then they are outliers.
• Continue the process until all of the points have
been processed.
• What about a simpler version of DBSCAN
• Define core points and core neighborhoods the
same way.
• Define (undirected graph) edge between two points
if they cohabitate a core nbrhd.
• The connectivity component partition is the
clustering.
• Other related method? How does vertical
technology help here? Gridding?

35
OPTICS
• Ordering Points To Identify Clustering Structure
• Ankerst, Breunig, Kriegel, and Sander (SIGMOD99)
• http//portal.acm.org/citation.cfm?id304187
• Addresses the shortcoming of DBSCAN, namely
choosing parameters.
• Develops a special order of the database wrt its
density-based clustering structure
• This cluster-ordering contains info equivalent to
the density-based clusterings corresponding to a
• Good for both automatic and interactive cluster
analysis, including finding intrinsic clustering
structure

36
OPTICS
Does this order resemble the Total Variation
order?
37
OPTICS Some Extension from DBSCAN
• Index-based
• k number of dimensions
• N number of points (20)
• p 75
• M N(1-p) 5
• Complexity O(kN2)
• Core Distance
• Reachability Distance

D
p1
o
p2
o
Max (core-distance (o), d (o, p)) r(p1, o)
2.8cm. r(p2,o) 4cm
MinPts 5 e 3 cm
38
Reachability-distance
undefined

Cluster-order of the objects
39
DENCLUE using density functions
• DENsity-based CLUstEring by Hinneburg Keim
(KDD98)
• Major features
• Solid mathematical foundation
• Good for data sets with large amounts of noise
• Allows a compact mathematical description of
arbitrarily shaped clusters in high-dimensional
data sets
• Significant faster than existing algorithm
(faster than DBSCAN by a factor of up to 45
claimed by authors ???)
• But needs a large number of parameters

40
Denclue Technical Essence
• Uses grid cells but only keeps information about
grid cells that do actually contain data points
and manages these cells in a tree-based access
structure.
• Influence function describes the impact of a
data point within its neighborhood.
• F(x,y) measures the influence that y has on x.
• A very good influence function is the Gaussian,
F(x,y) e d2(x,y)/2?
• Others include functions similar to the squashing
functions used in neural networks.
• One can think of the influence function as a
measure of the contribution to the density at x
• Overall density of the data space can be
calculated as the sum of the influence function
of all data points.
• Clusters can be determined mathematically by
identifying density attractors.
• Density attractors are local maximal of the
overall density function.

41
DENCLUE(D,s,?c,?)
1. Grid Data Set (use r s, the std. dev.)
2. Find (Highly) Populated Cells (use a
threshold?c) (shown in blue)
3. Identify populated cells (nonempty cells)
4. Find Density Attractor pts, C, using hill
climbing
5. Randomly pick a point, pi.
6. Compute local density (use r4s)
7. Pick another point, pi1, close to pi, compute
local density at pi1
8. If LocDen(pi) lt LocDen(pi1), climb
9. Put all points within distance s/2 of path, pi,
pi1, C into a density attractor cluster
called C
10. Connect the density attractor clusters, using a
threshold, ?, on the local densities of the
attractors.

A. Hinneburg and D. A. Keim. An Efficient
Approach to Clustering in Multimedia Databases
with Noise. In Proc. 4th Int. Conf. on Knowledge
Discovery and Data Mining. AAAI Press, 1998.
KDD 99 Workshop.
42
Comparison DENCLUE Vs DBSCAN
43
(No Transcript)
44
BIRCH (1996)
• Birch Balanced Iterative Reducing and Clustering
using Hierarchies, by Zhang, Ramakrishnan, Livny
(SIGMOD96
• http//portal.acm.org/citation.cfm?id235968.23332
4dlGUIDEdlACMidx235968partperiodicalWantT
ypeperiodicaltitleACM20SIGMOD20RecordCFID16
013608CFTOKEN14462336
• Incrementally construct a CF (Clustering Feature)
tree, a hierarchical data structure for
multiphase clustering
• Phase 1 scan DB to build an initial in-memory CF
tree (a multi-level compression of the data that
tries to preserve the inherent clustering
structure of the data)
• Phase 2 use an arbitrary clustering algorithm to
cluster the leaf nodes of the CF-tree
• Scales linearly finds a good clustering with a
single scan and improves quality with a few
• Weakness handles only numeric data, and
sensitive to the order of the data record.

45
BIRCH
• ABSTRACT
• Finding useful patterns in large datasets has
attracted considerable interest recently, and one
of the most widely studied problems in this area
is the identification of clusters, or densely
populated regions, in a multi-dimensional
dataset.
problem of large datasets and minimization of I/O
costs.
• This paper presents a data clustering method
named BIRCH (Balanced Iterative Reducing and
Clustering using Hierarchies), and demonstrates
that it is especially suitable for very large
databases.
• BIRCH incrementally and dynamically clusters
incoming multi-dimensional metric data points to
try to produce the best quality clustering with
the available resources (i.e., available memory
and time constraints).
• BIRCH can typically find a good clustering with a
single scan of the data, and improve the quality
further with a few additional scans.
• BIRCH is also the first clustering algorithm
proposed in the database area to handle "noise"
(data points that are not part of the underlying
pattern) effectively.
• We evaluate BIRCH's time/space efficiency, data
input order sensitivity, and clustering quality
through several experiments.

46
Clustering Feature Vector
CF (5, (16,30),(54,190))
(3,4) (2,6) (4,5) (4,7) (3,8)
47
Birch
Iteratively put points into closest leaf until
threshold is exceed, then split leaf. Inodes
summarize their subtrees and Inodes get split
when threshold is exceeded. Once in-memory CF
tree is built, use another method to cluster
leaves together.
Branching factor, B6 Threshold, L 7
48
CURE (Clustering Using REpresentatives )
• CURE proposed by Guha, Rastogi Shim, 1998
• http//portal.acm.org/citation.cfm?id276312
• Stops the creation of a cluster hierarchy if a
level consists of k clusters
• Uses multiple representative points to evaluate
the distance between clusters
• adjusts well to arbitrary shaped clusters (not
necessarily distance-based

49
Drawbacks of Distance-Based Method
• Drawbacks of square-error based clustering method
• Consider only one point as representative of a
cluster
• Good only for convex shaped, similar size and
density, and if k can be reasonably estimated

50
Cure The Algorithm
• Very much a hybrid method (involves pieces from
many others).
• Draw random sample s.
• Partition sample to p partitions with size s/p
• Partially cluster partitions into s/pq clusters
• Eliminate outliers
• By random sampling
• If a cluster grows too slow, eliminate it.
• Cluster partial clusters.
• Label data in disk

51
Cure
• ABSTRACT
• Clustering, in data mining, is useful for
discovering groups and identifying interesting
distributions in the underlying data. Traditional
clustering algorithms either favor clusters with
spherical shapes and similar sizes, or are very
fragile in the presence of outliers.
• We propose a new clustering algorithm called CURE
that is more robust to outliers, and identifies
clusters having non-spherical shapes and wide
variances in size.
• CURE achieves this by representing each cluster
by a certain fixed number of points that are
generated by selecting well scattered points from
the cluster and then shrinking them toward the
center of the cluster by a specified fraction.
• Having more than one representative point per
cluster allows CURE to adjust well to the
geometry of non-spherical shapes and the
shrinking helps to dampen the effects of
outliers.
• To handle large databases, CURE employs a
combination of random sampling and partitioning.
A random sample drawn from the data set is first
partitioned and each partition is partially
clustered. The partial clusters are then
clustered in a second pass to yield the desired
clusters.
• Our experimental results confirm that the quality
of clusters produced by CURE is much better than
those found by existing algorithms.
• Furthermore, they demonstrate that random
sampling and partitioning enable CURE to not only
outperform existing algorithms but also to scale
well for large databases without sacrificing
clustering quality.

52
Data Partitioning and Clustering
• s 50
• p 2
• s/p 25
• s/pq 5

x
x
53
Cure Shrinking Representative Points
• Shrink the multiple representative points towards
the gravity center by a fraction of ?.
• Multiple representatives capture the shape of the
cluster

54
Clustering Categorical Data ROCK http//portal.ac
m.org/citation.cfm?id351745
• ROCK Robust Clustering using linKs, by S. Guha,
R. Rastogi, K. Shim (ICDE99).
• Agglomerative Hierarchical
• Use links to measure similarity/proximity
• Not distance based
• Computational complexity
• Basic ideas
• Similarity function and neighbors
• Let T1 1,2,3, T23,4,5

55
ROCK
• Abstract
• Clustering, in data mining, is useful to discover
distribution patterns in the underlying data.
• Clustering algorithms usually employ a distance
metric based (e.g., euclidean) similarity measure
in order to partition the database such that data
points in the same partition are more similar
than points in different partitions.
• In this paper, we study clustering algorithms for
data with boolean and categorical attributes.
• We show that traditional clustering algorithms
that use distances between points for clustering
are not appropriate for boolean and categorical
attributes. Instead, we propose a novel concept
of links to measure the similarity/proximity
between a pair of data points.
• We develop a robust hierarchical clustering
algorithm ROCK that employs links and not
distances when merging clusters.
• Our methods naturally extend to non-metric
similarity measures that are relevant in
situations where a domain expert/similarity table
is the only source of knowledge.
• In addition to presenting detailed complexity
results for ROCK, we also conduct an experimental
study with real-life as well as synthetic data
sets to demonstrate the effectiveness of our
techniques.
• For data with categorical attributes, our
findings indicate that ROCK not only generates
algorithms, but it also exhibits good scalability
properties.

56
Rock Algorithm
• Links The number of common neighbors for the
two pts
• Algorithm
• Draw random sample
• Label data in disk

1,2,3, 1,2,4, 1,2,5, 1,3,4,
1,3,5 1,4,5, 2,3,4, 2,3,5, 2,4,5,
3,4,5
3
1,2,3 1,2,4
57
CHAMELEON
• CHAMELEON hierarchical clustering using dynamic
modeling, by G. Karypis, E.H. Han and V. Kumar99
• http//portal.acm.org/citation.cfm?id621303
• Measures the similarity based on a dynamic model
• Two clusters are merged only if the
interconnectivity and closeness (proximity)
between two clusters are high relative to the
internal interconnectivity of the clusters and
closeness of items within the clusters
• A two phase algorithm
• 1. Use a graph partitioning algorithm cluster
objects into a large number of relatively small
sub-clusters
• 2. Use an agglomerative hierarchical clustering
algorithm find the genuine clusters by
repeatedly combining these sub-clusters

58
CHAMELEON
• ABSTRACT
• Many advanced algorithms have difficulty dealing
with highly variable clusters that do not follow
a preconceived model.
• By basing its selections on both
interconnectivity and closeness, the Chameleon
algorithm yields accurate results for these
highly variable clusters.
• Existing algorithms use a static model of the
clusters and do not use information about the
nature of individual clusters as they are merged.
• Furthermore, one set of schemes (the CURE
algorithm and related schemes) ignores the
of items in two clusters.
• Another set of schemes (the Rock algorithm, group
averaging method, and related schemes) ignores
information about the closeness of two clusters
as defined by the similarity of the closest items
across two clusters.
• By considering either interconnectivity or
closeness only, these algorithms can select and
merge the wrong pair of clusters.
• Chameleon's key feature is that it accounts for
both interconnectivity and closeness in
identifying the most similar pair of clusters.
• Chameleon finds the clusters in the data set by
using a two-phase algorithm.
• During the first phase, Chameleon uses a
graph-partitioning algorithm to cluster the data
items into several relatively small subclusters.
• During the second phase, it uses an algorithm to
find the genuine clusters by repeatedly combining
these sub-clusters.

59
Overall Framework of CHAMELEON
Construct Sparse Graph
Partition the Graph
Data Set
Merge Partition
Final Clusters
60
Grid-Based Clustering Method
• Using multi-resolution grid data structure
• Several interesting methods
• STING (a STatistical INformation Grid approach)
by Wang, Yang and Muntz (1997)
• WaveCluster by Sheikholeslami, Chatterjee, and
Zhang (VLDB98)
• A multi-resolution clustering approach using
wavelet method
• CLIQUE Agrawal, et al. (SIGMOD98)

61
Vertical gridding
We can observe that almost all methods discussed
so far suffer from the curse of cardinality (for
very large cardinality data sets, the algorithms
are too slow to finish in the average life time!)
and/or the curse of dimensionality (points are
all at same distance). The work-arounds
employed to address the curses sampling (throw
out most of the points in a way that what remains
is low enough cardinality for the algorithm to
finish and in such a way that the remaining
sample contains all the information of the
original data set (Therein is the problem that
is impossible to do in general) Gridding
(agglomerate all points in a grid cell and treat
them as one point (smooth the data set to this
gridding level). The problem with gridding,
often, is that info is lost and the data
structure that holds the grid cell information is
very complex. With vertical methods (e.g.,
P-trees), all the info can be retained and
griddings can be constructed very efficiently on
demand. Horizontal data structures cant do
this. Subspace restrictions (e.g., Principal
methods (e.g., the gradient tangent vector field
of a response surface reduces the calculations to
the number of dimensions, not the number of
combinations of dimensions.) j-hi gridding the
j hi order bits identify a grid cells and the
rest identify points in a particular cell.
Thus, j-hi cells are not necessarily cubical
(unless all attribute bit-widths are the same).
j-lo gridding the j lo order bits identify
points in a particular cell and the rest identify
a grid cell. Thus, j-lo cells always have
a nice uniform shape (cubical).
62
1-hi gridding of Vector Space, R(A1, A2, A3) in
which all bit-widths are the same 3 (so each
grid cell contains 22 22 22 64 potential
points). Grid cells are identified by their
Peano id (Pid) internally the points cell
coordinates are shown - called the grid cell id
and
cell points are ided by coordinates
within their cell.
1
gci001 gcp 00,11,00
gci001 gcp 01,11,00
gci001 gcp 11,11,00
gci001 gcp 10,11,00
gci001 gcp 11,11,01
gci001 gcp 00,11,01
gci001 gcp 01,11,01
gci001 gcp 10,11,01
Pid 001
gci001 gcp 11,11,10
gci001 gcp 00,11,10
gci001 gcp 10,11,10
gci001 gcp 01,11,10
gci001 gcp 00,10,00
gci001 gcp 11,10,00
gci001 gcp 01,10,00
gci001 gcp 11,11,11
gci001 gcp 00,11,11
gci001 gcp 10,11,11
gci001 gcp 01,11,11
gci001 gcp 10,10,00
gci001 gcp 00,10,01
gci001 gcp 11,10,01
gci001 gcp 01,10,01
gci001 gcp 10,10,01
gci001 gcp 00,10,10
gci001 gcp 11,10,10
gci001 gcp 10,10,10
gci001 gcp 01,10,10
gci001 gcp 00,10,11
gci001 gcp 00,01,00
gci001 gcp 01,10,11
gci001 gcp 11,10,11
gci001 gcp 01,01,00
gci001 gcp 10,10,11
gci001 gcp 10,01,00
gci001 gcp 11,01.00
gci001 gcp 00,01,01
gci001 gcp 01,01,01
gci001 gcp 10,01,01
gci001 gcp 11,01.01
gci001 gcp 01,01,10
gci001 gcp 00,01,10
A3 hi-bit
gci001 gcp 10,01,10
gci001 gcp 11,01,10
0
gci001 gcp 01,01,11
gci001 gcp 11,01,11
gci001 gcp 00,01,11
gci001 gcp 10,01,11
gci001 gcp 00,00,00
gci001 gcp 01,00,00
gci001 gcp 11,00,00
gci001 gcp 10,00,00
gci001 gcp 00,00,01
gci001 gcp 01,00,01
gci001 gcp 10,00,01
gci001 gcp 11,00,01
0
gci001 gcp 00,00,10
gci001 gcp 01,00,10
gci001 gcp 10,00,10
gci001 gcp 11,00,10
gci001 gcp 00,00,11
gci001 gcp 01,00,11
gci001 gcp 10,00,11
gci001 gcp 11,00,11
1
A2 hi-bit
0
1
A1 hi-bit
63
2-hi gridding of Vector Space, R(A1, A2, A3) in
which all bitwidths are the same 3 (so each
grid cell contains 21 21 21 8 points).
11
10
A2
Pid 001.001
01
00
A3
gci 00,00,11 gcp0,1,0
gci 00,00,11 gcp1,1,0
gci 00,00,11 gcp0,1,1
gci 00,00,11 gcp1,1,1
01
gci 00,00,11 gcp0,0,0
gci 00,00,11 gcp1,0,0
00
10
gci 00,00,11 gcp0,0,1
gci 00,00,11 gcp1,0,1
11
00
01
10
11
64
1-hi gridding of R(A1, A2, A3), bitwidths of
3,2,3
Pid 001
gci001 gcp 11,1,00
gci001 gcp 00,1,00
gci001 gcp 10,1,00
gci001 gcp 01,1,00
1
gci001 gcp 01,1,01
gci001 gcp 00,1,01
gci001 gcp 10,1,01
gci001 gcp 11,1.01
gci001 gcp 00,1,10
A3 hi-bit
gci001 gcp 01,1,10
gci001 gcp 10,1,10
gci001 gcp 11,1,10
0
gci001 gcp 01,0,00
gci001 gcp 00,1,11
gci001 gcp 10,1,11
gci001 gcp 11,1,11
gci001 gcp 00,0,00
gci001 gcp 01,1,11
gci001 gcp 11,0,00
gci001 gcp 10,0,00
gci001 gcp 00,0,01
gci001 gcp 01,0,01
gci001 gcp 11,0,01
gci001 gcp 10,0,01
0
gci001 gcp 00,0,10
gci001 gcp 01,0,10
gci001 gcp 10,0,10
gci001 gcp 11,0,10
gci001 gcp 01,0,11
gci001 gcp 10,0,11
gci001 gcp 11,0,11
gci001 gcp 00,0,11
1
A2 hi-bit
0
1
A1 hi-bit
65
2-hi gridding) of R(A1, A2, A3), bitwidths of
3,2,3 (each grid cell contains 21 20 21 4
potential pts).
11
00
10
gcp 1,,0
A3 2-hi-bit
gcp 0,,0
gcp 0,,1
gcp 1,,1
01
01
10
00
11
A2 2-hi-bit
00
01
10
11
A1 2-hi-bit
Pid 3.1.3
66
HOBBit disks and rings (HOBBit Hi Order
Bifurcation Bit) 4-lo grid where A1,A2,A3 have
bit-widths, b11, b21, b31, HOBBit grid
centers are points of the form (exactly one per
grid cell) x(x1,b1..x1,41010, x2,b2..x2,41010,
x3,b3..x3,41010) where xi,js range over all
20 , H(x,20). Note we have switched the
direction of A3
(x1,b1..x1,41010, x2,b2..x2,41011,
x3,b3..x3,41011)?
? (x1,b1..x1,41011, x2,b2..x2,41011,
x3,b3..x3,41011)
gcp 1011, 1011, 1011
gcp 1010, 1011, 1011
gcp 1010, 1011, 1010
gcp 1011, 1011, 1010
?(x1,b1..x1,41011, x2,b2..x2,41011,
x3,b3..x3,41010)
(x1,b1..x1,41010, x2,b2..x2,41011,
x3,b3..x3,41010)?
gcp 1010, 1010, 1011
gcp 1011, 1010, 1011
gcp 1010, 1010, 1010
gcp 1011, 1010, 1010
?(x1,b1..x1,41011, x2,b2..x2,41010,
x3,b3..x3,41011)
(x1,b1..x1,41010, x2,b2..x2,41010,
x3,b3..x3,41011)?
(x1,b1..x1,41010, x2,b2..x2,41010,
x3,b3..x3,41010)?
?(x1,b1..x1,41011, x2,b2..x2,41010,
x3,b3..x3,41010)
A3
67
H(x,21) HOBBit disk about a HOBBit grid center
pt, x
(x1,b1..x1,41010, x2,b2..x2,41010,
x3,b3..x3,41010)
A2
A3
(x1,b1..x1,41000, x2,b2..x2,41011,
x3,b3..x3,41011)?
?(x1,b1..x1,41011, x2,b2..x2,41011,
x3,b3..x3,41011)
?(x1,b1..x1,41011, x2,b2..x2,41010,
x3,b3..x3,41011)
(x1,b1..x1,41000, x2,b2..x2,41011,
x3,b3..x3,41000)?
A1
(x1,b1..x1,41010, x2,b2..x2,41010,
x3,b3..x3,41010)
?(x1,b1..x1,41011, x2,b2..x2,41000,
x3,b3..x3,41011)
?(x1,b1..x1,41011, x2,b2..x2,41000,
x3,b3..x3,41010)
?(x1,b1..x1,41011, x2,b2..x2,41000,
x3,b3..x3,41001)
(x1,b1..x1,41000, x2,b2..x2,41000,
x3,b3..x3,41000)?
?(x1,b1..x1,41011, x2,b2..x2,41000,
x3,b3..x3,41000)
68
The Regions of H(x,21) are as follows
69
These REGIONS are labeled with dimensions in
which length is increased (e.g., all three
dimensions are increased below).
A2
A3
A1
123-REGION
70
A2
A3
A1
13-REGION
71
A2
A3
A1
23-REG
72
A2
A3
A1
12-REGION
73
A2
A3
3-REG
A1
74
A2
A3
A1
2-REG
75
A2
A3
1-REGION
A1
76
H(x,20) 123-REG Of H(x,20)
A2
A3
A1
77
(No Transcript)
78
• Select an outlier threshold, (pts without
neighbors in their ot L?-disk are outliers That
is, there is no gradient at these outlier points
(instantaneous rate of response change is zero).
• Create an j-lo grid with jot (see
previous slides - where HOBBit disks are built
out from HOBBit centers
• x ( x1,b1x1,ot11010
, , xn,bnxn,ot11010 ), xi,js ranging over
all binary patterns).
• Pick a point, x in R. Build out alternating
one-sided-rings centered at x until a neighbor is
found or radius ot is exceeded (in which case x
is declared an outlier). If a neighbor is found
at a raduis, ri lt ot 2j, ?f/ ?xk(x) is
estimated as below
• Note one can use L?-HOBBit or L? ordinary
distance.
• Note One-sided means that each successive build
out increases aternatively only in the positive
direction in all dimensions then only in the
negative direction in all dimensions.
• Note Building out HOBBit disks from a HOBBit
center automatically gives one-sided rings (a
built-out ring is defined to be the built-out
disk minus the previous built-out disk) as shown
in the next few slides.
• ( RootcountD(x,ri) - RootcountD(x,ri)k ) / ?xk
where D(x,ri)k is D(x,ri-1) expanded
• in all dimensions except k.
• Alternatively in 3., actually calculate the mean
(or median?) of the new points encountered in
D(x,ri) (we have a P-tree mask for the set so
this it trivial) and measure the xk-distance.
• NOTE Might want to go 1 more ring out to see
if one gets the same or a similar gradient

79
H(x,21)
H(x,21)1
HOBBit center, x(x1,b1..x1,41010,
x2,b2..x2,41010, x3,b3..x3,41010)
First new point
80
( RootcountD(x,ri) - RootcountD(x,ri)2 ) / ?x2
(2-1)/(-1) -1
H(x,21)2
H(x,21)
81
( RootcountD(x,ri) - RootcountD(x,ri)3 ) / ?x3
(2-1)/(-1) -1
H(x,21)3
H(x,21)
82
H(x,21)
HOBBit center, x(x1,b1..x1,41010,
x2,b2..x2,41010, x3,b3..x3,41010)
First new point
83
Est ?f/ ?xk(x) (RcD(x,ri) - RootcountD(x,ri)1 )
/ ?x1 (2-2)/(-1) 0
H(x,21)
H(x,21)1
HOBBit center, x(x1,b1..x1,41010,
x2,b2..x2,41010, x3,b3..x3,41010)
84
H(x,21)2
H(x,21)
85
H(x,21)3
H(x,21)
86
H(x,21)
87
H(x,21)
H(x,21)1
HOBBit center, x(x1,b1..x1,41010,
x2,b2..x2,41010, x3,b3..x3,41010)
First new point
88
( RootcountD(x,ri) - RootcountD(x,ri)2 ) / ?x2
(2-2)/(-1) 0
H(x,21)2
H(x,21)
89
( RootcountD(x,ri) - RootcountD(x,ri)3 ) / ?x3
(2-2)/(-1) 0
H(x,21)3
H(x,21)
90
seems to work. Next we consider a potential
accuracy improvement in which we take the medoid
of all new points as the gradient (or, more
accurately, as the point to which we climb in any
response surface hill climbing technique)
H(x,21)
91
medoid of the new point set (or, more correctly,
estimate the next hill-climb step). Note If the
original points are truly part of a strong
cluster, the hill climb will be excellent.
H(x,21)
new points s
new points centroid
92
medoid of the new point set (or, more correctly,
estimate the next hill-climb step). Note If the
original points are not truly part of a strong
cluster, the weak hill climb will indicate that.
H(x,21)
new points s
new points centroid
93
First new point
H(x,22)
H(x,22)1
94
To evaluate how well the formula estimates
the gradient, it is important to consider all
cases of the new point appearing in one of these
regions (if ? 1 point appears, gradient
components are additive, so it suffices to
consider 1?
H(x,2)
95
To evaluate how well the formula estimates the
gradient, it is important to consider all cases
of the new point appearing in 1 of these regions
96
(No Transcript)
97
H( x,23 )
Notice that the HOBBit center moves more and more
toward the true center as the grid size increases.
98
Grid based Gradients and Hill Climbing
• If we are using gridding to produce the gradient
vector field of a response surface, might we
always vary ?xi in the positive direction only?
How can that be done most efficiently?
• j-lo gridding, building out HOBBit rings from
HOBBit grid centers (see
previous slides where this approach was used.)
or j-lo
gridding. building out HOBBit rings from lo-value
grid pts (ending in j 0-bits)
• x ( x1,b1x1,j100 , , xn,bnxn,j100
)
• Ordinary j-lo griddng, building out rings from
lo-value ids (ending in j zero bits)
• Ordinary j-lo gridding, uilding out Rings from
true centers.
• Other? (there are many other possibilities, but
we will first explore 2.)
• Using j-lo gridding with j3 and lo-value cell
identifiers, is shown on the next slide.
• Of course, we need not use HOBBit build out.
• With ordinary unit radius build out, the results
are more exact, but are the calculations may be
more complex???

99
HOBBit j-lo rings using lo-value cell ids
x(x1,b1x1,j100 ,, xn,bnxn,j100)
100
Ordinary j-lo rings using lo-value cell ids
x(x1,b1x1,j100 ,, xn,bnxn,j100)
PDisk(x,3)PDisk(x,2)
PDisk(x,2)PDisk(x,1) wherePD(x,i)
Pxb..Pxj1 Pj..Pi1