Loading...

PPT – Clustering Analysis (of Spatial Data and using Peano Count Trees) (Ptree technology is patented by NDSU) Notes: 1. over 100 slides PowerPoint presentation | free to download - id: 4009bc-YWRlN

The Adobe Flash plugin is needed to view this content

Clustering Analysis (of Spatial Data and using

Peano Count Trees) (Ptree technology is patented

by NDSU) Notes 1. over 100 slides not

going to go through each in detail.

Clustering Methods

- A Categorization of Major Clustering Methods
- Partitioning methods
- Hierarchical methods
- Density-based methods
- Grid-based methods
- Model-based methods

Clustering Methods based on Partitioning

- Partitioning method Construct a partition of a

database D of n objects into a set of k clusters - Given a k, find a partition of k clusters that

optimizes the chosen partitioning criterion - k-means (MacQueen67) Each cluster is

represented by the center of the cluster - k-medoids or PAM method (Partition Around

Medoids) (Kaufman Rousseeuw87) Each cluster

is represented by 1 object in the cluster ( the

middle object or median-like object)

The K-Means Clustering Method

- Given k, the k-means algorithm is implemented in

4 steps (assumes partitioning criteria is

maximize intra-cluster similarity and minimize

inter-cluster similarity. Of course, a heuristic

is used. Method isnt really an optimization) - Partition objects into k nonempty subsets (or

pick k initial means). - Compute the mean (center) or centroid of each

cluster of the current partition (if one started

with k means, then this step is done). - centroid point that minimizes the sum of

dissimilarities from the mean or the sum of the

square errors from the mean. - Assign each object to the cluster with the most

similar (closest) center. - Go back to Step 2
- Stop when the new set of means doesnt change

(or some other stopping condition?)

k-Means

Step 1

Step 2

Step 3

10

9

8

7

6

5

4

3

2

1

0

0

1

2

3

4

5

6

7

8

9

10

Step 4

The K-Means Clustering Method

- Strength
- Relatively efficient O(tkn),
- n is objects,
- k is clusters
- t is iterations. Normally, k, t ltlt n.
- Weakness
- Applicable only when mean is defined (e.g., a

vector space) - Need to specify k, the number of clusters, in

advance. - It is sensitive to noisy data and outliers since

a small number of such data can substantially

influence the mean value.

The K-Medoids Clustering Method

- Find representative objects, called medoids,

(must be an actual object in the cluster, where

as the mean seldom is). - PAM (Partitioning Around Medoids, 1987)
- starts from an initial set of medoids
- iteratively replaces one of the medoids by a

non-medoid - if it improves the aggregate similarity measure,

retain the swap. Do this over all

medoid-nonmedoid pairs - PAM works for small data sets. Does not scale

for large data sets - CLARA (Clustering LARge Applications)

(Kaufmann,Rousseeuw, 1990) Sub-samples then apply

PAM - CLARANS (Clustering Large Applications based on

RANdom - Search) (Ng Han, 1994) Randomized the

sampling

PAM (Partitioning Around Medoids) (1987)

- Use real object to represent the cluster
- Select k representative objects arbitrarily
- For each pair of non-selected object h and

selected object i, calculate the total swapping

cost TCi,h - For each pair of i and h,
- If TCi,h lt 0, i is replaced by h
- Then assign each non-selected object to the most

similar representative object - repeat steps 2-3 until there is no change

CLARA (Clustering Large Applications) (1990)

- CLARA (Kaufmann and Rousseeuw in 1990)
- It draws multiple samples of the data set,

applies PAM on each sample, and gives the best

clustering as the output - Strength deals with larger data sets than PAM
- Weakness
- Efficiency depends on the sample size
- A good clustering based on samples will not

necessarily represent a good clustering of the

whole data set if the sample is biased

CLARANS (Randomized CLARA) (1994)

- CLARANS (A Clustering Algorithm based on

Randomized Search) (Ng and Han94) - CLARANS draws sample of neighbors dynamically
- The clustering process can be presented as

searching a graph where every node is a potential

solution, that is, a set of k medoids - If the local optimum is found, CLARANS starts

with new randomly selected node in search for a

new local optimum (Genetic-Algorithm-like) - Finally the best local optimum is chosen after

some stopping condition. - It is more efficient and scalable than both PAM

and CLARA

Distance-based partitioning has drawbacks

- Simple and fast O(N)
- The number of clusters, K, has to be arbitrarily

chosen before it is known how many clusters is

correct. - Produces round shaped clusters, not arbitrary

shapes (Chameleon data set below) - Sensitive to the selection of the initial

partition and may converge to a local minimum of

the criterion function if the initial partition

is not well chosen.

Correct result

K-means result

Distance-based partitioning (Cont.)

- If we start with A, B, and C as the initial

centriods around which the three clusters are

built, then we end up with the partition A,

B, C, D, E, F, G shown by ellipses. - Whereas, the correct three-cluster solution is

obtained by choosing, for example, A, D, and F as

the initial cluster means (rectangular clusters).

A Vertical Data Approach

- Partition the data set using rectangle P-trees (a

gridding) - These P-trees can be viewed as a grouping

(partition) of data - Pruning out outliers by disregard those sparse

values - Input total number of objects (N), percentage

of outliers (t) - Output Grid P-trees after prune
- (1) Choose the Grid P-tree with smallest root

count (Pgc) - (2) outliersoutliers OR Pgc
- (3) if (outliers/Nlt t) then remove Pgc and

repeat (1)(2) - Finding clusters using PAM method each grid

P-tree is an object - Note when we have a P-tree mask for each

cluster, the mean is just - the vector sum of the basic Ptrees ANDed with

the cluster Ptree, - divided by the rootcount of the cluster Ptree

Distance Function

- Data Matrix
- n objects p variables

- Dissimilarity Matrix
- n objects n objects

AGNES (Agglomerative Nesting)

- Introduced in Kaufmann and Rousseeuw (1990)
- Use the Single-Link (distance between two sets is

the minimum pairwise distance) method - Merge nodes that are most similarity
- Eventually all nodes belong to the same cluster

DIANA (Divisive Analysis)

- Introduced in Kaufmann and Rousseeuw (1990)
- Inverse order of AGNES (intitially all objects

are in one cluster then it is split according to

some criteria (e.g., maximize some aggregate

measure of pairwise dissimilarity again) - Eventually each node forms a cluster on its own

Contrasting Clustering Techniques

- Partitioning algorithms Partition a dataset to k

clusters, e.g., k3 ?

- Hierarchical alg Create hierarchical

decomposition of ever-finer partitions. - e.g., top down (divisively).

bottom up (agglomerative)

Hierarchical Clustering

Hierarchical Clustering (top down)

- In either case, one gets a nice dendogram in

which any maximal anti-chain (no 2 nodes linked)

is a clustering (partition).

Hierarchical Clustering (Cont.)

Recall that any maximal anti-chain (maximal set

of nodes in which no 2 are chained) is a

clustering (a dendogram offers many).

Hierarchical Clustering (Cont.)

But the horizontal anti-chains are the

clusterings resulting from the top down (or

bottom up) method(s).

Hierarchical Clustering (Cont.)

- Most hierarchical clustering algorithms are

variants of the single-link, complete-link or

average link. - Of these, single-link and complete link are most

popular. - In the single-link method, the distance between

two clusters is the minimum of the distances

between all pairs of patterns drawn one from each

cluster. - In the complete-link algorithm, the distance

between two clusters is the maximum of all

pairwise distances between pairs of patterns

drawn one from each cluster. - In the average-link algorithm, the distance

between two clusters is the average of all

pairwise distances between pairs of patterns

drawn one from each cluster (which is the same as

the distance between the means in the vector

space case easier to calculate).

Distance Between Clusters

- Single Link smallest distance between any pair

of points from two clusters

- Complete Link largest distance between any pair

of points from two clusters

Distance between Clusters (Cont.)

- Average Link average distance between points

from two clusters

- Centroid distance between centroids of the two

clusters

Single Link vs. Complete Link (Cont.)

Single link works but not complete link

Complete link works but not single link

Complete link works but not single link

Single Link vs. Complete Link (Cont.)

Single link works

Complete link doesnt

Single Link vs. Complete Link (Cont.)

Single link doesnt works

Complete link does

1-cluster noise 2-cluster

Hierarchical vs. Partitional

- Hierarchical algorithms are more versatile than

partitional algorithms. - For example, the single-link clustering algorithm

works well on data sets containing non-isotropic

(non-roundish) clusters including well-separated,

chain-like, and concentric clusters, whereas a

typical partitional algorithm such as the k-means

algorithm works well only on data sets having

isotropic clusters. - On the other hand, the time and space

complexities of the partitional algorithms are

typically lower than those of the hierarchical

algorithms.

More on Hierarchical Clustering Methods

- Major weakness of agglomerative clustering

methods - do not scale well time complexity of at least

O(n2), where n is the number of total objects - can never undo what was done previously (greedy

algorithm) - Integration of hierarchical with distance-based

clustering - BIRCH (1996) uses Clustering Feature tree

(CF-tree) and incrementally adjusts the quality

of sub-clusters - CURE (1998) selects well-scattered points from

the cluster and then shrinks them towards the

center of the cluster by a specified fraction - CHAMELEON (1999) hierarchical clustering using

dynamic modeling

Density-Based Clustering Methods

- Clustering based on density (local cluster

criterion), such as density-connected points - Major features
- Discover clusters of arbitrary shape
- Handle noise
- One scan
- Need density parameters as termination condition
- Several interesting studies
- DBSCAN Ester, et al. (KDD96)
- OPTICS Ankerst, et al (SIGMOD99).
- DENCLUE Hinneburg D. Keim (KDD98)
- CLIQUE Agrawal, et al. (SIGMOD98)

Density-Based Clustering Background

- Two parameters
- ? Maximum radius of the neighbourhood
- MinPts Minimum number of points in an ?

-neighbourhood of that point - N?(p) q belongs to D dist(p,q) ? ?
- Directly (density) reachable A point p is

directly density-reachable from a point q wrt. ?,

MinPts if - 1) p belongs to N?(q)
- 2) q is a core point
- N?(q) ? MinPts

Density-Based Clustering Background (II)

- Density-reachable
- A point p is density-reachable from a point q

(?p) wrt ?, MinPts if there is a chain of points

p1, , pn, p1q, pnp such that pi1 is

directly density-reachable from pi - ?q, q is density-reachable from q.
- Density reachability is reflexive and transitive,

but not symmetric, since only core objects can

be density reachable to each other. - Density-connected
- A point p is density-connected to a q wrt ?,

MinPts if there is a point o such that both, p

and q are density-reachable from o wrt ?, MinPts. - Density reachability is not symmetric, Density

connectivity inherits the reflexivity and

transitivity and provides the symmetry. Thus,

density connectivity is an equivalence relation

and therefore gives a partition (clustering).

p

p1

q

p

q

o

DBSCAN Density Based Spatial Clustering of

Applications with Noise

- Relies on a density-based notion of cluster A

cluster is defined as an equivalence class of

density-connected points. - Which gives the transitive property for the

density connectivity binary relation and

therefore it is an equivalence relation whose

components form a partition (clustering)

according to the duality. - Discovers clusters of arbitrary shape in spatial

databases with noise

Outlier

Border

? 1cm MinPts 3

Core

DBSCAN The Algorithm

- Arbitrary select a point p
- Retrieve all points density-reachable from p wrt

?, MinPts. - If p is a core point, a cluster is formed (note

it doesnt matter which of the core points within

a cluster you start at since density reachability

is symmetric on core points.) - If p is a border point or an outlier, no points

are density-reachable from p and DBSCAN visits

the next point of the database. Keep track of

such points. If they dont get scooped up by a

later core point, then they are outliers. - Continue the process until all of the points have

been processed. - What about a simpler version of DBSCAN
- Define core points and core neighborhoods the

same way. - Define (undirected graph) edge between two points

if they cohabitate a core nbrhd. - The connectivity component partition is the

clustering. - Other related method? How does vertical

technology help here? Gridding?

OPTICS

- Ordering Points To Identify Clustering Structure
- Ankerst, Breunig, Kriegel, and Sander (SIGMOD99)
- http//portal.acm.org/citation.cfm?id304187
- Addresses the shortcoming of DBSCAN, namely

choosing parameters. - Develops a special order of the database wrt its

density-based clustering structure - This cluster-ordering contains info equivalent to

the density-based clusterings corresponding to a

broad range of parameter settings - Good for both automatic and interactive cluster

analysis, including finding intrinsic clustering

structure

OPTICS

Does this order resemble the Total Variation

order?

OPTICS Some Extension from DBSCAN

- Index-based
- k number of dimensions
- N number of points (20)
- p 75
- M N(1-p) 5
- Complexity O(kN2)
- Core Distance
- Reachability Distance

D

p1

o

p2

o

Max (core-distance (o), d (o, p)) r(p1, o)

2.8cm. r(p2,o) 4cm

MinPts 5 e 3 cm

Reachability-distance

undefined

Cluster-order of the objects

DENCLUE using density functions

- DENsity-based CLUstEring by Hinneburg Keim

(KDD98) - Major features
- Solid mathematical foundation
- Good for data sets with large amounts of noise
- Allows a compact mathematical description of

arbitrarily shaped clusters in high-dimensional

data sets - Significant faster than existing algorithm

(faster than DBSCAN by a factor of up to 45

claimed by authors ???) - But needs a large number of parameters

Denclue Technical Essence

- Uses grid cells but only keeps information about

grid cells that do actually contain data points

and manages these cells in a tree-based access

structure. - Influence function describes the impact of a

data point within its neighborhood. - F(x,y) measures the influence that y has on x.
- A very good influence function is the Gaussian,

F(x,y) e d2(x,y)/2? - Others include functions similar to the squashing

functions used in neural networks. - One can think of the influence function as a

measure of the contribution to the density at x

made by y. - Overall density of the data space can be

calculated as the sum of the influence function

of all data points. - Clusters can be determined mathematically by

identifying density attractors. - Density attractors are local maximal of the

overall density function.

DENCLUE(D,s,?c,?)

- Grid Data Set (use r s, the std. dev.)
- Find (Highly) Populated Cells (use a

threshold?c) (shown in blue) - Identify populated cells (nonempty cells)
- Find Density Attractor pts, C, using hill

climbing - Randomly pick a point, pi.
- Compute local density (use r4s)
- Pick another point, pi1, close to pi, compute

local density at pi1 - If LocDen(pi) lt LocDen(pi1), climb
- Put all points within distance s/2 of path, pi,

pi1, C into a density attractor cluster

called C - Connect the density attractor clusters, using a

threshold, ?, on the local densities of the

attractors.

A. Hinneburg and D. A. Keim. An Efficient

Approach to Clustering in Multimedia Databases

with Noise. In Proc. 4th Int. Conf. on Knowledge

Discovery and Data Mining. AAAI Press, 1998.

KDD 99 Workshop.

Comparison DENCLUE Vs DBSCAN

(No Transcript)

BIRCH (1996)

- Birch Balanced Iterative Reducing and Clustering

using Hierarchies, by Zhang, Ramakrishnan, Livny

(SIGMOD96 - http//portal.acm.org/citation.cfm?id235968.23332

4dlGUIDEdlACMidx235968partperiodicalWantT

ypeperiodicaltitleACM20SIGMOD20RecordCFID16

013608CFTOKEN14462336 - Incrementally construct a CF (Clustering Feature)

tree, a hierarchical data structure for

multiphase clustering - Phase 1 scan DB to build an initial in-memory CF

tree (a multi-level compression of the data that

tries to preserve the inherent clustering

structure of the data) - Phase 2 use an arbitrary clustering algorithm to

cluster the leaf nodes of the CF-tree - Scales linearly finds a good clustering with a

single scan and improves quality with a few

additional scans - Weakness handles only numeric data, and

sensitive to the order of the data record.

BIRCH

- ABSTRACT
- Finding useful patterns in large datasets has

attracted considerable interest recently, and one

of the most widely studied problems in this area

is the identification of clusters, or densely

populated regions, in a multi-dimensional

dataset. - Prior work does not adequately address the

problem of large datasets and minimization of I/O

costs. - This paper presents a data clustering method

named BIRCH (Balanced Iterative Reducing and

Clustering using Hierarchies), and demonstrates

that it is especially suitable for very large

databases. - BIRCH incrementally and dynamically clusters

incoming multi-dimensional metric data points to

try to produce the best quality clustering with

the available resources (i.e., available memory

and time constraints). - BIRCH can typically find a good clustering with a

single scan of the data, and improve the quality

further with a few additional scans. - BIRCH is also the first clustering algorithm

proposed in the database area to handle "noise"

(data points that are not part of the underlying

pattern) effectively. - We evaluate BIRCH's time/space efficiency, data

input order sensitivity, and clustering quality

through several experiments.

Clustering Feature Vector

CF (5, (16,30),(54,190))

(3,4) (2,6) (4,5) (4,7) (3,8)

Birch

Iteratively put points into closest leaf until

threshold is exceed, then split leaf. Inodes

summarize their subtrees and Inodes get split

when threshold is exceeded. Once in-memory CF

tree is built, use another method to cluster

leaves together.

Branching factor, B6 Threshold, L 7

CURE (Clustering Using REpresentatives )

- CURE proposed by Guha, Rastogi Shim, 1998
- http//portal.acm.org/citation.cfm?id276312
- Stops the creation of a cluster hierarchy if a

level consists of k clusters - Uses multiple representative points to evaluate

the distance between clusters - adjusts well to arbitrary shaped clusters (not

necessarily distance-based - avoids single-link effect

Drawbacks of Distance-Based Method

- Drawbacks of square-error based clustering method

- Consider only one point as representative of a

cluster - Good only for convex shaped, similar size and

density, and if k can be reasonably estimated

Cure The Algorithm

- Very much a hybrid method (involves pieces from

many others). - Draw random sample s.
- Partition sample to p partitions with size s/p
- Partially cluster partitions into s/pq clusters
- Eliminate outliers
- By random sampling
- If a cluster grows too slow, eliminate it.
- Cluster partial clusters.
- Label data in disk

Cure

- ABSTRACT
- Clustering, in data mining, is useful for

discovering groups and identifying interesting

distributions in the underlying data. Traditional

clustering algorithms either favor clusters with

spherical shapes and similar sizes, or are very

fragile in the presence of outliers. - We propose a new clustering algorithm called CURE

that is more robust to outliers, and identifies

clusters having non-spherical shapes and wide

variances in size. - CURE achieves this by representing each cluster

by a certain fixed number of points that are

generated by selecting well scattered points from

the cluster and then shrinking them toward the

center of the cluster by a specified fraction. - Having more than one representative point per

cluster allows CURE to adjust well to the

geometry of non-spherical shapes and the

shrinking helps to dampen the effects of

outliers. - To handle large databases, CURE employs a

combination of random sampling and partitioning.

A random sample drawn from the data set is first

partitioned and each partition is partially

clustered. The partial clusters are then

clustered in a second pass to yield the desired

clusters. - Our experimental results confirm that the quality

of clusters produced by CURE is much better than

those found by existing algorithms. - Furthermore, they demonstrate that random

sampling and partitioning enable CURE to not only

outperform existing algorithms but also to scale

well for large databases without sacrificing

clustering quality.

Data Partitioning and Clustering

- s 50
- p 2
- s/p 25

- s/pq 5

x

x

Cure Shrinking Representative Points

- Shrink the multiple representative points towards

the gravity center by a fraction of ?. - Multiple representatives capture the shape of the

cluster

Clustering Categorical Data ROCK http//portal.ac

m.org/citation.cfm?id351745

- ROCK Robust Clustering using linKs, by S. Guha,

R. Rastogi, K. Shim (ICDE99). - Agglomerative Hierarchical
- Use links to measure similarity/proximity
- Not distance based
- Computational complexity
- Basic ideas
- Similarity function and neighbors
- Let T1 1,2,3, T23,4,5

ROCK

- Abstract
- Clustering, in data mining, is useful to discover

distribution patterns in the underlying data. - Clustering algorithms usually employ a distance

metric based (e.g., euclidean) similarity measure

in order to partition the database such that data

points in the same partition are more similar

than points in different partitions. - In this paper, we study clustering algorithms for

data with boolean and categorical attributes. - We show that traditional clustering algorithms

that use distances between points for clustering

are not appropriate for boolean and categorical

attributes. Instead, we propose a novel concept

of links to measure the similarity/proximity

between a pair of data points. - We develop a robust hierarchical clustering

algorithm ROCK that employs links and not

distances when merging clusters. - Our methods naturally extend to non-metric

similarity measures that are relevant in

situations where a domain expert/similarity table

is the only source of knowledge. - In addition to presenting detailed complexity

results for ROCK, we also conduct an experimental

study with real-life as well as synthetic data

sets to demonstrate the effectiveness of our

techniques. - For data with categorical attributes, our

findings indicate that ROCK not only generates

better quality clusters than traditional

algorithms, but it also exhibits good scalability

properties.

Rock Algorithm

- Links The number of common neighbors for the

two pts - Algorithm
- Draw random sample
- Cluster with links
- Label data in disk

1,2,3, 1,2,4, 1,2,5, 1,3,4,

1,3,5 1,4,5, 2,3,4, 2,3,5, 2,4,5,

3,4,5

3

1,2,3 1,2,4

CHAMELEON

- CHAMELEON hierarchical clustering using dynamic

modeling, by G. Karypis, E.H. Han and V. Kumar99

- http//portal.acm.org/citation.cfm?id621303
- Measures the similarity based on a dynamic model
- Two clusters are merged only if the

interconnectivity and closeness (proximity)

between two clusters are high relative to the

internal interconnectivity of the clusters and

closeness of items within the clusters - A two phase algorithm
- 1. Use a graph partitioning algorithm cluster

objects into a large number of relatively small

sub-clusters - 2. Use an agglomerative hierarchical clustering

algorithm find the genuine clusters by

repeatedly combining these sub-clusters

CHAMELEON

- ABSTRACT
- Many advanced algorithms have difficulty dealing

with highly variable clusters that do not follow

a preconceived model. - By basing its selections on both

interconnectivity and closeness, the Chameleon

algorithm yields accurate results for these

highly variable clusters. - Existing algorithms use a static model of the

clusters and do not use information about the

nature of individual clusters as they are merged. - Furthermore, one set of schemes (the CURE

algorithm and related schemes) ignores the

information about the aggregate interconnectivity

of items in two clusters. - Another set of schemes (the Rock algorithm, group

averaging method, and related schemes) ignores

information about the closeness of two clusters

as defined by the similarity of the closest items

across two clusters. - By considering either interconnectivity or

closeness only, these algorithms can select and

merge the wrong pair of clusters. - Chameleon's key feature is that it accounts for

both interconnectivity and closeness in

identifying the most similar pair of clusters. - Chameleon finds the clusters in the data set by

using a two-phase algorithm. - During the first phase, Chameleon uses a

graph-partitioning algorithm to cluster the data

items into several relatively small subclusters. - During the second phase, it uses an algorithm to

find the genuine clusters by repeatedly combining

these sub-clusters.

Overall Framework of CHAMELEON

Construct Sparse Graph

Partition the Graph

Data Set

Merge Partition

Final Clusters

Grid-Based Clustering Method

- Using multi-resolution grid data structure
- Several interesting methods
- STING (a STatistical INformation Grid approach)

by Wang, Yang and Muntz (1997) - WaveCluster by Sheikholeslami, Chatterjee, and

Zhang (VLDB98) - A multi-resolution clustering approach using

wavelet method - CLIQUE Agrawal, et al. (SIGMOD98)

Vertical gridding

We can observe that almost all methods discussed

so far suffer from the curse of cardinality (for

very large cardinality data sets, the algorithms

are too slow to finish in the average life time!)

and/or the curse of dimensionality (points are

all at same distance). The work-arounds

employed to address the curses sampling (throw

out most of the points in a way that what remains

is low enough cardinality for the algorithm to

finish and in such a way that the remaining

sample contains all the information of the

original data set (Therein is the problem that

is impossible to do in general) Gridding

(agglomerate all points in a grid cell and treat

them as one point (smooth the data set to this

gridding level). The problem with gridding,

often, is that info is lost and the data

structure that holds the grid cell information is

very complex. With vertical methods (e.g.,

P-trees), all the info can be retained and

griddings can be constructed very efficiently on

demand. Horizontal data structures cant do

this. Subspace restrictions (e.g., Principal

Components, Subspace Clustering) Gradient based

methods (e.g., the gradient tangent vector field

of a response surface reduces the calculations to

the number of dimensions, not the number of

combinations of dimensions.) j-hi gridding the

j hi order bits identify a grid cells and the

rest identify points in a particular cell.

Thus, j-hi cells are not necessarily cubical

(unless all attribute bit-widths are the same).

j-lo gridding the j lo order bits identify

points in a particular cell and the rest identify

a grid cell. Thus, j-lo cells always have

a nice uniform shape (cubical).

1-hi gridding of Vector Space, R(A1, A2, A3) in

which all bit-widths are the same 3 (so each

grid cell contains 22 22 22 64 potential

points). Grid cells are identified by their

Peano id (Pid) internally the points cell

coordinates are shown - called the grid cell id

and

cell points are ided by coordinates

within their cell.

1

gci001 gcp 00,11,00

gci001 gcp 01,11,00

gci001 gcp 11,11,00

gci001 gcp 10,11,00

gci001 gcp 11,11,01

gci001 gcp 00,11,01

gci001 gcp 01,11,01

gci001 gcp 10,11,01

Pid 001

gci001 gcp 11,11,10

gci001 gcp 00,11,10

gci001 gcp 10,11,10

gci001 gcp 01,11,10

gci001 gcp 00,10,00

gci001 gcp 11,10,00

gci001 gcp 01,10,00

gci001 gcp 11,11,11

gci001 gcp 00,11,11

gci001 gcp 10,11,11

gci001 gcp 01,11,11

gci001 gcp 10,10,00

gci001 gcp 00,10,01

gci001 gcp 11,10,01

gci001 gcp 01,10,01

gci001 gcp 10,10,01

gci001 gcp 00,10,10

gci001 gcp 11,10,10

gci001 gcp 10,10,10

gci001 gcp 01,10,10

gci001 gcp 00,10,11

gci001 gcp 00,01,00

gci001 gcp 01,10,11

gci001 gcp 11,10,11

gci001 gcp 01,01,00

gci001 gcp 10,10,11

gci001 gcp 10,01,00

gci001 gcp 11,01.00

gci001 gcp 00,01,01

gci001 gcp 01,01,01

gci001 gcp 10,01,01

gci001 gcp 11,01.01

gci001 gcp 01,01,10

gci001 gcp 00,01,10

A3 hi-bit

gci001 gcp 10,01,10

gci001 gcp 11,01,10

0

gci001 gcp 01,01,11

gci001 gcp 11,01,11

gci001 gcp 00,01,11

gci001 gcp 10,01,11

gci001 gcp 00,00,00

gci001 gcp 01,00,00

gci001 gcp 11,00,00

gci001 gcp 10,00,00

gci001 gcp 00,00,01

gci001 gcp 01,00,01

gci001 gcp 10,00,01

gci001 gcp 11,00,01

0

gci001 gcp 00,00,10

gci001 gcp 01,00,10

gci001 gcp 10,00,10

gci001 gcp 11,00,10

gci001 gcp 00,00,11

gci001 gcp 01,00,11

gci001 gcp 10,00,11

gci001 gcp 11,00,11

1

A2 hi-bit

0

1

A1 hi-bit

2-hi gridding of Vector Space, R(A1, A2, A3) in

which all bitwidths are the same 3 (so each

grid cell contains 21 21 21 8 points).

11

10

A2

Pid 001.001

01

00

A3

gci 00,00,11 gcp0,1,0

gci 00,00,11 gcp1,1,0

gci 00,00,11 gcp0,1,1

gci 00,00,11 gcp1,1,1

01

gci 00,00,11 gcp0,0,0

gci 00,00,11 gcp1,0,0

00

10

gci 00,00,11 gcp0,0,1

gci 00,00,11 gcp1,0,1

11

00

01

10

11

1-hi gridding of R(A1, A2, A3), bitwidths of

3,2,3

Pid 001

gci001 gcp 11,1,00

gci001 gcp 00,1,00

gci001 gcp 10,1,00

gci001 gcp 01,1,00

1

gci001 gcp 01,1,01

gci001 gcp 00,1,01

gci001 gcp 10,1,01

gci001 gcp 11,1.01

gci001 gcp 00,1,10

A3 hi-bit

gci001 gcp 01,1,10

gci001 gcp 10,1,10

gci001 gcp 11,1,10

0

gci001 gcp 01,0,00

gci001 gcp 00,1,11

gci001 gcp 10,1,11

gci001 gcp 11,1,11

gci001 gcp 00,0,00

gci001 gcp 01,1,11

gci001 gcp 11,0,00

gci001 gcp 10,0,00

gci001 gcp 00,0,01

gci001 gcp 01,0,01

gci001 gcp 11,0,01

gci001 gcp 10,0,01

0

gci001 gcp 00,0,10

gci001 gcp 01,0,10

gci001 gcp 10,0,10

gci001 gcp 11,0,10

gci001 gcp 01,0,11

gci001 gcp 10,0,11

gci001 gcp 11,0,11

gci001 gcp 00,0,11

1

A2 hi-bit

0

1

A1 hi-bit

2-hi gridding) of R(A1, A2, A3), bitwidths of

3,2,3 (each grid cell contains 21 20 21 4

potential pts).

11

00

10

gcp 1,,0

A3 2-hi-bit

gcp 0,,0

gcp 0,,1

gcp 1,,1

01

01

10

00

11

A2 2-hi-bit

00

01

10

11

A1 2-hi-bit

Pid 3.1.3

HOBBit disks and rings (HOBBit Hi Order

Bifurcation Bit) 4-lo grid where A1,A2,A3 have

bit-widths, b11, b21, b31, HOBBit grid

centers are points of the form (exactly one per

grid cell) x(x1,b1..x1,41010, x2,b2..x2,41010,

x3,b3..x3,41010) where xi,js range over all

binary patterns HOBBit disk about x, of radius

20 , H(x,20). Note we have switched the

direction of A3

(x1,b1..x1,41010, x2,b2..x2,41011,

x3,b3..x3,41011)?

? (x1,b1..x1,41011, x2,b2..x2,41011,

x3,b3..x3,41011)

gcp 1011, 1011, 1011

gcp 1010, 1011, 1011

gcp 1010, 1011, 1010

gcp 1011, 1011, 1010

?(x1,b1..x1,41011, x2,b2..x2,41011,

x3,b3..x3,41010)

(x1,b1..x1,41010, x2,b2..x2,41011,

x3,b3..x3,41010)?

gcp 1010, 1010, 1011

gcp 1011, 1010, 1011

gcp 1010, 1010, 1010

gcp 1011, 1010, 1010

?(x1,b1..x1,41011, x2,b2..x2,41010,

x3,b3..x3,41011)

(x1,b1..x1,41010, x2,b2..x2,41010,

x3,b3..x3,41011)?

(x1,b1..x1,41010, x2,b2..x2,41010,

x3,b3..x3,41010)?

?(x1,b1..x1,41011, x2,b2..x2,41010,

x3,b3..x3,41010)

A3

H(x,21) HOBBit disk about a HOBBit grid center

pt, x

, of radius 21

(x1,b1..x1,41010, x2,b2..x2,41010,

x3,b3..x3,41010)

A2

A3

(x1,b1..x1,41000, x2,b2..x2,41011,

x3,b3..x3,41011)?

?(x1,b1..x1,41011, x2,b2..x2,41011,

x3,b3..x3,41011)

?(x1,b1..x1,41011, x2,b2..x2,41010,

x3,b3..x3,41011)

(x1,b1..x1,41000, x2,b2..x2,41011,

x3,b3..x3,41000)?

A1

(x1,b1..x1,41010, x2,b2..x2,41010,

x3,b3..x3,41010)

?(x1,b1..x1,41011, x2,b2..x2,41000,

x3,b3..x3,41011)

?(x1,b1..x1,41011, x2,b2..x2,41000,

x3,b3..x3,41010)

?(x1,b1..x1,41011, x2,b2..x2,41000,

x3,b3..x3,41001)

(x1,b1..x1,41000, x2,b2..x2,41000,

x3,b3..x3,41000)?

?(x1,b1..x1,41011, x2,b2..x2,41000,

x3,b3..x3,41000)

The Regions of H(x,21) are as follows

These REGIONS are labeled with dimensions in

which length is increased (e.g., all three

dimensions are increased below).

A2

A3

A1

123-REGION

A2

A3

A1

13-REGION

A2

A3

A1

23-REG

A2

A3

A1

12-REGION

A2

A3

3-REG

A1

A2

A3

A1

2-REG

A2

A3

1-REGION

A1

H(x,20) 123-REG Of H(x,20)

A2

A3

A1

(No Transcript)

- Algorithm (for computing gradients)
- Select an outlier threshold, (pts without

neighbors in their ot L?-disk are outliers That

is, there is no gradient at these outlier points

(instantaneous rate of response change is zero). - Create an j-lo grid with jot (see

previous slides - where HOBBit disks are built

out from HOBBit centers - x ( x1,b1x1,ot11010

, , xn,bnxn,ot11010 ), xi,js ranging over

all binary patterns). - Pick a point, x in R. Build out alternating

one-sided-rings centered at x until a neighbor is

found or radius ot is exceeded (in which case x

is declared an outlier). If a neighbor is found

at a raduis, ri lt ot 2j, ?f/ ?xk(x) is

estimated as below - Note one can use L?-HOBBit or L? ordinary

distance. - Note One-sided means that each successive build

out increases aternatively only in the positive

direction in all dimensions then only in the

negative direction in all dimensions. - Note Building out HOBBit disks from a HOBBit

center automatically gives one-sided rings (a

built-out ring is defined to be the built-out

disk minus the previous built-out disk) as shown

in the next few slides. - ( RootcountD(x,ri) - RootcountD(x,ri)k ) / ?xk

where D(x,ri)k is D(x,ri-1) expanded - in all dimensions except k.
- Alternatively in 3., actually calculate the mean

(or median?) of the new points encountered in

D(x,ri) (we have a P-tree mask for the set so

this it trivial) and measure the xk-distance. - NOTE Might want to go 1 more ring out to see

if one gets the same or a similar gradient

gradient

H(x,21)

H(x,21)1

HOBBit center, x(x1,b1..x1,41010,

x2,b2..x2,41010, x3,b3..x3,41010)

First new point

( RootcountD(x,ri) - RootcountD(x,ri)2 ) / ?x2

(2-1)/(-1) -1

gradient

H(x,21)2

H(x,21)

( RootcountD(x,ri) - RootcountD(x,ri)3 ) / ?x3

(2-1)/(-1) -1

gradient

H(x,21)3

H(x,21)

gradient

H(x,21)

HOBBit center, x(x1,b1..x1,41010,

x2,b2..x2,41010, x3,b3..x3,41010)

First new point

Est ?f/ ?xk(x) (RcD(x,ri) - RootcountD(x,ri)1 )

/ ?x1 (2-2)/(-1) 0

gradient

H(x,21)

H(x,21)1

HOBBit center, x(x1,b1..x1,41010,

x2,b2..x2,41010, x3,b3..x3,41010)

gradient

H(x,21)2

H(x,21)

gradient

H(x,21)3

H(x,21)

gradient

H(x,21)

H(x,21)

H(x,21)1

HOBBit center, x(x1,b1..x1,41010,

x2,b2..x2,41010, x3,b3..x3,41010)

First new point

( RootcountD(x,ri) - RootcountD(x,ri)2 ) / ?x2

(2-2)/(-1) 0

H(x,21)2

H(x,21)

( RootcountD(x,ri) - RootcountD(x,ri)3 ) / ?x3

(2-2)/(-1) 0

H(x,21)3

H(x,21)

Intuitively, this Gradient estimation method

seems to work. Next we consider a potential

accuracy improvement in which we take the medoid

of all new points as the gradient (or, more

accurately, as the point to which we climb in any

response surface hill climbing technique)

H(x,21)

Estimate the gradient arrowhead as being at the

medoid of the new point set (or, more correctly,

estimate the next hill-climb step). Note If the

original points are truly part of a strong

cluster, the hill climb will be excellent.

H(x,21)

new points s

new points centroid

Estimate the gradient arrowhead as being at the

medoid of the new point set (or, more correctly,

estimate the next hill-climb step). Note If the

original points are not truly part of a strong

cluster, the weak hill climb will indicate that.

H(x,21)

new points s

new points centroid

First new point

H(x,22)

H(x,22)1

To evaluate how well the formula estimates

the gradient, it is important to consider all

cases of the new point appearing in one of these

regions (if ? 1 point appears, gradient

components are additive, so it suffices to

consider 1?

H(x,2)

To evaluate how well the formula estimates the

gradient, it is important to consider all cases

of the new point appearing in 1 of these regions

(if ? 1 pt appears, gradient comps add)

(No Transcript)

H( x,23 )

Notice that the HOBBit center moves more and more

toward the true center as the grid size increases.

Grid based Gradients and Hill Climbing

- If we are using gridding to produce the gradient

vector field of a response surface, might we

always vary ?xi in the positive direction only?

How can that be done most efficiently? - j-lo gridding, building out HOBBit rings from

HOBBit grid centers (see

previous slides where this approach was used.)

or j-lo

gridding. building out HOBBit rings from lo-value

grid pts (ending in j 0-bits) - x ( x1,b1x1,j100 , , xn,bnxn,j100

) - Ordinary j-lo griddng, building out rings from

lo-value ids (ending in j zero bits) - Ordinary j-lo gridding, uilding out Rings from

true centers. - Other? (there are many other possibilities, but

we will first explore 2.) - Using j-lo gridding with j3 and lo-value cell

identifiers, is shown on the next slide. - Of course, we need not use HOBBit build out.
- With ordinary unit radius build out, the results

are more exact, but are the calculations may be

more complex???

HOBBit j-lo rings using lo-value cell ids

x(x1,b1x1,j100 ,, xn,bnxn,j100)

Ordinary j-lo rings using lo-value cell ids

x(x1,b1x1,j100 ,, xn,bnxn,j100)

PDisk(x,3)PDisk(x,2)

PDisk(x,2)PDisk(x,1) wherePD(x,i)

Pxb..Pxj1 Pj..Pi1