Artificial Intelligence 15-381 Unsupervised Machine Learning Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Artificial Intelligence 15-381 Unsupervised Machine Learning Methods

Description:

Learning useful structure without labeled classes, optimization criterion, ... Prelude to discovery of underlying properties. Summarize the news for the past month ... – PowerPoint PPT presentation

Number of Views:543
Avg rating:3.0/5.0
Slides: 20
Provided by: rcp6
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Artificial Intelligence 15-381 Unsupervised Machine Learning Methods


1
Artificial Intelligence 15-381Unsupervised
Machine Learning Methods
  • Jaime Carbonell
  • 1-November-2001
  • OUTLINE
  • What is unsupervised learning?
  • Similarity computations
  • Clustering Algorithms
  • Other kinds of unsupervised learning

2
Unsupervised Learning
  • Definition of Unsupervised Learning
  • Learning useful structure without labeled
    classes, optimization criterion, feedback signal,
    or any other information beyond the raw data

3
Unsupervised Learning
  • Examples
  • Find natural groupings of Xs (Xhuman languages,
    stocks, gene sequences, animal species,)?
  • Prelude to discovery of underlying properties
  • Summarize the news for the past month?
  • Cluster first, then report centroids.
  • Sequence extrapolation E.g. Predict cancer
    incidence next decade predict rise in
    antibiotic-resistant bacteria
  • Methods
  • Clustering (n-link, k-means, GAC,)
  • Taxonomy creation (hierarchical clustering)
  • Novelty detection ("meaningful"outliers)
  • Trend detection (extrapolation from multivariate
    partial derivatives)

4
Similarity Measures in Data Analysis
  • General Assumptions
  • Each data item is a tuple (vector)
  • Values of tuple are nominal, ordinal or numerical
  • Similarity (Distance)-1
  • Pure Numerical Tuples
  • Sim(di,dj) ?di,kdj,k
  • sim (di,dj) cos(didj)
  • and many more (slide after next)

5
Similarity Measures in Data Analysis
  • For Ordinal Values
  • E.g. "small," "medium," "large," "X-large"
  • Convert to numerical assuming constant ?on a
    normalized 0,1 scale, where max(v)1,
    min(v)0, others interpolate
  • E.g. "small"0, "medium"0.33, etc.
  • Then, use numerical similarity measures
  • Or, use similarity matrix (see next slide)

6
Similarity Measures (cont.)
  • For Nominal Values
  • E.g. "Boston", "LA", "Pittsburgh", or "male",
    "female", or "diffuse", "globular", "spiral",
    "pinwheel"
  • Binary rule If di,kdj,k, then sim1, else 0
  • Use underlying sematic property E.g. Sim(Boston,
    LA)?dist(Boston, LA)-1, or Sim(Boston,
    LA)?(size(Boston) size(LA) )-1
  • Use similarity Matrix

7
Similarity Matrix
  • tiny little small medium large huge
  • tiny 1.0 0.8 0.7 0.5 0.2 0.0
  • little 1.0 0.9 0.7 0.3 0.1
  • small 1.0 0.7 0.3 0.2
  • medium 1.0 0.5 0.3
  • large 1.0 0.8
  • huge 1.0
  • Diagonal must be 1.0
  • Monotonicity property must hold
  • Triangle inequality must hold
  • Transitive property need not hold

8
Document Clustering Techniques
  • Similarity or Distance MeasureAlternative
    Choices
  • Cosine similarity
  • Euclidean distance
  • Kernel functions, e.g.,
  • Language Modeling P(ymodelx) where x and y are
    documents

9
Document Clustering Techniques
  • Kullback Leibler distance ("relative entropy")

10
Incremental Clustering Methods
  • Given n data items D D1, D2,Di,Dn
  • And given minimal similarity threshold Smin
  • Cluster data incrementally as follows
  • Procedure Singlelink(D)
  • Let CLUSTERS D1
  • For i2 to n
  • Let Dc ArgmaxSim(Di,Dj
  • jlti
  • If DcgtSmin, add Dj to Dc's cluster
  • Else Append(CLUSTERS, Dj new cluster

11
Incremental Clustering (cont.)
  • Procedure Averagelink(D)
  • Let CLUSTERS D1
  • For i2 to n
  • Let Dc ArgmaxSim(Di, centroid(C)
  • C in CLUSTERS
  • If DcgtSmin, add Dj to cluster C
  • Else Append(CLUSTERS, Dj new cluster
  • Observations
  • Single pass over the data?easy to cluster new
    data incrementally
  • Requires arbitrary Smin threshold
  • O(N2) time, O(N) space

12
Document Clustering Techniques
  • Example. Group documents based on similarity
  • Similarity matrix
  • Thresholding at similarity value of .9 yields
  • complete graph C1 1,4,5, namely Complete
    Linkage
  • connected graph C21,4,5,6, namely Single
    Linkage
  • For clustering we need three things
  • A similarity measure for pairwise comparison
    between documents
  • A clustering criterion (complete Link, Single
    Ling,)
  • A clustering algorithm

13
Document Clustering Techniques
  • Clustering Criterion Alternative Linkages
  • Single-link ('nearest neighbor")
  • Complete-link
  • Average-link ("group average clustering") or
    GAC)

14
Non-hierarchical Clustering Methods
  • A Single-Pass Algorithm
  • Treat the first document as the first cluster
    (singleton cluster).
  • Compare each subsequent document to all the
    clusters processed so far.
  • Add this new document to the closest cluster if
    the intercluster similarity is above the
    similarity threshold (predetermined) otherwise,
    leave the new document alone as a new cluster.
  • Repeat Steps 2 and 3 until all the documents are
    processed.
  • - O(n2) time and O(n) space (worst case
    complexity)

15
Non-hierarchical Methods (cont.)
  • Multi-pass K-means ("reallocation method")
  • Select K initial centroids (the "seeds")
  • Assign each document to the closeest centroid,
    resulting in K clusters.
  • Recompute the centroid for each of the K
    clusters.
  • Repeat Steps 2 and 3 until the centroids are
    stabilized.
  • - O(nK) time and O(K) space per pass

16
Hierarchical Agglomerative Clustering Methods
  • Generic Agglomerative Procedure (Salton '89)
  • result in nested clusters via iterations
  • Compute all pairwise document-document similarity
    coefficients
  • Place each of n documents into a class of its own
  • Merge the two most similar clusters into one
  • - replace the two clusters by the new cluster
  • - compute intercluster similarity scores w.r.t.
    the new cluster
  • Repeat the above step until only one cluster is
    left

17
Hierarchical Agglomerative Clustering Methods
(cont.)
  • Heuristic Approaches to Speedy Clustering
  • Reallocation methods with k selected-seeds (O(kn)
    time)
  • - k is the desired number of clusters n is the
    number of documents
  • Buckshot random sampling (of ?(k)n documents)
    puls global HAC
  • Fractionation Divide and Conquer

18
Creating Taxonomies
  • Hierarchical Clustering
  • GAC trace creates binary hierarchy
  • Incremental-link? Hierarchical version
  • Cluster data with high Smin? 1st hierarchical
    level
  • Decrease Smin (stop at Smin0)
  • Treat cluster centroids as data tuples and
    recluster, creating next level of hierarchy, then
    repeat steps 2 and 3.
  • K-means? Hierarchical k-means
  • Cluster data with large k
  • Decrease k (stop at k1)
  • Treat cluster centroids as data tuples and
    recluster, creating next level of hierarchy, then
    repeat steps 2 and 3.

19
Taxonomies (cont.)
  • Postprocess Taxonomies
  • Eliminate "no-op" levels
  • Agglomerate "skinny" levels
  • Label meaningful levels manually or with centroid
    summary
Write a Comment
User Comments (0)
About PowerShow.com