Generalized Model Selection For Unsupervised Learning in High Dimension - PowerPoint PPT Presentation

Loading...

PPT – Generalized Model Selection For Unsupervised Learning in High Dimension PowerPoint presentation | free to download - id: d0bc1-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Generalized Model Selection For Unsupervised Learning in High Dimension

Description:

Generalized Model Selection. For Unsupervised Learning. in High Dimension. Vaithyanathan and Dom ... propose a unified objective function whose arguments ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 17
Provided by: hjs9
Learn more at: http://bi.snu.ac.kr
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Generalized Model Selection For Unsupervised Learning in High Dimension


1
Generalized Model Selection For Unsupervised
Learning in High Dimension
  • Vaithyanathan and Dom
  • IBM Almaden Research Center
  • NIPS99

2
Abstract
  • Bayesian approach to model selection in
    unsupervised learning
  • propose a unified objective function whose
    arguments include both the feature space and
    number of clusters.
  • determining feature set (dividing feature set
    into noise features and useful features
  • determining the number of clusters
  • marginal likelihood with Bayesian scheme vs.
    cross-validation(cross-validated likelihood).
  • DC (Distributional clustering of terms) for
    initial feature selection.

3
Model Selection in Clustering
  • Bayesian approaches1), cross-validation2)
    techniques, MDL approaches3).
  • Need for unified objective function
  • the optimal number of clusters is dependent on
    the feature space in which the clustering is
    performed.
  • c.f. feature selection in clustering

4
Model Selection in Clustering (Contd)
  • Generalized model for clustering
  • data D d1,,d?, feature space T with
    dimension M
  • likelihood P(DT?) maximization, where ?(with
    parameter ?) is the structure of the model ( of
    clusters, the partitioning of the feature set
    into U(useful set), N(noise set) and the
    assignment of patterns to clusters).
  • Bayesian approach to model selection
  • regularization using marginal likelihood

5
Bayesian Approach to Model Selection for
Clustering
  • Data
  • data D d1,,dn, feature space T with
    dimension M
  • Clustering D
  • finding and such that
  • where ? is the structure of the model and ? is
    the set of all parameter vectors
  • the model structure ? consists of the of
    clusters the partitioning of the feature set
    and the assignment of patterns to clusters.

6
Assumptions
  • The feature sets T represented by U and N are
    conditionally independent and the data is
    independent.
  • 2. Data d1,,dn is i.i.d

7
  • 3. All parameter vectors are independent.
  • marginal likelihood
  • Approximations to Marginal Likelihood/Stochastic
    Complexity

8
Document Clustering
  • Marginal likelihood

  • (11)

adapting multinomial models using term counts as
the features
assuming that priors ?(..) is conjugate to the
Dirichlet distribution
NLML (Negative Log Marginal Likelihood)
9
Document Clustering (cont)
  • Cross-Validated likelihood

10
Distributional clustering for feature subset
selection
  • heuristic method to obtain a subset of tokens
    that are topical and can be used as features in
    the bag-of-words model to cluster documents
  • reduce feature size M to C
  • by clustering words based on their distributions
    over the documents.
  • A histogram for each token
  • the first bin of documents with zero
    occurrences of the token
  • the second bin of documents consisting of a
    single occurrence of the token
  • the third bin of documents that contain two or
    more occurrence of the term

11
DC for feature subset selection (Contd)
  • measure of similarity of the histograms
  • relative entropy or the K-L distance ?(..)
  • e.g. for two terms with prob. p1(.), p2(.)
  • k-means DC

12
Experimental Setup
  • AP Reuters Newswire articles from the TREC-6
  • 8235 documents from the routing track, 25
    classes, disregard multiple classes
  • 32450 unique terms (discarding terms that
    appeared in less than 3 documents)
  • Evaluation measure of clustering
  • MI

13
Results of Distributional Clustering
  • cluster 32450 tokens into 3,4,5 clusters.
  • eliminating function words

Figure 1. centroid of a typical
high-frequency function-words cluster
14
Finding the Optimum Features and Document
Clusters for a Fixed Number of Clusters
  • Now, apply the objective function (11) to the
    feature subsets selected by DC
  • EM/CEM (Classification EM hard-thresholded
    version of the EM)1)
  • initialization k-means algorithm

15
(No Transcript)
16
  • Comparison of feature-selection heuristics
  • FBTop20 Removal of the top 20 of the most
    frequent terms
  • FBTop40 Removal of the top 40 of the most
    frequent terms
  • FBTop40Bot10 Removal of top 40 of the most
    frequent terms and removal of all tokens that do
    not appear in at least 10 documents
  • NF No feature selection
  • CSW Common stop words removed
About PowerShow.com