Hierarchical Topic Models and the Nested Chinese Restaurant Process - PowerPoint PPT Presentation

About This Presentation
Title:

Hierarchical Topic Models and the Nested Chinese Restaurant Process

Description:

... Restaurant Process (CRP) 9 out of ... distribution on topics using a CRP prior; ... CRP prior vs. Bayes Factors. Predicting the structure. NIPS ... – PowerPoint PPT presentation

Number of Views:428
Avg rating:3.0/5.0
Slides: 27
Provided by: rodrigode
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical Topic Models and the Nested Chinese Restaurant Process


1
Hierarchical Topic Models and the Nested Chinese
Restaurant Process
  • Blei, Griffiths, Jordan, Tenenbaum
  • presented by Rodrigo de Salvo Braz

2
Document classification
  • One-class approach one topic per document, with
    words generated according to the topic.
  • For example, a Naive Bayes model.

3
Document classification
  • It is more realistic to assume more than one
    topic per document.
  • Generative model pick a mixture distribution
    over K topics and generate words from it.

4
Document classification
  • Even more realistic topics may be organized in a
    hierarchy (not independent)
  • Pick a path from root to leaf in a tree each
    node is a topic sample from the mixture.

5
Dirichlet distribution (DD)
  • Distribution over distribution vectors of
    dimension KP(p u, ?) 1/Z(u) ?i piui
  • Parameters are a prior distribution (previous
    observations)
  • Symmetric Dirichlet distribution assumes a
    uniform prior distribution (ui uj, any i, j).

6
Latent Dirichlet Allocation (LDA)
  • Generative model of multiple-topic documents
  • Generate a mixture distribution on topics using a
    Dirichlet distribution
  • Pick a topic according to their distribution and
    generate words according to the word distribution
    for the topic.

7
Latent Dirichlet Allocation (LDA)
DD hyper parameter
Topics
?
?
K
?
Words
w
Topic distribution
W
8
Chinese Restaurant Process (CRP)
1 out of 9 customers
9
Chinese Restaurant Process (CRP)
2 out of 9 customers
10
Chinese Restaurant Process (CRP)
3 out of 9 customers
11
Chinese Restaurant Process (CRP)
4 out of 9 customers
12
Chinese Restaurant Process (CRP)
5 out of 9 customers
13
Chinese Restaurant Process (CRP)
6 out of 9 customers
14
Chinese Restaurant Process (CRP)
7 out of 9 customers
15
Chinese Restaurant Process (CRP)
8 out of 9 customers
16
Chinese Restaurant Process (CRP)
9 out of 9 customers
Data point (a distribution itself) sampled
17
Species Sampling Mixture
  • Generative model of multiple-topic documents
  • Generate a mixture distribution on topics using a
    CRP prior
  • Pick a topic according to their distribution and
    generate words according to the word distribution
    for the topic.

18
Species Sampling Mixture
CRP hyper parameter
Topics
?
?
K
?
Words
w
Topic distribution
W
19
Nested CRP
1
2
3
4
5
6
1
2
3
4
5
6
3
6
1
2
4
5
20
Hierarchical LDA (hLDA)
  • Generative model of multiple-topic documents
  • Generate a mixture distribution on topics using a
    Nested CRP prior
  • Pick a topic according to their distribution and
    generate words according to the word distribution
    for the topic.

21
hLDA graphical model
22
Artificial data experiment
100 1000-word documents on 25-term
vocabulary Each vertical bar is a topic
23
CRP prior vs. Bayes Factors
24
Predicting the structure
25
NIPS abstracts
26
Comments
  • Accommodates growing collections of data
  • Hierarchical organization makes sense, but not
    clear to me why the CRP prior is the best prior
    for that
  • No mention of time maybe it takes a very long
    time.
Write a Comment
User Comments (0)
About PowerShow.com