Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. - PowerPoint PPT Presentation

Loading...

PPT – Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. PowerPoint presentation | free to download - id: 7c1aaf-YWNlY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003.

Description:

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Jonathan Huang (jch1_at_cs.cmu.edu) – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 21
Provided by: DerekH157
Learn more at: http://web2.utc.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003.


1
Latent Dirichlet Allocation D. Blei, A. Ng, and
M. Jordan. Journal of Machine Learning Research,
3993-1022, January 2003.
  • Jonathan Huang (jch1_at_cs.cmu.edu)
  • Advisor Carlos Guestrin
  • 11/15/2005

2
Bag of Words Models
  • Lets assume that all the words within a document
    are exchangeable.

3
Mixture of Unigrams
Zi
w4i
w3i
w2i
wi1
Mixture of Unigrams Model (this is just Naïve
Bayes)
  • For each of M documents,
  • Choose a topic z.
  • Choose N words by drawing each one independently
    from a multinomial conditioned on z.
  • In the Mixture of Unigrams model, we can only
    have one topic per document!

4
The pLSI Model
  • For each word of document d in the training set,
  • Choose a topic z according to a multinomial
    conditioned on the index d.
  • Generate the word by drawing from a multinomial
    conditioned on z.
  • In pLSI, documents can have multiple topics.

d
zd4
zd3
zd2
zd1
wd4
wd3
wd2
wd1
Probabilistic Latent Semantic Indexing (pLSI)
Model
5
Motivations for LDA
  • In pLSI, the observed variable d is an index into
    some training set. There is no natural way for
    the model to handle previously unseen documents.
  • The number of parameters for pLSI grows linearly
    with M (the number of documents in the training
    set).
  • We would like to be Bayesian about our topic
    mixture proportions.

6
Dirichlet Distributions
  • In the LDA model, we would like to say that the
    topic mixture proportions for each document are
    drawn from some distribution.
  • So, we want to put a distribution on
    multinomials. That is, k-tuples of non-negative
    numbers that sum to one.
  • The space is of all of these multinomials has a
    nice geometric interpretation as a (k-1)-simplex,
    which is just a generalization of a triangle to
    (k-1) dimensions.
  • Criteria for selecting our prior
  • It needs to be defined for a (k-1)-simplex.
  • Algebraically speaking, we would like it to play
    nice with the multinomial distribution.

7
Dirichlet Examples
8
Dirichlet Distributions
  • Useful Facts
  • This distribution is defined over a
    (k-1)-simplex. That is, it takes k non-negative
    arguments which sum to one. Consequently it is a
    natural distribution to use over multinomial
    distributions.
  • In fact, the Dirichlet distribution is the
    conjugate prior to the multinomial distribution.
    (This means that if our likelihood is multinomial
    with a Dirichlet prior, then the posterior is
    also Dirichlet!)
  • The Dirichlet parameter ?i can be thought of as a
    prior count of the ith class.

9
The LDA Model
?
?
?
?
z4
z3
z2
z1
z4
z3
z2
z1
z4
z3
z2
z1
w4
w3
w2
w1
w4
w3
w2
w1
w4
w3
w2
w1
  • For each document,
  • Choose ?Dirichlet(?)
  • For each of the N words wn
  • Choose a topic zn Multinomial(?)
  • Choose a word wn from p(wnzn,?), a multinomial
    probability conditioned on the topic zn.

b
10
The LDA Model
  • For each document,
  • Choose ? Dirichlet(?)
  • For each of the N words wn
  • Choose a topic zn Multinomial(?)
  • Choose a word wn from p(wnzn,?), a multinomial
    probability conditioned on the topic zn.

11
Inference
  • The inference problem in LDA is to compute the
    posterior of the hidden variables given a
    document and corpus parameters ? and ?. That is,
    compute p(?,zw,?,?).
  • Unfortunately, exact inference is intractable, so
    we turn to alternatives

12
Variational Inference
  • In variational inference, we consider a
    simplified graphical model with variational
    parameters ?, ? and minimize the KL Divergence
    between the variational and posterior
    distributions.

13
Parameter Estimation
  • Given a corpus of documents, we would like to
    find the parameters ? and ? which maximize the
    likelihood of the observed data.
  • Strategy (Variational EM)
  • Lower bound log p(w?,?) by a function L(?,??,?)
  • Repeat until convergence
  • Maximize L(?,??,?) with respect to the
    variational parameters ?,?.
  • Maximize the bound with respect to parameters ?
    and ?.

14
Some Results
  • Given a topic, LDA can return the most probable
    words.
  • For the following results, LDA was trained on
    10,000 text articles posted to 20 online
    newsgroups with 40 iterations of EM. The number
    of topics was set to 50.

15
Some Results
politics sports space computers christianity
Political Team Space Drive God
Party Game NASA Windows Jesus
Business Play Research Card His
Convention Year Center DOS Bible
Institute Games Earth SCSI Christian
Committee Win Health Disk Christ
States Hockey Medical System Him
Rights Season Gov Memory Christians
16
Extensions/Applications
  • Multimodal Dirichlet Priors
  • Correlated Topic Models
  • Hierarchical Dirichlet Processes
  • Abstract Tagging in Scientific Journals
  • Object Detection/Recognition

17
Visual Words
  • Idea Given a collection of images,
  • Think of each image as a document.
  • Think of feature patches of each image as words.
  • Apply the LDA model to extract topics.
  • (J. Sivic, B. C. Russell, A. A. Efros, A.
    Zisserman, W. T. Freeman. Discovering object
    categories in image collections. MIT AI Lab Memo
    AIM-2005-005, February, 2005. )

18
Visual Words
  • Examples of visual words

19
Visual Words
20
Thanks!
  • Questions?
  • References
  • Latent Dirichlet allocation. D. Blei, A. Ng, and
    M. Jordan. Journal of Machine Learning Research,
    3993-1022, January 2003.
  • Finding Scientific Topics. Griffiths, T.,
    Steyvers, M. (2004). Proceedings of the National
    Academy of Sciences, 101 (suppl. 1), 5228-5235.
  • Hierarchical topic models and the nested Chinese
    restaurant process. D. Blei, T. Griffiths, M.
    Jordan, and J. Tenenbaum In S. Thrun, L. Saul,
    and B. Scholkopf, editors, Advances in Neural
    Information Processing Systems (NIPS) 16,
    Cambridge, MA, 2004. MIT Press.
  • Discovering object categories in image
    collections. J. Sivic, B. C. Russell, A. A.
    Efros, A. Zisserman, W. T. Freeman. MIT AI Lab
    Memo AIM-2005-005, February, 2005.
About PowerShow.com