Latent Dirichlet allocation - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Latent Dirichlet allocation

Description:

Latent Dirichlet Allocation in Web Spam Filtering (AIRWeb'08) ... Latent Friend Mining from Blog Data (SIGKDD'06) Usage of LDA (con't) ... – PowerPoint PPT presentation

Number of Views:837
Avg rating:3.0/5.0
Slides: 29
Provided by: netPk
Category:

less

Transcript and Presenter's Notes

Title: Latent Dirichlet allocation


1
Latent Dirichlet allocation
  • Conglei Yao
  • 2008-12-19

2
Outline
  • Brief introduction
  • LDA parameters estimation

3
Outline
  • Brief introduction
  • LDA parameters estimation

4
First paper of LDA
  • Latent dirichlet allocation
  • Blei, D.M. and Ng, A.Y. and Jordan, M.I.
  • The Journal of Machine Learning Research,2003
  • Figure

5
Why propose LDA
  • Other latent variable models
  • Models
  • Unigram
  • Mixture of unigrams
  • PLSI
  • Drawbacks
  • No ability to model multiple topics phenomenon
  • No ability to predict on new data
  • Too many parameters to estimate, intractable

6
(No Transcript)
7
Usage of LDA
  • Topic-word-document distribution
  • An result on wikipedia

Topic 0th medical health medicine care practice
patient training treatment patients Topic 1th
memory intel processor instruction processors
cpu performance instructions . Topic 199th
distribution probability test random sample
variables statistical variable data error
8
Usage of LDA (Cont)
  • The author-topic model for authors and documents
    (UAI, 2004)

9
Usage of LDA (Cont)
  • Learning to Classify Short and Sparse Text Web
    with Hidden Topics from Large-scale Data
    Collections (WWW08)

10
Usage of LDA (cont)
  • A Latent Dirichlet Model for Unsupervised Entity
    Resolution (SIAM06)

11
Usage of LDA (cont)
  • LDA-Based Document Models for Ad-hoc Retrieval
    (SIGIR06)

Topic Based Language Models for ad hoc
Information Retrieval (Neural networks, 2004)
12
Usage of LDA (Cont)
  • Latent Dirichlet Allocation in Web Spam Filtering
    (AIRWeb08)
  • Probabilistic Models for Discovering
    ECommunities (WWW06)
  • A mixture model for contextual text mining
    (SIGKDD06)
  • Latent Friend Mining from Blog Data (SIGKDD06)

13
Usage of LDA (cont)
  • Finding Scientific Topics (PNAS,2004)
  • Gibbs Sampling method to estimate parameters
  • Automatic determine topic number
  • Application on PNAS data

14
Outline
  • Brief introduction
  • LDA parameters estimation

15
Beta distribution
16
Beta distribution (Cont)
17
Dirichlet distribution
  • Generalize Beta distribution from 2 to K
    dimensions

18
Conjugate prior distributions
If the likelihood P(Xtheta) is a multinomial
distribution with parameters theta (a vector),
then for theta, the conjugate prior is the
Dirichlet distribution.
19
Latent Dirichlet allocation
20
(No Transcript)
21
Likelihoods
22
Inference via Gibbs Sampling
23
Collapsed LDA Gibbs Sampler
24
Joint distribution
25
Joint distribution (cont)
26
Update equation
27
Multinomial parameters
28
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com