Probabilistic Latent Semantic Analysis - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Probabilistic Latent Semantic Analysis

Description:

Perplexity Comparison (1/2) What is perplexity? Indicator ... High probability will give lower perplexity, thus good predictions. Perplexity Comparison (2/2) ... – PowerPoint PPT presentation

Number of Views:299
Avg rating:3.0/5.0
Slides: 29
Provided by: Ral97
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Latent Semantic Analysis


1
Probabilistic Latent Semantic Analysis
  • Thomas Hofmann
  • Presented by
  • Vortrag Quang Lam Nguyen
  • Basierend auf Mummoorthy Murugesan, Cs 6901

2
Outline
  • Background
  • LSA
  • PLSA
  • Model Fitting
  • Basic I Maximum Likelihood Estimation
  • Basic II EM Algorithm
  • Basic III Over fitting
  • Experimental Results
  • Conclusion

3
Background (1/2)
Probabilistic Latent Semantic Analysis and
Latent Semantic Analysis
  • Latent present but not evident, hidden
  • Semantic meaning
  • Hidden meaning of terms, and their
    occurrences in documents

4
Background (2/2)
N dimensions lexical space
Sport
Polysemy
Synonymy
Muskelkater
Kater
Polysemy
Auto
Bank
Bank
Du hast nicht alle Tassen im Schrank
Wagen
KltN dimensions semantic (latent) space
Du bist verrückt
Bank
Auto
Bank
Park
Wagen
Einzahlung
Kater
Muskelkater
Sport
5
The Setting
  • Set of N documents
  • Dd_1, ,d_N
  • Set of M words
  • Ww_1, ,w_M
  • Set of K Latent classes
  • Zz_1, ,z_K

6
Latent Semantic Indexing (1/2)
  • Term-Document-matrix A of size N M to represent
    the frequency counts
  • Singular Value Decomposition (SVD)
  • A(nm) U(nn) E(nm) V(mm)
  • Keep only k eigen values from E
  • A(nm) U(nk) E(kk) V(km)
  • A A
  • Term represented by k factors or a vector in
    k-dimensional space
  • Terms with common meaning mapped to same
    direction


7
Latent Semantic Indexing (2/2)
  • LSI puts documents together even if they dont
    have common words
  • Disadvantages
  • Statistical foundation is missing
  • PLSA addresses this concern!

8
Probabilistic Latent Semantic Analysis
  • Overview
  • Aspect Model
  • Model fitting with EM and TEM
  • Basic I Maximum Likelihood Estimation
  • Basic II EM Algorithm
  • Basic III Over fitting

9
PLSA Overview
  • Automated Document Indexing and Information
    retrieval
  • Identification of Latent Classes using an
    Expectation Maximization (EM) Algorithm
  • Shown to solve
  • Polysemy and Synonymy
  • Has a better statistical foundation than LSA

10
PLSA Aspect Model (1/3)
  • Aspect Model
  • Document is a mixture of underlying (latent) K
    aspects
  • Each aspect is represented by a distribution of
    words p(wz)

11
Aspect Model (2/3)
  • Latent Variable model for general co-occurrence
    data
  • Associate each observation (w,d) with a class
    variable z ? Zz_1,,z_K
  • Generative Model predicting words
  • Select a doc with probability P(d)
  • Pick a latent class z with probability P(zd)
  • Generate a word w with probability p(wz)

P(d)
P(zd)
P(wz)
d
z
w
12
Aspect Model (3/3)
  • To get the joint probability model
  • (d,w) assumed to be independent
  • Now, we have to compute P(z), P(zd), P(wz). We
    are given just documents(d) and words(w).

13
Basic I Maximum Likelihood Estimation
  • Probability model based on real data
  • ? it has to be fit ? Model Fitting
  • Tuning free Parameters of the model to provide an
    optimal fit to real-world data
  • Parameters in a way that make the data more
    likely than other values would do it
  • Prerequisite correct parameters are known!

14
Basic II EM Algorithm (1/2)
  • Maximum Likelihood Estimation
  • BUT correct parameters not known
  • FOR they depend on unknown properties!
  • Iterative
  • 1. Expectation Step
  • 2. Maximization Step

15
Basic II EM Algorithm (2/2)
  • E-Step (Expectation)
  • Hidden parameters estimated - expectation of the
    likelihood function is calculated with the
    current parameter values
  • M-Step (Maximization)
  • Determine the actual parameters -
  • Find the parameters that maximizes the likelihood
    function (Maximum Likelihood Estimation)

16
Model fitting
  • We have the equation for log-likelihood function
    from the aspect model, and we need to maximize
    it.
  • Expectation Maximization ( EM) is used for this
    purpose

17
E-Step Model Fitting (2/2)
  • It is the probability that a word w occurring in
    a document d, is explained by aspect z
  • (based on some calculations)

18
M Step Model Fitting (3/3)
  • All these equations use p(zd,w) calculated in E
    Step
  • Converges to local maximum of the likelihood
    function

19
Basic III Over fitting
  • Trade off between Predictive performance on the
    training data and Unseen new data
  • Actual aim predict correct output for UNSEEN
    data, too -gt generalization
  • Problem may adjust to very specific random
    features of the training data too much -gt over
    fitting
  • ? Tempered EM

20
TEM (Tempered EM)
  • Introduce control parameter ß
  • ß starts from the value of 1, and decreases
  • Similar to Simulated Annealing
  • ß as temperature variable

21
Choosing ß
  • It defines
  • Underfit Vs Overfit
  • Simple solution using held-out data (part of
    training data)
  • Using the training data for ß starting from 1
  • Test the model with held-out data
  • If improvement, continue with the same ß
  • If no improvement, ß nß where nlt1

22
Experimental Results
  • Perplexity Comparison
  • Polysemy
  • Information Retrieval

23
Perplexity Comparison (1/2)
  • What is perplexity?
  • Indicator for the quality of probability models
  • Less surpised by test example
  • High probability will give lower perplexity, thus
    good predictions

24
Perplexity Comparison (2/2)
25
Polysemy
  • Segment occurring in two different contexts are
    identified (image, sound)

26
Information Retrieval
  • For natural Language Queries, simple term
    matching does not work effectively
  • Ambiguous terms
  • Same Queries vary due to personal styles
  • Latent semantic indexing
  • Creates this latent semantic space (hidden
    meaning)

27
Comparing PLSA and LSA
  • LSA and PLSA perform dimensionality reduction
  • In LSA, by keeping only K singular values
  • In PLSA, by having K aspects
  • Comparison to SVD
  • U Matrix related to P(dz) (doc to aspect)
  • V Matrix related to P(zw) (aspect to term)
  • E Matrix related to P(z) (aspect strength)
  • The main difference is the way the approximation
    is done
  • PLSA generates a model (aspect model) and
    maximizes its predictive power
  • Selecting the proper value of K is heuristic in
    LSA
  • Model selection in statistics can determine
    optimal K in PLSA

28
Conclusion
  • PLSI consistently outperforms LSI in the
    experiments
  • Precision gain is 100 compared to baseline
    method in some cases
  • PLSA has statistical theory to support it, and
    thus better than LSA.
Write a Comment
User Comments (0)
About PowerShow.com