Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models - PowerPoint PPT Presentation

About This Presentation
Title:

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

Description:

Lower perplexity compared to unigram models. Reveals meaningful ... Ising model. L1 regularized conditional likelihood learns true structure asymptotically ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 29
Provided by: nmra
Learn more at: http://ai.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models


1
Sparse Word GraphsA Scalable Algorithm for
Capturing Word Correlations in Topic Models
  • Ramesh NallapatiJoint work with
  • John Lafferty, Amr Ahmed,
  • William Cohen and Eric Xing
  • Machine Learning Department
  • Carnegie Mellon University

2
Introduction
  • Statistical topic modeling an attractive
    framework for topic discovery
  • Completely unsupervised
  • Models text very well
  • Lower perplexity compared to unigram models
  • Reveals meaningful semantic patterns
  • Can help summarize and visualize document
    collections
  • e.g. PLSA, LDA, DPM, DTM, CTM, PA

3
Introduction
  • A common assumption in all the variants
  • Exchangeability bag of words assumption
  • Topics represented as a ranked list of words
  • Consequences
  • Word Correlation information is lost
  • e.g. white-house vs. white and house
  • Long distance correlations

4
Introduction
  • Objective
  • To capture correlations between words within
    topics
  • Motivation
  • More interpretable representation of topics as a
    network of words rather than a list
  • Helps better visualize and summarize document
    collections
  • May reveal unexpected relationships and patterns
    within topics

5
Past Work Topic Models
  • Bigram topic models Wallach, ICML 2006
  • Requires KV(K-1) parameters
  • Only captures local dependencies
  • Does not model sparsity of correlations
  • Does not capture within-topic correlations

6
Past work Other approaches
  • Hyperspace Analog to Language (HAL)
  • Lund and Burges, Cog. Sci., 96
  • Word pair correlation measured as a weighted
    count of number of times they occur within a
    fixed length window
  • Weight of an occurrence / 1/(mutual distance)

7
Past work Other approaches
  • Hyperspace Analog to Language (HAL)
  • Lund and Burges, Cog. Sci., 96
  • Plusses
  • Sparse solutions, scalability
  • Minuses
  • Only unearths global correlations, not semantic
    correlations
  • E.g. river bank, bank check
  • Only local dependencies

8
Past work Other approaches
  • Query expansion in IR
  • Similar in spirit finds words that highly
    co-occur with the query words
  • However, not a corpus visualization tool
    requires a context to operate on
  • Wordnet
  • Semantic networks
  • Human labeled not directly related to our goal

9
Our approach
  • L1 norm regularization
  • Known to enforce sparse solutions
  • Sparsity permits scalability
  • Convex optimization problem
  • Globally optimal solutions
  • Recent advances in learning structure of
    graphical models
  • L1 regularization framework asymptotically leads
    to true structure

10
BackgroundLASSO
  • Example linear regression
  • Regularization used to improve generalizability
  • E.g.1 Ridge regression L2 norm regularization
  • E.g.2 Lasso L1 norm regularization

11
Background LASSO
  • Lasso encourages sparse solutions

12
Background Gaussian Random Fields
  • Multivariate Gaussian distribution
  • Random field structure G (V,E)
  • V set of all variables X1,?,Xp
  • (s,t) 2 E , ?-1st ? 0
  • Xs ? Xu XN(s) where u ? N(s)

13
Background Gaussian Random Fields
  • Estimating the graph structure of GRF from data
    Meinshausen and Buhlmann, Annals. Stats., 2006
  • Regress each variable onto others imposing L1
    penalty to encourage sparsity
  • Estimated neighborhood

14
Background Gaussian Random Fields
Estimated graph
True Graph
Courtesy Meinshausen and Buhlmann, Annals.
Stats., 2006
15
Background Gaussian Random Fields
  • Application to topic models CTM
  • Blei and Lafferty, NIPS, 2006

16
Background Gaussian Random Fields
  • Application to CTMBlei Lafferty, Annals.
    Appl. Stats., 07

17
Structure learning of an MRF
  • Ising model
  • L1 regularized conditional likelihood learns true
    structure asymptotically
  • Wainwright, Ravikumar and Lafferty, NIPS06

18
Structure learning of an MRF
Courtesy Wainwright, Ravikumar and Lafferty,
NIPS06
19
Sparse Word Graphs
  • Algorithm
  • Run LDA on the document collection and obtain
    topic assignments
  • Convert topic assignments for each document into
    K binary vectors X
  • Assume an MRF for each topic with X as underlying
    data
  • Apply structure learning for MRF using
    regularized conditional likelihood

20
Sparse Word Graphs
21
Sparse Word Graphs Scalability
  • We still run V logistic regression problems, each
    of size V for each topic O(KV2) !
  • However, each example is very sparse
  • L1 penalty results in sparse solutions
  • Can run each topic in parallel
  • Efficient interior point based L1 regularized
    logistic regression Koh, Kim Boyd, JMLR,07

22
Experiments
  • Small AP corpus
  • 2.2K Docs, 10.5K unique words
  • Ran 10 topic LDA model
  • Used ? 0.1 in L1 logistic regression
  • Took just 45 min. per topic
  • Very sparse solutions
  • Computes only under 0.1 of the total number of
    possible edges

23
Topic Business neighborhood of top LDA terms
24
Topic Business neighborhood of top edges
25
Topic War neighborhood of top LDA terms
26
Topic War neighborhood of top edges
27
Concluding remarks
  • Pros
  • A highly scalable algorithm for capturing within
    topic word correlations
  • Captures both short distance and long distance
    correlations
  • Makes topics more interpretable
  • Cons
  • Not a complete probabilistic model
  • Significant modeling challenge since the
    correlations are latent

28
Concluding remarks
  • Applications of Sparse Word Graphs
  • Better document summarization and visualization
    tool
  • Word sense disambiguation
  • Semantic query expansion
  • Future Work
  • Evaluation on a real task
  • Build a unified statistical model
Write a Comment
User Comments (0)
About PowerShow.com