British Museum Library, London - PowerPoint PPT Presentation

Loading...

PPT – British Museum Library, London PowerPoint presentation | free to download - id: 3ec380-ODFhY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

British Museum Library, London

Description:

* Template I-Grey curve * * Template I-Grey curve * * Template I-Grey curve * * Template I-Grey curve * * Template I-Grey curve * * Template I-Grey curve * * Template ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 41
Provided by: Marke2
Learn more at: http://eecs.wsu.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: British Museum Library, London


1
(No Transcript)
2
British Museum Library, London Picture Courtesy
flickr
3
Courtesy Wikipedia
4
Topic Models and the Role of Sampling
  • Barnan Das

5
British Museum Library, London Picture Courtesy
flickr
6
Topic Modeling
  • Methods for automatically organizing,
    understanding, searching and summarizing large
    electronic archives.
  • Uncover hidden topical patterns in collections.
  • Annotate documents according to topics.
  • Using annotations to organize, summarize and
    search.

7
Topic Modeling
NIH Grants Topic Map 2011 NIH Map Viewer
(https//app.nihmaps.org)
8
Topic Modeling Applications
  • Information retrieval.
  • Content-based image retrieval.
  • Bioinformatics

9
Overview of this Presentation
  • Latent Dirichlet allocation (LDA)
  • Approximate posterior inference
  • Gibbs sampling
  • Paper
  • Fast collapsed Gibbs sampling for LDA

10
Latent Dirichlet Allocation
David Bleis Talk Machine Learning Summer School,
Cambridge 2009 D. M. Blei, A. Y. Ng, and M. I.
Jordan, "Latent dirichlet allocation," The
Journal of Machine Learning Research, vol. 3, pp.
993-1022, 2003.
11
Probabilistic Model
  • Generative probabilistic modeling
  • Treats data as observations
  • Contains hidden variables
  • Hidden variables reflect thematic structure of
    the collection.
  • Infer hidden structure using posterior inference
  • Discovering topics in the collection.
  • Placing new data into the estimated model
  • Situating new documents into the estimated topic
    structure.

12
Intuition
13
Generative Model
14
Posterior Distribution
  • Only documents are observable.
  • Infer underlying topic structure.
  • Topics that generated the documents.
  • For each document, distribution of topics.
  • For each word, which topic generated the word.
  • Algorithmic challenge Finding the conditional
    distribution of all the latent variables, given
    the observation.

15
LDA as Graphical Model
Dirichlet
Multinomial
Dirichlet
Multinomial
16
Posterior Distribution
  • From a collection of documents W, infer
  • Per-word topic assignment zd,n
  • Per-document topic proportions ?d
  • Per-corpus topic distribution ?k
  • Use posterior expectation to perform different
    tasks.

17
Posterior Distribution
  • Evaluate P(zW) posterior distribution over the
    assignment of words to topic.
  • ? and ? can be estimated.

18
Computing P(zW)
  • Involves evaluating a probability distribution
    over a large discrete space.
  • Contribution of each zd,n depends on
  • All z-n values.
  • NkWn -gt of times word Wd,n has been assigned a
    topic k.
  • Nkd -gt of times a word from document d has been
    assigned a topic k.
  • Sampling from the target distribution using MCMC.

19
Approximate posterior inferenceGibbs Sampling
C. M. Bishop and SpringerLink, Pattern
recognition and machine learning vol. 4 Springer
New York, 2006. Iain Murrays Talk Machine
Learning Summer School, Cambridge 2009
20
Overview
  • When exact inference is intractable.
  • Standard sampling techniques have limitation
  • Cannot handle all kinds of distributions.
  • Cannot handle high dimensional data.
  • MCMC techniques do not have these limitations.
  • Markov chain
  • For random variables x(1),,x(M),
  • p(x(m1)x(1),,x(m))p(x(m1)x(m)) m?1,M-1

21
Gibbs Sampling
  • Target distribution p(x) p(x1,,xM).
  • Choose the initial state of the Markov chain
    xii1,M.
  • Replace xi by a value drawn from the distribution
    p(xix-i).
  • xi ith component of Z
  • x-i x1,,xM but xi omitted.
  • This process is repeated for all the variables.
  • Repeat the whole cycle for however many samples
    are needed.

22
Why Gibbs Sampling?
  • Compared to other MCMC techniques, Gibbs sampling
    is
  • Easy to implement
  • Requires little memory
  • Competitive in speed and performance

23
Gibbs Sampling for LDA
  • The full conditional distribution is

Probability of Wd,n under topic k
Probability of topic k in document d
Z ?k? ?
24
Gibbs Sampling for LDA
  • Target distribution
  • Initial state of Markov chain zn will have
    value in 1,2,,K.
  • Chain run for a number of iterations.
  • In each iteration a new state is found by
    sampling zn from

25
Gibbs Sampling for LDA
  • Subsequent samples are taken after appropriate
    lag to ensure that their autocorrelation is low.
  • This is collapsed Gibbs sampling.
  • For single sample ? and ? are calculated from z.

26
Fast Collapsed Gibbs Sampling For Latent
Dirichlet Allocation
  • Ian Porteous, David Newman, Alexander Ihler,
    Arthur Asuncion, Padhraic Smyth, Max Welling
  • University of California, Irvine

27
FastLDA Graphical Representation
28
FastLDA Segments
  • Sequence of bounds on the Z Z1,, Zk
  • Z1 ? Z2 ? ? ZK Z
  • Several slksKk segments for each topic.
  • 1st segment conservative estimate on the
    probability of the topic given the upper bound Zk
    on the true normalization factor Z.
  • Subsequent segments corrections for the missing
    probability mass for a topic given the improved
    bound.

29
FastLDA Segments
30
Upper Bounds for Z
  • Find a sequence of improving bounds on the
    normalization constant.
  • Z defined in terms of component vectors.
  • Holders inequality to construct initial upper
    bound.
  • Bound intelligently improved for each topic.

31
Fast LDA Algorithm
  • Algorithm
  • Sort topics in decreasing order of Nkd
  • u Uniform0,1
  • For topics in order
  • Calculate length of segments.
  • For each next topic, Zk is improved.
  • When sum of segments gt u
  • Return topic and return.
  • Complexity
  • Not more than O(K log K) for any operation.

32
Experiments
  • Four large datasets
  • NIPS full papers
  • Enron emails
  • NY Times news articles
  • PubMed abstracts
  • ? 0.01 and ? 2/K
  • Computations run on workstations with
  • Dual Xeon 3.0Ghz processors
  • Code compiled by gcc version 3.4.

33
Results
Speedup 5-8 times
34
Results
  • Speedup relatively insensitive to number of
    documents in the corpus.

35
Results
  • Large Dirichlet parameter smooths the
    distribution of the topics within a document.
  • FastLDA needs to visit and compute more topics
    before drawing a sample.

36
Discussions
37
Discussions
  • Other domains.
  • Other sampling techniques.
  • Other distributions other than Dirichlet.
  • Parallel computation.
  • Newman et al. Scalable parallel topic models.
  • Deciding on the value of K.
  • Choices of bounds.
  • Reason behind choosing these datasets.
  • Are the values mentioned in the paper magic
    numbers?
  • Why were the words having count lt10 discarded?
  • Assigning weights to words.

38
(No Transcript)
39
Backup Slides
40
Dirichlet Distribution
  • The Dirichlet distribution is an exponential
    family distribution over the simplex, i.e.,
    positive vectors that sum to one.
  • The Dirichlet is conjugate to the multinomial.
    Given a multinomial observation, the posterior
    distribution of ? is a Dirichlet.
About PowerShow.com