Title: Describing Visual Scenes using Transformed Dirichlet Processes Erik B' Sudderth, Antonio Torralba, W
1Describing Visual Scenes using Transformed
Dirichlet Processes Erik B. Sudderth, Antonio
Torralba, William T. Freeman, and Alan S.
Willsky.In Adv. in Neural Information Processing
Systems, 2005.
- Misc-read presentation Jonathan Huang
(jch1_at_cs.cmu.edu) - 4/19/2006
2Paper Contributions
- An extension of the idea of using LDA on a visual
bag-of-words by incorporating spatial structure
into a generative model - An approach to handling uncertainty about the
number of instances of an object class within a
scene
3Outline
- Review Latent Dirichlet Allocation and
application to visual scenes - Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results
4Latent Dirichlet Allocation (LDA)
- In LDA, every document/image is a mixture of
topics, where the mixture proportions are drawn
from a Dirichlet prior.
j ranges over the documents i ranges over the
words in each document
5Latent Dirichlet Allocation (LDA)
Sky
Cow
Cow
Grass
Grass
Water
6Some Questions
- How do we choose the number of topics for LDA?
- How can we put spatial structure into this model?
7Outline
- Review Latent Dirichlet Allocation and
application to visual scenes - Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results
8Dirichlet Distributions
- The Dirichlet Distribution is defined on the
K-dimensional simplex - This can be thought of as a distribution on the
space of distributions over random variables
which can take K possible values.
9Dirichlet Processes (DP)
- The Dirichlet Process can be thought of as the
infinite dimensional version of the Dirichlet
Distribution. It is a distribution on the space
of all distributions (a measure over measures if
you prefer). - Definition of a Dirichlet Process
- The parameters to a DP are a positive number ?
and a base distribution G0 on some measurable
space ?. - If a distribution GDP(?,G0), then for any
partition (A1,,AK) of ?, - Intuitively, this means that a draw G from a DP
wants to look like the base distribution G0. In
fact, the expectation of DP(?,G0) is exactly G0,
and as ? increases, it becomes more likely that G
looks like G0. - Important fact samples from a DP are discrete
distributions with probability 1.
10Dirichlet Processes (DP)
- It is easier to think of the distribution we get
by sampling from some G which is first sampled
from a DP. - The Polya Urn sampling scheme (Blackwell/Macqueen
1973) gives a way to draw from G (where G is
never directly specified). Given a sequence
?1,?2,,?i-1 of i.i.d. previous draws from G, - The Polya Urn scheme
- is important if we want to use MCMC in models
with a Dirichlet Process. - Shows the clustering property of DPs
11Chinese Restaurant Processes
- The Polya urn scheme is closely related to the
Chinese Restaurant Process. - Consider a restaurant with infinitely many tables
- Customers ?i enter one at a time, choosing to
either sit at a table with other customers, or to
start a new table. - A customer starts a new table with probability
proportional to ?, and sits at an old table with
probability proportional to the number of people
at that table.
12DP Mixture Models
- Infinite limit of mixture models as the of
mixture components tends to infinity. - Gaussian mixture model example
13DP Mixture Models (Inference)
- There are various ways to do inference in these
models which generally use MCMC or variational
methods. - Inference is much easier when the base
distribution G0 and the data model are conjugate
to each other.
(Plot DP fits as a function of iterations within
a variational inference procedure, figure from
Michael Jordan tutorial)
(Plot DP fits as the number of points increases,
figure from Michael Jordan tutorial)
14Outline
- Review Latent Dirichlet Allocation and
application to visual scenes. - Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results
15Hierarchical Dirichlet Processes (HDP)
- What happens if we put a prior on a Dirichlet
Process? - Why would we want to?
- We might have a collection of related documents
or images, each of which is a mixture of gaussians
16Hierarchical Dirichlet Processes (HDP)
- Chinese Restaurant Franchise
- Now consider a franchise with infinitely many
restaurants - People come into each restaurant as in the
Dirichlet Process, but now - The first person to sit at a table gets to choose
a dish for all further people at that table to
share. - All restaurants share the same set of (possibly
infinite) dishes - Popular dishes get more popular under this
distribution
17Hierarchical Dirichlet Processes (HDP)
HDP Graphical Model
LDA Graphical Model
tji represents the ith table of the jth
document k_jt represents which dish is at table t
for the jth document.
18Outline
- Review Latent Dirichlet Allocation and
application to visual scenes. - Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results
19Transformed Dirichlet Processes (TDP)
- In the TDP, the global mixture components (the
?ks) undergo a set of random transformations for
each group (document/image).
LDA Graphical Model
HDP Graphical Model
TDP Graphical Model
- This is a twist on the Chinese Restaurant
Franchise - Now, the first customer at a table not only gets
to order a dish, but gets to season it in some
way.
20Outline
- Review Latent Dirichlet Allocation and
application to visual scenes. - Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results
21TDP on Visual Scenes
LDA Graphical Model
HDP Graphical Model
TDP Graphical Model
Visual Scene TDP Graphical Model
- Groups (Restaurants) correspond to training or
test images - O is a fixed number of object categories
- Every cluster (object class instantiation) has a
canonical mean and variance given by ?k, and is
allowed to translate by ?jt
22Transformed Dirichlet Processes (TDP)
23Local Image Features
- SIFT descriptors are computed over local
elliptical regions and vector quantized to form
1800 visual words.
24Outline
- Review Latent Dirichlet Allocation and
application to visual scenes. - Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results
25Results
- Dataset
- 250 training images and 75 test images from the
MIT-CSAIL database - Images contain buildings, side-views of cars,
roads. - Training is semi-supervised, in the sense that
some parts of each training image are labeled. - For Training 100 rounds of blocked
Gibbs-sampling. - For Testing 50 rounds of blocked Gibbs-sampling
with 10 random restarts.
26Results
- Remarks
- TDP can estimate the number of object
instantiations in each scene - TDP discovered that buildings are large, and
cars are small horizontal things.
27Results
28Conclusion
- As claimed,
- This method goes beyond bag-of-words models to
use spatial information - And models the multiple instantiations of an
object class within an image - The results might be more convincing if more than
three object classes were considered? -
29Thanks!
- References
- Erik B. Sudderth, Antonio Torralba, William T.
Freeman, and Alan S. Willsky. Describing Visual
Scenes using Transformed Dirichlet Processes. In
Adv. in Neural Information Processing Systems,
2005. - Erik B. Sudderth, Antonio Torralba, William T.
Freeman, and Alan S. Willsky. Depth from Familiar
Objects. To appear in CVPR 2006. - Michael Jordan. Dirichlet Processes, Chinese
Restaurant Processes and All That. NIPS 2005
tutorial slides.