A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005

About This Presentation

Title:

A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005

Description:

Inference How to make decision on a novel image ... Blei et al Unsupervised Learning by Probabilistic Latent Semantic Analysis, T. Hoffman A Bayesian Hierarchical ... – PowerPoint PPT presentation

Number of Views:398

Avg rating:3.0/5.0

Slides: 60

Provided by: SchoolofC86

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005

1
A Bayesian Hierarchical Model for Learning
Natural Scene Categories L. Fei-Fei and P.
Perona. CVPR 2005 Discovering objects and their
location in images J. Sivic, B. Russell, A.
Efros, A. Zisserman and B. Freeman. ICCV 2005
Tomasz Malisiewicz tomasz_at_cmu.edu Advanced
Machine Perception February 2006
2
Graphical Models Recent Trend in Machine Learning
Describing Visual Scenes using Transformed
Dirichlet Processes. E. Sudderth, A. Torralba,
W. Freeman, and A. Willsky. NIPS, Dec. 2005.
3
Outline

Goals of both vision papers
Techniques from statistical text modeling
- pLSA vs LDA
Scene Classification via LDA
Object Discovery via pLSA

4
Goal Learn and Recognize Natural Scene Categories
Classify a scene without first extracting objects
Other techniques we know of -Global
frequency (Oliva and Torralba) -Texton Histogram
(Renninger, Malik et al)
5
Goal Discover Object Categories

Discover what objects are present in a collection
of images in an unsupervised way
Find those same objects in novel images
Determine what local image features correspond to
what objects segmenting the image

6
Enter the world of Statistical Text Modeling

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet
allocation. Journal of Machine Learning Research,
39931022, January 2003.
Bag-of-words approaches the order of words in a
document can be neglected
Graphical Model Fun

7
Bag-of-words

A document is a collection of M words
A corpus (collection of documents) is summarized
in a term-document matrix

8
(No Transcript)
9
(No Transcript)
10
1990 Latent Semantic Analysis (LSA)

Goal map high-dimensional count vectors to a
lower dimensional representation to reveal
semantic relations between words
The lower dimensional space is called the latent
semantic space
Dim( latent space ) K

11
1990 Latent Semantic Analysis (LSA)

D d1,,dN N documents
W w1,,wM M words
Nij (di,wj) NxM co-occurrence
term-document matrix

12
What did we just do?
Singular Value Decomposition
13
LSA summary

SVD on term-document matrix
Approximate N by thresholding all but the largest
K singular values in W to zero
Produces rank-K optimal approximation to N in the
L2-matrix or Frobenius norm sense

14
LSA and Polysemy

Polysemy the ambiguity of an individual word or
phrase that can be used (in different contexts)
to express two or more different meanings
Under the LSA model, the coordinates of a word in
latent space can be written as a linear
superposition of the coordinates of the documents
that contain the word

15
Problems with LSA

LSA does not define a properly normalized
probability distribution
No obvious interpretation of the directions in
the latent space
From statistics, the utilization of L2 norm in
LSA corresponds to a Gaussian Error assumption
which is hard to justify in the context of count
variables
Polysemy problem

16
pLSA to the rescue

Probabilistic Latent Semantic Analysis
pLSA relies on the likelihood function of
multinomial sampling and aims at an explicit
maximization of the predictive power of the model

17
pLSA to the rescue
Slide credit Josef Sivic
18
Learning the pLSA parameters
Observed counts of word i in document j
Unlike LSA, pLSA does not minimize any type of
squared deviation. The parameters are
estimated in a probabilistically sound way.
Maximize likelihood of data using EM. Minimize KL
divergence between empirical distribution and
model
Slide credit Josef Sivic
19
EM for pLSA (training on a corpus)

E-step compute posterior probabilities for the
latent variables
M-step maximize the expected complete data
log-likelihood

20
Graphical View of pLSA

pLSA is a generative model
Select a document di with prob P(di)
Pick latent class zk with prob P(zkdi)
Generate word wj with prob P(wjzk)

Observed variables
Latent variables
Plates
21
How does pLSA deal with previously unseen
documents?

Folding-in Heuristic
First train on Corpus to obtain
Now re-run same training EM algorithm, but dont
re-estimate and let Ddunseen

22
Problems with pLSA

Not a well-defined generative model of documents
d is a dummy index into the list of documents in
the training set (as many values as documents)
No natural way to assign probability to a
previously unseen document
Number of parameters to be estimated grows with
size of training set

23
LDA to the rescue

Latent Dirichlet Allocation treats the topic
mixture weights as a k-parameter hidden random
variable and places a Dirichlet prior on the
multinomial mixing weights
Dirichlet distribution is conjugate to the
multinomial distribution (most natural prior to
choose the posterior distribution is also a
Dirichlet!)

24
Corpus-Level parameters in LDA

Alpha and beta are corpus-level documents that
are sampled once in the corpus creating
generative model (outside of the plates!)
Alpha and beta must be estimated before we can
find the topic mixing proportions belonging to a
previously unseen document

LDA
25
Getting rid of plates
Thanks to Jonathan Huang for the un-plated LDA
graphic
26
Inference in LDA

Inference estimation of document-level
parameters
Intractable to compute ? must employ approximate
inference

27
Approximate Inference in LDA

Variational Methods Use Jensens inequality to
obtain a lower bound on the log likelihood that
is indexed by a set of variational parameters
Optimal Variational Parameters (document-specific)
are obtained by minimizing the KL divergence
between the variational distribution and the true
posterior

Variational Methods are one way of doing
this. Gibbs sampling (MCMC) is another way.
Variational distribution
28
Look at some P(wz) produced by LDA

Show some pLSI and LDA results applied to text
An LDA project by Tomasz Malisiewicz and Jonathan
Huang
Search for the word drive

29
pLSA and LDA applied to Images

How can one apply these techniques to the images?

30
Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Hoffman, 2001
Latent Dirichlet Allocation (LDA)
Blei et al., 2001
31
Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Sivic et al. ICCV 2005
32
Hierarchical Bayesian text models
Latent Dirichlet Allocation (LDA)
Fei-Fei et al. ICCV 2005
33
A Bayesian Hierarchical Model for Learning
Natural Scene Categories
34
Flow Chart Quick Overview
35
How to Generate an Image?
Choose a scene (mountain, beach, )
Given scene generate an intermediate probability
vector over themes
For each word
Determine current theme from mixture of themes
Draw a codeword from that theme
36

Choose a category label c p(cn)
N prior over scene category (multinomial)
Choose pi p(pic,theta)
Pi is multinomial distribution over themes
Theta is a CxK (category x themes)
-Theta_k is k-dimensional dirichlet parameter
condition on the category c
For each of the N patches
Choose theme Zn mult(pi)
Choose patch Xn p(XnZn,beta)
Beta is matrix of size KxT (themes x words)

37
How to Generate an Image?
38
Inference

How to make decision on a novel image
Integrate over latent variables to get
Approximate Variational Inference (not easy, but
Gibbs sampling is supposed to be easier)

39
Codebook

174 Local Image Patches
Detection
Evenly Sampled Grid
Random Sampling
Saliency Detector
Lowes DoG Detector
Representation
Normalized 11x11 gray values
128-dim SIFT

40
Results Average performance 64

Confusion Matrix

100 training examples and 50 test examples
Rank statistic test the probability of a test
scene correctly belong to one of the top N most
probable categories
41
Results The Distributions
Theme distribution
Codeword distribution
42
The peak at 174
43
Summary of detection and representation choices

SIFT outperforms pixel gray values
Sliding grid, which creates the largest number of
patches, does best

44
Discovering objects and their location in images
45
Visual Words

Vector Quantized SIFT descriptors computed in
regions
Regions come from elliptical shape adaptation
around interest point, and from the maximally
stable regions of Matas et al.
Both are elliptical regions at twice their
detected scale

46
Building a Vocabulary
47
Building a Vocabulary
K-means clustering of 300K regions to get about
1K clusters for each of Shape Adapted and
Maximally Stable regions
Vector quantization
Slide credit Josef Sivic
48
pLSA Training

Sanity Check Remember what quantities must be
estimated?

49
Results 1 Topic Discovery

This is just the training stage
Obtain P(zkdj) for each image, then classify
image as containing object k according to the max
of P(zkdj) over k

4 object categories Plus background
50
Results 1 Topic Discovery
51
Results 2 Classifying New Images

Object Categories learned on a corpus, then
object categories found in new image

Anybody remember how this is done?
Remember the index d in the graphical model
52
How does pLSA deal with previously unseen
documents?

Folding-in Heuristic
First train on Corpus to obtain
Now re-run same training EM algorithm, but dont
re-estimate and let Ddunseen

53
Results 2 Classifying New Images

Train on one set and test on another

54
Results 3 Segmentation

Localization and Segmentation of Object
For a word occurrence in a particular document we
can examine the probability of different topics
Find words with P(zkdj,wi) gt .8

55
Results 3 Segmentation
Note words shown are not the most probable
words for a topic, but instead they are words
that have a high probability of occurring in a
topic AND high probability of occurring in the
image
56
Results 3 Segmentation and Doublets

Two class image dataset consisting of half the
faces (218 images) and backgrounds (217 images)
A 4 topic pLSA model is learned for all training
faces and training backgrounds with 3 fixed
background topics, i.e. one (face) topic is
learned in addition to the three fixed background
topics
A doublet vocabulary is then formed from the top
100 visual words of the face topic. A second 4
topic pLSA model is then learned for the combined
vocabulary of singlets and doublets with the
background topics fixed.

57
Doublets
Face Segmentation Scores Singleton .49
Doublets .61
Efros didnt work as much as youd think
58
Conclusions

Showed how both papers use bag-of-words
approaches
Were now ready to become experts on generative
models like pLSA and LDA
Graphical Model Fun! (Carlos Guestrin teaches
Graphical Models)

59
Are you really into Graphical Models?

Describing Visual Scenes using Transformed
Dirichlet Processes. E. Sudderth, A. Torralba, W.
Freeman, and A. Willsky. NIPS, Dec. 2005.

60
References

A Bayesian Hierarchical Model for Learning
Natural Scene Categories, Fei Fei Li et al
Describing Visual Scenes using Transformed
Dirichlet Processes, Sudderth et al
Discovering objects and their location in images,
Sivic et al
Latent Dirichlet Allocation, Blei et al
Unsupervised Learning by Probabilistic Latent
Semantic Analysis, T. Hoffman

Write a Comment

User Comments (0)

About PowerShow.com

A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005 - PowerPoint PPT Presentation

A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005

Inference How to make decision on a novel image ... Blei et al Unsupervised Learning by Probabilistic Latent Semantic Analysis, T. Hoffman A Bayesian Hierarchical ... – PowerPoint PPT presentation