A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005 - PowerPoint PPT Presentation

About This Presentation
Title:

A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005

Description:

Inference How to make decision on a novel image ... Blei et al Unsupervised Learning by Probabilistic Latent Semantic Analysis, T. Hoffman A Bayesian Hierarchical ... – PowerPoint PPT presentation

Number of Views:398
Avg rating:3.0/5.0
Slides: 60
Provided by: SchoolofC86
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005


1
A Bayesian Hierarchical Model for Learning
Natural Scene Categories L. Fei-Fei and P.
Perona. CVPR 2005 Discovering objects and their
location in images J. Sivic, B. Russell, A.
Efros, A. Zisserman and B. Freeman. ICCV 2005
Tomasz Malisiewicz tomasz_at_cmu.edu Advanced
Machine Perception February 2006
2
Graphical Models Recent Trend in Machine Learning
Describing Visual Scenes using Transformed
Dirichlet Processes. E. Sudderth, A. Torralba,
W. Freeman, and A. Willsky. NIPS, Dec. 2005.
3
Outline
  • Goals of both vision papers
  • Techniques from statistical text modeling
  • - pLSA vs LDA
  • Scene Classification via LDA
  • Object Discovery via pLSA

4
Goal Learn and Recognize Natural Scene Categories
Classify a scene without first extracting objects
Other techniques we know of -Global
frequency (Oliva and Torralba) -Texton Histogram
(Renninger, Malik et al)
5
Goal Discover Object Categories
  • Discover what objects are present in a collection
    of images in an unsupervised way
  • Find those same objects in novel images
  • Determine what local image features correspond to
    what objects segmenting the image

6
Enter the world of Statistical Text Modeling
  • D. Blei, A. Ng, and M. Jordan. Latent Dirichlet
    allocation. Journal of Machine Learning Research,
    39931022, January 2003.
  • Bag-of-words approaches the order of words in a
    document can be neglected
  • Graphical Model Fun

7
Bag-of-words
  • A document is a collection of M words
  • A corpus (collection of documents) is summarized
    in a term-document matrix

8
(No Transcript)
9
(No Transcript)
10
1990 Latent Semantic Analysis (LSA)
  • Goal map high-dimensional count vectors to a
    lower dimensional representation to reveal
    semantic relations between words
  • The lower dimensional space is called the latent
    semantic space
  • Dim( latent space ) K

11
1990 Latent Semantic Analysis (LSA)
  • D d1,,dN N documents
  • W w1,,wM M words
  • Nij (di,wj) NxM co-occurrence
    term-document matrix

12
What did we just do?
Singular Value Decomposition
13
LSA summary
  • SVD on term-document matrix
  • Approximate N by thresholding all but the largest
    K singular values in W to zero
  • Produces rank-K optimal approximation to N in the
    L2-matrix or Frobenius norm sense

14
LSA and Polysemy
  • Polysemy the ambiguity of an individual word or
    phrase that can be used (in different contexts)
    to express two or more different meanings
  • Under the LSA model, the coordinates of a word in
    latent space can be written as a linear
    superposition of the coordinates of the documents
    that contain the word

15
Problems with LSA
  • LSA does not define a properly normalized
    probability distribution
  • No obvious interpretation of the directions in
    the latent space
  • From statistics, the utilization of L2 norm in
    LSA corresponds to a Gaussian Error assumption
    which is hard to justify in the context of count
    variables
  • Polysemy problem

16
pLSA to the rescue
  • Probabilistic Latent Semantic Analysis
  • pLSA relies on the likelihood function of
    multinomial sampling and aims at an explicit
    maximization of the predictive power of the model

17
pLSA to the rescue
Slide credit Josef Sivic
18
Learning the pLSA parameters
Observed counts of word i in document j
Unlike LSA, pLSA does not minimize any type of
squared deviation. The parameters are
estimated in a probabilistically sound way.
Maximize likelihood of data using EM. Minimize KL
divergence between empirical distribution and
model
Slide credit Josef Sivic
19
EM for pLSA (training on a corpus)
  • E-step compute posterior probabilities for the
    latent variables
  • M-step maximize the expected complete data
    log-likelihood

20
Graphical View of pLSA
  • pLSA is a generative model
  • Select a document di with prob P(di)
  • Pick latent class zk with prob P(zkdi)
  • Generate word wj with prob P(wjzk)

Observed variables
Latent variables
Plates
21
How does pLSA deal with previously unseen
documents?
  • Folding-in Heuristic
  • First train on Corpus to obtain
  • Now re-run same training EM algorithm, but dont
    re-estimate and let Ddunseen

22
Problems with pLSA
  • Not a well-defined generative model of documents
    d is a dummy index into the list of documents in
    the training set (as many values as documents)
  • No natural way to assign probability to a
    previously unseen document
  • Number of parameters to be estimated grows with
    size of training set

23
LDA to the rescue
  • Latent Dirichlet Allocation treats the topic
    mixture weights as a k-parameter hidden random
    variable and places a Dirichlet prior on the
    multinomial mixing weights
  • Dirichlet distribution is conjugate to the
    multinomial distribution (most natural prior to
    choose the posterior distribution is also a
    Dirichlet!)

24
Corpus-Level parameters in LDA
  • Alpha and beta are corpus-level documents that
    are sampled once in the corpus creating
    generative model (outside of the plates!)
  • Alpha and beta must be estimated before we can
    find the topic mixing proportions belonging to a
    previously unseen document

LDA
25
Getting rid of plates
Thanks to Jonathan Huang for the un-plated LDA
graphic
26
Inference in LDA
  • Inference estimation of document-level
    parameters
  • Intractable to compute ? must employ approximate
    inference

27
Approximate Inference in LDA
  • Variational Methods Use Jensens inequality to
    obtain a lower bound on the log likelihood that
    is indexed by a set of variational parameters
  • Optimal Variational Parameters (document-specific)
    are obtained by minimizing the KL divergence
    between the variational distribution and the true
    posterior

Variational Methods are one way of doing
this. Gibbs sampling (MCMC) is another way.
Variational distribution
28
Look at some P(wz) produced by LDA
  • Show some pLSI and LDA results applied to text
  • An LDA project by Tomasz Malisiewicz and Jonathan
    Huang
  • Search for the word drive

29
pLSA and LDA applied to Images
  • How can one apply these techniques to the images?

30
Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Hoffman, 2001
Latent Dirichlet Allocation (LDA)
Blei et al., 2001
31
Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Sivic et al. ICCV 2005
32
Hierarchical Bayesian text models
Latent Dirichlet Allocation (LDA)
Fei-Fei et al. ICCV 2005
33
A Bayesian Hierarchical Model for Learning
Natural Scene Categories
34
Flow Chart Quick Overview
35
How to Generate an Image?
Choose a scene (mountain, beach, )
Given scene generate an intermediate probability
vector over themes
For each word
Determine current theme from mixture of themes
Draw a codeword from that theme
36
  • Choose a category label c p(cn)
  • N prior over scene category (multinomial)
  • Choose pi p(pic,theta)
  • Pi is multinomial distribution over themes
  • Theta is a CxK (category x themes)
  • -Theta_k is k-dimensional dirichlet parameter
    condition on the category c
  • For each of the N patches
  • Choose theme Zn mult(pi)
  • Choose patch Xn p(XnZn,beta)
  • Beta is matrix of size KxT (themes x words)

37
How to Generate an Image?
38
Inference
  • How to make decision on a novel image
  • Integrate over latent variables to get
  • Approximate Variational Inference (not easy, but
    Gibbs sampling is supposed to be easier)

39
Codebook
  • 174 Local Image Patches
  • Detection
  • Evenly Sampled Grid
  • Random Sampling
  • Saliency Detector
  • Lowes DoG Detector
  • Representation
  • Normalized 11x11 gray values
  • 128-dim SIFT

40
Results Average performance 64
  • Confusion Matrix

100 training examples and 50 test examples
Rank statistic test the probability of a test
scene correctly belong to one of the top N most
probable categories
41
Results The Distributions
Theme distribution
Codeword distribution
42
The peak at 174
43
Summary of detection and representation choices
  • SIFT outperforms pixel gray values
  • Sliding grid, which creates the largest number of
    patches, does best

44
Discovering objects and their location in images
45
Visual Words
  • Vector Quantized SIFT descriptors computed in
    regions
  • Regions come from elliptical shape adaptation
    around interest point, and from the maximally
    stable regions of Matas et al.
  • Both are elliptical regions at twice their
    detected scale

46
Building a Vocabulary
47
Building a Vocabulary
K-means clustering of 300K regions to get about
1K clusters for each of Shape Adapted and
Maximally Stable regions
Vector quantization
Slide credit Josef Sivic
48
pLSA Training
  • Sanity Check Remember what quantities must be
    estimated?

49
Results 1 Topic Discovery
  • This is just the training stage
  • Obtain P(zkdj) for each image, then classify
    image as containing object k according to the max
    of P(zkdj) over k

4 object categories Plus background
50
Results 1 Topic Discovery
51
Results 2 Classifying New Images
  • Object Categories learned on a corpus, then
    object categories found in new image

Anybody remember how this is done?
Remember the index d in the graphical model
52
How does pLSA deal with previously unseen
documents?
  • Folding-in Heuristic
  • First train on Corpus to obtain
  • Now re-run same training EM algorithm, but dont
    re-estimate and let Ddunseen

53
Results 2 Classifying New Images
  • Train on one set and test on another

54
Results 3 Segmentation
  • Localization and Segmentation of Object
  • For a word occurrence in a particular document we
    can examine the probability of different topics
  • Find words with P(zkdj,wi) gt .8

55
Results 3 Segmentation
Note words shown are not the most probable
words for a topic, but instead they are words
that have a high probability of occurring in a
topic AND high probability of occurring in the
image
56
Results 3 Segmentation and Doublets
  • Two class image dataset consisting of half the
    faces (218 images) and backgrounds (217 images)
  • A 4 topic pLSA model is learned for all training
    faces and training backgrounds with 3 fixed
    background topics, i.e. one (face) topic is
    learned in addition to the three fixed background
    topics
  • A doublet vocabulary is then formed from the top
    100 visual words of the face topic. A second 4
    topic pLSA model is then learned for the combined
    vocabulary of singlets and doublets with the
    background topics fixed.

57
Doublets
Face Segmentation Scores Singleton .49
Doublets .61
Efros didnt work as much as youd think
58
Conclusions
  • Showed how both papers use bag-of-words
    approaches
  • Were now ready to become experts on generative
    models like pLSA and LDA
  • Graphical Model Fun! (Carlos Guestrin teaches
    Graphical Models)

59
Are you really into Graphical Models?
  • Describing Visual Scenes using Transformed
    Dirichlet Processes. E. Sudderth, A. Torralba, W.
    Freeman, and A. Willsky. NIPS, Dec. 2005.

60
References
  • A Bayesian Hierarchical Model for Learning
    Natural Scene Categories, Fei Fei Li et al
  • Describing Visual Scenes using Transformed
    Dirichlet Processes, Sudderth et al
  • Discovering objects and their location in images,
    Sivic et al
  • Latent Dirichlet Allocation, Blei et al
  • Unsupervised Learning by Probabilistic Latent
    Semantic Analysis, T. Hoffman
Write a Comment
User Comments (0)
About PowerShow.com