Bilinear Deep Learning for Image Classification - PowerPoint PPT Presentation

Loading...

PPT – Bilinear Deep Learning for Image Classification PowerPoint presentation | free to download - id: 58a53d-NmJiN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Bilinear Deep Learning for Image Classification

Description:

ACM Multimedia Dec. 1, 2011 Bilinear Deep Learning for Image Classification Shenghua ZHONG, Yan LIU, Yang LIU – PowerPoint PPT presentation

Number of Views:239
Avg rating:3.0/5.0
Slides: 31
Provided by: hkp4
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Bilinear Deep Learning for Image Classification


1
Bilinear Deep Learning for Image Classification
ACM Multimedia
Dec. 1, 2011
  • Shenghua ZHONG, Yan LIU, Yang LIU
  • Department of Computing
  • The Hong Kong Polytechnic University
  • www.comp.polyu.edu.hk/csshzhong

2
Outline
  • Problem
  • Image classification , a well-known challenge
  • Idea
  • Provide human-like judgment by referencing human
    visual system
  • Methodology
  • Deep models, consistent with human visual cortex
  • Proposed technique
  • Bilinear deep learning
  • Experiments
  • Caltech101, Urban and Natural Scene, CMU PIE
  • Conclusion
  • Explore deep techniques for more multimedia
    analysis tasks

3
Outline
  • Problem
  • Image classification , a well-known challenge
  • Idea
  • Provide human-like judgment by referencing human
    visual system
  • Methodology
  • Deep models, consistent with human visual cortex
  • Proposed technique
  • Bilinear deep learning
  • Experiments
  • Caltech101, Urban and Natural Scene, CMU PIE
  • Conclusion
  • Explore deep techniques for more multimedia
    analysis tasks

4
Image Classification
  • Image classification is a classical problem
  • Aim to understand the semantic meaning of visual
    information
  • And determine the category of the images
    according to some predefined criteria
  • Image classification remains a well-known
    challenge
  • After more than fifteen years extensive research
  • Humans do not have difficulty with classifying
    images
  • Before the age of 25 months, children can
    recognize novel three-dimensional objects
  • Aim of this paper
  • Provide human-like judgment by referencing the
    architecture of the human visual system and the
    procedure of intelligent perception
  • Researchers from cognitive science and
    neuroscience have conducted pioneering work on
    modeling the human brain using computational
    architectures
  • Deep architecture is a representative paradigm
    that has achieved notable success in modeling the
    human visual system

5
Outline
  • Problem
  • Image classification , a well-known challenge
  • Idea
  • Provide human-like judgment by referencing human
    visual system
  • Methodology
  • Deep models, consistent with human visual cortex
  • Proposed technique
  • Bilinear deep learning
  • Experiments
  • Caltech101, Urban and Natural Scene, CMU PIE
  • Conclusion
  • Explore deep techniques for more multimedia
    analysis tasks

6
Deep Learning
  • Physical structure
  • Human dozens of cortical layers are involved in
    generating even the simplest vision
  • Deep model the problem using multiple layers of
    parameterized nonlinear modules
  • Evolution of intelligence
  • Human the multi-layers structure began to appear
    in the neocortex starting from old world monkeys
    about 40 million years ago
  • Deep the development of intelligence follows
    with the multi-layer structure
  • Propagation of information
  • Human several reasons for believing that our
    visual systems contain multi-layer generative
    models
  • Deep layer-wise reconstruction to learn multiple
    levels of representation and abstraction that
    helps to make sense of data

7
Human Visual Cortex and Deep Architectures
  • (a) Human visual cortex (b) An
    example of deep architectures

8
Outline
  • Problem
  • Image classification , a well-known challenge
  • Idea
  • Provide human-like judgment by referencing human
    visual system
  • Methodology
  • Deep models, consistent with human visual cortex
  • Proposed technique
  • Bilinear deep learning
  • Experiments
  • Caltech101, Urban and Natural Scene, CMU PIE
  • Conclusion
  • Explore deep techniques for more multimedia
    analysis tasks

9
Bilinear Deep Learning
  • Deep architecture
  • Human visual information through the optic
    tract to a nerve position is transmitted as the
    second-order data
  • Proposed technique a novel Bilinear Deep Belief
    Network
  • Three-stage learning
  • Human two peaks of activation with initial
    guess and the post-recognition in the visual
    cortex areas
  • Proposed techniques a new Bilinear Discriminant
    Initialization
  • Semi-supervised framework
  • Human more practical for long-term daily
    learning of visual world
  • Proposed techniques a more flexible learning
    framework

10
Bilinear Deep Belief Network
  • Fully interconnected directed belief network are
    constructed by a set of second-order planes
  • The number of units in the input layer is equal
    to the resolution of the images
  • The number of units in the label layer is equal
    to the number of classes of images.
  • The number of units in each hidden layer is
    determined by bilinear discriminant
    initialization
  • The search of the mapping is transformed to the
    problem of finding the optimum parameter space
    for the deep architecture.

11
Three-stage Learning
  • Bilinear discriminant initialization
  • Human early peak related to the activation of
    initial guess
  • Deep determine the initial parameters and sizes
    of the upper layer
  • Greedy layer-wise reconstruction
  • Human information propagation between adjacent
    layers
  • Deep determine the parameters of each pair of
    layers
  • Global fine-tuning
  • Human late peak related to the activation of
    post-recognition
  • Deep refine the parameter space for better
    classification performance

12
Bilinear Discriminant Initialization
  • Latent representation with projection matrices U
    and V
  • Preserve discriminant information in the
    projected feature space by optimizing the
    objective function
  • Obtain the discriminant initial connections in
    layer pair and utilize the optimal dimension to
    define the structure of the next layer

between-class weights
within class weights
13
Greedy Layer-Wise Reconstruction
  • A joint configuration ( , ) of the input
    layer and the first hidden layer has
    energy
  • Utilize Contrastive Divergence algorithm to
    update the parameter space

14
Global Fine-Tuning
  • Backpropagation adjusts the entire deep network
    to find good local optimum parameters
  • Before backpropagation, a good region in the
    whole parameter space has been found
  • The convergence obtained from backpropagation
    learning is not slow
  • The result generally converge to a good local
    minimum on the error surface

15
Proposed Algorithm
Algorithm 1 Bilinear Deep Belief Network
Input Training data set X, Labeled samples in X Corresponding labels set Y, Number of layers N, Number of epochs E Number of labeled data L , Parameter Between-class weights , Within class weights Initial bias parameters b and c, Momentum and learning rate , , Output Optimal parameter space 1. Preserve discriminant information by optimizing the objective function while not convergent do Fix V, compute U by solving Fix U, compute V by solving end while 2. Compute initial weights of the connections 3. Determine the structure of the next layer 4. Calculate the state of the next layer 5 Update the weights and biases 6. Calculate optimal parameter space
Bilinear Discriminant Initialization
Greedy Layer-Wise Reconstruction
Global Fine-Tuning
16
Outline
  • Problem
  • Image classification , a well-known challenge
  • Idea
  • Provide human-like judgment by referencing human
    visual system
  • Methodology
  • Deep models, consistent with human visual cortex
  • Proposed technique
  • Bilinear deep learning
  • Experiments
  • Caltech101, Urban and Natural Scene, CMU PIE
  • Conclusion
  • Explore deep techniques for more multimedia
    analysis tasks

17
Experiment Setting
  • Database
  • Subset of Caltech101
  • Standard dataset for image classification with
    images of 100 different objects
  • Frequently used subset including 2,935 images
    from the first 5 categories
  • Urban and Natural Scene
  • 2,688 natural color images with 8 categories
  • CMU PIE dataset
  • 11560 face images varying pose, illumination and
    expression of 68 subjects
  • Compared algorithms
  • K-nearest neighbor (KNN)
  • Support vector machines (SVM)
  • Transductive SVM (TSVM) Collobert et al, JMLR,
    2006
  • Neural network (NN)
  • EmbedNN Weston et al, ICML, 2008
  • Semi-DBN Bengio et al, NIPS, 2006
  • DBN-rNCA Salakhutdinov et al, AISTATS, 2007
  • DDBN Liu et al, PR, 2011
  • DCNN Jarrett et al, ICCV, 2009

18
Experiments on Caltech101
  • Sample images from datasets
  • Two experiments
  • Classification accuracy comparison
  • Converging time comparison
  • Experiment setting
  • 50 images for each category to form the test set
  • The rest to form the training set

Faces_easy Faces Motorbikes Airplanes Back_google
19
Classification Accuracy Comparison on Caltech101
  • Deep techniques achieve much better performance
    than shallow techniques
  • Bilinear deep belief networks is the best with
    different numbers of labeled data

20
Converging Time Comparison on Caltech 101
  • Iterations in fine-tuning stage
  • BDBN converges much more quickly because of
    better initial guess

21
Experiments on Urban and Natural Scene
  • Sample images from datasets
  • Two experiments
  • Classification accuracy and real running time
    comparison
  • Limitation discussion
  • Experiment setting
  • 50 images for each category to form the test set
  • The rest to form the training set

Coast beach Highway Open country Tall building
Forest Street Mountain City Center
22
Performance Comparison on Urban and Natural Scene
  • Classical setting of neurons numbers in hidden
    layers
  • 500, 500, and 2000
  • BDBN setting by bilinear discriminant
    initalization
  • 2424, 2121, 1920
  • For compared models
  • _d means the same size of BDBN setting
  • _c means the same size of classical setting

23
Limitation of Image Classification Only Based on
Visual Similarity
  • Street Highway Misclassified image

Ground truth label Street
Misclassified label Highway
  • Only calculating visual similarity is limited
  • Human can give the correct judgment by
    referencing the buildings and cars along the
    street
  • Contextual cueing
  • The knowledge about spatial invariants learned
    from past experiences

24
Experiments on CMU PIE
  • Sample images from datasets
  • Two experiments
  • Classification accuracy comparison for noise data
  • Parameter space visualization
  • Experiment setting
  • 50 images for each category to form the test set
  • The rest to form the training set

25
Robustness to the Noise on CMU PIE
The reconstruction of BDBN in every layer
26
Visualization of Parameter Space on CMU PIE
Emphasize regions are identical to facial feature
regions
Facial feature points
27
Outline
  • Problem
  • Image classification , a well-known challenge
  • Idea
  • Provide human-like judgment by referencing human
    visual system
  • Methodology
  • Deep models, consistent with human visual cortex
  • Proposed technique
  • Bilinear deep learning
  • Experiments
  • Caltech101, Urban and Natural Scene, CMU PIE
  • Conclusion
  • Explore deep techniques for more multimedia
    analysis tasks

28
Attractive Characters of Proposed BDBN
  • Novel deep architecture
  • Simulate the multi-layer physical structure of
    the visual cortex
  • Enable the preservation of the natural tensor
    structure of the input image in the information
    propagation
  • Three-stage learning
  • Realize the procedure of object recognition by
    human beings
  • Robust to noise
  • Bilinear discriminant initalization
  • Prevent information propagation falling into a
    bad local optimum
  • Provide a more meaningful setting for deep
    architecture
  • Much lower training time
  • Semi-supervised learning ability
  • Reduce the training requirements

29
Conclusion and Future Work
  • Reference the human visual system and perception
    procedure is promising
  • Deep learning is a good model for visual data
    analysis
  • Bilinear deep belief network shows impressive
    performance in image classification
  • Future work
  • Provide more semantic understanding of the images
    by integrating contextual cueing in deep modeling
  • Utilize deep learning for multimedia content
    analysis in a large scale dataset with noisy tags
  • More information
  • www.comp.polyu.edu.hk/csshzhong
  • Paper, presentation slides, demo, source code,
    and datasets

30
Q A
Thanks!
  • Shenghua ZHONG, Yan LIU, Yang LIU
  • Department of Computing
  • The Hong Kong Polytechnic University
  • www.comp.polyu.edu.hk/csshzhong
About PowerShow.com