Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions

Description:

Accurate model and typically with high DOF. ... 3000 training (Male model) 3000 testing (Female model) 3D Pose. Synthesized Silhouettes sampled ... – PowerPoint PPT presentation

Number of Views:99

Avg rating:3.0/5.0

Slides: 47

Provided by: Tian50

Category:

more less

Transcript and Presenter's Notes

Title: Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions

1
Articulated Pose Estimation in a Learned Smooth
Space of Feasible Solutions

Taipeng Tian, Rui Li and Stan Sclaroff
Computer Science Dept.
Boston University

2
Introduction

Motivating application
Gesture Recognition
Fixed Gesture Lexicon.
For example

Aircraft Signaler hand gestures
Basketball Refereehand Signals
Traffic Controllerhand Signals
3
Problem Definition
Input (Observation)
Output
Pose Estimation
2D Projected Marker Positions
Silhouette (Alt Moments)
4
Related Work Pose Estimation from a Single
Image

Geometry Based
Taylor CVIU 01
Barron Kakadiaris IVC 01
Parameswaran Chellappa CVPR 04
Learning Based
Rosales Sclaroff HUMO 00
Agarwal Triggs CVPR 04
Others
Lee Cohen CVPR 04
Shakhnarovich, Viola, Darrell ICCV 03
Mori, Ren, Efros and Malik CVPR 04
Many more

5
Idea 1 Learning Mappings
Image Features

Specialized Mapping Architechture (SMA)Rosales
and Sclaroff NIPS 01
Relevance Vector RegressionAgarwal and Triggs
CVPR 04

Pose
6
Idea 1 Learning Mappings
Image Features

Specialized Mapping Architechture (SMA)Rosales
and Sclaroff NIPS 01
Relevance Vector RegressionAgarwal and Triggs
CVPR 04

Pose
7
Idea 2 Exploring the Solution Space

Simulated AnnealingDeutscher et al. CVPR 00
Monte Carlo Markov Chain
Lee and Cohen CVPR 04
etc

8
Idea 2 Exploring the Solution Space

Simulated AnnealingDeutscher et al. CVPR 00
Monte Carlo Markov Chain Lee and Cohen CVPR
04
etc

Accurate model and typically with high DOF.
Exploring the pose space for a solution
consistent with observations.
Difficult for high DOF.
Computationally intensive.

9
Key Observations

We have a constrained set of poses.
Not necessary to explore the full parameter
space.
Combine two ideas
Learn Mappings
Explore a constrained space (i.e. learned model
of body poses)

Aircraft Signaler hand gestures
Basketball Refereehand Signals
Traffic Controllerhand Signals
10
Overview of Framework
X Latent Space
Learning Phase
Y Training Data
Learn a model of human body poses
Learn the rendering function F(.)
11
Learning a Model of Human Poses

Gaussian Process Latent Variable Model (GPLVM)
Neil Lawrence NIPS 04 is used.
GPLVM originally used for visualizing high
dimensional data
Grochow et al. (SIGGRAPH 03) uses it to solve
the inverse kinematics problem for human motion
animation.
Currently we use it for automated articulated
body pose inference

12
Gaussian Process Latent Variable Model(GPLVM)
Overview
Higher Dimensional
Probabilistic Mapping
Lower Dimensional / Latent Space
13
GPLVM Training Learning a Model of Body Poses

Given training set of 2D projected marker
positions yi (each yi is of D dimension)
Goal Learn parameters

Corresponding latent variable values for each
training data point
Variables related to the Kernel
14
Kernel Function

Also known as covariance function.
Measures the similarity of the latent variables x
and x.
For a data set of size N, we form an N by N
kernel matrix K, in which Ki,j k(xi, xj).

15
GPLVM Training Learning a Model of Body Poses

For a single dimension, the likelihood of y given
the Gaussian Process (GP) model parameters is
Joint likelihood for D dimensions is

To learn GPLVM from the training set yi, we
maximize the following posterior

Negative Log
And placing the priors
17

To learn GPLVM from the training set yi, we
maximize the following posterior

Negative Log
Computationally Intensive. A subset is chosen to
compute the kernel matrix. This subset of poses
is called the Active Set.
18

For a new pair (x,y) we can predict using

This eqn. can be used to solve for x given y or y
given x, via gradient descent.

19
GPLVM
20
GPLVM
21
GPLVM
Left hand raised silhouettes tend to be clustered
together
22
GPLVM
Does not always do a good job
23
About GPLVM

Allows mapping to and from the lower dimensional
space.
Allows smooth parameterization (i.e. allows
derivatives) in lower dimensional space.
Two dimensions work well for our data set.
(Growchow et al. uses 2-5)

24
Learning the Forward/Rendering Function
Input 2D Pose
Similar to Rosales and Sclaroff
Silhouettes (Represented using Alt Moments)
25
Overview of Framework
X Latent Space
Learning Phase
Y Training Data
Learn a model of human body poses
Learn the rendering function F(.)
26
Pose Inference
Typical Regularization (Also used by Agarwal and
Triggs)
27
Data Term
Forward function (Rendering function)
Silhouette (Alt Moments)
2D Projected Marker Positions
28
Regularization Term
Independent of feature s
Replace with prior knowledge term (i.e the
learned model of poses)
29
Pose Inference
Also need to talk about initialization
Solution obtained using Conjugate Gradient -
Initialization using Active Set
30
Data Collection
3D Pose

12 gestures in the flight director lexicon
Synthesize 6000 pairs of (Silhouette, Pose) pairs
using Poser
3000 training (Male model)
3000 testing (Female model)

Synthesized Silhouettes sampled Uniformly over
the frontal view-sphere
31
Experiments (Synthetic Data)
(a) Silhouette images generated by Poser 5 (Test
Set)
(b) Estimation from SMA (Specialized Mapping
Architecture)
(c) Our Approach
(d) Ground Truth
32
Comparison with SMA
33
Additional Constraints
Additional constraints can be added to achieve
more accurate estimate, e.g. temporal consistency
34
Experiments (Real Data)
(a) Silhouette images of real person
(b) SMA (Specialized Mapping Architecture)
(c) Our Approach (Without Temporal Consistency)
(d) Our Approach (With Temporal Consistency)
35
Experiments (Real Data)
(a) Silhouette images of real person
(b) SMA (Specialized Mapping Architecture)
(c) Our Approach (Without Temporal Consistency)
(d) Our Approach (With Temporal Consistency)
36
Conclusion

Proposed a novel method for Pose estimation for a
pre-defined gesture lexicon.
Interesting to note that two dimension is enough
in our case.
Technique is fast. (about 0.1 sec per frame in
Matlab)
Tracking as an extension. video

37
Thank You ?
38
Comments after the talk

Related Works
Bullets / Summary of Strength vs Weakness
Why we need this work?
Include year of publication for the related work
(eg Rosales Sclaroff work not mentioned,
Smichisecu work not mentioned)
Order the related work temporally?
Include an introduction slide and motivating
slide
How to Motivate this work?
State of the art is so and so We found this
common weakness. So we proposed this work..
Human Pose not mentioned in Intro
At the end of the talk say why use this work over
the others
Why GPLVM and not other reduction techniques?
Like LLE/PCA/ISOMAP etc
Give a top overview of the algorithm. A flow
chart view?
Explain the L(x,y) mapping using an illustration
like the mapping between two planes. Clearly say
what is high dimension y and what is low
dimension x
Give reference for GPLVM or website link.
Add a slide on Math of GPLVM
The Tikhonov regularization approach of
minimizing phi(y)-s regularization term.
Usually the regularization term is Dx but now
we chose L(x,y). Explain why
Slide to talk about temporal constraint.
Why learn the rendering function? i.e because we
want to take the derivative
Give the numbers for the training set and this
gives an idea how good are the quantitative
results

39
Related Work

Model Based
Simulated AnnealingDeutscher et al CVPR 00
Kinematic Jump ProcessesSminchisescu and Triggs
CVPR 03
Monte Carlo Markov Chain Lee and Cohen CVPR
04
etc

Learning Based
Specialized Mapping Architechture (SMA)Rosales
and Sclaroff NIPS 01
Relevance Vector RegressionAgarwal and Triggs
CVPR 04
Parameter Sensitive HashingShakhnarovich et al
CVPR 03
etc

To learn GPLVM from the training set yi, we
maximize the following posterior

Negative Log
41
Overview of Framework (Learning Phase)
Learning a model of human body poses (Using GPLVM)
Learn the Rendering Function F(.)
42
Overview of Framework (Estimation Phase)
Search over learned model of human body pose for
solution consistent with observation
43
Kernel Function

measures the similarity of the latent variables x
and x.
For a data set of N, we can form a N by N kernel
matrix K, in which Ki,j k(xi, xj).

how correlated x, x are in general
spread of the function
noise in the prediction
44
GPLVM Training Learning a Model of Body Poses

To learn the parameters of the GPLVM from the
training set yi, we maximize the following
posterior

And placing the priors
45
Gaussian Process Latent Variable Model(GPLVM)
Original space representation
Express how well the two value matches
Space of Feasible Poses
Low dimensional parameterization
46