Title: Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions
1Articulated Pose Estimation in a Learned Smooth
Space of Feasible Solutions
- Taipeng Tian, Rui Li and Stan Sclaroff
- Computer Science Dept.
- Boston University
2Introduction
- Motivating application
- Gesture Recognition
- Fixed Gesture Lexicon.
- For example
Aircraft Signaler hand gestures
Basketball Refereehand Signals
Traffic Controllerhand Signals
3Problem Definition
Input (Observation)
Output
Pose Estimation
2D Projected Marker Positions
Silhouette (Alt Moments)
4Related Work Pose Estimation from a Single
Image
- Geometry Based
- Taylor CVIU 01
- Barron Kakadiaris IVC 01
- Parameswaran Chellappa CVPR 04
- Learning Based
- Rosales Sclaroff HUMO 00
- Agarwal Triggs CVPR 04
- Others
- Lee Cohen CVPR 04
- Shakhnarovich, Viola, Darrell ICCV 03
- Mori, Ren, Efros and Malik CVPR 04
- Many more
5Idea 1 Learning Mappings
Image Features
- Specialized Mapping Architechture (SMA)Rosales
and Sclaroff NIPS 01 - Relevance Vector RegressionAgarwal and Triggs
CVPR 04
Pose
6Idea 1 Learning Mappings
Image Features
- Specialized Mapping Architechture (SMA)Rosales
and Sclaroff NIPS 01 - Relevance Vector RegressionAgarwal and Triggs
CVPR 04
Pose
7Idea 2 Exploring the Solution Space
- Simulated AnnealingDeutscher et al. CVPR 00
- Monte Carlo Markov Chain
- Lee and Cohen CVPR 04
- etc
8Idea 2 Exploring the Solution Space
- Simulated AnnealingDeutscher et al. CVPR 00
- Monte Carlo Markov Chain Lee and Cohen CVPR
04 - etc
- Accurate model and typically with high DOF.
- Exploring the pose space for a solution
consistent with observations. - Difficult for high DOF.
- Computationally intensive.
9Key Observations
- We have a constrained set of poses.
- Not necessary to explore the full parameter
space. - Combine two ideas
- Learn Mappings
- Explore a constrained space (i.e. learned model
of body poses)
Aircraft Signaler hand gestures
Basketball Refereehand Signals
Traffic Controllerhand Signals
10Overview of Framework
X Latent Space
Learning Phase
Y Training Data
Learn a model of human body poses
Learn the rendering function F(.)
11Learning a Model of Human Poses
- Gaussian Process Latent Variable Model (GPLVM)
Neil Lawrence NIPS 04 is used. - GPLVM originally used for visualizing high
dimensional data - Grochow et al. (SIGGRAPH 03) uses it to solve
the inverse kinematics problem for human motion
animation. - Currently we use it for automated articulated
body pose inference
12Gaussian Process Latent Variable Model(GPLVM)
Overview
Higher Dimensional
Probabilistic Mapping
Lower Dimensional / Latent Space
13GPLVM Training Learning a Model of Body Poses
- Given training set of 2D projected marker
positions yi (each yi is of D dimension) - Goal Learn parameters
Corresponding latent variable values for each
training data point
Variables related to the Kernel
14Kernel Function
- Also known as covariance function.
- Measures the similarity of the latent variables x
and x. - For a data set of size N, we form an N by N
kernel matrix K, in which Ki,j k(xi, xj).
15GPLVM Training Learning a Model of Body Poses
- For a single dimension, the likelihood of y given
the Gaussian Process (GP) model parameters is - Joint likelihood for D dimensions is
16- To learn GPLVM from the training set yi, we
maximize the following posterior
Negative Log
And placing the priors
17- To learn GPLVM from the training set yi, we
maximize the following posterior
Negative Log
Computationally Intensive. A subset is chosen to
compute the kernel matrix. This subset of poses
is called the Active Set.
18- For a new pair (x,y) we can predict using
- This eqn. can be used to solve for x given y or y
given x, via gradient descent.
19GPLVM
20GPLVM
21GPLVM
Left hand raised silhouettes tend to be clustered
together
22GPLVM
Does not always do a good job
23About GPLVM
- Allows mapping to and from the lower dimensional
space. - Allows smooth parameterization (i.e. allows
derivatives) in lower dimensional space. - Two dimensions work well for our data set.
(Growchow et al. uses 2-5)
24Learning the Forward/Rendering Function
Input 2D Pose
Similar to Rosales and Sclaroff
Silhouettes (Represented using Alt Moments)
25Overview of Framework
X Latent Space
Learning Phase
Y Training Data
Learn a model of human body poses
Learn the rendering function F(.)
26Pose Inference
Typical Regularization (Also used by Agarwal and
Triggs)
27Data Term
Forward function (Rendering function)
Silhouette (Alt Moments)
2D Projected Marker Positions
28Regularization Term
Independent of feature s
Replace with prior knowledge term (i.e the
learned model of poses)
29Pose Inference
Also need to talk about initialization
Solution obtained using Conjugate Gradient -
Initialization using Active Set
30Data Collection
3D Pose
- 12 gestures in the flight director lexicon
- Synthesize 6000 pairs of (Silhouette, Pose) pairs
using Poser - 3000 training (Male model)
- 3000 testing (Female model)
Synthesized Silhouettes sampled Uniformly over
the frontal view-sphere
31Experiments (Synthetic Data)
(a) Silhouette images generated by Poser 5 (Test
Set)
(b) Estimation from SMA (Specialized Mapping
Architecture)
(c) Our Approach
(d) Ground Truth
32Comparison with SMA
33Additional Constraints
Additional constraints can be added to achieve
more accurate estimate, e.g. temporal consistency
34Experiments (Real Data)
(a) Silhouette images of real person
(b) SMA (Specialized Mapping Architecture)
(c) Our Approach (Without Temporal Consistency)
(d) Our Approach (With Temporal Consistency)
35Experiments (Real Data)
(a) Silhouette images of real person
(b) SMA (Specialized Mapping Architecture)
(c) Our Approach (Without Temporal Consistency)
(d) Our Approach (With Temporal Consistency)
36Conclusion
- Proposed a novel method for Pose estimation for a
pre-defined gesture lexicon. - Interesting to note that two dimension is enough
in our case. - Technique is fast. (about 0.1 sec per frame in
Matlab) - Tracking as an extension. video
37Thank You ?
38Comments after the talk
- Related Works
- Bullets / Summary of Strength vs Weakness
- Why we need this work?
- Include year of publication for the related work
(eg Rosales Sclaroff work not mentioned,
Smichisecu work not mentioned) - Order the related work temporally?
- Include an introduction slide and motivating
slide - How to Motivate this work?
- State of the art is so and so We found this
common weakness. So we proposed this work.. - Human Pose not mentioned in Intro
- At the end of the talk say why use this work over
the others - Why GPLVM and not other reduction techniques?
Like LLE/PCA/ISOMAP etc - Give a top overview of the algorithm. A flow
chart view? - Explain the L(x,y) mapping using an illustration
like the mapping between two planes. Clearly say
what is high dimension y and what is low
dimension x - Give reference for GPLVM or website link.
- Add a slide on Math of GPLVM
- The Tikhonov regularization approach of
minimizing phi(y)-s regularization term.
Usually the regularization term is Dx but now
we chose L(x,y). Explain why - Slide to talk about temporal constraint.
- Why learn the rendering function? i.e because we
want to take the derivative - Give the numbers for the training set and this
gives an idea how good are the quantitative
results
39Related Work
- Model Based
- Simulated AnnealingDeutscher et al CVPR 00
- Kinematic Jump ProcessesSminchisescu and Triggs
CVPR 03 - Monte Carlo Markov Chain Lee and Cohen CVPR
04 - etc
- Learning Based
- Specialized Mapping Architechture (SMA)Rosales
and Sclaroff NIPS 01 - Relevance Vector RegressionAgarwal and Triggs
CVPR 04 - Parameter Sensitive HashingShakhnarovich et al
CVPR 03 - etc
40- To learn GPLVM from the training set yi, we
maximize the following posterior
Negative Log
41Overview of Framework (Learning Phase)
Learning a model of human body poses (Using GPLVM)
Learn the Rendering Function F(.)
42Overview of Framework (Estimation Phase)
Search over learned model of human body pose for
solution consistent with observation
43Kernel Function
- measures the similarity of the latent variables x
and x. - For a data set of N, we can form a N by N kernel
matrix K, in which Ki,j k(xi, xj).
how correlated x, x are in general
spread of the function
noise in the prediction
44GPLVM Training Learning a Model of Body Poses
- To learn the parameters of the GPLVM from the
training set yi, we maximize the following
posterior
And placing the priors
45Gaussian Process Latent Variable Model(GPLVM)
Original space representation
Express how well the two value matches
Space of Feasible Poses
Low dimensional parameterization
46- For a new pair (x,y) we can predict using