Title: Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU
1LectureshipA proposal for advancing computer
graphics, imaging and multimedia design at RGU
Fabio Cuzzolin INRIA Rhone-Alpes
- Robert Gordon University
- Aberdeen, 20/6/2008
2Career path
- Masters thesis on gesture recognition at the
University of Padova - Visiting student, ESSRL, Washington University in
St. Louis, and at the University of California at
Los Angeles (2000) - Ph.D. thesis on belief functions and uncertainty
theory (2001)? - Researcher at Politecnico di Milano with the
Image and Sound Processing group (2003-2004)? - Post-doc at the University of California at Los
Angeles, UCLA Vision Lab (2004-2006)? - Marie Curie fellow at INRIA Rhone-Alpes
3Scientific production and collaborations
- collaborations with journals
IEEE PAMI
IEEE SMC-B
CVIU
Information Fusion
Int. J. Approximate Reasoning
- PC member for VISAPP, FLAIRS, IMMERSCOM, ISAIM
- currently 410 journal papers and 318 conference
papers
4My background
research
5A multi-layer frameworkfor human motion analysis
- different tasks, integrated in a series of layes
- feedbacks act between different layers
multiple views
6A multi-layer frameworkfor human motion analysis
- Action and gesture recognition
- Laplacian unsupervised segmentation
- Matching of 3D shapes by embedded orthogonal
alignment - Bilinear models for invariant gaitID
- Manifold learning for dynamical models
- The role of uncertainty measures
- Information fusion for model-free pose estimation
7HMMs for gesture recognition
- transition matrix A -gt gesture dynamics
- state-output matrix C -gt collection of hand poses
- Hand poses were represented by size functions
(BMVC'97)?
8Gesture classification
- EM to learn HMM parameters from an input sequence
- the new sequence is fed to the
- learnt gesture models
- they produce a likelihood
- the most likely model is chosen (if above a
threshold)? - OR new model is attributed the label of the
closest one (using K-L divergence or other
distances)?
HMM 1
HMM 2
HMM n
9Volumetric action recognition
- 2D approaches features are extracted from
single views -gt viewpoint dependence
- volumetric approach features are extracted from
a volumetric reconstruction of the moving body
(ICIP'04)?
10A multi-layer frameworkfor human motion analysis
- Action and gesture recognition
- Laplacian unsupervised segmentation
- Matching of 3D shapes by embedded orthogonal
alignment - Bilinear models for invariant gaitID
- Manifold learning for dynamical models
- The role of uncertainty measures
- Information fusion for model-free pose estimation
11Unsupervised coherent 3D segmentation
- to recognize actions we need to extract features
- segmenting moving articulated 3D bodies into
parts - along sequences, in a consistent way
- in an unsupervised fashion
- robustly, with respect to changes of the topology
of the moving body - as a building block of a wider motion analysis
and capture framework - ICCV-HM'07, CVPR'08, to submit to IJCV
12Clustering after Laplacian embedding
- local neighborhoods -gt stable under articulated
motion
- generates a lower-dim, widely separated embedded
cloud - less sensitive to topology changes than other
methods - less computationally expensive then ISOMAP
13Algorithm
- K-wise clustering in the embedding space
14Seed propagation along time
- To ensure time consistency clusters seeds have
to be propagated along time - Old positions of clusters in 3D are added to new
cloud and embedded - Result new seeds
15Results
- Coherent clustering along a sequence
- Handling of topology changes
16A multi-layer frameworkfor human motion analysis
- Action and gesture recognition
- Laplacian unsupervised segmentation
- Matching of 3D shapes by embedded orthogonal
alignment - Bilinear models for invariant gaitID
- Manifold learning for dynamical models
- The role of uncertainty measures
- Information fusion for model-free pose estimation
17Laplacian matching of dense meshes or voxelsets
- as embeddings are pose-invariant (for articulated
bodies)? - they can then be used to match dense shapes by
simply aligning their images after embedding
- ICCV '07 NTRL, ICCV '07 3dRR, CVPR '08,
submitted to ECCV'08, to submit to PAMI
18Eigenfunction Histogram assignment
- Algorithm
- compute Laplacian embedding of the two shapes
- find assignment between eigenfunctions of the two
shapes - this selects a section of the embedding space
- embeddings are orthogonally aligned there by EM
19Results
- Appls graph matching, protein analysis, motion
capture - To propagate bodypart segmentation in time
- Motion field estimation, action segmentation
20Application spatio-temporal action segmentation
- problem segmenting parts of the video(s)
containing interesting motions - global approach working on the entire sequence
(multidimensional volume)? - previous works object segmentation on the
spatio-temporal volume for single frames - idea in a multi-camera setup, working on 3D
clouds (hulls) motion fields time 7D volume - outline of an approach smoothing using message
passing shape detection on the obtained manifold
21A multi-layer frameworkfor human motion analysis
- Action and gesture recognition
- Laplacian unsupervised segmentation
- Matching of 3D shapes by embedded orthogonal
alignment - Bilinear models for invariant gaitID
- Manifold learning for dynamical models
- The role of uncertainty measures
- Information fusion for model-free pose estimation
22Bilinear models for gait-ID
- To recognize the identity of humans from their
gait (CVPR '06, book chapter in progress)? - nuisance factors emotional state, illumination,
appearance, view invariance ... (literature
randomized trees)?? - each motion possess several labels action,
identity, viewpoint, emotional state, etc. - bilinear models (Tenenbaum) can be used to
separate the influence of style and content
(the label to classify)?
23Content classification of unknown style
- given a training set in which persons
(contentID) are seen walking from different
viewpoints (styleviewpoint)? - an asymmetric bilinear model can learned from it
through SVD - when new motions are acquired in which a known
person is being seen walking from a different
viewpoint (unknown style) - an iterative EM procedure can be set up to
classify the content - E step -gt estimation of p(cs), the prob. of the
content given the current estimate s of the style
- M step -gt estimation of the linear map for
unknown style s
24Three-layer model
- Features projections of silhouette's contours
onto a line through the center
- Three layer model
- each sequence is encoded as an HMM
- its C matrix is stacked in a single observation
vector - a bilinear model is learnt from those vectors
25Results on CMU database
- Mobo database 25 people performing 4 different
walking actions, from 6 cameras. Three labels
action, id, view - Compared performances with baseline algorithm
and straight k-NN on sequence HMMs
26A multi-layer frameworkfor human motion analysis
- Action and gesture recognition
- Laplacian unsupervised segmentation
- Matching of 3D shapes by embedded orthogonal
alignment - Bilinear models for invariant gaitID
- Manifold learning for dynamical models
- The role of uncertainty measures
- Information fusion for model-free pose estimation
27Learning manifolds of dynamical models
- Classify movements represented as dynamical
models - for instance, each image sequence can be mapped
to an ARMA, or AR linear model - Motion classification then reduces to find a
suitable distance function in the space of
dynamical models - when some a-priori info is available (training
set).. - .. we can learn in a supervised fashion the
best metric for the classification problem! - To submit to ECCV'08 MLVMA Workshop
28Learning pullback metrics
- many unsupervised algorithms take in input
dataset and map it to an embedded space, but fail
to learn a full metric - consider than a family of diffeomorphisms F?
between the original space M and a metric space N - the diffeomorphism F induces on M a pullback
metric - maximizing inverse volume finds the manifold
which better interpolates the data (geodesics
pass through crowded regions)?
29Space of AR(2) models
- given an input sequence, we can identify the
parameters of the linear model which better
describes it - autoregressive models of order 2 AR(2)?
- Fisher metric on AR(2)?
- Compute the geodesics of the pullback metric on M
30Results on action and ID rec
- scalar feature, AR(2) and ARMA models
31A multi-layer frameworkfor human motion analysis
- Action and gesture recognition
- Laplacian unsupervised segmentation
- Matching of 3D shapes by embedded orthogonal
alignment - Bilinear models for invariant gaitID
- Manifold learning for dynamical models
- The role of uncertainty measures
- Information fusion for model-free pose estimation
32Uncertainty measures Intervals, credal sets
- a number of formalisms have been proposed to
extend or replace classical probability
- assumption not enough evidence to determine the
actual probability describing the problem - second-order distributions (Dirichlet), interval
probabilities - credal sets
33Belief functions as random sets
- Probability on a finite set function p 2T -gt
0,1 with - p(A)?x m(x), where m T -gt 0,1 is a mass
function - Probabilities are additive if A?B? then
p(A?B)p(A)p(B)?
34Information fusion by Dempsters rule
- several aggregation or elicitation operators
proposed - original proposal Dempsters rule
- b2
- m(?)0.1, m(a2 ,a3 ,a4)0.9
35Imprecise classifiers and credal networks
- imprecise classifiers
- class estimate is a belief function
- exploit only available evidence, represent
ignorance
- Belief networks or credal networks
- at each node a belief function or a convex set of
probs - robust version of bayesian networks
36A multi-layer frameworkfor human motion analysis
- Action and gesture recognition
- Laplacian unsupervised segmentation
- Matching of 3D shapes by embedded orthogonal
alignment - Bilinear models for invariant gaitID
- Manifold learning for dynamical models
- The role of uncertainty measures
- Information fusion for model-free pose estimation
37Model-free pose estimation
- estimating the pose (internal configuration) of
a moving body from the available images
38Learning feature-pose maps
- ... learn a map between features and poses
directly from the data - given pose and feature sequences acquired by
motion capture ..
- a Gaussian density for each state is set up on
the feature space -gt approximate feature space - maps each cluster to the set of training poses qk
with feature yk inside it
39Evidential model
- MTNS'00, ISIPTA'05, to submit to Information
Fusion
40Results on human body tracking
- comparison of three models left view only, right
view only, both views
- estimate associated with the right model
- pose estimation yielded by the overall model
41Conclusions - Research
- Hot topic in computer vision and machine
learning human motion analysis - Applications motion capture, surveillance, human
machine interaction, biometric identification - Different tools from machine learning, robust
statistics, differential geometry can be useful - Several tasks are involved in a hierarchical
fashion - Tasks are not isolated, but interact and generate
feedbacks to help the solution of the others
42Conclusions - Teaching plans
- machine vision involves notions coming from
different branches of pure and applied
mathematics robust statistics, differential
geometry, discrete math - all of them are considered as useful tools to
solve real-world problems - students have then the chance to improve their
mathematical background ... - ... and learn at the same time how to develop
real products on the ground - integrated courses can be designed along this line
43Conclusions Commercial partnerships
- several opportunities to develop technology
transfer activities involving companies - biometrics in particular, behavioral
(non-controlled) identification - surveillance multi-camera human motion detection
and classification - image and video browsing internet-based content
retrieval - personal links with companies like Honeywell Labs
(surveillance), Riya (image googling), MS Research