Modeling Facial Shape and Appearance

About This Presentation

Title:

Modeling Facial Shape and Appearance

Description:

Modeling Facial Shape and Appearance. Shape and Changes in the Texture ... To build models of facial appearance and its variation one can adopt a ... – PowerPoint PPT presentation

Number of Views:319

Avg rating:3.0/5.0

Slides: 76

Provided by: hom4267

Category:

more less

Transcript and Presenter's Notes

Title: Modeling Facial Shape and Appearance

1
Modeling Facial Shape and Appearance

Shape and Changes in the Texture
Parametric Face Modeling and Tracking
Illumination Modeling

2
Outline

Modeling Shape and Changes in the Texture
Parametric Face Modeling and Tracking
Illumination Modeling

3
Modeling Facial Shape and Appearance

To interpret images of faces, it is important to
have a
model of how the face can appear.
Changes can be broken down into two
parts changes in shape and changes in texture
(patterns of pixel values) across the face.
The lecture describes a powerful method of
generating compact models of shape and texture
variation and describe how such models can be
used to interpret images of faces.

4
Statistical Shape Analysis
Statistical Shape Analysis

Statistical shape analysis is a geometrical
analysis from a set of shapes in which statistics
are measured to describe geometrical properties
from similar shapes or different groups, for
instance, the difference between face or hand
shapes.

5

Example -Hands

Training set
By varying the first three parameters of the
shape vector, one at a time, one can demonstrate
some of the modes of variation allowed by the
model (http//www.isbe.man.ac.uk/research/Flexible
_Models/pdms.html)
Each row obtained by varying a parameter and
fixing others at zero

6
PART I

Modeling Shape and Changes in the Texture
Statistical Models (Appearance, Shape)
Procrustes analysis for aligning set of shapes
Statistical Models of Variation and Texture
Fitting model to new points
Active Shape Models
Parametric Face Modeling and Tracking
Illumination Modeling

7
Statistical Models of Appearance

To build models of facial appearance and its
variation one can adopt a statistical approach,
learning the ways in which the shape and texture
of the face vary across a range of images.
The method relies on obtaining a suitably large,
representative training set of facial images,
each of which is annotated with a set of feature
points defining correspondences across the set.
The positions of the feature points are used to
define the shape of the face and are analyzed to
learn the ways in which the shape can vary.
The patterns of intensities are then analyzed to
learn the ways in which the texture can vary.

8
Statistical Shape Models

Building a statistical model requires a set of
training images. The set should be chosen so it
covers the types of variation one wish the model
to represent.
For instance, if we are interested only in faces
with neutral expressions, we should include only
neutral expressions in the model.
If however, we wish to be able to synthesize and
recognize a range of expressions, the training
set should include images of people smiling,
frowning, winking and so on.

9
Statistical Shape Models

Another approach s that each face must be
annotated with a set of points defining the key
facial features. These points are used to define
the correspondences across the training set and
represent the shape of the face in the image.
Thus the same number of points should be placed
on each image and with the same set of labels.
The number of such points can be varied from a
few to a few thousands and they can be 2D or 3D
points.

Example of 68 points defining facial features.
10
Aligning Sets of Shapes

There is considerable literature on methods of
aligning
shapes into a common coordinate frame, the most
popular approach being Procrustes analysis. The
transforms of each shape in a set, xi, so the sum
of
squared distances of the shape to the mean
is minimized. It is poorly defined unless
constraints are
placed on the alignment of the mean (for
instance,
ensuring that it is centered on the origin and
has unit
scale and some fixed but arbitrary orientation).

11
Procrustes Analysis

Procrustes analysis is a form of statistical
shape analysis used to analyse the distribution
of a set of shapes. Procrustes refers to a
character from Greek mythology who made his
victims fit his bed either by stretching their
limbs or cutting them off.
Here we consider objects made up from a finite
number k of points in n dimensions. The shape of
object can be considered as a member of an
equivalence class formed by removing the
translational, rotational and scaling components.
For example, translational components can be
removed from an object by translating the object
so that the mean of all the points lies at the
origin.
Likewise the scale component can be removed by
scaling the object so that the sum of the squared
distances from the points to the origin is 1
(s-size). The process finds the size of the
object
and dividing the points by the scale giving
points

12
Procrustes Analysis

Removing the rotational component is more
complex. Consider two objects with scale and
translation removed. Fix one of these and rotate
the other around the origin so that the sum of
the squared distances between the points is
minimised. A rotation by angle gives
The Procrustes distance is
The distance can be minimised by using a least
squares technique to find the angle ? which gives
the minimum distance.

13
Iterative Aligning Sets of Shapes
14
Statistical Models of Variation

Suppose we have s sets of n points xi in d
dimensions (usually two or three) that are
aligned into a common coordinate frame.
These vectors form a distribution in nd
dimensional space. If we can model this
distribution, we can generate new examples
similar to those in the original training set,
and we can examine new shapes to determine if
they are plausible examples.

15
Statistical Models of Variation

The approach is as follows
Compute the mean of the data
Compute the covariance of the data
Compute the eigenvectors Fi and corresponding
eigenvalues ?i of S (sorted so ?i ?i 1).
Efficient methods of computing the eigenvectors
and values exist for the case in which there are
fewer samples than dimensions in the vectors.

16
Face Shape Variation

The figure shows the first two most significant
modes of face
shape variation of a model built from examples of
a single
individual with different viewpoints and
expressions. The model has
learned that the 2D shape change caused by 3D
head rotation
causes the largest shape change.

Two modes of a face shape model (parameters
varied by 2s from the mean).
17
Statistical Models of Texture

To build a statistical model of the texture
(intensity or color over an image patch) one can
warp (modify) each example image so its feature
points match a reference shape (typically the
mean shape).
The warping can be achieved by using any
continuous deformation, such as piece-wise affine
using a triangulation of the region or an
interpolating spline. Warping to a reference
shape removes spurious texture variation due to
shape differences that would occur if we simply
performed eigenvector decomposition on the
un-normalized face patches (as in the eigenface
approach).
The intensity information is sampled from the
shape-normalized image over the region covered by
the mean shape to form a texture vector gim.
Although he main shape changes due to smiling
have been removed, there is considerable texture
difference from a purely neutral face. By varying
the elements of the texture parameter vector bg
within limits learned from the training set, one
can generate a variety of plausible
shape-normalized face textures.

Example of a labeled face image and the face
patch warped into the mean shape.
18
Fitting the Model to New Points

Goal to find the best pose and shape parameters
to match a model instance x to a new set of image
points Y.
Minimizing the sum of squared distances between
corresponding model and image points is
equivalent to minimizing the expression
More generally, one can allow different weights
for different points, S- shape transformation, b
is a shape, and Phi is a function on shape.
If the allowed global transformation St(.) is
more complex than a simple translation, this is a
nonlinear equation with no analytic solution.
However, a good approximation can be found
rapidly using a two-stage iterative approach.
Solve for the pose parameters t assuming a fixed
shape bs.
Solve for the shape parameters bs, assuming a
fixed pose.
Repeat until convergence.

19
Active Shape Models (ASM)

We assume we have an initial estimate for the
pose and shape parameters (eg the mean shape).
This is iteratively updated as follows
Look along normals through each model point to
find the best local match for the model of the
image appearance at that point (eg strongest
nearby edge)
Update the pose and shape parameters to best
fit the model instance to the found points
Repeat until convergence

20
Example of ASM failing

The figure demonstrates the Active Shape Model
(ASM) failing. The main facial features have been
found, but the local models searching for the
edges of the face have failed to locate their
correct positions, perhaps because they are too
far away. The ASM is a local method and prone to
local minima.

Example of ASM search failure. The search
profiles are not long enough to locate the edges
of the face.
21
Multiresolution Models

The performance can be significantly improved
using a multi-resolution implementation, in which
we start searching on a coarse level of a
Gaussian image pyramid, and progressively refine
If a facial appearance model is trained on a
sufficiently general set of data, it is able to
synthesize faces similar to those in target
images. If we can find the model parameters that
generate a face similar to the target, those
parameters imply the position of the facial
features and can be used directly for face
interpretation.
Both models and update matrices can be estimated
at a range of image resolutions (training on a
Gaussian image pyramid). We can then use a
Multiresolution search algorithm in which we
start at a coarse resolution and iterate to
convergence at each level before projecting the
current Solution to the next level of the model.
This is more efficient and can converge to the
correct solution from further away than search at
a single resolution.

22
Multiresolution Active Shape Models

To improve the efficiency and robustness of the
algorithm, it can be implemented in a
multiresolution framework.
This involves first searching for the object in a
coarse image and then refining the location in a
series of finer resolution images.
This leads to a faster algorithm and one that is
less likely to get stuck on the wrong image
structure.
Local models for each point are trained on each
level of a Gaussian image pyramid.
The Gaussian Pyramid is a hierarchy of low-pass
filtered versions of the original image, such
that successive levels correspond to lower
frequencies.

Search along sampled profile to find best fit of
gray-level model.
23
Example face modeling using acttive
multi-resolution method
Example of multi-resolution approach at highest
resolution. Left to right Initial, after 5
iterations, final model
24
http//www.cs.virginia.edu/gfx/Courses/2003/Intro
.fall.03/slides/morph_web/morph_images/pages/Slide
46.html
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Discussion

Open questions regarding the models include
How does one obtain accurate correspondences
across the training set?
What is the optimal choice of model size and
number of model modes?
What representation of image structure should be
modeled?
What is the best method for matching the model to
the image?

29
PART II

Modeling Shape and Changes in the Texture
Parametric Face Modeling and Tracking
Definitions and samples of modern work
Previous work on face tracking
Methods for parametric face modeling
Tracking Strategies
Illumination Modeling

30
Parametric Face Modeling and Tracking

In the previous section, models for describing
the (2D) appearance and geometry of faces were
discussed.
Let us now look at three-dimensional models and
how they are used for face tracking.
Whether we want to analyze a facial image (face
detection, tracking, recognition) or synthesize
one (computer graphics, face animation), we need
a model for the appearance and/or structure of
the human face.
Depending on the application, the model can be
simple (e.g. just an oval shape) or complex (e.g.
thousands of polygons in layers simulating bone
and layers of skin and muscles).
We usually wish to control appearance, structure
and motion of the model with a small number of
parameters, chosen so as to best represent the
variability likely to occur in the application.

31
Parametric Face Modeling and Tracking

When analyzing a sequence of images (or frames),
showing a moving face, the model might describe
not only the static appearance of the face but
also its dynamic behavior (i.e. the motion).
To be able to execute any further analysis of a
facial image (e.g. reconstruction), the position
of the face in the image is helpful, as is the
pose (i.e. the 3D position and orientation) of
the face.
The process of estimating position and pose
parameters from each frame in a sequence is
called tracking.
In contrast to face detection, we can utilize the
knowledge of position, pose and so on, of the
face in the previous image in the sequence.
This section explains the basics of parametric
face models used for face tracking as well as
fundamental strategies and methodologies for
tracking.

32
Face tracking in digital cameras

FotoNation Face Tracker
http//www.fotonation.com/index.php?moduleproduct
item23

33
Stereo Face tracking

Stereo tracking with two web cameras

Images captured by two cameras are used in self
calibration
34
Stereo Face tracking

Affordable 3D Face Tracking Using Projective
Vision
D.O. Gorodnichy, S. Malik, G. Roth Computational
Video Group, Ottawa

The StereoTracker at work. The orientation and
scale of the virtual man (at the bottom right) is
controlled by the position of the observed face.
35
Realistic Face Reconstruction and 3D Face
Tracking

INRIA MIRAGES Lab research (France)
In the very beginning the user creates, for each
image, a camera which is then manually positioned
in front of the image plane so that the
projection of the generic model matches
approximately the person's face on this image

36
Realistic Face Reconstruction and 3D Face
Tracking

INRIA MIRAGES Lab research (France)
User manually positions key points on the image
Model is adapted to changes

37
Realistic Face Reconstruction and 3D Face
Tracking

INRIA MIRAGES Lab research (France)
Bezier curves (green) drawn by the user and
computer generated model silhouettes (red)
Reconstruction system interface (right)

38
Tracking through background
Cha Zhang (Microsoft Research) uses background
segmentation for face identification and tracking
39
Previous Work in Face Tracking

A plethora of face trackers are available in the
literature. They differ in how they model the
face, how they track changes from one frame to
the next, if and how changes in illumination and
structure are handled, if they are susceptible to
drift, and if real- time performance is possible.
The presentation here is limited to monocular
systems (in contrast to stereo-vision) and 3D
tracking.
Li et al. estimated face motion in a simple 3D
model by a combination of prediction and a model
based least-squares solution to the optical flow
constraint equation.
LaCascia et al. used a cylindrical face model
with a parameterized texture being a linear
combination of texture warping templates and
orthogonal illumination templates. The 3D head
pose was derived by registering the texture map
captured from the new frame with the model
texture. Stable tracking was achieved via
regularized, weighted least-squares minimization
of the registration error.

40
Previous Work in Face Tracking

Malciu et al. used an ellipsoidal textured
wireframe model and minimized the registration
error and/or used the optical flow to estimate
the 3D pose.
DeCarlo et al. used a sophisticated face model
parameterized in a set of deformations. Rigid and
nonrigid motion was tracked by integrating
optical flow constraints and edge-based forces,
thereby preventing drift.
Wiles et al. tacked a set of hyperpatches (i.e.
representations of surface patches invariant to
motion and changing lighting).
Gokturk et al. developed a two-stage approach for
3D tracking of pose and deformations. The first
stage learns the possible deformations of 3D
faces by tracking stereo data. The second stage
simultaneously tracks the pose and deformation of
the face in the monocular image sequence using an
optical flow formulation associated with the
tracked features. A simple face model using 19
feature points was utilized.
Ahlberg et al. represented the face using a
deformable wireframe model with a statistical
texture. The active appearance models were used
to minimize the registration error. Because the
model allows deformation, rigid and nonrigid
motions are tracked.
Dornaika et al. extend the tracker with a step
based on random sampling and consensus to improve
the rigid 3D pose estimate.

41
Parametric Face Modeling

There are many ways to parameterize and model the
appearance and
behavior of the human face. The choice depends
on, among other things,
the application, the available resources, and the
display device.
The many kinds of variability being
modeled/parameterized include the
following
Three-dimensional motion and pose The dynamic,
3D position and rotation of the head. Tracking
involves estimating these parameters for each
frame in the video sequence.
Facial action Facial feature motion such as lip
and eyebrow motion.
Shape and feature configuration The shape of
the head, face and the facial features (e.g.
mouth, eyes). This could be estimated or assumed
to be known by the tracker.
Illumination The variability in appearance due
to different lighting conditions.
Texture and color The image pattern describing
the skin.
Expression Muscular synthesis of emotions
making the face look happy or sad, for example.

42
Parametric Face Modeling

Parametric Face Modeling and Tracking
Definitions and samples of current works
Previous work on face tracking
Methods for parametric face modeling
Eigenfaces
Facial Action Coding System
MPG-4 Facial Animation
Computer Graphics Models
Wireframe models
Projection models

43
PFM Eigenfaces

The space spanned by the eigenfaces is called the
face space.
Unfortunately, the manifold (distribution) of
facial images has a highly nonlinear structure.
For face tracking, it has been more popular to
linearize the face manifold by warping the facial
images to a standard pose and/or shape, thereby
creating shape-free, geometrically normalized, or
shape normalized images and eigenfaces (texture
templates, texture modes) that can be warped to
any face shape or texture-mapped onto a wireframe
face model.

44
PFM Facial Action Coding System

During the 1960s and 1970s, a system for
parameterizing minimal facial actions was
developed by
psychologists trying to analyze facial
expressions. The
system was called the Facial Action Coding System
(FACS) and describes each facial expression as a
combination of around 50 action units (AUs). Each
AU
represents the activation of one facial muscle.
The FACS has been popular tool not only for
psychology
studies but also for computerized facial
modeling. There
are also other models available in the
literature.

45
FACS Level of Description
FACS itself is purely descriptive and includes no
inferential labels. By converting FACS codes to
EMFACS or similar systems, face images may be
coded for emotion-specified expressions as well
as for more molar categories of positive or
negative emotion.
46
PFM MPG-4 Facial Animation

MPEG-4, since 1999 an international standard for
coding and representation of audiovisual objects,
contains definitions of face model parameters.
There are two sets of parameters facial
definition parameters (FDPs), which describe the
static appearance of the head, and facial
animation parameters (FAPs), which describe the
dynamics.
The FAPs describe the motion of certain feature
points, such as lip corners. Points on the face
model not directly affected by the FAPs are then
interpolated according to the face models own
motion model, which is not defined by MPEG-4
(complete face models can also be specified and
transmitted).
Typically, the FAP coefficients are used as morph
target weights, provided the face model has a
morph target for each FAP. The FDPs describe the
static shape of the face by the 3D coordinates
of each feature point (MPEG-4 defines 84 feature
points) and the texture as an image with the
corresponding texture coordinates.

47
PFM Computer Graphics Models

When synthesizing faces using computer graphics,
the most common model is a wireframe model or a
polygonal mesh. The face is then described as a
set of vertices connected with lines forming
polygons (usually triangles). The polygons are
shaded or texture-mapped, and illumination is
added. The texture could be parameterized or
fixed in the latter case, facial appearance is
changed by moving the vertices only.
To achieve life-like animation of the face, a
large number (thousands) of vertices and polygons
are commonly used. Each vertex can move in three
dimensions, so the model requires a large number
of degrees of freedom. To reduce this number,
some kind of parameterization is needed.
A commonly adopted solution is to create a set of
morph targets
and blend between them. A morph target is a
predefined set of
vertex positions, where each morph target
represents, for example,
a facial expression or a viseme.

48
PFM Wireframe Face Model

Candide is a simple face model that has been a
popular research
tool for many years. It was originally created by
Rydfalk and later
extended by Welsh to cover the entire head
(Candide-2) and by
Ahlberg to correspond better to MPEG-4 facial
animation (Candide-
3). The simplicity of the model makes it a good
pedagogic
example.
Candide is a wireframe model with 113 vertices
connected by lines
forming 184 triangular surfaces. The geometry
(shape, structure) is
determined by the 3D coordinates of the vertices
in a model-
centered coordinate system (x, y, z). To modify
the geometry,
Candide-1 and Candide-2 implement a set of action
units from
FACS. Each action is implemented as a list of
vertex displacements,
an action unit vector, describing the change in
face geometry when
the action unit is fully activated.

49
PFM Projection Models
There are general projection models representing
the camera. (Parameters may be known to calibrate
camera) or unknown (uncalibrated). Skewness
and rotation can sometime play role as
well. Perspective projection and weak
perspective projection (an approximation of
perspective projection where depth variation is
small) are used.
50
Example of CMU head tracking
Example of the CMU S2 3D head tracking, including
re-registration after losing the head.
51
Tracking

Parametric Face Modeling and Tracking
Definitions and samples of current works
Previous work on face tracking
Methods for parametric face modeling
Tracking Strategies
Motion-based and Model-based
Classification
First-frame
Statistical
Appearance based
Feature based
Example of first-frame model-based and
feature-based tracker
Conclusions on face tracking

52
Tracking Strategies

A face tracking system estimates the rigid or
nonrigid motion of a face through a sequence of
image frames.
Tracking systems can be said to be either
motion-based or model-based, sometimes referred
to as feed-forward or feed-back motion
estimation.

53
Motion-based tracker

A motion-based tracker estimates the
displacements of pixels (or blocks of pixels)
from one frame to another. The displacements
might be estimated using optical flow methods
(giving a dense optical flow field), block-based
motion estimation methods (giving a sparse field
but using less computation power), or motion
estimation in a few image patches only (giving a
few motion vectors but at a very low
computational cost).
The estimated motion field is then used to
compute the motion of the object model. The
motion estimation in such a method is
consequently dependent on the pixels in two
frames the object model is used only for
transforming the 2D motion vectors to 3D object
model motion. The problem with such methods is
the drifting or the long sequence motion problem.
A tracker of this kind accumulates motion errors
and eventually loses track of the face.

54
Model-based trackers

A model-based tracker, on the other hand, uses a
model of the objects appearance and tries to
change the object models pose (and possibly
shape) parameters to fit the new frame. The
motion estimation is thus dependent on the object
model and the new frame the old frame is not
regarded except for constraining the search
space.
Such a tracker does not suffer from drifting
instead, problems arise when the model is not
strong or flexible enough to cope with the
situation in the new frame.

55
First-frame Model-based Trackers

In general, the word model refers to any prior
knowledge about
the 3D structure, the 3D motion/dynamics and the
2D facial
appearance.
First-frame models One of the main issues when
designing a model-based tracker is the appearance
model. An obvious approach is to capture a
reference image of the object at the beginning of
the sequence.
The image could then be geometrically transformed
according to
the estimated motion parameters, so one can
compensate for
changes in scale and rotation (and possibly
nonrigid motion).
Because the image is captured, the appearance
model is
deterministic, object-specific and accurate.

56
Statistical-based Model-based Trackers

A drawback with such a first-frame model is the
lack of flexibility
it is difficult to generalize from one sample
only. Another property is that the tracker does
not know what it is tracking.
A different approach is a statistical model-based
tracker. Here, the
appearance model relies on previously captured
images combined
with knowledge of which parts or positions of the
images
correspond to the various facial features. When
the model is
transformed to fit the new frame, we thus obtain
information about
the estimated positions of those specific facial
features.

57
Appearance-based and Feature-based Tracking

The problem of finding the optimal parameters is
a high-dimensional search problem and thus of
high computational complexity. By using clever
heuristics (e.g., the active appearance models),
we can reduce the search time.
An appearance-based or featureless tracker
matches a model of the entire facial appearance
with the input image, trying to exploit all
available information in the model as well as the
image.
A feature-based tracker, on the other hand,
chooses a few facial features that are,
supposedly, easily and robustly tracked. Features
such as color, specific points or patches, and
edges can be used.
Typically, a tracker is based on the feature
points tries, in the rigid motion case, to
estimate the 2D position of a set of points and
from these points compute the 3D pose of the face.

58
EXAMPLE Feature-based Tracking

The tracker described next tracks a set of
feature points in an image sequence and uses the
2D measurements to calculate the 3D structure and
motion of the head.
The tracker is based on the structure from motion
(SFM) algorithm by Azerbayejani and Pentland. The
face tracker was then developed by Jebara and
Pentland and further by Strom et al.
The tracker estimates the 3D pose and structure
of a rigid object as well as the cameras focal
length. With the terminology above, it is
first-frame model-based and feature-based tracker.

59
EXAMPLE Face Model Parameterization

The tracker designed by Jebara and Pentland
estimated a model as a set of points with no
surface. Strom et al. extended the system to
include a wireframe face model. A set of feature
points are placed on the surface of the model,
not necessarily coinciding with the model
vertices. The face model gives the system several
advantages, such as being able to predict the
surface angle relative to the camera as well as
self-occlusion. Thus the tracker can predict when
some measurements should not be trusted. The face
model used by Strom was a modified version of
Candide.
The pose in the kth frame is parameterized with
three rotation angles (rx, ry, rz), three
translation parameters (tx, ty, tz), and the
inverse focal length F 1/f of the camera. In
practice, the z-translation should be
parameterized by ? tz F instead of tz for
stability reasons.
The structure of the face is represented by the
image coordinates (u0, v0) and the depth values
z0 of the feature points in the first frame.

60
Example Extended Kalman Filtering and Structure
from Motion

A Kalman filter is used to estimate the dynamic
changes of a state vector of which only a
function can be observed. When the function is
nonlinear, we must use an extended Kalman filter
(EKF).
The tracker must be initialized, for example, by
letting the user place his head in a certain
position and with the face toward the camera or
by using a face detection algorithm. The model
texture is captured from the image and stored as
a reference, and feature points are automatically
extracted. To select feature points that could be
reliably tracked, points where the determinant of
the Hessian
is large are used. The determinant is weighted
with the cosine of the angle between the model
surface normal and the camera direction. The
number of feature points to select is limited
only by the available computational power and the
realtime requirements. At least seven points are
needed for the tracker to work, and more are
preferable. Strom used 24 feature points and was
able to achieve realtime performance.

61
Example Tracking Process

Using the face model and the values from the
normalized template matching, the measurement
noise covariance matrix can be estimated making
the Kalman filter rely on some measurements more
than others.
Note that this also tells the Kalman filter in
which directions in the image the measurements
are reliable. For example, a feature point on an
edge (e.g. the mouth outline) can reliably be
placed in the direction perpendicular to the edge
but less reliably along the edge.

62
Example Tracking Process
Patches from the rendered image (lower left) are
matched with the incoming video. The
two-dimensional feature point trajectories are
fed through the structure from motion (SfM)
extended Kalman filter, which estimates the pose
information needed to render the next model view.
For calrity, only 4 of 24 patches are shown.
63
Tracking Results
Tracking results on two test sequences. Every
tenth frame is shown.
64
Example Tracking Results

The initial test shows that the system is able to
track a previously unseen person in a
subjectively
accurate way. Some important issues to be
addressed are-
Speed Can the system run in real time?
Robustness Can the system cope with varying
illumination, facial expressions, and large head
motion? Apparently, track is sometimes lost. One
way to increase robustness is to combine the
tracker with a feature-based step. To improve
robustness to varying illumination conditions, an
illumination basis could be added to the texture
parameterization.
Accuracy How accurate is the tracking? Ahlberg
and Forchheimer describe a system that tracks a
synthetic sequence where the true parameters are
known.

65
Example Tracking Optimization

To optimize the algorithm, three potentially
time-consuming parts within each iteration need
to
be taken care of
Shape normalization Using dedicated graphics
hardware for texture mapping or performing
certain parts of the computation offline.
Analysis-synthesis The projection of the
shape-normalized input image onto the texture
modes and generation of the model texture has a
complexity that grows linearly with the number of
texture modes used.
Residual image and update vector computation The
complexity grows linearly with the number of
parameters to extract. However, it can be
performed very quickly y exploiting the vector
instructions available in many modern CPUs.

66
Conclusions for Face Tracking

Whereas motion-based trackers may suffer from
drifting, model-based trackers do not have that
problem.
Appearance and feature-based trackers follow
different basic principles and have different
characteristics.

67
Part III

Modeling Shape and Changes in the Texture
Parametric Face Modeling and Tracking
Illumination Modeling

68
Illumination Modeling

Changes in lighting can produce large variability
in the
appearance of faces. One way to measure the
difficulties presented by lighting, or any
variability, is
the number of degrees of freedom needed to
describe it.
For example, the pose of a face relative to the
camera has six degrees of freedom three
rotations,
and three translations. Facial expression has
tens
of degrees of freedom if one considers the number
of
muscles that may contract to change expression.

69
Illumination Modeling

To describe the light that strikes a face, we
must describe the intensity of light hitting each
point on the face from each direction.
Light is a function of position and direction,
meaning that light has an infinite number of
degrees of freedom. However, effective systems
can account for the effects of lighting using
fewer than 10 degrees of freedom. This can have
considerable impact on the speed and accuracy of
recognition systems.
Support for low-dimensional models is both
empirical and theoretical. Principal Component
Analysis (PCA) on images of a face obtained under
various lighting conditions shows that this image
set is well approximated by a low-dimensional,
linear subspace of the space of all images.
Experimentation shows that algorithms that take
advantage of this observation can achieve high
performance.

70
Illumination Modeling

An alternate stream of work attempts to
compensate for lighting effects without the use
of 3D face models. This work matches directly 2D
images using representations of images that are
found to be insensitive to lighting variations.
These include image gradients, Gabor jets, the
direction of image gradients and projections to
subspaces derived from linear discriminants.
These methods are certainly of interest,
especially for applications in which 3D face
models are not available. However, methods based
on 3D models may be more powerful, as they have
the potential to compensate completely for
lighting changes, whereas 2D methods cannot
achieve such invariance.

71
Illumination Modeling

Building truly accurate models of the way the
face reflects light is a complex task. This is in
part because skin is not homogeneous light
striking the face may b reflected by oils or
water on the skin, by melanin in the epidermis,
or by hemoglobin in the dermis.
Based on empirical measurements of skin,
Marschner et al. state The BRDF (Bidirectional
Reflectance Distribution Function) itself is
quite unusual at small incidence angles it is
almost Lambertian, but at higher angles strong
forward scattering emerges.
Furthermore, light entering the skin at one
point may scatter below the surface of the skin,
and exit from another point. This phenomenon,
known as subsurface scattering, cannot be modeled
by a bidirectional reflectance function (BRDF),
which assumes that light leaves a surface from
the point that it strikes it. Jensen et al.
presented one model of subsurface scattering.

72
Illumination Modeling

For purposes of realistic computer graphics, this
complexity must be confronted in some way. For
example, Borshukov and Lewis reported that in The
Matrix Reloaded, they began by modeling face
reflectance using a Lambertian diffuse component
and a modified Phong model to account for a
Fresnel-like effect. As production progressed it
became increasingly clear that realistic skin
rendering couldnt be achieved without subsurface
scattering simulations.

73
Illumination Modeling

However, simpler models may be adequate for face
recognition. This suggests that even if one
wishes to model face reflectance more accurately,
simple models may provide useful, approximate
algorithms that can initialize more complex ones.
Thus, one can discuss analytically derived
representation of the images produced by a
convex, Lambertian object illuminated by distant
light sources. One can also restrict
consideration to convex objects so we can ignore
the effect of shadows cast by one part of the
object on another part of it.
One can also assume that the surface of the
object reflects light according to Lamberts law,
which states that materials absorb and reflect it
uniformly in all directions.

74
Illumination Modeling

Other researchers (Z. Zhang, Microsoft Research)
deal with Face Re-Lighting from a Single Image
under Harsh Lighting Conditions and modeling
synthetic illumination/reflection conditions.
Left- real image, right synthetic image