Modeling Facial Shape and Appearance

- Shape and Changes in the Texture
- Parametric Face Modeling and Tracking
- Illumination Modeling

Outline

- Modeling Shape and Changes in the Texture
- Parametric Face Modeling and Tracking
- Illumination Modeling

Modeling Facial Shape and Appearance

- To interpret images of faces, it is important to

have a - model of how the face can appear.
- Changes can be broken down into two
- parts changes in shape and changes in texture
- (patterns of pixel values) across the face.
- The lecture describes a powerful method of

generating compact models of shape and texture

variation and describe how such models can be

used to interpret images of faces.

Statistical Shape Analysis

Statistical Shape Analysis

- Statistical shape analysis is a geometrical

analysis from a set of shapes in which statistics

are measured to describe geometrical properties

from similar shapes or different groups, for

instance, the difference between face or hand

shapes.

Example -Hands

- Training set
- By varying the first three parameters of the

shape vector, one at a time, one can demonstrate

some of the modes of variation allowed by the

model (http//www.isbe.man.ac.uk/research/Flexible

_Models/pdms.html) - Each row obtained by varying a parameter and

fixing others at zero

PART I

- Modeling Shape and Changes in the Texture
- Statistical Models (Appearance, Shape)
- Procrustes analysis for aligning set of shapes
- Statistical Models of Variation and Texture
- Fitting model to new points
- Active Shape Models
- Parametric Face Modeling and Tracking
- Illumination Modeling

Statistical Models of Appearance

- To build models of facial appearance and its

variation one can adopt a statistical approach,

learning the ways in which the shape and texture

of the face vary across a range of images. - The method relies on obtaining a suitably large,

representative training set of facial images,

each of which is annotated with a set of feature

points defining correspondences across the set. - The positions of the feature points are used to

define the shape of the face and are analyzed to

learn the ways in which the shape can vary. - The patterns of intensities are then analyzed to

learn the ways in which the texture can vary.

Statistical Shape Models

- Building a statistical model requires a set of

training images. The set should be chosen so it

covers the types of variation one wish the model

to represent. - For instance, if we are interested only in faces

with neutral expressions, we should include only

neutral expressions in the model. - If however, we wish to be able to synthesize and

recognize a range of expressions, the training

set should include images of people smiling,

frowning, winking and so on.

Statistical Shape Models

- Another approach s that each face must be

annotated with a set of points defining the key

facial features. These points are used to define

the correspondences across the training set and

represent the shape of the face in the image.

Thus the same number of points should be placed

on each image and with the same set of labels. - The number of such points can be varied from a

few to a few thousands and they can be 2D or 3D

points.

Example of 68 points defining facial features.

Aligning Sets of Shapes

- There is considerable literature on methods of

aligning - shapes into a common coordinate frame, the most
- popular approach being Procrustes analysis. The
- transforms of each shape in a set, xi, so the sum

of - squared distances of the shape to the mean
- is minimized. It is poorly defined unless

constraints are - placed on the alignment of the mean (for

instance, - ensuring that it is centered on the origin and

has unit - scale and some fixed but arbitrary orientation).

Procrustes Analysis

- Procrustes analysis is a form of statistical

shape analysis used to analyse the distribution

of a set of shapes. Procrustes refers to a

character from Greek mythology who made his

victims fit his bed either by stretching their

limbs or cutting them off. - Here we consider objects made up from a finite

number k of points in n dimensions. The shape of

object can be considered as a member of an

equivalence class formed by removing the

translational, rotational and scaling components. - For example, translational components can be

removed from an object by translating the object

so that the mean of all the points lies at the

origin. - Likewise the scale component can be removed by

scaling the object so that the sum of the squared

distances from the points to the origin is 1

(s-size). The process finds the size of the

object - and dividing the points by the scale giving

points

Procrustes Analysis

- Removing the rotational component is more

complex. Consider two objects with scale and

translation removed. Fix one of these and rotate

the other around the origin so that the sum of

the squared distances between the points is

minimised. A rotation by angle gives - The Procrustes distance is
- The distance can be minimised by using a least

squares technique to find the angle ? which gives

the minimum distance.

Iterative Aligning Sets of Shapes

Statistical Models of Variation

- Suppose we have s sets of n points xi in d

dimensions (usually two or three) that are

aligned into a common coordinate frame. - These vectors form a distribution in nd

dimensional space. If we can model this

distribution, we can generate new examples

similar to those in the original training set,

and we can examine new shapes to determine if

they are plausible examples.

Statistical Models of Variation

- The approach is as follows
- Compute the mean of the data
- Compute the covariance of the data
- Compute the eigenvectors Fi and corresponding

eigenvalues ?i of S (sorted so ?i ?i 1).

Efficient methods of computing the eigenvectors

and values exist for the case in which there are

fewer samples than dimensions in the vectors.

Face Shape Variation

- The figure shows the first two most significant

modes of face - shape variation of a model built from examples of

a single - individual with different viewpoints and

expressions. The model has - learned that the 2D shape change caused by 3D

head rotation - causes the largest shape change.

Two modes of a face shape model (parameters

varied by 2s from the mean).

Statistical Models of Texture

- To build a statistical model of the texture

(intensity or color over an image patch) one can

warp (modify) each example image so its feature

points match a reference shape (typically the

mean shape). - The warping can be achieved by using any

continuous deformation, such as piece-wise affine

using a triangulation of the region or an

interpolating spline. Warping to a reference

shape removes spurious texture variation due to

shape differences that would occur if we simply

performed eigenvector decomposition on the

un-normalized face patches (as in the eigenface

approach). - The intensity information is sampled from the

shape-normalized image over the region covered by

the mean shape to form a texture vector gim. - Although he main shape changes due to smiling

have been removed, there is considerable texture

difference from a purely neutral face. By varying

the elements of the texture parameter vector bg

within limits learned from the training set, one

can generate a variety of plausible

shape-normalized face textures.

Example of a labeled face image and the face

patch warped into the mean shape.

Fitting the Model to New Points

- Goal to find the best pose and shape parameters

to match a model instance x to a new set of image

points Y. - Minimizing the sum of squared distances between

corresponding model and image points is

equivalent to minimizing the expression - More generally, one can allow different weights

for different points, S- shape transformation, b

is a shape, and Phi is a function on shape. - If the allowed global transformation St(.) is

more complex than a simple translation, this is a

nonlinear equation with no analytic solution.

However, a good approximation can be found

rapidly using a two-stage iterative approach. - Solve for the pose parameters t assuming a fixed

shape bs. - Solve for the shape parameters bs, assuming a

fixed pose. - Repeat until convergence.

Active Shape Models (ASM)

- We assume we have an initial estimate for the

pose and shape parameters (eg the mean shape).

This is iteratively updated as follows - Look along normals through each model point to

find the best local match for the model of the

image appearance at that point (eg strongest

nearby edge) - Update the pose and shape parameters to best

fit the model instance to the found points - Repeat until convergence

Example of ASM failing

- The figure demonstrates the Active Shape Model

(ASM) failing. The main facial features have been

found, but the local models searching for the

edges of the face have failed to locate their

correct positions, perhaps because they are too

far away. The ASM is a local method and prone to

local minima.

Example of ASM search failure. The search

profiles are not long enough to locate the edges

of the face.

Multiresolution Models

- The performance can be significantly improved

using a multi-resolution implementation, in which

we start searching on a coarse level of a

Gaussian image pyramid, and progressively refine - If a facial appearance model is trained on a

sufficiently general set of data, it is able to

synthesize faces similar to those in target

images. If we can find the model parameters that

generate a face similar to the target, those

parameters imply the position of the facial

features and can be used directly for face

interpretation. - Both models and update matrices can be estimated

at a range of image resolutions (training on a

Gaussian image pyramid). We can then use a

Multiresolution search algorithm in which we

start at a coarse resolution and iterate to

convergence at each level before projecting the

current Solution to the next level of the model.

This is more efficient and can converge to the

correct solution from further away than search at

a single resolution.

Multiresolution Active Shape Models

- To improve the efficiency and robustness of the

algorithm, it can be implemented in a

multiresolution framework. - This involves first searching for the object in a

coarse image and then refining the location in a

series of finer resolution images. - This leads to a faster algorithm and one that is

less likely to get stuck on the wrong image

structure. - Local models for each point are trained on each

level of a Gaussian image pyramid. - The Gaussian Pyramid is a hierarchy of low-pass

filtered versions of the original image, such

that successive levels correspond to lower

frequencies.

Search along sampled profile to find best fit of

gray-level model.

Example face modeling using acttive

multi-resolution method

Example of multi-resolution approach at highest

resolution. Left to right Initial, after 5

iterations, final model

http//www.cs.virginia.edu/gfx/Courses/2003/Intro

.fall.03/slides/morph_web/morph_images/pages/Slide

46.html

(No Transcript)

(No Transcript)

(No Transcript)

Discussion

- Open questions regarding the models include
- How does one obtain accurate correspondences

across the training set? - What is the optimal choice of model size and

number of model modes? - What representation of image structure should be

modeled? - What is the best method for matching the model to

the image?

PART II

- Modeling Shape and Changes in the Texture
- Parametric Face Modeling and Tracking
- Definitions and samples of modern work
- Previous work on face tracking
- Methods for parametric face modeling
- Tracking Strategies
- Illumination Modeling

Parametric Face Modeling and Tracking

- In the previous section, models for describing

the (2D) appearance and geometry of faces were

discussed. - Let us now look at three-dimensional models and

how they are used for face tracking. - Whether we want to analyze a facial image (face

detection, tracking, recognition) or synthesize

one (computer graphics, face animation), we need

a model for the appearance and/or structure of

the human face. - Depending on the application, the model can be

simple (e.g. just an oval shape) or complex (e.g.

thousands of polygons in layers simulating bone

and layers of skin and muscles). - We usually wish to control appearance, structure

and motion of the model with a small number of

parameters, chosen so as to best represent the

variability likely to occur in the application.

Parametric Face Modeling and Tracking

- When analyzing a sequence of images (or frames),

showing a moving face, the model might describe

not only the static appearance of the face but

also its dynamic behavior (i.e. the motion). - To be able to execute any further analysis of a

facial image (e.g. reconstruction), the position

of the face in the image is helpful, as is the

pose (i.e. the 3D position and orientation) of

the face. - The process of estimating position and pose

parameters from each frame in a sequence is

called tracking. - In contrast to face detection, we can utilize the

knowledge of position, pose and so on, of the

face in the previous image in the sequence. - This section explains the basics of parametric

face models used for face tracking as well as

fundamental strategies and methodologies for

tracking.

Face tracking in digital cameras

- FotoNation Face Tracker
- http//www.fotonation.com/index.php?moduleproduct

item23

Stereo Face tracking

- Stereo tracking with two web cameras

Images captured by two cameras are used in self

calibration

Stereo Face tracking

- Affordable 3D Face Tracking Using Projective

Vision - D.O. Gorodnichy, S. Malik, G. Roth Computational

Video Group, Ottawa

The StereoTracker at work. The orientation and

scale of the virtual man (at the bottom right) is

controlled by the position of the observed face.

Realistic Face Reconstruction and 3D Face

Tracking

- INRIA MIRAGES Lab research (France)
- In the very beginning the user creates, for each

image, a camera which is then manually positioned

in front of the image plane so that the

projection of the generic model matches

approximately the person's face on this image

Realistic Face Reconstruction and 3D Face

Tracking

- INRIA MIRAGES Lab research (France)
- User manually positions key points on the image
- Model is adapted to changes

Realistic Face Reconstruction and 3D Face

Tracking

- INRIA MIRAGES Lab research (France)
- Bezier curves (green) drawn by the user and

computer generated model silhouettes (red) - Reconstruction system interface (right)

Tracking through background

Cha Zhang (Microsoft Research) uses background

segmentation for face identification and tracking

Previous Work in Face Tracking

- A plethora of face trackers are available in the

literature. They differ in how they model the

face, how they track changes from one frame to

the next, if and how changes in illumination and

structure are handled, if they are susceptible to

drift, and if real- time performance is possible.

The presentation here is limited to monocular

systems (in contrast to stereo-vision) and 3D

tracking. - Li et al. estimated face motion in a simple 3D

model by a combination of prediction and a model

based least-squares solution to the optical flow

constraint equation. - LaCascia et al. used a cylindrical face model

with a parameterized texture being a linear

combination of texture warping templates and

orthogonal illumination templates. The 3D head

pose was derived by registering the texture map

captured from the new frame with the model

texture. Stable tracking was achieved via

regularized, weighted least-squares minimization

of the registration error.

Previous Work in Face Tracking

- Malciu et al. used an ellipsoidal textured

wireframe model and minimized the registration

error and/or used the optical flow to estimate

the 3D pose. - DeCarlo et al. used a sophisticated face model

parameterized in a set of deformations. Rigid and

nonrigid motion was tracked by integrating

optical flow constraints and edge-based forces,

thereby preventing drift. - Wiles et al. tacked a set of hyperpatches (i.e.

representations of surface patches invariant to

motion and changing lighting). - Gokturk et al. developed a two-stage approach for

3D tracking of pose and deformations. The first

stage learns the possible deformations of 3D

faces by tracking stereo data. The second stage

simultaneously tracks the pose and deformation of

the face in the monocular image sequence using an

optical flow formulation associated with the

tracked features. A simple face model using 19

feature points was utilized. - Ahlberg et al. represented the face using a

deformable wireframe model with a statistical

texture. The active appearance models were used

to minimize the registration error. Because the

model allows deformation, rigid and nonrigid

motions are tracked. - Dornaika et al. extend the tracker with a step

based on random sampling and consensus to improve

the rigid 3D pose estimate.

Parametric Face Modeling

- There are many ways to parameterize and model the

appearance and - behavior of the human face. The choice depends

on, among other things, - the application, the available resources, and the

display device. - The many kinds of variability being

modeled/parameterized include the - following
- Three-dimensional motion and pose The dynamic,

3D position and rotation of the head. Tracking

involves estimating these parameters for each

frame in the video sequence. - Facial action Facial feature motion such as lip

and eyebrow motion. - Shape and feature configuration The shape of

the head, face and the facial features (e.g.

mouth, eyes). This could be estimated or assumed

to be known by the tracker. - Illumination The variability in appearance due

to different lighting conditions. - Texture and color The image pattern describing

the skin. - Expression Muscular synthesis of emotions

making the face look happy or sad, for example.

Parametric Face Modeling

- Parametric Face Modeling and Tracking
- Definitions and samples of current works
- Previous work on face tracking
- Methods for parametric face modeling
- Eigenfaces
- Facial Action Coding System
- MPG-4 Facial Animation
- Computer Graphics Models
- Wireframe models
- Projection models

PFM Eigenfaces

- The space spanned by the eigenfaces is called the

face space. - Unfortunately, the manifold (distribution) of

facial images has a highly nonlinear structure. - For face tracking, it has been more popular to

linearize the face manifold by warping the facial

images to a standard pose and/or shape, thereby

creating shape-free, geometrically normalized, or

shape normalized images and eigenfaces (texture

templates, texture modes) that can be warped to

any face shape or texture-mapped onto a wireframe

face model.

PFM Facial Action Coding System

- During the 1960s and 1970s, a system for
- parameterizing minimal facial actions was

developed by - psychologists trying to analyze facial

expressions. The - system was called the Facial Action Coding System
- (FACS) and describes each facial expression as a
- combination of around 50 action units (AUs). Each

AU - represents the activation of one facial muscle.
- The FACS has been popular tool not only for

psychology - studies but also for computerized facial

modeling. There - are also other models available in the

literature.

FACS Level of Description

FACS itself is purely descriptive and includes no

inferential labels. By converting FACS codes to

EMFACS or similar systems, face images may be

coded for emotion-specified expressions as well

as for more molar categories of positive or

negative emotion.

PFM MPG-4 Facial Animation

- MPEG-4, since 1999 an international standard for

coding and representation of audiovisual objects,

contains definitions of face model parameters.

There are two sets of parameters facial

definition parameters (FDPs), which describe the

static appearance of the head, and facial

animation parameters (FAPs), which describe the

dynamics. - The FAPs describe the motion of certain feature

points, such as lip corners. Points on the face

model not directly affected by the FAPs are then

interpolated according to the face models own

motion model, which is not defined by MPEG-4

(complete face models can also be specified and

transmitted). - Typically, the FAP coefficients are used as morph

target weights, provided the face model has a

morph target for each FAP. The FDPs describe the

static shape of the face by the 3D coordinates

of each feature point (MPEG-4 defines 84 feature

points) and the texture as an image with the

corresponding texture coordinates.

PFM Computer Graphics Models

- When synthesizing faces using computer graphics,

the most common model is a wireframe model or a

polygonal mesh. The face is then described as a

set of vertices connected with lines forming

polygons (usually triangles). The polygons are

shaded or texture-mapped, and illumination is

added. The texture could be parameterized or

fixed in the latter case, facial appearance is

changed by moving the vertices only. - To achieve life-like animation of the face, a

large number (thousands) of vertices and polygons

are commonly used. Each vertex can move in three

dimensions, so the model requires a large number

of degrees of freedom. To reduce this number,

some kind of parameterization is needed. - A commonly adopted solution is to create a set of

morph targets - and blend between them. A morph target is a

predefined set of - vertex positions, where each morph target

represents, for example, - a facial expression or a viseme.

PFM Wireframe Face Model

- Candide is a simple face model that has been a

popular research - tool for many years. It was originally created by

Rydfalk and later - extended by Welsh to cover the entire head

(Candide-2) and by - Ahlberg to correspond better to MPEG-4 facial

animation (Candide- - 3). The simplicity of the model makes it a good

pedagogic - example.
- Candide is a wireframe model with 113 vertices

connected by lines - forming 184 triangular surfaces. The geometry

(shape, structure) is - determined by the 3D coordinates of the vertices

in a model- - centered coordinate system (x, y, z). To modify

the geometry, - Candide-1 and Candide-2 implement a set of action

units from - FACS. Each action is implemented as a list of

vertex displacements, - an action unit vector, describing the change in

face geometry when - the action unit is fully activated.

PFM Projection Models

There are general projection models representing

the camera. (Parameters may be known to calibrate

camera) or unknown (uncalibrated). Skewness

and rotation can sometime play role as

well. Perspective projection and weak

perspective projection (an approximation of

perspective projection where depth variation is

small) are used.

Example of CMU head tracking

Example of the CMU S2 3D head tracking, including

re-registration after losing the head.

Tracking

- Parametric Face Modeling and Tracking
- Definitions and samples of current works
- Previous work on face tracking
- Methods for parametric face modeling
- Tracking Strategies
- Motion-based and Model-based
- Classification
- First-frame
- Statistical
- Appearance based
- Feature based
- Example of first-frame model-based and

feature-based tracker - Conclusions on face tracking

Tracking Strategies

- A face tracking system estimates the rigid or

nonrigid motion of a face through a sequence of

image frames. - Tracking systems can be said to be either

motion-based or model-based, sometimes referred

to as feed-forward or feed-back motion

estimation.

Motion-based tracker

- A motion-based tracker estimates the

displacements of pixels (or blocks of pixels)

from one frame to another. The displacements

might be estimated using optical flow methods

(giving a dense optical flow field), block-based

motion estimation methods (giving a sparse field

but using less computation power), or motion

estimation in a few image patches only (giving a

few motion vectors but at a very low

computational cost). - The estimated motion field is then used to

compute the motion of the object model. The

motion estimation in such a method is

consequently dependent on the pixels in two

frames the object model is used only for

transforming the 2D motion vectors to 3D object

model motion. The problem with such methods is

the drifting or the long sequence motion problem.

A tracker of this kind accumulates motion errors

and eventually loses track of the face.

Model-based trackers

- A model-based tracker, on the other hand, uses a

model of the objects appearance and tries to

change the object models pose (and possibly

shape) parameters to fit the new frame. The

motion estimation is thus dependent on the object

model and the new frame the old frame is not

regarded except for constraining the search

space. - Such a tracker does not suffer from drifting

instead, problems arise when the model is not

strong or flexible enough to cope with the

situation in the new frame.

First-frame Model-based Trackers

- In general, the word model refers to any prior

knowledge about - the 3D structure, the 3D motion/dynamics and the

2D facial - appearance.
- First-frame models One of the main issues when

designing a model-based tracker is the appearance

model. An obvious approach is to capture a

reference image of the object at the beginning of

the sequence. - The image could then be geometrically transformed

according to - the estimated motion parameters, so one can

compensate for - changes in scale and rotation (and possibly

nonrigid motion). - Because the image is captured, the appearance

model is - deterministic, object-specific and accurate.

Statistical-based Model-based Trackers

- A drawback with such a first-frame model is the

lack of flexibility - it is difficult to generalize from one sample

only. Another property is that the tracker does

not know what it is tracking. - A different approach is a statistical model-based

tracker. Here, the - appearance model relies on previously captured

images combined - with knowledge of which parts or positions of the

images - correspond to the various facial features. When

the model is - transformed to fit the new frame, we thus obtain

information about - the estimated positions of those specific facial

features.

Appearance-based and Feature-based Tracking

- The problem of finding the optimal parameters is

a high-dimensional search problem and thus of

high computational complexity. By using clever

heuristics (e.g., the active appearance models),

we can reduce the search time. - An appearance-based or featureless tracker

matches a model of the entire facial appearance

with the input image, trying to exploit all

available information in the model as well as the

image. - A feature-based tracker, on the other hand,

chooses a few facial features that are,

supposedly, easily and robustly tracked. Features

such as color, specific points or patches, and

edges can be used. - Typically, a tracker is based on the feature

points tries, in the rigid motion case, to

estimate the 2D position of a set of points and

from these points compute the 3D pose of the face.

EXAMPLE Feature-based Tracking

- The tracker described next tracks a set of

feature points in an image sequence and uses the

2D measurements to calculate the 3D structure and

motion of the head. - The tracker is based on the structure from motion

(SFM) algorithm by Azerbayejani and Pentland. The

face tracker was then developed by Jebara and

Pentland and further by Strom et al. - The tracker estimates the 3D pose and structure

of a rigid object as well as the cameras focal

length. With the terminology above, it is

first-frame model-based and feature-based tracker.

EXAMPLE Face Model Parameterization

- The tracker designed by Jebara and Pentland

estimated a model as a set of points with no

surface. Strom et al. extended the system to

include a wireframe face model. A set of feature

points are placed on the surface of the model,

not necessarily coinciding with the model

vertices. The face model gives the system several

advantages, such as being able to predict the

surface angle relative to the camera as well as

self-occlusion. Thus the tracker can predict when

some measurements should not be trusted. The face

model used by Strom was a modified version of

Candide. - The pose in the kth frame is parameterized with

three rotation angles (rx, ry, rz), three

translation parameters (tx, ty, tz), and the

inverse focal length F 1/f of the camera. In

practice, the z-translation should be

parameterized by ? tz F instead of tz for

stability reasons. - The structure of the face is represented by the

image coordinates (u0, v0) and the depth values

z0 of the feature points in the first frame.

Example Extended Kalman Filtering and Structure

from Motion

- A Kalman filter is used to estimate the dynamic

changes of a state vector of which only a

function can be observed. When the function is

nonlinear, we must use an extended Kalman filter

(EKF). - The tracker must be initialized, for example, by

letting the user place his head in a certain

position and with the face toward the camera or

by using a face detection algorithm. The model

texture is captured from the image and stored as

a reference, and feature points are automatically

extracted. To select feature points that could be

reliably tracked, points where the determinant of

the Hessian - is large are used. The determinant is weighted

with the cosine of the angle between the model

surface normal and the camera direction. The

number of feature points to select is limited

only by the available computational power and the

realtime requirements. At least seven points are

needed for the tracker to work, and more are

preferable. Strom used 24 feature points and was

able to achieve realtime performance.

Example Tracking Process

- Using the face model and the values from the

normalized template matching, the measurement

noise covariance matrix can be estimated making

the Kalman filter rely on some measurements more

than others. - Note that this also tells the Kalman filter in

which directions in the image the measurements

are reliable. For example, a feature point on an

edge (e.g. the mouth outline) can reliably be

placed in the direction perpendicular to the edge

but less reliably along the edge.

Example Tracking Process

Patches from the rendered image (lower left) are

matched with the incoming video. The

two-dimensional feature point trajectories are

fed through the structure from motion (SfM)

extended Kalman filter, which estimates the pose

information needed to render the next model view.

For calrity, only 4 of 24 patches are shown.

Tracking Results

Tracking results on two test sequences. Every

tenth frame is shown.

Example Tracking Results

- The initial test shows that the system is able to

track a previously unseen person in a

subjectively - accurate way. Some important issues to be

addressed are- - Speed Can the system run in real time?
- Robustness Can the system cope with varying

illumination, facial expressions, and large head

motion? Apparently, track is sometimes lost. One

way to increase robustness is to combine the

tracker with a feature-based step. To improve

robustness to varying illumination conditions, an

illumination basis could be added to the texture

parameterization. - Accuracy How accurate is the tracking? Ahlberg

and Forchheimer describe a system that tracks a

synthetic sequence where the true parameters are

known.

Example Tracking Optimization

- To optimize the algorithm, three potentially

time-consuming parts within each iteration need

to - be taken care of
- Shape normalization Using dedicated graphics

hardware for texture mapping or performing

certain parts of the computation offline. - Analysis-synthesis The projection of the

shape-normalized input image onto the texture

modes and generation of the model texture has a

complexity that grows linearly with the number of

texture modes used. - Residual image and update vector computation The

complexity grows linearly with the number of

parameters to extract. However, it can be

performed very quickly y exploiting the vector

instructions available in many modern CPUs.

Conclusions for Face Tracking

- Whereas motion-based trackers may suffer from

drifting, model-based trackers do not have that

problem. - Appearance and feature-based trackers follow

different basic principles and have different

characteristics.

Part III

- Modeling Shape and Changes in the Texture
- Parametric Face Modeling and Tracking
- Illumination Modeling

Illumination Modeling

- Changes in lighting can produce large variability

in the - appearance of faces. One way to measure the
- difficulties presented by lighting, or any

variability, is - the number of degrees of freedom needed to

describe it. - For example, the pose of a face relative to the
- camera has six degrees of freedom three

rotations, - and three translations. Facial expression has

tens - of degrees of freedom if one considers the number

of - muscles that may contract to change expression.

Illumination Modeling

- To describe the light that strikes a face, we

must describe the intensity of light hitting each

point on the face from each direction. - Light is a function of position and direction,

meaning that light has an infinite number of

degrees of freedom. However, effective systems

can account for the effects of lighting using

fewer than 10 degrees of freedom. This can have

considerable impact on the speed and accuracy of

recognition systems. - Support for low-dimensional models is both

empirical and theoretical. Principal Component

Analysis (PCA) on images of a face obtained under

various lighting conditions shows that this image

set is well approximated by a low-dimensional,

linear subspace of the space of all images.

Experimentation shows that algorithms that take

advantage of this observation can achieve high

performance.

Illumination Modeling

- An alternate stream of work attempts to

compensate for lighting effects without the use

of 3D face models. This work matches directly 2D

images using representations of images that are

found to be insensitive to lighting variations. - These include image gradients, Gabor jets, the

direction of image gradients and projections to

subspaces derived from linear discriminants. - These methods are certainly of interest,

especially for applications in which 3D face

models are not available. However, methods based

on 3D models may be more powerful, as they have

the potential to compensate completely for

lighting changes, whereas 2D methods cannot

achieve such invariance.

Illumination Modeling

- Building truly accurate models of the way the

face reflects light is a complex task. This is in

part because skin is not homogeneous light

striking the face may b reflected by oils or

water on the skin, by melanin in the epidermis,

or by hemoglobin in the dermis. - Based on empirical measurements of skin,

Marschner et al. state The BRDF (Bidirectional

Reflectance Distribution Function) itself is

quite unusual at small incidence angles it is

almost Lambertian, but at higher angles strong

forward scattering emerges. - Furthermore, light entering the skin at one

point may scatter below the surface of the skin,

and exit from another point. This phenomenon,

known as subsurface scattering, cannot be modeled

by a bidirectional reflectance function (BRDF),

which assumes that light leaves a surface from

the point that it strikes it. Jensen et al.

presented one model of subsurface scattering.

Illumination Modeling

- For purposes of realistic computer graphics, this

complexity must be confronted in some way. For

example, Borshukov and Lewis reported that in The

Matrix Reloaded, they began by modeling face

reflectance using a Lambertian diffuse component

and a modified Phong model to account for a

Fresnel-like effect. As production progressed it

became increasingly clear that realistic skin

rendering couldnt be achieved without subsurface

scattering simulations.

Illumination Modeling

- However, simpler models may be adequate for face

recognition. This suggests that even if one

wishes to model face reflectance more accurately,

simple models may provide useful, approximate

algorithms that can initialize more complex ones.

- Thus, one can discuss analytically derived

representation of the images produced by a

convex, Lambertian object illuminated by distant

light sources. One can also restrict

consideration to convex objects so we can ignore

the effect of shadows cast by one part of the

object on another part of it. - One can also assume that the surface of the

object reflects light according to Lamberts law,

which states that materials absorb and reflect it

uniformly in all directions.

Illumination Modeling

- Other researchers (Z. Zhang, Microsoft Research)

deal with Face Re-Lighting from a Single Image

under Harsh Lighting Conditions and modeling

synthetic illumination/reflection conditions. - Left- real image, right synthetic image

Conclusions

- This lecture presented topics in
- Modeling Shape and Changes in the Texture (2D

Modeling ) - Parametric Face Modeling and Tracking (3D

Modeling ) - Illumination Modeling