Modeling Facial Shape and Appearance - PowerPoint PPT Presentation

About This Presentation

Modeling Facial Shape and Appearance


Modeling Facial Shape and Appearance. Shape and Changes in the Texture ... To build models of facial appearance and its variation one can adopt a ... – PowerPoint PPT presentation

Number of Views:319
Avg rating:3.0/5.0
Slides: 76
Provided by: hom4267


Transcript and Presenter's Notes

Title: Modeling Facial Shape and Appearance

Modeling Facial Shape and Appearance
  • Shape and Changes in the Texture
  • Parametric Face Modeling and Tracking
  • Illumination Modeling

  • Modeling Shape and Changes in the Texture
  • Parametric Face Modeling and Tracking
  • Illumination Modeling

Modeling Facial Shape and Appearance
  • To interpret images of faces, it is important to
    have a
  • model of how the face can appear.
  • Changes can be broken down into two
  • parts changes in shape and changes in texture
  • (patterns of pixel values) across the face.
  • The lecture describes a powerful method of
    generating compact models of shape and texture
    variation and describe how such models can be
    used to interpret images of faces.

Statistical Shape Analysis
Statistical Shape Analysis
  • Statistical shape analysis is a geometrical
    analysis from a set of shapes in which statistics
    are measured to describe geometrical properties
    from similar shapes or different groups, for
    instance, the difference between face or hand


Example -Hands
  • Training set
  • By varying the first three parameters of the
    shape vector, one at a time, one can demonstrate
    some of the modes of variation allowed by the
    model (http//
  • Each row obtained by varying a parameter and
    fixing others at zero

  • Modeling Shape and Changes in the Texture
  • Statistical Models (Appearance, Shape)
  • Procrustes analysis for aligning set of shapes
  • Statistical Models of Variation and Texture
  • Fitting model to new points
  • Active Shape Models
  • Parametric Face Modeling and Tracking
  • Illumination Modeling

Statistical Models of Appearance
  • To build models of facial appearance and its
    variation one can adopt a statistical approach,
    learning the ways in which the shape and texture
    of the face vary across a range of images.
  • The method relies on obtaining a suitably large,
    representative training set of facial images,
    each of which is annotated with a set of feature
    points defining correspondences across the set.
  • The positions of the feature points are used to
    define the shape of the face and are analyzed to
    learn the ways in which the shape can vary.
  • The patterns of intensities are then analyzed to
    learn the ways in which the texture can vary.

Statistical Shape Models
  • Building a statistical model requires a set of
    training images. The set should be chosen so it
    covers the types of variation one wish the model
    to represent.
  • For instance, if we are interested only in faces
    with neutral expressions, we should include only
    neutral expressions in the model.
  • If however, we wish to be able to synthesize and
    recognize a range of expressions, the training
    set should include images of people smiling,
    frowning, winking and so on.

Statistical Shape Models
  • Another approach s that each face must be
    annotated with a set of points defining the key
    facial features. These points are used to define
    the correspondences across the training set and
    represent the shape of the face in the image.
    Thus the same number of points should be placed
    on each image and with the same set of labels.
  • The number of such points can be varied from a
    few to a few thousands and they can be 2D or 3D

Example of 68 points defining facial features.
Aligning Sets of Shapes
  • There is considerable literature on methods of
  • shapes into a common coordinate frame, the most
  • popular approach being Procrustes analysis. The
  • transforms of each shape in a set, xi, so the sum
  • squared distances of the shape to the mean
  • is minimized. It is poorly defined unless
    constraints are
  • placed on the alignment of the mean (for
  • ensuring that it is centered on the origin and
    has unit
  • scale and some fixed but arbitrary orientation).

Procrustes Analysis
  • Procrustes analysis is a form of statistical
    shape analysis used to analyse the distribution
    of a set of shapes. Procrustes refers to a
    character from Greek mythology who made his
    victims fit his bed either by stretching their
    limbs or cutting them off.
  • Here we consider objects made up from a finite
    number k of points in n dimensions. The shape of
    object can be considered as a member of an
    equivalence class formed by removing the
    translational, rotational and scaling components.
  • For example, translational components can be
    removed from an object by translating the object
    so that the mean of all the points lies at the
  • Likewise the scale component can be removed by
    scaling the object so that the sum of the squared
    distances from the points to the origin is 1
    (s-size). The process finds the size of the
  • and dividing the points by the scale giving

Procrustes Analysis
  • Removing the rotational component is more
    complex. Consider two objects with scale and
    translation removed. Fix one of these and rotate
    the other around the origin so that the sum of
    the squared distances between the points is
    minimised. A rotation by angle gives
  • The Procrustes distance is
  • The distance can be minimised by using a least
    squares technique to find the angle ? which gives
    the minimum distance.

Iterative Aligning Sets of Shapes
Statistical Models of Variation
  • Suppose we have s sets of n points xi in d
    dimensions (usually two or three) that are
    aligned into a common coordinate frame.
  • These vectors form a distribution in nd
    dimensional space. If we can model this
    distribution, we can generate new examples
    similar to those in the original training set,
    and we can examine new shapes to determine if
    they are plausible examples.

Statistical Models of Variation
  • The approach is as follows
  • Compute the mean of the data
  • Compute the covariance of the data
  • Compute the eigenvectors Fi and corresponding
    eigenvalues ?i of S (sorted so ?i ?i 1).
    Efficient methods of computing the eigenvectors
    and values exist for the case in which there are
    fewer samples than dimensions in the vectors.

Face Shape Variation
  • The figure shows the first two most significant
    modes of face
  • shape variation of a model built from examples of
    a single
  • individual with different viewpoints and
    expressions. The model has
  • learned that the 2D shape change caused by 3D
    head rotation
  • causes the largest shape change.

Two modes of a face shape model (parameters
varied by 2s from the mean).
Statistical Models of Texture
  • To build a statistical model of the texture
    (intensity or color over an image patch) one can
    warp (modify) each example image so its feature
    points match a reference shape (typically the
    mean shape).
  • The warping can be achieved by using any
    continuous deformation, such as piece-wise affine
    using a triangulation of the region or an
    interpolating spline. Warping to a reference
    shape removes spurious texture variation due to
    shape differences that would occur if we simply
    performed eigenvector decomposition on the
    un-normalized face patches (as in the eigenface
  • The intensity information is sampled from the
    shape-normalized image over the region covered by
    the mean shape to form a texture vector gim.
  • Although he main shape changes due to smiling
    have been removed, there is considerable texture
    difference from a purely neutral face. By varying
    the elements of the texture parameter vector bg
    within limits learned from the training set, one
    can generate a variety of plausible
    shape-normalized face textures.

Example of a labeled face image and the face
patch warped into the mean shape.
Fitting the Model to New Points
  • Goal to find the best pose and shape parameters
    to match a model instance x to a new set of image
    points Y.
  • Minimizing the sum of squared distances between
    corresponding model and image points is
    equivalent to minimizing the expression
  • More generally, one can allow different weights
    for different points, S- shape transformation, b
    is a shape, and Phi is a function on shape.
  • If the allowed global transformation St(.) is
    more complex than a simple translation, this is a
    nonlinear equation with no analytic solution.
    However, a good approximation can be found
    rapidly using a two-stage iterative approach.
  • Solve for the pose parameters t assuming a fixed
    shape bs.
  • Solve for the shape parameters bs, assuming a
    fixed pose.
  • Repeat until convergence.

Active Shape Models (ASM)
  • We assume we have an initial estimate for the
    pose and shape parameters (eg the mean shape).
    This is iteratively updated as follows
  •   Look along normals through each model point to
    find the best local match for the model of the
    image appearance at that point (eg strongest
    nearby edge)
  •   Update the pose and shape parameters to best
    fit the model instance to the found points
  •   Repeat until convergence

Example of ASM failing
  • The figure demonstrates the Active Shape Model
    (ASM) failing. The main facial features have been
    found, but the local models searching for the
    edges of the face have failed to locate their
    correct positions, perhaps because they are too
    far away. The ASM is a local method and prone to
    local minima.

Example of ASM search failure. The search
profiles are not long enough to locate the edges
of the face.
Multiresolution Models
  • The performance can be significantly improved
    using a multi-resolution implementation, in which
    we start searching on a coarse level of a
    Gaussian image pyramid, and progressively refine
  • If a facial appearance model is trained on a
    sufficiently general set of data, it is able to
    synthesize faces similar to those in target
    images. If we can find the model parameters that
    generate a face similar to the target, those
    parameters imply the position of the facial
    features and can be used directly for face
  • Both models and update matrices can be estimated
    at a range of image resolutions (training on a
    Gaussian image pyramid). We can then use a
    Multiresolution search algorithm in which we
    start at a coarse resolution and iterate to
    convergence at each level before projecting the
    current Solution to the next level of the model.
    This is more efficient and can converge to the
    correct solution from further away than search at
    a single resolution.

Multiresolution Active Shape Models
  • To improve the efficiency and robustness of the
    algorithm, it can be implemented in a
    multiresolution framework.
  • This involves first searching for the object in a
    coarse image and then refining the location in a
    series of finer resolution images.
  • This leads to a faster algorithm and one that is
    less likely to get stuck on the wrong image
  • Local models for each point are trained on each
    level of a Gaussian image pyramid.
  • The Gaussian Pyramid is a hierarchy of low-pass
    filtered versions of the original image, such
    that successive levels correspond to lower

Search along sampled profile to find best fit of
gray-level model.
Example face modeling using acttive
multi-resolution method
Example of multi-resolution approach at highest
resolution. Left to right Initial, after 5
iterations, final model
(No Transcript)
(No Transcript)
(No Transcript)
  • Open questions regarding the models include
  • How does one obtain accurate correspondences
    across the training set?
  • What is the optimal choice of model size and
    number of model modes?
  • What representation of image structure should be
  • What is the best method for matching the model to
    the image?

  • Modeling Shape and Changes in the Texture
  • Parametric Face Modeling and Tracking
  • Definitions and samples of modern work
  • Previous work on face tracking
  • Methods for parametric face modeling
  • Tracking Strategies
  • Illumination Modeling

Parametric Face Modeling and Tracking
  • In the previous section, models for describing
    the (2D) appearance and geometry of faces were
  • Let us now look at three-dimensional models and
    how they are used for face tracking.
  • Whether we want to analyze a facial image (face
    detection, tracking, recognition) or synthesize
    one (computer graphics, face animation), we need
    a model for the appearance and/or structure of
    the human face.
  • Depending on the application, the model can be
    simple (e.g. just an oval shape) or complex (e.g.
    thousands of polygons in layers simulating bone
    and layers of skin and muscles).
  • We usually wish to control appearance, structure
    and motion of the model with a small number of
    parameters, chosen so as to best represent the
    variability likely to occur in the application.

Parametric Face Modeling and Tracking
  • When analyzing a sequence of images (or frames),
    showing a moving face, the model might describe
    not only the static appearance of the face but
    also its dynamic behavior (i.e. the motion).
  • To be able to execute any further analysis of a
    facial image (e.g. reconstruction), the position
    of the face in the image is helpful, as is the
    pose (i.e. the 3D position and orientation) of
    the face.
  • The process of estimating position and pose
    parameters from each frame in a sequence is
    called tracking.
  • In contrast to face detection, we can utilize the
    knowledge of position, pose and so on, of the
    face in the previous image in the sequence.
  • This section explains the basics of parametric
    face models used for face tracking as well as
    fundamental strategies and methodologies for

Face tracking in digital cameras
  • FotoNation Face Tracker
  • http//

Stereo Face tracking
  • Stereo tracking with two web cameras

Images captured by two cameras are used in self
Stereo Face tracking
  • Affordable 3D Face Tracking Using Projective
  • D.O. Gorodnichy, S. Malik, G. Roth Computational
    Video Group, Ottawa

The StereoTracker at work. The orientation and
scale of the virtual man (at the bottom right) is
controlled by the position of the observed face.
Realistic Face Reconstruction and 3D Face
  • INRIA MIRAGES Lab research (France)
  • In the very beginning the user creates, for each
    image, a camera which is then manually positioned
    in front of the image plane so that the
    projection of the generic model matches
    approximately the person's face on this image

Realistic Face Reconstruction and 3D Face
  • INRIA MIRAGES Lab research (France)
  • User manually positions key points on the image
  • Model is adapted to changes

Realistic Face Reconstruction and 3D Face
  • INRIA MIRAGES Lab research (France)
  • Bezier curves (green) drawn by the user and
    computer generated model silhouettes (red)
  • Reconstruction system interface (right)

Tracking through background
Cha Zhang (Microsoft Research) uses background
segmentation for face identification and tracking
Previous Work in Face Tracking
  • A plethora of face trackers are available in the
    literature. They differ in how they model the
    face, how they track changes from one frame to
    the next, if and how changes in illumination and
    structure are handled, if they are susceptible to
    drift, and if real- time performance is possible.
    The presentation here is limited to monocular
    systems (in contrast to stereo-vision) and 3D
  • Li et al. estimated face motion in a simple 3D
    model by a combination of prediction and a model
    based least-squares solution to the optical flow
    constraint equation.
  • LaCascia et al. used a cylindrical face model
    with a parameterized texture being a linear
    combination of texture warping templates and
    orthogonal illumination templates. The 3D head
    pose was derived by registering the texture map
    captured from the new frame with the model
    texture. Stable tracking was achieved via
    regularized, weighted least-squares minimization
    of the registration error.

Previous Work in Face Tracking
  • Malciu et al. used an ellipsoidal textured
    wireframe model and minimized the registration
    error and/or used the optical flow to estimate
    the 3D pose.
  • DeCarlo et al. used a sophisticated face model
    parameterized in a set of deformations. Rigid and
    nonrigid motion was tracked by integrating
    optical flow constraints and edge-based forces,
    thereby preventing drift.
  • Wiles et al. tacked a set of hyperpatches (i.e.
    representations of surface patches invariant to
    motion and changing lighting).
  • Gokturk et al. developed a two-stage approach for
    3D tracking of pose and deformations. The first
    stage learns the possible deformations of 3D
    faces by tracking stereo data. The second stage
    simultaneously tracks the pose and deformation of
    the face in the monocular image sequence using an
    optical flow formulation associated with the
    tracked features. A simple face model using 19
    feature points was utilized.
  • Ahlberg et al. represented the face using a
    deformable wireframe model with a statistical
    texture. The active appearance models were used
    to minimize the registration error. Because the
    model allows deformation, rigid and nonrigid
    motions are tracked.
  • Dornaika et al. extend the tracker with a step
    based on random sampling and consensus to improve
    the rigid 3D pose estimate.

Parametric Face Modeling
  • There are many ways to parameterize and model the
    appearance and
  • behavior of the human face. The choice depends
    on, among other things,
  • the application, the available resources, and the
    display device.
  • The many kinds of variability being
    modeled/parameterized include the
  • following
  • Three-dimensional motion and pose The dynamic,
    3D position and rotation of the head. Tracking
    involves estimating these parameters for each
    frame in the video sequence.
  • Facial action Facial feature motion such as lip
    and eyebrow motion.
  • Shape and feature configuration The shape of
    the head, face and the facial features (e.g.
    mouth, eyes). This could be estimated or assumed
    to be known by the tracker.
  • Illumination The variability in appearance due
    to different lighting conditions.
  • Texture and color The image pattern describing
    the skin.
  • Expression Muscular synthesis of emotions
    making the face look happy or sad, for example.

Parametric Face Modeling
  • Parametric Face Modeling and Tracking
  • Definitions and samples of current works
  • Previous work on face tracking
  • Methods for parametric face modeling
  • Eigenfaces
  • Facial Action Coding System
  • MPG-4 Facial Animation
  • Computer Graphics Models
  • Wireframe models
  • Projection models

PFM Eigenfaces
  • The space spanned by the eigenfaces is called the
    face space.
  • Unfortunately, the manifold (distribution) of
    facial images has a highly nonlinear structure.
  • For face tracking, it has been more popular to
    linearize the face manifold by warping the facial
    images to a standard pose and/or shape, thereby
    creating shape-free, geometrically normalized, or
    shape normalized images and eigenfaces (texture
    templates, texture modes) that can be warped to
    any face shape or texture-mapped onto a wireframe
    face model.

PFM Facial Action Coding System
  • During the 1960s and 1970s, a system for
  • parameterizing minimal facial actions was
    developed by
  • psychologists trying to analyze facial
    expressions. The
  • system was called the Facial Action Coding System
  • (FACS) and describes each facial expression as a
  • combination of around 50 action units (AUs). Each
  • represents the activation of one facial muscle.
  • The FACS has been popular tool not only for
  • studies but also for computerized facial
    modeling. There
  • are also other models available in the

FACS Level of Description
FACS itself is purely descriptive and includes no
inferential labels. By converting FACS codes to
EMFACS or similar systems, face images may be
coded for emotion-specified expressions as well
as for more molar categories of positive or
negative emotion.
PFM MPG-4 Facial Animation
  • MPEG-4, since 1999 an international standard for
    coding and representation of audiovisual objects,
    contains definitions of face model parameters.
    There are two sets of parameters facial
    definition parameters (FDPs), which describe the
    static appearance of the head, and facial
    animation parameters (FAPs), which describe the
  • The FAPs describe the motion of certain feature
    points, such as lip corners. Points on the face
    model not directly affected by the FAPs are then
    interpolated according to the face models own
    motion model, which is not defined by MPEG-4
    (complete face models can also be specified and
  • Typically, the FAP coefficients are used as morph
    target weights, provided the face model has a
    morph target for each FAP. The FDPs describe the
    static shape of the face by the 3D coordinates
    of each feature point (MPEG-4 defines 84 feature
    points) and the texture as an image with the
    corresponding texture coordinates.

PFM Computer Graphics Models
  • When synthesizing faces using computer graphics,
    the most common model is a wireframe model or a
    polygonal mesh. The face is then described as a
    set of vertices connected with lines forming
    polygons (usually triangles). The polygons are
    shaded or texture-mapped, and illumination is
    added. The texture could be parameterized or
    fixed in the latter case, facial appearance is
    changed by moving the vertices only.
  • To achieve life-like animation of the face, a
    large number (thousands) of vertices and polygons
    are commonly used. Each vertex can move in three
    dimensions, so the model requires a large number
    of degrees of freedom. To reduce this number,
    some kind of parameterization is needed.
  • A commonly adopted solution is to create a set of
    morph targets
  • and blend between them. A morph target is a
    predefined set of
  • vertex positions, where each morph target
    represents, for example,
  • a facial expression or a viseme.

PFM Wireframe Face Model
  • Candide is a simple face model that has been a
    popular research
  • tool for many years. It was originally created by
    Rydfalk and later
  • extended by Welsh to cover the entire head
    (Candide-2) and by
  • Ahlberg to correspond better to MPEG-4 facial
    animation (Candide-
  • 3). The simplicity of the model makes it a good
  • example.
  • Candide is a wireframe model with 113 vertices
    connected by lines
  • forming 184 triangular surfaces. The geometry
    (shape, structure) is
  • determined by the 3D coordinates of the vertices
    in a model-
  • centered coordinate system (x, y, z). To modify
    the geometry,
  • Candide-1 and Candide-2 implement a set of action
    units from
  • FACS. Each action is implemented as a list of
    vertex displacements,
  • an action unit vector, describing the change in
    face geometry when
  • the action unit is fully activated.

PFM Projection Models
There are general projection models representing
the camera. (Parameters may be known to calibrate
camera) or unknown (uncalibrated). Skewness
and rotation can sometime play role as
well. Perspective projection and weak
perspective projection (an approximation of
perspective projection where depth variation is
small) are used.
Example of CMU head tracking
Example of the CMU S2 3D head tracking, including
re-registration after losing the head.
  • Parametric Face Modeling and Tracking
  • Definitions and samples of current works
  • Previous work on face tracking
  • Methods for parametric face modeling
  • Tracking Strategies
  • Motion-based and Model-based
  • Classification
  • First-frame
  • Statistical
  • Appearance based
  • Feature based
  • Example of first-frame model-based and
    feature-based tracker
  • Conclusions on face tracking

Tracking Strategies
  • A face tracking system estimates the rigid or
    nonrigid motion of a face through a sequence of
    image frames.
  • Tracking systems can be said to be either
    motion-based or model-based, sometimes referred
    to as feed-forward or feed-back motion

Motion-based tracker
  • A motion-based tracker estimates the
    displacements of pixels (or blocks of pixels)
    from one frame to another. The displacements
    might be estimated using optical flow methods
    (giving a dense optical flow field), block-based
    motion estimation methods (giving a sparse field
    but using less computation power), or motion
    estimation in a few image patches only (giving a
    few motion vectors but at a very low
    computational cost).
  • The estimated motion field is then used to
    compute the motion of the object model. The
    motion estimation in such a method is
    consequently dependent on the pixels in two
    frames the object model is used only for
    transforming the 2D motion vectors to 3D object
    model motion. The problem with such methods is
    the drifting or the long sequence motion problem.
    A tracker of this kind accumulates motion errors
    and eventually loses track of the face.

Model-based trackers
  • A model-based tracker, on the other hand, uses a
    model of the objects appearance and tries to
    change the object models pose (and possibly
    shape) parameters to fit the new frame. The
    motion estimation is thus dependent on the object
    model and the new frame the old frame is not
    regarded except for constraining the search
  • Such a tracker does not suffer from drifting
    instead, problems arise when the model is not
    strong or flexible enough to cope with the
    situation in the new frame.

First-frame Model-based Trackers
  • In general, the word model refers to any prior
    knowledge about
  • the 3D structure, the 3D motion/dynamics and the
    2D facial
  • appearance.
  • First-frame models One of the main issues when
    designing a model-based tracker is the appearance
    model. An obvious approach is to capture a
    reference image of the object at the beginning of
    the sequence.
  • The image could then be geometrically transformed
    according to
  • the estimated motion parameters, so one can
    compensate for
  • changes in scale and rotation (and possibly
    nonrigid motion).
  • Because the image is captured, the appearance
    model is
  • deterministic, object-specific and accurate.

Statistical-based Model-based Trackers
  • A drawback with such a first-frame model is the
    lack of flexibility
  • it is difficult to generalize from one sample
    only. Another property is that the tracker does
    not know what it is tracking.
  • A different approach is a statistical model-based
    tracker. Here, the
  • appearance model relies on previously captured
    images combined
  • with knowledge of which parts or positions of the
  • correspond to the various facial features. When
    the model is
  • transformed to fit the new frame, we thus obtain
    information about
  • the estimated positions of those specific facial

Appearance-based and Feature-based Tracking
  • The problem of finding the optimal parameters is
    a high-dimensional search problem and thus of
    high computational complexity. By using clever
    heuristics (e.g., the active appearance models),
    we can reduce the search time.
  • An appearance-based or featureless tracker
    matches a model of the entire facial appearance
    with the input image, trying to exploit all
    available information in the model as well as the
  • A feature-based tracker, on the other hand,
    chooses a few facial features that are,
    supposedly, easily and robustly tracked. Features
    such as color, specific points or patches, and
    edges can be used.
  • Typically, a tracker is based on the feature
    points tries, in the rigid motion case, to
    estimate the 2D position of a set of points and
    from these points compute the 3D pose of the face.

EXAMPLE Feature-based Tracking
  • The tracker described next tracks a set of
    feature points in an image sequence and uses the
    2D measurements to calculate the 3D structure and
    motion of the head.
  • The tracker is based on the structure from motion
    (SFM) algorithm by Azerbayejani and Pentland. The
    face tracker was then developed by Jebara and
    Pentland and further by Strom et al.
  • The tracker estimates the 3D pose and structure
    of a rigid object as well as the cameras focal
    length. With the terminology above, it is
    first-frame model-based and feature-based tracker.

EXAMPLE Face Model Parameterization
  • The tracker designed by Jebara and Pentland
    estimated a model as a set of points with no
    surface. Strom et al. extended the system to
    include a wireframe face model. A set of feature
    points are placed on the surface of the model,
    not necessarily coinciding with the model
    vertices. The face model gives the system several
    advantages, such as being able to predict the
    surface angle relative to the camera as well as
    self-occlusion. Thus the tracker can predict when
    some measurements should not be trusted. The face
    model used by Strom was a modified version of
  • The pose in the kth frame is parameterized with
    three rotation angles (rx, ry, rz), three
    translation parameters (tx, ty, tz), and the
    inverse focal length F 1/f of the camera. In
    practice, the z-translation should be
    parameterized by ? tz F instead of tz for
    stability reasons.
  • The structure of the face is represented by the
    image coordinates (u0, v0) and the depth values
    z0 of the feature points in the first frame.

Example Extended Kalman Filtering and Structure
from Motion
  • A Kalman filter is used to estimate the dynamic
    changes of a state vector of which only a
    function can be observed. When the function is
    nonlinear, we must use an extended Kalman filter
  • The tracker must be initialized, for example, by
    letting the user place his head in a certain
    position and with the face toward the camera or
    by using a face detection algorithm. The model
    texture is captured from the image and stored as
    a reference, and feature points are automatically
    extracted. To select feature points that could be
    reliably tracked, points where the determinant of
    the Hessian
  • is large are used. The determinant is weighted
    with the cosine of the angle between the model
    surface normal and the camera direction. The
    number of feature points to select is limited
    only by the available computational power and the
    realtime requirements. At least seven points are
    needed for the tracker to work, and more are
    preferable. Strom used 24 feature points and was
    able to achieve realtime performance.

Example Tracking Process
  • Using the face model and the values from the
    normalized template matching, the measurement
    noise covariance matrix can be estimated making
    the Kalman filter rely on some measurements more
    than others.
  • Note that this also tells the Kalman filter in
    which directions in the image the measurements
    are reliable. For example, a feature point on an
    edge (e.g. the mouth outline) can reliably be
    placed in the direction perpendicular to the edge
    but less reliably along the edge.

Example Tracking Process
Patches from the rendered image (lower left) are
matched with the incoming video. The
two-dimensional feature point trajectories are
fed through the structure from motion (SfM)
extended Kalman filter, which estimates the pose
information needed to render the next model view.
For calrity, only 4 of 24 patches are shown.
Tracking Results
Tracking results on two test sequences. Every
tenth frame is shown.
Example Tracking Results
  • The initial test shows that the system is able to
    track a previously unseen person in a
  • accurate way. Some important issues to be
    addressed are-
  • Speed Can the system run in real time?
  • Robustness Can the system cope with varying
    illumination, facial expressions, and large head
    motion? Apparently, track is sometimes lost. One
    way to increase robustness is to combine the
    tracker with a feature-based step. To improve
    robustness to varying illumination conditions, an
    illumination basis could be added to the texture
  • Accuracy How accurate is the tracking? Ahlberg
    and Forchheimer describe a system that tracks a
    synthetic sequence where the true parameters are

Example Tracking Optimization
  • To optimize the algorithm, three potentially
    time-consuming parts within each iteration need
  • be taken care of
  • Shape normalization Using dedicated graphics
    hardware for texture mapping or performing
    certain parts of the computation offline.
  • Analysis-synthesis The projection of the
    shape-normalized input image onto the texture
    modes and generation of the model texture has a
    complexity that grows linearly with the number of
    texture modes used.
  • Residual image and update vector computation The
    complexity grows linearly with the number of
    parameters to extract. However, it can be
    performed very quickly y exploiting the vector
    instructions available in many modern CPUs.

Conclusions for Face Tracking
  • Whereas motion-based trackers may suffer from
    drifting, model-based trackers do not have that
  • Appearance and feature-based trackers follow
    different basic principles and have different

Part III
  • Modeling Shape and Changes in the Texture
  • Parametric Face Modeling and Tracking
  • Illumination Modeling

Illumination Modeling
  • Changes in lighting can produce large variability
    in the
  • appearance of faces. One way to measure the
  • difficulties presented by lighting, or any
    variability, is
  • the number of degrees of freedom needed to
    describe it.
  • For example, the pose of a face relative to the
  • camera has six degrees of freedom three
  • and three translations. Facial expression has
  • of degrees of freedom if one considers the number
  • muscles that may contract to change expression.

Illumination Modeling
  • To describe the light that strikes a face, we
    must describe the intensity of light hitting each
    point on the face from each direction.
  • Light is a function of position and direction,
    meaning that light has an infinite number of
    degrees of freedom. However, effective systems
    can account for the effects of lighting using
    fewer than 10 degrees of freedom. This can have
    considerable impact on the speed and accuracy of
    recognition systems.
  • Support for low-dimensional models is both
    empirical and theoretical. Principal Component
    Analysis (PCA) on images of a face obtained under
    various lighting conditions shows that this image
    set is well approximated by a low-dimensional,
    linear subspace of the space of all images.
    Experimentation shows that algorithms that take
    advantage of this observation can achieve high

Illumination Modeling
  • An alternate stream of work attempts to
    compensate for lighting effects without the use
    of 3D face models. This work matches directly 2D
    images using representations of images that are
    found to be insensitive to lighting variations.
  • These include image gradients, Gabor jets, the
    direction of image gradients and projections to
    subspaces derived from linear discriminants.
  • These methods are certainly of interest,
    especially for applications in which 3D face
    models are not available. However, methods based
    on 3D models may be more powerful, as they have
    the potential to compensate completely for
    lighting changes, whereas 2D methods cannot
    achieve such invariance.

Illumination Modeling
  • Building truly accurate models of the way the
    face reflects light is a complex task. This is in
    part because skin is not homogeneous light
    striking the face may b reflected by oils or
    water on the skin, by melanin in the epidermis,
    or by hemoglobin in the dermis.
  • Based on empirical measurements of skin,
    Marschner et al. state The BRDF (Bidirectional
    Reflectance Distribution Function) itself is
    quite unusual at small incidence angles it is
    almost Lambertian, but at higher angles strong
    forward scattering emerges.
  • Furthermore, light entering the skin at one
    point may scatter below the surface of the skin,
    and exit from another point. This phenomenon,
    known as subsurface scattering, cannot be modeled
    by a bidirectional reflectance function (BRDF),
    which assumes that light leaves a surface from
    the point that it strikes it. Jensen et al.
    presented one model of subsurface scattering.

Illumination Modeling
  • For purposes of realistic computer graphics, this
    complexity must be confronted in some way. For
    example, Borshukov and Lewis reported that in The
    Matrix Reloaded, they began by modeling face
    reflectance using a Lambertian diffuse component
    and a modified Phong model to account for a
    Fresnel-like effect. As production progressed it
    became increasingly clear that realistic skin
    rendering couldnt be achieved without subsurface
    scattering simulations.

Illumination Modeling
  • However, simpler models may be adequate for face
    recognition. This suggests that even if one
    wishes to model face reflectance more accurately,
    simple models may provide useful, approximate
    algorithms that can initialize more complex ones.
  • Thus, one can discuss analytically derived
    representation of the images produced by a
    convex, Lambertian object illuminated by distant
    light sources. One can also restrict
    consideration to convex objects so we can ignore
    the effect of shadows cast by one part of the
    object on another part of it.
  • One can also assume that the surface of the
    object reflects light according to Lamberts law,
    which states that materials absorb and reflect it
    uniformly in all directions.

Illumination Modeling
  • Other researchers (Z. Zhang, Microsoft Research)
    deal with Face Re-Lighting from a Single Image
    under Harsh Lighting Conditions and modeling
    synthetic illumination/reflection conditions.
  • Left- real image, right synthetic image

  • This lecture presented topics in
  • Modeling Shape and Changes in the Texture (2D
    Modeling )
  • Parametric Face Modeling and Tracking (3D
    Modeling )
  • Illumination Modeling
Write a Comment
User Comments (0)