A Nonobtrusive Head Mounted Face Capture System - PowerPoint PPT Presentation

About This Presentation
Title:

A Nonobtrusive Head Mounted Face Capture System

Description:

Can process in real-time. Simple and user-friendly system. Static with respect to human head ... Euclidean distance between two points i and j is given by ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 55
Provided by: cse58
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Nonobtrusive Head Mounted Face Capture System


1
A Non-obtrusive Head Mounted Face Capture System
Chandan K. ReddyMasters Thesis Defense
  • Thesis Committee

Dr. George C. Stockman (Main Advisor) Dr. Frank
Biocca (Co-Advisor) Dr. Charles Owen Dr. Jannick
Rolland (External Faculty)
2
Modes of Communication
  • Text only - e.g. Mail, Electronic Mail
  • Voice only e.g. Telephone
  • PC camera based conferencing e.g. Web cam
  • Multi-user Teleconferencing
  • Teleconferencing through Virtual Environments
  • Augmented Reality Based Teleconferencing

3
Face-to-Face Communication
There is no landscape that we know as well as
the human face. The twenty-five-odd square inches
containing the features is the most intimately
scrutinized piece of territory in existence,
examined constantly, and carefully, with far more
than an intellectual interest. - by Gary
Faigin.
A well developed Face-to Face Communication
System will advance the state-of-art
teleconferencing systems
Face-to Face Communication System is part of
Teleportal project that is being developed in
MIND Lab at Michigan State University and ODA Lab
at University of Central Florida.
4
Problem Definition
  • Face Capture System ( FCS )
  • Virtual View Synthesis
  • Depth Extraction and 3D Face Modeling
  • Head Mounted Projection Displays
  • 3D Tele-immersive Environments
  • High Bandwidth Network Connections

5
Thesis Contributions
  • Complete hardware setup for the FCS.
  • Camera-mirror parameter estimation for the
    optimal configuration of the FCS.
  • Generation of quality frontal videos from two
    side videos
  • Reconstruction of texture mapped 3D face model
    from two side views
  • Evaluation mechanisms for the generated frontal
    views.

6
Existing Face Capture Systems

FaceCap3d - a product from Standard Deviation
Optical Face Tracker a product from Adaptive
Optics
Courtesy
Advantages Freedom for Head Movements Drawbacks
Obstruction of the users Field of view Main
Applications Character Animation and Mobile
environments
7
Existing Face Capture Systems

Courtesy
Sea of Cameras (UNC Chappel Hill)
National tele-immersion Initiative
Advantages No burden for the user Drawbacks
Highly equipped environments and restricted head
motion Main Applications Teleconferencing and
Collaborative work
8
Virtual View Synthesis
  • View Interpolation for Image Synthesis by Chen
    and Williams 93
  • View Morphing by Seitz and Dyer 96
  • The Lumigraph by Gortler et al 96
  • Light Field Rendering by Levoy and Hanrahan
    96
  • Stereo based View Synthesis by Kanade et al
    99
  • Dynamic View Morphing by Manning and Dyer 99
  • Spatio-Temporal View Interpolation by Vedula
    and Kanade 02

9
Depth Extraction and Face Modeling
  • Depth Extraction
  • Structured Light
  • Shape from Shading
  • Structure from Stereo
  • Structure from Motion
  • Face Modeling
  • A parametric model of human faces Parke 74
  • 3D individualized head model from orthogonal
    views - Ip and Yin 96
  • Realistic facial expressions synthesized from
    photographs - Pighin et al 98
  • Face model from a video sequence of face images -
    Lai and Cheng 01

10
Head Mounted Displays and Tele-immersive
Environments
  • Head Mounted Displays Ivan Sutherland 68
  • VIDEOPLACE Kruger 85
  • CAVES Cruz Neira 93
  • Teleconferencing using a Sea of Cameras Fuchs
    et al 94
  • Head Mounted Projective Displays - Fischer 96
  • Degenerate CAVES (Immersa Desk, Immersive Work
    Bench) Czernuszenko et al 97
  • Office of the Future Raskar et al 98
  • MAGIC BOOK Billinghurst et al 01
  • Mobile Displays Feiner 02

11
Proposed Face Capture System
(F. Biocca and J. P. Rolland, Teleportal
face-to-face system, Patent Filed, 2000.)
Novel Face Capture System that is being
developed. Two Cameras capture the corresponding
side views through the mirrors
12
Advantages
  • Users field of view is unobstructed
  • Portable and easy to use
  • Gives very accurate and quality face images
  • Can process in real-time
  • Simple and user-friendly system
  • Static with respect to human head
  • Flipping the mirror cameras view the users
    viewpoint

13
Applications
  • Mobile Environments
  • Collaborative Work
  • Multi-user Teleconferencing
  • Medical Areas
  • Distance Learning
  • Gaming and Entertainment industry
  • Others

14
System Design
15
Equipment Required
16
Transmission using Internet2
  • Over 190 universities working in partnership with
    industry to develop Internet2
  • Internet2 connections are capable of transmitting
    full broadcast quality video streams between
    remote collaborative sites using MPEG2 video
    encoding and decoding technology
  • Suitable for High Band width channel applications
    like Medical visualization, Tele-conferencing and
    other applications that make use of enormous
    amount of data
  • The Internet 2 test bed has been established
    between the MIND Lab at Michigan State University
    and ODA Lab at University of Central Florida
    implemented using MPEG 2 video streams

17
Optical Layout
  • Three Components to be considered
  • Camera
  • Mirror
  • Human Face

18
Specification Parameters
  • Camera
  • Sensing area 3.2 mm X 2.4 mm (¼).
  • Pixel Dimensions Image sensed is of dimensions
    768 X 494 pixels. Digitized image size is 320 X
    240 due to restrictions of the RAM size.
  • Focal Length(Fc) 12 mm (VCL 12UVM).
  • Field of View (FOV) 15.2 0 X 11.4 0.
  • Diameter (Dc) 12mm
  • Fnumber (Nc) 1 -achieve maximum lightness.
  • Minimum Working Distance (MWD)- 200 mm.
  • Depth of Field (DOF) to be estimated

19
Specification Parameters (Contd.)
  • Mirror
  •  Diameter (Dm) / Fnumber (Nm)
  • Focal Length (fm)
  • Magnification factor (Mm)
  • Radius of curvature (Rm)
  • Human Face
  • Height of the face to be captured (H 250mm)
  • Width of the face to be captured (W 175 mm)
  • Distances
  • Distance between the camera and the mirror.
    (Dcm150mm)
  • Distance between the mirror and the face. (Dmf
    200mm)

20
Estimation of the variable parameters
The Imaging Equation
The Diameter of the mirror Dm 26.3 2 /
(10.16 N)
21
Optimal Design Calculations
22
Customization of Cameras and Mirrors
  • Off-the-shelf cameras
  • Customizing camera lens is a tedious task
  • Trade-off has to be made between the field of
    view and the depth of field
  • Sony DXC LS1 with 12mm lens is suitable for our
    application
  • Custom designed mirrors
  • A plano-convex lens with 40mm diameter is coated
    with black on the planar side.
  • The radius of curvature of the convex surface is
    155.04 mm.
  • The thickness at the center of the lens is 5 mm.
  • The thickness at the edge is 3.7 mm.

23
Block diagram of the system
24
Experimental setup
25
Virtual Video Synthesis
26
Problem Statement
Generating virtual frontal view from two side
views
27
Data processing
  • Two synchronized videos are captured in real-time
    (30 frames/sec) simultaneously.
  • For effective capturing and processing, the data
    is stored in uncompressed format.
  • Machine Specifications (Lorelei _at_
    metlab.cse.msu.edu)
  • Pentium III processor
  • Processor speed 746 MHz
  • RAM Size 384 MB
  • Hard Disk write Speed (practical) 9 MB/s
  • MIL-LITE is configured to use 150 MB of RAM

28
Data processing (Contd.)
  • Size of 1 second video 30 320 240 3
  • 6.59 MB
  • Using 150 MB RAM, only 10 seconds video from two
    cameras can be captured
  • Why does the processing have to be offline?
  • Calibration procedure is not automatic
  • Disk writing speed must be at least 14 MB/S.
  • To capture 2 videos of 640 480 resolution, the
    Disk writing speed must be at least 54 MB/S ???

29
Structured Light technique
Projecting a grid on the frontal view of the face
A square grid in the frontal view appears as a
quadrilateral (with curved edges) in the real
side view
30
Color Balancing
  • Hardware based approach
  • White balancing of the cameras
  • Why this is more robust ? why not software
    based ?
  • There is no change in the input camera
  • Better handling of varying lighting conditions
  • No pre - knowledge of the skin color is required
  • No additional overhead
  • Its enough if both cameras are color balanced
    relatively

31
Off-line Calibration Stage
Left Calibration Face Image
Right Calibration Face Image
Projector
Transformation Tables
32
Calibration Procedure
  • Capture the two side views with a grid projected
    on the face from the two cameras placed near two
    ears and store them in the corresponding images
    (ILs,t and IRu,v) .
  • Take some grid intersection points and define
    transform functions for determining the (s,t)
    coordinates in the left image (IL) and (u,v)
    coordinates in the right image (IR).
  • Apply bilinear interpolation technique to obtain
    any points inside the grid coordinates.
  • Based on transformation functions construct two
    transformation tables (one for the left image and
    one for the right) which have index as (x,y) and
    gives a corresponding (s,t) of IL and (u,v) of IR.

33
Operational Stage
Right Face Image
Left Face Image
Transformation Tables
Right Warped Face Image
Left Warped Face Image
Mosaiced Face Image
34
Generation of Virtual Frontal Views
  • Get the two side views without a grid projected
    on the face from the two cameras placed near two
    ears (IL and IR).
  • Generate the (x,y) coordinate in the virtual
    view, move to the corresponding location in the
    transformation table and store the mapping
    (Mpx,y) at that pixel value.
  • Reconstruct the (x,y) coordinates of the frontal
    view (image V) with the help of Mpx,y and the
    values of ILs,t and IRu,v.
  • Smooth the geometrical and lighting variations
    across the vertical midline in V by applying a
    linear (one-dimensional) filter.
  • Continue this reconstruction of Vx,y for every
    frame of the videos to produce the final virtual
    frontal video.

35
Bilinear Mapping
  • To get the corresponding (u,v) point inside the
    quadrilateral

Computed by linearly interpolating by fraction of
u along the top and bottom edges of the
quadrilateral, and then linearly interpolating by
fraction v between the two interpolated points to
yield destination point
36
Virtual video synthesis (Calibration phase)
37
Virtual video synthesis (contd.)
38
Virtual Frontal Video
39
Comparison of the Frontal Views
First row Virtual frontal views Second row
Original frontal views
40
Video Synchronization (Eye blinking)
First row Virtual frontal views Second row
Original frontal views
41
Face Data through Head Mounted System
42
3D Face Model
43
Coordinate Systems
  • There are five coordinate systems in our
    application
  • World Coordinate System (WCS)
  • Face Coordinate System (FCS)
  • Left Camera Coordinate system (LCCS)
  • Right Camera Coordinate system (RCCS)
  • Projector Coordinate System (PCS)

44
Camera Calibration
  • Conversion from 3D world coordinates to 2D camera
    coordinates - Perspective Transformation Model

Eliminating the scale factor uj (c11 c31
uj) xj (c12 c32 uj) yj (c13 c33 uj) zj
c14 vj (c21 c31 vj) xj (c22 c32 vj) yj
(c23 c33 vj) zj c24
45
Calibration sphere
  • A sphere can be used for Calibration
  • Calibration points on the sphere are chosen in
    such a way that the
  • Azimuthal angle is varied in steps of 45o
  • Polar angle is varied in steps of 30o
  • The location of these calibration points is known
    in the 3D coordinate System with respect to the
    origin of the sphere
  • The origin of the sphere defines the origin of
    the World Coordinate System

46
Spherical to Cartesian coordinates
  • The 3D coordinates are known in the spherical
    coordinate system
  • The 3D location (Px, Py, Pz) in cartesian
    coordinate system is defined as
  • (R, ?, ?) in the spherical coordinate system
  • R Radius of the Sphere
  • ? - Azimuthal angle in the xy-plane from the
    x-axis.
  • ? - Polar angle from the z-axis. (also known as
    "colatitude of P).
  • The range - 0 ? ? ? 2 ? and 0 ? ? ? ?
  • Px R Sin (?) Cos (?)
  • Py R Sin (?) Sin (?)
  • Pz R Cos (?)
  • Given (R, ?, ?) we can compute (Px, Py, Pz)

47
Projector Calibration
  • Similar to Camera Calibration
  • 2D image coordinates can not be obtained directly
    from a 2D image.
  • A Blank Image is projected onto the sphere
  • The 2D coordinates of the calibration points on
    the projected image are noted
  • More points can be seen from the projectors
    point of view some points are common to both
    camera views
  • Results appear to have slightly more errors when
    compared to the camera calibration

48
3D Face Model Construction
  • Why?
  • To obtain different views of the face
  • To generate the stereo pair to view it in the
    HMPD
  • Steps required
  • Computation of 3D Locations
  • Customization of 3D Model
  • Texture Mapping

49
Computation of 3D points
  • 3d point estimation using stereo
  • Stereo between two cameras is not possible
    because of the occlusion by the facial features
  • Hence two stereo pair computations
  • Left camera and projector
  • Right camera and projector
  • Using stereo, compute 3D points of prominent
    facial feature points in FCS

50
3D Generic Face Model
A generic face model with 395 vertices and 818
triangles Left front view and Right side view
51
Texture Mapped 3D Face
52
Evaluation
53
Evaluation Schemes
  • Evaluation of facial expressions and is not
    studied extensively in literature
  • Evaluation can be done for facial alignment, face
    recognition for static images
  • Lip and eye movements in a dynamic event
  • Perceptual quality How are the moods conveyed?
  • Two types of evaluation
  • Objective evaluation
  • Subjective evaluation

54
Objective Evaluation
  • Theoretical Evaluation
  • No human feedback required
  • This evaluation can give us a measure of
  • Face recognition
  • Face alignment
  • Facial movements
  • Methods applied
  • Normalized cross correlation
  • Euclidean distance measures

55
Evaluation Images
5 frames were considered for objective
evaluation First row virtual frontal views
Second row original frontal views
56
Normalized Cross-Correlation
  • Regions considered for normalized
    cross-correlation
  • ( Left Real image Right Virtual image)

57
Normalized Cross-Correlation
  • Let V be the virtual image and R be the real
    image
  • Let w be the width and h be the height of the
    images
  • The Normalized Cross-correlation between the two
    images V and R is given by
  • where

58
Normalized Cross-Correlation
59
Euclidean Distance measures
  • Euclidean distance between two points i and j is
    given by
  • Let Rij be the euclidean distance between two
    points i and j in the real image
  • Let Vij be the euclidean distance between two
    points i and j in the virtual image
  • Dij Rij - Vij

60
Euclidean Distance measures
61
Subjective Evaluation
  • Evaluates the human perception
  • Measurement of quality of a talking face
  • Factors that might affect
  • Quality of the video
  • Facial movements and expressions
  • Synchronization of the two halves of the face
  • Color and Texture of the face
  • Quality of audio
  • Synchronization of audio
  • A preliminary study has been made to assess the
    quality of the generated videos

62
Conclusion and Future Work
Future Work
Conclusion
Time Domain
Static
Dynamic
Virtual Frontal Image
Virtual Frontal Video
2D
Texture Mapped 3D Face Model
3D Facial Animation
3D
63
Summary
  • Design and implementation of a novel Face Capture
    System
  • Generation of virtual frontal view from two side
    views in a video sequence
  • Extraction of depth information using stereo
    method
  • Texture mapped 3D face model generation
  • Evaluation of virtual frontal videos

64
Future Work
  • Online processing in real-time
  • Automatic calibration
  • 3D facial animation
  • Subjective Evaluation of the virtual frontal
    videos
  • Data compression while processing and
    transmission
  • Customization of camera lenses
  • Integration with a Head Mounted Projection Display

65
Thank You
  • Doubts,
  • Queries
  • Suggestions
Write a Comment
User Comments (0)
About PowerShow.com