Analyzing and editing style MERL--A Mitsubishi Electric Research Lab Bill Freeman joint work with Josh Tennenbaum, MIT Sept. 12, 1996 - PowerPoint PPT Presentation

About This Presentation
Title:

Analyzing and editing style MERL--A Mitsubishi Electric Research Lab Bill Freeman joint work with Josh Tennenbaum, MIT Sept. 12, 1996

Description:

Need volunteers... From Monday's paper: A simple story ... Zaph Chancery. Times. Mistral. Times Bold. Monaco. Task: Translation. Domain: shape and lighting ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 66
Provided by: merl93
Learn more at: http://www.ai.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Analyzing and editing style MERL--A Mitsubishi Electric Research Lab Bill Freeman joint work with Josh Tennenbaum, MIT Sept. 12, 1996


1
(No Transcript)
2
(No Transcript)
3
Need volunteers
4
From Mondays paper A simple story about
representations
  • Input signal a moving edge.
  • Model it using an auto-regressive model,
  • Using two different representations for
    observations y
  • Representation 1 image-based.
  • Representation 2 position-based.

5
Input signal
Representation 1
6
Bases, n8
Representation 1
7
Dynamics, n8
Representation 1
8
Bases, n20
Representation 1
9
Dynamics, n20
Representation 1
10
Bases, n50
Representation 1
11
  • N 50 dynamics

Representation 1
12
What happens next?
Representation 1
13
Representing the edge position
  • Input signal y 1100
  • What dimension of an auto-regressive model do we
    need to describe that signal?

Representation 2
14
N 1
  • Can only show exponentially decaying position.

Representation 2
15
N 2
  • A 2-d model can handle uniform translation
    exactly.

Representation 2
16
The simple story
  • For a simple, canonical signal like a moving
    edge,
  • modelling it with an AR model,
  • The pixel-based representation requires a
    high-dimensional state vector, and even then
    doesnt work very well.
  • The position-based representation works perfectly
    with a 2-dimensional state vector.

17
Separating style and content with bilinear
modelsBill Freeman, MIT AI Lab.Josh
Tenenbaum, MIT Dept. Brain and Cognitive
Sciences
18
Style and content example
  • Content Style
  • character font

not observed
Matura MT
Character 1
synthesis
analysis
observed
19
Many perception problems have this two-factor
structure
Factor 1
Factor 2
  • Domain Content Style
  • typography letter font
  • face recognition identity head orientation
  • shape from shading shape lighting
  • color perception object color illum. color
  • speech recognition words speaker

20
Color constancy demo
21
  • How much of what we may consider to be
    (high-level) visual style can we account for by a
    simple, low-level statistical model?
  • Given observations that are the result of two
    strongly interacting factors,
  • can we separately analyze or manipulate those two
    factors?

22
Perceptual tasks
23
Common form of observations
factor 1
factor 2
24
General case
Account for observations by a rendering function,
f(a,b)
content-class (b values)
  • f(a1,b1) f(a1,b2) f(a1,b3) ...
  • f(a2,b1) f(a2,b2) f(a2,b3) ...
  • ... ... ...

style (a values)
25
Asymmetric bilinear model
ysc f(As , bc) As bc
Matrix for style s
Observation vector in style s and content c
Vector for content element c
26
Asymmetric bilinear model, withidentity is the
style factor.
27
Symmetric bilinear model
ysck f(as, bc) as Wk bc
Kth element of the observation vector in style s
and content c
Vector for content element c
Vector for style s
Matrix for element k of observation vector.
28
Symmetric bilinear model
29
Fitting model to training observations
Asymmetric model
ysc As bc
Symmetric model
ysck as Wk bc
  • Iterate SVDs
  • Magnus and Neudecker, 1988

30
identity
y
  • head pose

31
Vector transpose
32
Related Work, bilinear models
Koenderink and Van Doorn, 1991, 1996 Tomasi and
Kanade, 1992 Faugeras, 1993 Magnus and Neudecker,
1988 Marimont and Wandell, 1992 Turk and
Pentland, 1991 Ullman and Basri, 1991 Murase and
Nayar, 1995
33
Related Work, analyzing style
Hofstadter, 1995 and earlier papers. Grebert et
al, 1992 SIGGRAPH papers regarding controls for
animation or line style. Typically hand-crafted,
not learned. Brand and Hertzmann, 2000 Hertzmann
et al, 2001 Efros and Freeman, 2001
34
Procedure
  • (1) Fit a bilinear model to the training data of
    content elements observed across different
    styles, using linear algebra techniques.
  • (2) Use new data to find the parameters for a
    new, unknown style, or to classify new
    observations, or to generalize both style and
    content.

35
Task ClassificationDomain vowel phonemes
phoneme
  • ah eh ou ...
  • ah eh ou
  • ah eh ou ...
  • ah eh ou ...
  • ah eh ou ...
  • eh ee

speaker
training set
utterances from new speaker
ah
ou
eeu
36
Benchmark dataset
  • CMU machine learning repository
  • Training 8 speakers saying 11 different vowel
    phonemes.
  • Testing 7 new speakers
  • Data representation LPC coefficients.

37
Classification using bilinear models
Matrix describing the unknown style of the new
speaker
Previously learned vowel (content) descriptors
Vowel data from a speaker in a new style
yobserved Anew speaker bphonemes
  • Use EM (expectation maximization) algorithm.
  • Build up model for new speakers style
    simultaneously with classification of the content.

38
Example problem for Expectation Maximization (EM)
algorithm
  • Find the probability that each point came from
    one of two random spatial processes.

39
  • EM algorithm

(E-step)
(M-step)
40
Classification results performance comparison
Multi-layer perceptron 51 1-nearest neighbor
(nn) 56 Discrm. adapt. nn 62 Bilinear
model data not grouped 69 data grouped by
speaker 76
41
Task ClassificationDomain faces and pose.
42
Face pose classification results
  • Given observations of a new face, what of the
    poses can we identify correctly?
  • Nearest neighbor matching 53
  • Bilinear model Estimate As while
    classify bc with EM 74

43
Task ExtrapolationDomain typography
Chicago Zaph Times Mistral Times Bold Monaco
(Rest of alphabet, used in training, not shown.)
44
Coulomb warp representation
Describe each shape by the warp that a square of
ink particles would have to undergo to form the
shape.
45
Coulomb warping
reference shape
target shape
46
Coulomb warp representation
Coulomb warp
pixel
  • shapes

averages
47
  • S1 S2 S1 S2
    S1 S2
  • (pixel)
    (Coulomb)

48
Basis functions for the asymmetric bilinear model
  • bletter C




49
Controlling complexity in calculating the style
matrix for the new font
symmetric model (5 parameters to fit)
asymmetric model (173,280 parameters to fit)
asymmetric model, using symmetric model as a prior
Monaco (true)
50
Results of extrapolation to a new style
Chicago Zaph Times Mistral Times Bold Monaco
synthetic actual
51
Leave-one-out results
Zaph Chancery
Times Bold
Chicago
Monaco
Times
Mistral
52
(No Transcript)
53
Task TranslationDomain shape and lighting
  • Factor 2 Identity (face shape)

Factor 1 lighting
Generalization
(1) Fit symmetric bilinear model to training data
(pixel representation). (2) Solve for parameters
describing face and lighting of new image.
54
Translation Results
Factor 2 Identity (face shape)
Factor 1 lighting
55
Conclusion bilinear models are useful for
translation, classification, and extrapolation
perceptual tasks.
  • factor 1 factor 2 observation
  • letter1 Matura MT
  • phoneme speaker ahh
  • pose 3 Hiro
  • illuminant surface color eye cone

A
responses
56
(No Transcript)
57
(No Transcript)
58
End. Extra pages follow.
  • The following slides are extras.

59
Style and content
  • Mention unsupervised version would be a good
    class project. Josh or I would be into working
    with someone on it.

60
Increase dimensionality to represent
non-linearities
  • Say f(x) p x2 q x r.
  • This parabola varies non-linearly with x,
  • but as a linear function of .
  • (Like homogeneous coordinates in graphics)

61
Fitting parabolas
1-d model
2-d model
3-d model
62
  • Reconstruction from low-dimensional model

63
  • Eigenfaces for each pose

64
Task ClassificationDomain faces and pose.
Factor 2 identity
  • Factor 1
  • head pose

We build a bilinear model of how head pose and
identity modify face appearance.
65
Basis images
Pose-dependent basis functions for face
appearance.
Pose
One set of coefficients will reconstruct the same
person in different poses.
Write a Comment
User Comments (0)
About PowerShow.com