Analyzing and editing style MERL--A Mitsubishi Electric Research Lab Bill Freeman joint work with Josh Tennenbaum, MIT Sept. 12, 1996

About This Presentation

Title:

Analyzing and editing style MERL--A Mitsubishi Electric Research Lab Bill Freeman joint work with Josh Tennenbaum, MIT Sept. 12, 1996

Description:

Need volunteers... From Monday's paper: A simple story ... Zaph Chancery. Times. Mistral. Times Bold. Monaco. Task: Translation. Domain: shape and lighting ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 66

Provided by: merl93

Learn more at: http://www.ai.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Analyzing and editing style MERL--A Mitsubishi Electric Research Lab Bill Freeman joint work with Josh Tennenbaum, MIT Sept. 12, 1996

1
(No Transcript)
2
(No Transcript)
3
Need volunteers
4
From Mondays paper A simple story about
representations

Input signal a moving edge.
Model it using an auto-regressive model,
Using two different representations for
observations y
Representation 1 image-based.
Representation 2 position-based.

5
Input signal
Representation 1
6
Bases, n8
Representation 1
7
Dynamics, n8
Representation 1
8
Bases, n20
Representation 1
9
Dynamics, n20
Representation 1
10
Bases, n50
Representation 1
11

N 50 dynamics

Representation 1
12
What happens next?
Representation 1
13
Representing the edge position

Input signal y 1100
What dimension of an auto-regressive model do we
need to describe that signal?

Representation 2
14
N 1

Can only show exponentially decaying position.

Representation 2
15
N 2

A 2-d model can handle uniform translation
exactly.

Representation 2
16
The simple story

For a simple, canonical signal like a moving
edge,
modelling it with an AR model,
The pixel-based representation requires a
high-dimensional state vector, and even then
doesnt work very well.
The position-based representation works perfectly
with a 2-dimensional state vector.

17
Separating style and content with bilinear
modelsBill Freeman, MIT AI Lab.Josh
Tenenbaum, MIT Dept. Brain and Cognitive
Sciences
18
Style and content example

Content Style
character font

not observed
Matura MT
Character 1
synthesis
analysis
observed
19
Many perception problems have this two-factor
structure
Factor 1
Factor 2

Domain Content Style
typography letter font
face recognition identity head orientation
shape from shading shape lighting
color perception object color illum. color
speech recognition words speaker

20
Color constancy demo
21

How much of what we may consider to be
(high-level) visual style can we account for by a
simple, low-level statistical model?
Given observations that are the result of two
strongly interacting factors,
can we separately analyze or manipulate those two
factors?

22
Perceptual tasks
23
Common form of observations
factor 1
factor 2
24
General case
Account for observations by a rendering function,
f(a,b)
content-class (b values)

f(a1,b1) f(a1,b2) f(a1,b3) ...
f(a2,b1) f(a2,b2) f(a2,b3) ...
... ... ...

style (a values)
25
Asymmetric bilinear model
ysc f(As , bc) As bc
Matrix for style s
Observation vector in style s and content c
Vector for content element c
26
Asymmetric bilinear model, withidentity is the
style factor.
27
Symmetric bilinear model
ysck f(as, bc) as Wk bc
Kth element of the observation vector in style s
and content c
Vector for content element c
Vector for style s
Matrix for element k of observation vector.
28
Symmetric bilinear model
29
Fitting model to training observations
Asymmetric model
ysc As bc
Symmetric model
ysck as Wk bc

Iterate SVDs
Magnus and Neudecker, 1988

30
identity
y

head pose

31
Vector transpose
32
Related Work, bilinear models
Koenderink and Van Doorn, 1991, 1996 Tomasi and
Kanade, 1992 Faugeras, 1993 Magnus and Neudecker,
1988 Marimont and Wandell, 1992 Turk and
Pentland, 1991 Ullman and Basri, 1991 Murase and
Nayar, 1995
33
Related Work, analyzing style
Hofstadter, 1995 and earlier papers. Grebert et
al, 1992 SIGGRAPH papers regarding controls for
animation or line style. Typically hand-crafted,
not learned. Brand and Hertzmann, 2000 Hertzmann
et al, 2001 Efros and Freeman, 2001
34
Procedure

(1) Fit a bilinear model to the training data of
content elements observed across different
styles, using linear algebra techniques.
(2) Use new data to find the parameters for a
new, unknown style, or to classify new
observations, or to generalize both style and
content.

35
Task ClassificationDomain vowel phonemes
phoneme

ah eh ou ...
ah eh ou
ah eh ou ...
ah eh ou ...
ah eh ou ...
eh ee

speaker
training set
utterances from new speaker
ah
ou
eeu
36
Benchmark dataset

CMU machine learning repository
Training 8 speakers saying 11 different vowel
phonemes.
Testing 7 new speakers
Data representation LPC coefficients.

37
Classification using bilinear models
Matrix describing the unknown style of the new
speaker
Previously learned vowel (content) descriptors
Vowel data from a speaker in a new style
yobserved Anew speaker bphonemes

Use EM (expectation maximization) algorithm.
Build up model for new speakers style
simultaneously with classification of the content.

38
Example problem for Expectation Maximization (EM)
algorithm

Find the probability that each point came from
one of two random spatial processes.

EM algorithm

(E-step)
(M-step)
40
Classification results performance comparison
Multi-layer perceptron 51 1-nearest neighbor
(nn) 56 Discrm. adapt. nn 62 Bilinear
model data not grouped 69 data grouped by
speaker 76
41
Task ClassificationDomain faces and pose.
42
Face pose classification results

Given observations of a new face, what of the
poses can we identify correctly?

Nearest neighbor matching 53
Bilinear model Estimate As while
classify bc with EM 74

43
Task ExtrapolationDomain typography
Chicago Zaph Times Mistral Times Bold Monaco
(Rest of alphabet, used in training, not shown.)
44
Coulomb warp representation
Describe each shape by the warp that a square of
ink particles would have to undergo to form the
shape.
45
Coulomb warping
reference shape
target shape
46
Coulomb warp representation
Coulomb warp
pixel

shapes

averages
47

S1 S2 S1 S2
S1 S2
(pixel)
(Coulomb)

48
Basis functions for the asymmetric bilinear model

bletter C

49
Controlling complexity in calculating the style
matrix for the new font
symmetric model (5 parameters to fit)
asymmetric model (173,280 parameters to fit)
asymmetric model, using symmetric model as a prior
Monaco (true)
50
Results of extrapolation to a new style
Chicago Zaph Times Mistral Times Bold Monaco
synthetic actual
51
Leave-one-out results
Zaph Chancery
Times Bold
Chicago
Monaco
Times
Mistral
52
(No Transcript)
53
Task TranslationDomain shape and lighting

Factor 2 Identity (face shape)

Factor 1 lighting
Generalization
(1) Fit symmetric bilinear model to training data
(pixel representation). (2) Solve for parameters
describing face and lighting of new image.
54
Translation Results
Factor 2 Identity (face shape)
Factor 1 lighting
55
Conclusion bilinear models are useful for
translation, classification, and extrapolation
perceptual tasks.

factor 1 factor 2 observation
letter1 Matura MT
phoneme speaker ahh
pose 3 Hiro
illuminant surface color eye cone

A
responses
56
(No Transcript)
57
(No Transcript)
58
End. Extra pages follow.

The following slides are extras.

59
Style and content

Mention unsupervised version would be a good
class project. Josh or I would be into working
with someone on it.

60
Increase dimensionality to represent
non-linearities

Say f(x) p x2 q x r.
This parabola varies non-linearly with x,
but as a linear function of .
(Like homogeneous coordinates in graphics)

61
Fitting parabolas
1-d model
2-d model
3-d model
62

Reconstruction from low-dimensional model

Eigenfaces for each pose

64
Task ClassificationDomain faces and pose.
Factor 2 identity

Factor 1
head pose

We build a bilinear model of how head pose and
identity modify face appearance.
65
Basis images
Pose-dependent basis functions for face
appearance.
Pose
One set of coefficients will reconstruct the same
person in different poses.

Write a Comment

User Comments (0)