Title: Analyzing and editing style MERL--A Mitsubishi Electric Research Lab Bill Freeman joint work with Josh Tennenbaum, MIT Sept. 12, 1996
1(No Transcript)
2(No Transcript)
3Need volunteers
4From Mondays paper A simple story about
representations
- Input signal a moving edge.
- Model it using an auto-regressive model,
- Using two different representations for
observations y - Representation 1 image-based.
- Representation 2 position-based.
5Input signal
Representation 1
6Bases, n8
Representation 1
7Dynamics, n8
Representation 1
8Bases, n20
Representation 1
9Dynamics, n20
Representation 1
10Bases, n50
Representation 1
11Representation 1
12What happens next?
Representation 1
13Representing the edge position
- Input signal y 1100
- What dimension of an auto-regressive model do we
need to describe that signal?
Representation 2
14N 1
- Can only show exponentially decaying position.
Representation 2
15N 2
- A 2-d model can handle uniform translation
exactly.
Representation 2
16The simple story
- For a simple, canonical signal like a moving
edge, - modelling it with an AR model,
- The pixel-based representation requires a
high-dimensional state vector, and even then
doesnt work very well. - The position-based representation works perfectly
with a 2-dimensional state vector.
17Separating style and content with bilinear
modelsBill Freeman, MIT AI Lab.Josh
Tenenbaum, MIT Dept. Brain and Cognitive
Sciences
18Style and content example
- Content Style
- character font
not observed
Matura MT
Character 1
synthesis
analysis
observed
19Many perception problems have this two-factor
structure
Factor 1
Factor 2
- Domain Content Style
- typography letter font
- face recognition identity head orientation
- shape from shading shape lighting
- color perception object color illum. color
- speech recognition words speaker
20Color constancy demo
21- How much of what we may consider to be
(high-level) visual style can we account for by a
simple, low-level statistical model? - Given observations that are the result of two
strongly interacting factors, - can we separately analyze or manipulate those two
factors?
22Perceptual tasks
23Common form of observations
factor 1
factor 2
24General case
Account for observations by a rendering function,
f(a,b)
content-class (b values)
- f(a1,b1) f(a1,b2) f(a1,b3) ...
- f(a2,b1) f(a2,b2) f(a2,b3) ...
- ... ... ...
style (a values)
25Asymmetric bilinear model
ysc f(As , bc) As bc
Matrix for style s
Observation vector in style s and content c
Vector for content element c
26Asymmetric bilinear model, withidentity is the
style factor.
27Symmetric bilinear model
ysck f(as, bc) as Wk bc
Kth element of the observation vector in style s
and content c
Vector for content element c
Vector for style s
Matrix for element k of observation vector.
28Symmetric bilinear model
29Fitting model to training observations
Asymmetric model
ysc As bc
Symmetric model
ysck as Wk bc
- Iterate SVDs
- Magnus and Neudecker, 1988
30identity
y
31Vector transpose
32Related Work, bilinear models
Koenderink and Van Doorn, 1991, 1996 Tomasi and
Kanade, 1992 Faugeras, 1993 Magnus and Neudecker,
1988 Marimont and Wandell, 1992 Turk and
Pentland, 1991 Ullman and Basri, 1991 Murase and
Nayar, 1995
33Related Work, analyzing style
Hofstadter, 1995 and earlier papers. Grebert et
al, 1992 SIGGRAPH papers regarding controls for
animation or line style. Typically hand-crafted,
not learned. Brand and Hertzmann, 2000 Hertzmann
et al, 2001 Efros and Freeman, 2001
34Procedure
- (1) Fit a bilinear model to the training data of
content elements observed across different
styles, using linear algebra techniques. - (2) Use new data to find the parameters for a
new, unknown style, or to classify new
observations, or to generalize both style and
content.
35Task ClassificationDomain vowel phonemes
phoneme
- ah eh ou ...
- ah eh ou
- ah eh ou ...
- ah eh ou ...
- ah eh ou ...
- eh ee
speaker
training set
utterances from new speaker
ah
ou
eeu
36Benchmark dataset
- CMU machine learning repository
- Training 8 speakers saying 11 different vowel
phonemes. - Testing 7 new speakers
- Data representation LPC coefficients.
37Classification using bilinear models
Matrix describing the unknown style of the new
speaker
Previously learned vowel (content) descriptors
Vowel data from a speaker in a new style
yobserved Anew speaker bphonemes
- Use EM (expectation maximization) algorithm.
- Build up model for new speakers style
simultaneously with classification of the content.
38Example problem for Expectation Maximization (EM)
algorithm
- Find the probability that each point came from
one of two random spatial processes.
39(E-step)
(M-step)
40Classification results performance comparison
Multi-layer perceptron 51 1-nearest neighbor
(nn) 56 Discrm. adapt. nn 62 Bilinear
model data not grouped 69 data grouped by
speaker 76
41Task ClassificationDomain faces and pose.
42Face pose classification results
- Given observations of a new face, what of the
poses can we identify correctly?
- Nearest neighbor matching 53
- Bilinear model Estimate As while
classify bc with EM 74
43Task ExtrapolationDomain typography
Chicago Zaph Times Mistral Times Bold Monaco
(Rest of alphabet, used in training, not shown.)
44Coulomb warp representation
Describe each shape by the warp that a square of
ink particles would have to undergo to form the
shape.
45Coulomb warping
reference shape
target shape
46Coulomb warp representation
Coulomb warp
pixel
averages
47- S1 S2 S1 S2
S1 S2 - (pixel)
(Coulomb)
48Basis functions for the asymmetric bilinear model
49Controlling complexity in calculating the style
matrix for the new font
symmetric model (5 parameters to fit)
asymmetric model (173,280 parameters to fit)
asymmetric model, using symmetric model as a prior
Monaco (true)
50Results of extrapolation to a new style
Chicago Zaph Times Mistral Times Bold Monaco
synthetic actual
51Leave-one-out results
Zaph Chancery
Times Bold
Chicago
Monaco
Times
Mistral
52(No Transcript)
53Task TranslationDomain shape and lighting
- Factor 2 Identity (face shape)
Factor 1 lighting
Generalization
(1) Fit symmetric bilinear model to training data
(pixel representation). (2) Solve for parameters
describing face and lighting of new image.
54Translation Results
Factor 2 Identity (face shape)
Factor 1 lighting
55Conclusion bilinear models are useful for
translation, classification, and extrapolation
perceptual tasks.
- factor 1 factor 2 observation
- letter1 Matura MT
- phoneme speaker ahh
- pose 3 Hiro
- illuminant surface color eye cone
-
A
responses
56(No Transcript)
57(No Transcript)
58End. Extra pages follow.
- The following slides are extras.
59Style and content
- Mention unsupervised version would be a good
class project. Josh or I would be into working
with someone on it.
60Increase dimensionality to represent
non-linearities
- Say f(x) p x2 q x r.
- This parabola varies non-linearly with x,
- but as a linear function of .
- (Like homogeneous coordinates in graphics)
61Fitting parabolas
1-d model
2-d model
3-d model
62- Reconstruction from low-dimensional model
63 64Task ClassificationDomain faces and pose.
Factor 2 identity
We build a bilinear model of how head pose and
identity modify face appearance.
65Basis images
Pose-dependent basis functions for face
appearance.
Pose
One set of coefficients will reconstruct the same
person in different poses.