Generalization Error of Linear Neural Networks in an Empirical Bayes Approach

About This Presentation

Title:

Generalization Error of Linear Neural Networks in an Empirical Bayes Approach

Description:

1. Generalization Error of. Linear Neural Networks in. an Empirical Bayes Approach ... Lemma 1: Posterior is localized so that we can substitute the model at the SB ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 31

Provided by: watanabep

Category:

more less

Transcript and Presenter's Notes

Title: Generalization Error of Linear Neural Networks in an Empirical Bayes Approach

1
Generalization Error of Linear Neural Networks
in an Empirical Bayes Approach

Shinichi Nakajima Sumio Watanabe
Tokyo Institute of Technology
Nikon Corporation

2
Contents

Backgrounds
Regular models
Unidentifiable models
Superiority of Bayes to ML
Whats the purpose?
Setting
Model
Subspace Bayes (SB) Approach
Analysis
(James-Stein estimator)
Solution
Generalization error
Discussion Conclusions

3
Regular Models
Conventional Learning Theory
K dimensionality of parameter space
n of samples
x input
y output
1. Asymptotic normalities of distribution of ML
estimator and Bayes posterior
GE
FE
Model selection methods (AIC, BIC, MDL)
2. Asymptotic generalization error l(ML)
l(Bayes)
4
Unidentifiable models
H of components
1. Asymptotic normalities NOT hold.
No (penalized likelihood type) information
criterion.
5
Superiority of Bayes to ML
How singularities work in learning ?
When true is on singularities,
Increase of neighborhood of true accelerates
overfitting.
Increase of population denoting true suppresses
overfitting. (only in Bayes)
1. Asymptotic normalities NOT hold.
No (penalized likelihood type) information
criterion.
2. Bayes has advantage G(Bayes) lt
G(ML)
6
Whats the purpose ?

Bayes provides good generalization.

Expensive. (Needs Markov chain Monte Carlo)

Is there any approximation with good
generalization and tractability?

Variational Bayes (VB) HintonvanCamp93
MacKay95 Attias99GhahramaniBeal00

Analyzed in another paper. NakajimaWatanabe05

Subspace Bayes (SB)

7
Contents

Backgrounds
Regular models
Unidentifiable models
Superiority of Bayes to ML
Whats the purpose?
Setting
Model
Subspace Bayes (SB) Approach
Analysis
(James-Stein estimator)
Solution
Generalization error
Discussion Conclusions

8
Linear Neural Networks(LNNs)
LNN with M input, N output, and H hidden units
A input parameter (H x M ) matrix
B output parameter (N x H ) matrix
Essential parameter dimensionality
9
Maximum Likelihood estimator BaldiHornik95
ML estimator is given by
where
Here
h-th largest singular value of RQ -1/2.
right singular vector.
left singular vector.
10
Bayes estimation
input
output
parameter
Learner
Prior
True
In ML (or MAP) Predict with one model
In Bayes Predict with ensemble of models
11
Empirical Bayes (EB) approachEffronMorris73
True
Learner
Prior
Hyperparameter is estimated by maximizing
marginal likelihood.
12
Subspace Bayes (SB) approach
SB is an EB where part of parameters are regarded
as hyperparameters.
a) MIP (Marginalizing in Input Parameter space)
version
A parameter
B hyperparameter
b) MOP (Marginalizing in Output Parameter space)
version
A hyperparameter
B parameter
Marginalization can be done analytically in LNNs.
13
Intuitive explanation
Bayes posterior
SB posterior
For redundant comp.
Optimize
14
Contents

Backgrounds
Regular models
Unidentifiable models
Superiority of Bayes to ML
Whats the purpose?
Setting
Model
Subspace Bayes (SB) Approach
Analysis
(James-Stein estimator)
Solution
Generalization error
Discussion Conclusions

15
Free energy (a.k.a. evidence, stochastic
complexity)
Free energy
Important variable used for model selection.
Akaike80Mackay92
We minimize the free energy, optimizing
hyperparameter.
16
Generalization error
Generalization Error
Kullbuck-Leibler divergence between q p
where
Expectation of V over q
Asymptotic expansion
In regular,
In unidentifiable,
17
James-Stein (JS) estimator
for any true
Domination of a over b
for a certain true
K-dimensional mean estimation (Regular model)
A certain relation between EB and JSwas
discussed in EfronMorris73
samples
James-Stein estimator
JamesStein61
18
Positive-part JS estimator
Positive-part JS type (PJS) estimator
where
Thresholding
Model selection
PJS is a model selecting, shrinkage estimator.
19
Hyperparameter optimization
Assume orthonormality
d x d identity matrix
Analytically solved in LNNs!
Optimum hyperparameter value
20
SB solution (Theorem1, Lemma1)
L dimensionality of marginalized subspace (per
component), i.e., L M in MIP, or L N
in MOP.
Theorem 1 The SB estimator is given by
where
Lemma 1 Posterior is localized so that we can
substitute the model at the SB estimator for
predictive.
SB is asymptotically equivalent to PJS estimation.
21
Generalization error (Theorem 2)
Theorem 2 SB generalization coefficient is given
by
h-th largest eigenvalue of matrix subject to
WN-H (M-H, IN-H ).
Expectation over Wishart distribution.
22
Large scale approximation(Theorem 3)
Theorem 3 In the large scale limit when
,the generalization
coefficient converges to
where
23
Results 1 (true rank dependence)
N 30
ML
M 50
Bayes
SB(MIP)
SB(MOP)
N 30
M 50
SB provides good generalization.
Note This does NOT mean domination of SB over
Bayes. Discussion of domination needs
consideration of delicate situation. (See paper)
24
Results 2(redundant rank dependence)
N 30
ML
M 50
Bayes
SB(MOP)
SB(MIP)
N 30
M 50
depends on H similarly to ML.has also a property
similar to ML.
25
Contents

Backgrounds
Regular models
Unidentifiable models
Superiority of Bayes to ML
Whats the purpose?
Setting
Model
Subspace Bayes (SB) Approach
Analysis
(James-Stein estimator)
Solution
Generalization error
Discussion Conclusions

26
Feature of SB

provides good generalization.
In LNNs, asymptotically equivalent to PJS.
requires smaller computational costs.
Reduction of marginalized space.
In some models, marginalization can be done
analytically.
related to variational Bayes (VB) approach.

27
Variational Bayes (VB) Solution
NakajimaWatanabe05

VB results in same solution as MIP.
VB automatically selects larger dimension to
marginalize.

For
and
Bayes posterior
VB posterior
Similar to SB posterior
28
Conclusions

We have introduced a subspace Bayes (SB)
approach.
We have proved that, in LNNs, SB is
asymptotically equivalent to a shrinkage (PJS)
estimation.
Even in asymptotics, SB for redundant components
converges not to ML but to smaller value, which
means suppression of overfitting.
Interestingly, MIP of SB is asymptotically
equivalent to VB.
We have clarified the SB generalization error.
SB has Bayes-like and ML-like properties, i.e.,
shrinkage and acceleration of overfitting by
basis selection.

29
Future work

Analysis of other models. (neural networks,
Bayesian networks, mixture models, etc).
Analysis of variational Bayes (VB) in other
models.

30
Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

Generalization Error of Linear Neural Networks in an Empirical Bayes Approach - PowerPoint PPT Presentation

Generalization Error of Linear Neural Networks in an Empirical Bayes Approach

1. Generalization Error of. Linear Neural Networks in. an Empirical Bayes Approach ... Lemma 1: Posterior is localized so that we can substitute the model at the SB ... – PowerPoint PPT presentation