Best Practices for Convolutional NNs Applied to Visual Document Analysis according to P'A'Simard, D'

About This Presentation

Title:

Best Practices for Convolutional NNs Applied to Visual Document Analysis according to P'A'Simard, D'

Description:

Best Practices for Convolutional NNs. Applied to Visual Document Analysis ... handwriting recognition. segmented handwritten digits. data: ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 19

Provided by: mzw

Category:

more less

Transcript and Presenter's Notes

Title: Best Practices for Convolutional NNs Applied to Visual Document Analysis according to P'A'Simard, D'

1
Best Practices for Convolutional NNsApplied to
Visual Document Analysis(according to
P.A.Simard, D. Steinkraus, and J.C. Platt)
2
outline

the task
training set expansion
network architecture
learning

3
the task

handwriting recognition
segmented handwritten digits
data
benchmark set of English digit images (MNIST)
size-normalized to 28 x 28 pixels
60,000 training patterns, 10,000 test patterns
goal image vector ? 0, 1, , 9

4
the task

example from test set

5
training set expansion

Etest Etrain ? 1/P (P size of training set)
idea apply transformations to generate
additional data
learning algorithm will learn transformation
invariance (wrt. original, non-transformed, input)

6
training set expansion

examples of transformations
translation
rotation
skewing
method for every pixel in original image,
computenew location, e.g.
x(x,y)1, y(x,y)0
x(x,y)ax, y(x,y)ay ( interpolation if a not
int)
elastic deformations

7
training set expansion
A (0,0)
3 (1,0)
7 (2,0)
(0.75, 0.5)
?
5 (1,-1)
9 (2,-1)
(0,0)
xnew(x,y) 1.75 ynew(x,y) - 0.5 gray level
(gl) evaluate gl at (xnew,ynew) with bilinear
interpolation over x 3 0.75 (7 - 3)
6 5 0.75 (9 - 5) 8 over y 8 0.5 (6 -
8) 7
8
training set expansion

elastic deformations
x(x,y) rand(-1, 1), y(x,y) rand(-1, 1)
smooth with Gaussian function of given SD (in
pix)
if chosen SD large, resulting values small
if SD small, random field
intermediate SD elastic deformation
factor for intensity

9
training set expansion

examples of distortions

10
network architecture

account for topological properties of input
(shape of curves, edges, etc.)
gradually extract more complex features
simple features extracted at higher resolutions,
more coarser features at coarser resolutions over
smaller regions
conversion from one to the other with operation
of convolution
coarser resolutions generated by sub-sampling

11
network architecture
12
network architecture

set of layers each with one or more planes
each unit on plane receives input from small area
on planes in previous layer ? local receptive
fields
shared weights at all points on a plane ? reduce
number of parameters
multiple planes in each layer ? detect multiple
features
once feature detected, spatial subsampling ?
local averaging of weights
(partial) invariance to translation, rotation,
scale, and deformation

13
network architecture
S2
C2
S1 (factor of 2)
C1
Kernel size 5x5
100 hidden units
50 features
5 features e.g. edge, ink, intersection
14
gradient-based learning

backpropagation
output Yp F(Xp, W)
loss function Ep D(Dp, F(Xp, W)
Etrain(W) average of Ep over training
(X1,D1), (Xp,Dp)
Ep (Dp - F(Xp, W))2 / 2
Etrain(W) 1/P sum(Ep)
simplest setting find W such that min
Etrain(W)

15
gradient-based learning

if E differentiable wrt. W,
gradient-based optimization can be used to
compute min
module output Xn Fn (Xn-1, Wn)
Wn trainable parameters Wn ? W
Xn-1 modules input (previous
modules output)
X0 input pattern Xp

16
gradient-based learning

if known, then and can be computed

(Wn, Xn-1)
(Wn, Xn-1)
JF(wn,xn) wrt. W evaluated at (Wn, Xn-1)
compute gradient
JF(wn,xn) wrt. X at (Wn, Xn-1) propagate
backward
JF martix containing partial derivatives
of all outputs wrt. all inputs
17
gradient-based learning

simplest minimization gradient descent
W iteratively adjusted as follows
traditional backprop special case of gradient
learning with
Yn Wn Xn-1
Xn F(Yn)

18
application

zip-code scanning (generalized version over
time-domain)
fax reading
similar techniques used in other digital image
recognition(e.g. face recognition, X-ray, MRI,
etc.)
later version (2003) dynamically changing layer
parameters

Write a Comment

User Comments (0)