Minicourse on Artificial Neural Networks and Bayesian Networks - PowerPoint PPT Presentation

1 / 114
About This Presentation
Title:

Minicourse on Artificial Neural Networks and Bayesian Networks

Description:

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan ... How can network models explain high-level reasoning? ... – PowerPoint PPT presentation

Number of Views:294
Avg rating:3.0/5.0
Slides: 115
Provided by: Informatio367
Category:

less

Transcript and Presenter's Notes

Title: Minicourse on Artificial Neural Networks and Bayesian Networks


1
Mini-course on Artificial Neural Networks and
Bayesian Networks
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Michal Rosen-Zvi

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
2
Section 1 Introduction
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
3
Networks (1)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Networks serve as a visual way for displaying
    relationships
  • Social networks are examples of flat networks
    where the only information is relation between
    entities

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
4
Example collaboration network
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????

1. Analyzing Cortical Activity using Hidden
Markov Models Itay Gat, Naftali Tishby, and Moshe
Abeles "Network, Computation in Neural Systems",
August 1997. 2. Cortical Activity Flips Among
Quasi Stationary States Moshe Abeles, Hagai
Bergman, Itay Gat, Isaac Meilijson, Eyal
Seidemann, Naftali Tishby, Eilon Vaadia Prepared
Feb 1, 1995, Appeared in the Proceedings of the
National Academy of Science (PNAS) 3. Rigorous
Learning Curve Bounds from Statistical
Mechanics David Haussler, Michael Kearns, H.
Sebastian Seung, and Naftali Tishby Prepared
July 1994. Full version, Machine Learning (1997).
4. H. S. Seung, Haim Sompolinsky, Naftali
Tishby Learning Curves in Large Neural Networks.
COLT 1991 112-127 5. Yann LeCun, Ido Kanter,
Sara A. Solla Second Order Properties of Error
Surfaces. NIPS 1990 918-924 6. Esther Levin,
Naftali Tishby, Sara A. Solla A Statistical
Approach to Learning and Generalization in
Layered Neural Networks. COLT 1989 245-260 7.
Litvak V, Sompolinsky H, Segev I, and Abeles M
(2003) On the Transmission of Rate Code in Long
Feedforward Networks with Excitatory-Inhibitory
Balance. Journal of Neuroscience,
23(7)3006-30158. Senn, W., Segev, I., and
Tsodyks, M. (1998). Reading neural synchrony with
depressing synapses. Neural Computation 10
815-819 8. Tsodkys, M., I.Mit'kov, H.Sompolinsky
(1993) Pattern of synchrony in inhomogeneous
networks of oscillators with pulse interactions.
Phys. Rev. Lett., 9. Memory Capacity of Balanced
Networks (Yuval Aviel, David Horn and Moshe
Abeles) 10. The Role of Inhibition in an
Associative Memory Model of the Olfactory Bulb.
(Ofer Hendin, David Horn and Misha Tsodyks) 11
Information Bottleneck for Gaussian Variables
Gal Chechik, Amir Globerson, Naftali Tishby and
Yair Weiss Prepared June 2003. Submitted to
NIPS-2003
matlab
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
5
Networks (2)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Artificial Neural Networks represent rules
    deterministic relations - between input and
    output

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
6
Networks (3)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Bayesian Networks represent probabilistic
    relations - conditional independencies and
    dependencies between variables

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
7
Outline
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Introduction/Motivation
  • Artificial Neural Networks
  • The Perceptron, multilayered FF NN and recurrent
    NN
  • On-line (supervised) learning
  • Unsupervised learning and PCA
  • Classification
  • Capacity of networks
  • Bayesian networks (BN)
  • Bayes rules and the BN semantics
  • Classification using Generative models
  • Applications Vision, Text

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
8
Motivation
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The research of ANNs is inspired by neurons in
    the brain and (partially) driven by the need for
    models of the reasoning in the brain.
  • Scientists are challenged to use machines more
    effectively for tasks traditionally solved by
    humans (example - driving a car, inferring
    scientific referees to papers and many others)

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
9
Questions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • How can a network learn?
  • What will be the learning rate?
  • What are the limitations on the network capacity?
  • How networks can be used to classify results with
    no labels (unsupervised learning)?
  • What are the relations and differences between
    learning in ANN and learning in BN?
  • How can network models explain high-level
    reasoning?

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
10
History of (modern) ANNs and BNs
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
11
Section 2 On-line Learning
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Based on slides from Michael Biehls summer course

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
12
Section 2.1 The Perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
13
The Perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Input ?
  • Adaptive Weights J
  • Output S

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
14
Perceptron binary output
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Implements a linearly separable classification
    of inputs
  • Milestones
  • Perceptron convergence theorem, Rosenblatt (1958)
  • Capacity, winder (1963) Cover(1965)
  • Statistical Physics of perceptron weights,
    Gardner (1988)
  • How does this device learn?

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
15
Learning a linearly separable rule from reliable
examples
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Unknown rule ST(?)sign(B??) 1
  • Defines the correct classification.
  • Parameterized through a teacher perceptron with
    weights B?RN, (B?B1)
  • Only available information example data
  • D ?? , S?T(?)sign(B???) for ?1P

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
16
Learning a linearly (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Training finding the student weights J
  • J parameterizes a hypothesis SS(?)sign(J??)
  • Supervised learning is based on the student
    performance with respect to the training data D
  • Binary error measure
  • ??T(J) ?S?S(?),S?T(?)
  • ??T(J)1 if S?S(?)?S?T(?) ??T(W)0 if
    S?S(?)S?T(?)

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
17
Off-line learning
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Guided by the minimization of a cost function
    H(J), e.g., the training error
  • H(J)? ?tT(J)
  • Equilibrium statistical mechanics treatment
  • Energy H of N degrees of freedm
  • Ensemble of systems is in thermal equilibrium at
    formal temperature
  • Disorder avg. over random examples (replicas)
    assumes distribution over the inputs
  • Macroscopic description, order parameters
  • Typical properties of large sustems, P ?N

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
18
On-line training
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Single presentation of uncorrelated (new)
    ??,S?T(?)
  • Update of student weights
  • Learning dynamics in discrete time

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
19
On-line training - Statistical Physics approach
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Consider sequence of independent, random
  • Thermodynamic limit
  • Disorder average over latest example
    self-averaging properties
  • Continuous time limit

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
20
Generalization
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Performance of the student (after training) with
    respect to arbitrary, new input
  • In practice empirical mean of mean error measure
    over a set of test inputs
  • In the theoretical analysis average over the
    (assumed) probability density of inputs
  • Generalization error

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
21
Generalization (cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The simplest model distribution
  • Isotropic density P(?), ? uncorrelated with B and
    J
  • Consider vectors of independent identically
    distributed (iid) components ?j with

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
22
Geometric argument
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Projection of data into (B, J)-plane yields
    isotropic density of inputs

?g?/?
ST(?)SS(?)
For B1
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
23
Overlap Parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Sufficient to quantify the success of learning
  • RB?J QJ?J
  • Random guessing R0, ?g1/2
  • Perfect generalization ,
    ?g0

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
24
Derivation for large N
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Given B, J, and uncorrelated random input ??i?0,
    ??i ?j ??ij, consider student/teacher fields
    that are sums of (many) independent random
    quantities
  • xJ???iJi?I
  • yB???iBi?i

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
25
Central Limit Theorem
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Joint density of (x,y) is for N?8, a two
    dimensional Gaussian, fully specified by the
    first and the second moments
  • ?x??iJi??i?0 ?y??iBi??i?0
  • ?x2? ?ijJiJj??i?j? ?iJi2 Q
  • ?y2? ?ijBiBj??i?j? ?iBi2 1
  • ?xy? ?ijJiBj??i?j? ?iJiBi R

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
26
Central Limit Theorem (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Details of the input are irrelevant.
  • Some possible examples binary, ?i?1, with equal
    prob. Uniform, Gaussian.

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
27
Generalization Error
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The isotropic distribution is also assumed to
    describe the statistics of the example data inputs

Exercise Derive the generalization error as a
function of R,Q use Mathematical notes
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
28
Assumptions about the data
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • No spatial correlatins
  • No distinguished directions in the input space
  • No temporal correlations
  • No correlations with the rule
  • Single presentation without repeatitions
  • Consequences
  • Average over data can be performed step by step
  • Actual choice of B is irrelevant, it is not
    necessary to averaged over the teacher

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
29
Hebbian learning (revisited) Hebb 1949
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Off-line interpretation Vallet 1989
  • Choice of student weights given D??,S?T?1P
  • J(P) ???S?T/N
  • Equivalent On-line interpretation
  • Dynamics upon single presentation of examples
  • J(?) J(?-1) ??S?T/N

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
30
Hebb on-line
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • From microscopic to macroscopic recursions for
    overlaps

Exercise Derive the update equations of R,Q
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
31
Hebb on-line (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Average over the latest example ???
  • The random input,?? enters only through the
    fields
  • The random input ?? and J(?-1), B are
    statistically independent
  • The Central Limit Theorems applies and obtains
    the joint density

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
32
Hebb on-line (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Exercise Derive the update equations of R,Q as a
function of ? use Mathematical notes off-line
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
33
Hebb on-line (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Continuous time limit, N?8, ? ?/N, d?1/N

Initial conditions - tabula rasa
R(0)Q(0)0 What are the mean values after
training with ?N examples??? See matlab code
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
34
Hebb on-line mean values
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The order parameters, Q and R, are self averaging
    for infinite N
  • Self average properties of A(J)
  • The observation of a value of A different from
    its mean occurs with vanishing probability

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
35
Learning Curve ? dependent of the order
parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Exercise Solve the differential equations for R
and Q
Exercise Find the function ?(?)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
36
Learning Curve ? dependent of the order
parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
The normalized overlap between the two vectors,
B, J provides the angle between the vectors two
vectors
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
37
Learning Curve ? dependent of the order
parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Exercise Find asymptotic behavior of ?(?)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
38
Asymptotic expansion draw w. matlab
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
39
Questions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • What are other learning algorithms that can be
    used for efficient learning?
  • What training algorithm will provide the best
    learning/ the fastest asymptotic decrease?

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
40
Modified Hebbian learning
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The training algorithm is defined by a modulation
    function f
  • J(?) J(?-1) f() ??S?T/N
  • Restriction f may depend on available
    quantities f(J(?-1),??,S?T)

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
41
Perceptron Rosenblatt 1959
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • If classification is correct dont change the
    weights.
  • If classification is incorrect
  • if the right class for the ? example is 1 J(?).??
    increases.
  • if right class for the ? example is -1 J(?).??
    decreases

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
42
Perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Only informative points are used (mistake driven)
  • The solution is a linear combination of the
    training points
  • Converges only for linearly separable data

Exercise Derive the update equations of ?,Q as a
function of ?, J,B and ?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
43
On-line dynamics Biehl and Riegler 1994
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
44
Questions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Find the asymptotic behavior (by simulations
    and/or analytically) of the generalization error
    for the perceptron algorithm and Hebb algorithm,
    which one is better?
  • What training algorithm will provide the best
    learning/ the fastest asymptotic decrease?

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
45
Learning Curve - Hebb and Perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
46
Section 2.2 On-line by gradient descent
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
47
Introduction
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Commonly used in practical applications
  • Multilayered neural network with continuous
    activation functions, where output is a
    differentiable function of the adaptive
    parameters
  • Can be used for fitting a function to a data

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
48
Linear perceptron and linear regression (1D)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • xJ?
  • Using a quadratic loss function and gradient
    descent for finding the best curve to fit a data
    set see ? , off-line

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
49
Simple case Linear perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Teacher ST(?)yB??
  • Student SS(?)xJ??
  • Training and performance evaluation are based on
    the quadratic error

Consider the training dynamics
Exercise Derive the update equations of R,Q as a
function of ?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
50
Linear perceptron (cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Some exercises
  • Write a matlab code for the linear perceptron,
    teacher-student scenario.
  • Show that
  • Investigate the role of the learning rate ?
  • Find the asymptotic decrease to zero errors

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
51
Adatron binary output J(?) J(?-1) f() ??S?T/N
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Some exercises
  • Write a matlab code for the linear perceptron,
    teacher-student scenario.
  • Find the asymptotic decrease to zero errors
  • Compare with the performance of the Perceptron
    and Hebb rule

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
52
Multilayered feed-forward NN
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Example architecture the soft-committee machine

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
53
Multilayered ff NN (cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Transfer function sigmoidal g(x)
  • e.g., g(x)tanh(x) or g(x)
  • Error function is defined
  • The total output

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
54
Teacher-Student scenario
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • If teacher and student have the same architecture
    but student has K hidden units and teacher as M
    hidden units,
  • Can the student learn the rules?

K
K M
K M
Unlearnable rule
Learnable rule
Overlearnable rule
In the following we will discuss matching
architectures
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
55
The error measure
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • One (obvious) choice for continuous outputs

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
56
On-line gradient descent
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Assuming the same learning rate, ?, over all the
    network, the update equations are for fixed known
    v

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
57
Assumptions and definitions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Isotropic uncorrelated input data
  • The number of input components N is huge

The rule is specified by the norms of the
teacher, say all 1s
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
58
Order parameters role
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The set of order parameters and weights is
    sufficient for describing the learning, this is
    the macroscopic set of parameters

Microscopic KNK degrees of freedom Macroscopic
K(K-1)/2KMK different order parameters
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
59
Generalization error erf function Saad Solla
1995
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Reflect symmetries of the soft committee machine

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
60
Permutation Symmetry
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The generalization error is characterized by
    invariance under permutations of branches
  • How do you think this feature affects learning
    performance?

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
61
A simple case
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Hidden to output weights are fixed and known
    wivi1
  • The update rule is

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
62
Update of the order parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
63
Differential Equations
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
64
Learning curves
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
65
Section 3 Unsupervised learning
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Based on slides from Michael Biehls summer course

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
66
Introduction
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Learning without a teacher!?
  • Real world data is, in general, not isotropic and
    structure less in input space.
  • Unsupervised learning extraction of information
    from unlabelled inputs

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
67
Potential aims
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Correlation analysis
  • Clustering of data grouping according to some
    similarity criterion
  • Identification of prototypes represent large
    amount of data by few examples
  • Dimension reduction represent high dimensional
    data by few relevant features

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
68
A simple example
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Prototypes for high dimensional data directions
    in the space
  • Assume data points are distributed as

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
69
A simple example (cont)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The student task is to find the directions B1 and
    B2
  • The data looks different in different planes!

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
70
Student scenario
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Search for the two vectors, using a two student
    vectors
  • Define set of possible learning rules
  • Analyze learning abilities
  • Compare and choose the best learning
  • It would provide the two principle components of
    the data

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
71
PCA General setting matlab
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Given a set of data points X 1. Compute the
covariance matrix
2. Compute the eigenvalues and eigenvectors of
the covariance matrix
3. Arrange the egienvalues from the biggest to
the smallest. Take the first d eigenvectors
as principle components if the input
dimensionality is to be reduced to d.
4. Project the input data onto the principle
components, which forms the representation of
input data.
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
72
Principle Component Analysis
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Algebraic view point Given data find a linear
    transformation such that the sum of squared
    distances is minimized over all linear
    transformations
  • Statistical view point Given data assume that
    each point is a random variable sampled from a
    Gaussian with unit covariance and mean. Find the
    ML estimator of the means under the constraint
    that there are K different means that are
    linearly related to the data.

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
73
Example vision
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
74
Example vision (cont)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Average results for each of the 6400 pixels
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
75
First nine eigen faces
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
76
Dimensionality Reduction
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The goal is to compress information with minimal
    loss
  • Methods
  • Unsupervised learning
  • Principle Component Analysis
  • Nonnegative Matrix Factorization
  • Bayesian Models (Matrices are probabilities)

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
77
Section 4 Bayesian Networks
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Some slides are from Baldis course on Neural
    Networks

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
78
Bayesian Statistics
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Bayesian framework for induction we start with
    hypothesis space and wish to express relative
    preferences in terms of background information
    (the Cox-Jaynes axioms).
  • Axiom 0 Transitivity of preferences.
  • Theorem 1 Preferences can be represented by a
    real number p (A).
  • Axiom 1 There exists a function f such that
  • p(non A) f(p(A))
  • Axiom 2 There exists a function F such that
  • p (A,B) F(p(A), p(BA))
  • Theorem2 There is always a rescaling w such that
    p(A)w(p(A)) is in 0,1, and satisfies the sum
    and product rules.

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
79
Probability as Degree of Belief
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Sum Rule
  • P(non A) 1- P(A)
  • Product Rule
  • P(A and B) P(A) P(BA)
  • BayesTheorem
  • P(BA)P(AB)P(B)/P(A)
  • Induction Form
  • P(MD) P(DM)P(M)/P(D)
  • Equivalently
  • logP(MD) logP(DM)logP(M)-logP(D)

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
80
The Asia problem
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Shortness-of-breath (dyspnoea) may be due to
    Tuberculosis, Lung cancer or bronchitis, or none
    of them. A recent visit to Asia increases the
    chances of tuberculosis, while Smoking is known
    to be a risk factor for both lung cancer and
    Bronchitis. The results of a single chest X-ray
    do not discriminate between lung cancer and
    tuberculosis, as neither does the presence or
    absence of Dyspnoea.

Lauritzen Spiegelhalter 1988
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
81
Graphical models
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Successful marriage between Probabilistic Theory
    and Graph Theory
  • M. I. Jordan

P(x1,x2,x3) ? P(x1,x3) P(x2,x3)
P(x1,x2,x3) ? Y(x1,x3) Y(x2,x3)
Applications Vision, Speech Recognition, Error
correcting codes, Bioinformatics
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
82
Directed acyclic Graphs
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Involves conditional dependencies

P(x1,x2,x3) P(x1)P(x2)P(x3x1,x2)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
83
Directed Graphical Models (2)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Each node is associated with a random variable
  • Each arrow is associated with conditional
    dependencies (Parentschild)
  • Shaded nodes illustrates an observed variable
  • Plates stand for repetitions of i.i.d. drawings
    of the random variables

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
84
Classification problem
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • This is problem is unsupervised where one is
    searching for best labels that fit the data, and
    does not have any examples that contain labels
  • Perceptron and Support vector machines are widely
    used for classifications. These are
    discriminative methods

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
85
Classification assigning labels to data
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
86
Density estimator
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The simplest model for density estimation is the
    Naïve Bayes classifier

Assumes that each of the data points is
distributed independently Results in a trivial
learning algorithm Usually does not suffer from
overfitting
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
87
Directed graph real world example
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Statistical modeling of data mining Huge corpus,
authors and words are observed, topics and
relations are learned.
The author topic model
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
88
Goal
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Automatically extract topical content of
    documents and learn association of topics to
    authors of documents
  • Expand existing probabilistic topic models to
    include author information
  • Some queries that model should be able to answer
  • What topics does author X work on?
  • Which authors work on topic X?
  • What are interesting temporal patterns in topics?

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
89
Previous topic-based models
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Hoffman (1999) Probabilistic Latent Semantic
    Indexing (pLSI)
  • EM implementation
  • Problem of overfitting
  • Latent Dirichlet Allocation (LDA) Blei, Ng,
    Jordan (2003) Griffiths Steyvers, (PNAS 2004)
  • Clarified the pLSI model
  • Variational EM, Scalability?
  • Gibbs sampling technique for inference
  • Computationally simple, Efficient (linear with
    size of data), Can easily be applied to 100K
    documents

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
90
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
91
Classification
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
92
Topics Model for Semantic Representation
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Based on a Professor Mark Steyvers slides, a
    joint work of Mark Steyvers (UCI) and Tom
    Griffiths (Stanford)

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
93
The DRM Paradigm
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • The Deese (1959), Roediger, and McDermott (1995)
    Paradigm
  • Subjects hear a series of word lists during the
    study phase, each comprising semantically related
    items strongly related to another non-presented
    word (false target).
  • Subjects (later) receive recognition tests for
    all words plus other distracted words including
    the false target.
  • DRM experiments routinely demonstrate that
    subjects claim to recognize false tagets.

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
94
Example test of false memory effects in the DRM
Paradaigm
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • STUDY Bed, Rest, Awake, Tired, Dream, Wake,
    Snooze, Blanket, Doze, Slumber, Snore, Nap,
    Peace, Yawn, Drowsy
  • FALSE RECALL Sleep 61

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
95
A Rational Analysis of Semantic Memory
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Our associative/semantic memory system might
    arise from the need to efficiently predict word
    usage with just a few basis functions (i.e.,
    concepts or topics)
  • The topics model provides such a rational analysis

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
96
A Spatial Representation Latent Semantic
Analysis (Landauer Dumais, 1997)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
97
Triangle Inequality constraint on words with
multiple meanings
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Euclidian distance AC ? AB BC

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
98
A generative model for topics
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Each document (i.e. context)
  • is a mixture of topics.
  • Each topic is a distribution
  • over words.
  • Each word is chosen
  • from a single topic.

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
99
A toy example
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
wi
TOPIC MIXTURE
P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS
0.1 MYSTERY 0.1 JOY 0.1
P( w z ) SCIENTIFIC 0.4
KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS
0.1 MYSTERY 0.1
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
100
All probability to topic 1
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Document HEART, LOVE, JOY, SOUL, HEART, .
wi
One TOPIC
P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS
0.1 MYSTERY 0.1 JOY 0.1
P( w z ) SCIENTIFIC 0.4
KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS
0.1 MYSTERY 0.1
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
101
All probability to topic 2
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Document SCIENTIFIC, KNOWLEDGE, SCIENTIFIC,
RESEARCH, .
wi
P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS
0.1 MYSTERY 0.1 JOY 0.1
P( w z ) SCIENTIFIC 0.4
KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS
0.1 MYSTERY 0.1
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
102
Application to corpus data
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • TASA corpus text from first grade to college
  • representative sample of text
  • 26,000 word types (stop words removed)
  • 37,000 documents
  • 6,000,000 word tokens

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
103
Fitting the model
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Learning is unsupervised
  • Learning means inverting the generative model
  • We estimate P( z w ) assign each word in the
    corpus to one of T topics
  • With T500 topics and 6x106 words, the size of
    the discrete state space is (500)6,000,000
    HELP!
  • Efficient sampling approach ? Markov Chain Monte
    Carlo (MCMC)
  • Time Memory requirements linear with T and N

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
104
Gibbs Sampling MCMCsee Griffiths Steyvers,
2003 for details
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Assign every word in corpus to one of T topics
  • Sampling distribution for z

number of times word w assigned to topic j
number of times topic j used in document d
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
105
A selection from 500 topics P(wz j)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • THEORY
  • SCIENTISTS
  • EXPERIMENT
  • OBSERVATIONS
  • SCIENTIFIC
  • EXPERIMENTS
  • HYPOTHESIS
  • EXPLAIN
  • SCIENTIST
  • OBSERVED
  • EXPLANATION
  • BASED
  • OBSERVATION
  • IDEA
  • EVIDENCE
  • THEORIES
  • BELIEVED
  • DISCOVERED

SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAU
TS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES A
TMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT
BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY S
MELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPI
NAL FIBERS SENSORY
ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM W
ORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE P
AINTER ARTS BEAUTIFUL DESIGNS
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
106
Polysemy words with multiple meanings
represented in different topics
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
107
Predicting word association
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • LSA finds the closest word
  • Topics Model do inference given that one word
    was observed what will be the next word with the
    highest probability?

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
108
Word Association (norms from Nelson et al. 1998)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
CUE PLANET
  • Associate N. People
  • 1 EARTH
  • 2 STARS
  • 3 SPACE
  • 4 SUN
  • 5 MARS

Model STARS SUN EARTH
SPACE SKY
First associate EARTH is in the set of 5
associates (from the model)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
109
P( set contains first associate )
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
110
Explaining variability in false recall
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • One factor mean associative strength of list
    items to critical item (Deese 1959 Roediger et
    al. 2001).

BED REST AWAKE TIRED DROWSY
SLEEP
Mean .431
For 55 DRM lists, R .69 (with the given
lexicon)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
111
One recall component inference
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
  • Encoding study words lead to topics distribution
    (gist)
  • Retrieval infer words from stored topics
    distribution

Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
112
Predictions for the Sleep list
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
113
Correlation between intrusion rates and
predictions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
114
Other recall components??? One possibility two
routes add strength
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
Write a Comment
User Comments (0)
About PowerShow.com