Title: Minicourse on Artificial Neural Networks and Bayesian Networks
1Mini-course on Artificial Neural Networks and
Bayesian Networks
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
2Section 1 Introduction
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
3Networks (1)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Networks serve as a visual way for displaying
relationships - Social networks are examples of flat networks
where the only information is relation between
entities
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
4Example collaboration network
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
1. Analyzing Cortical Activity using Hidden
Markov Models Itay Gat, Naftali Tishby, and Moshe
Abeles "Network, Computation in Neural Systems",
August 1997. 2. Cortical Activity Flips Among
Quasi Stationary States Moshe Abeles, Hagai
Bergman, Itay Gat, Isaac Meilijson, Eyal
Seidemann, Naftali Tishby, Eilon Vaadia Prepared
Feb 1, 1995, Appeared in the Proceedings of the
National Academy of Science (PNAS) 3. Rigorous
Learning Curve Bounds from Statistical
Mechanics David Haussler, Michael Kearns, H.
Sebastian Seung, and Naftali Tishby Prepared
July 1994. Full version, Machine Learning (1997).
4. H. S. Seung, Haim Sompolinsky, Naftali
Tishby Learning Curves in Large Neural Networks.
COLT 1991 112-127 5. Yann LeCun, Ido Kanter,
Sara A. Solla Second Order Properties of Error
Surfaces. NIPS 1990 918-924 6. Esther Levin,
Naftali Tishby, Sara A. Solla A Statistical
Approach to Learning and Generalization in
Layered Neural Networks. COLT 1989 245-260 7.
Litvak V, Sompolinsky H, Segev I, and Abeles M
(2003) On the Transmission of Rate Code in Long
Feedforward Networks with Excitatory-Inhibitory
Balance. Journal of Neuroscience,
23(7)3006-30158. Senn, W., Segev, I., and
Tsodyks, M. (1998). Reading neural synchrony with
depressing synapses. Neural Computation 10
815-819 8. Tsodkys, M., I.Mit'kov, H.Sompolinsky
(1993) Pattern of synchrony in inhomogeneous
networks of oscillators with pulse interactions.
Phys. Rev. Lett., 9. Memory Capacity of Balanced
Networks (Yuval Aviel, David Horn and Moshe
Abeles) 10. The Role of Inhibition in an
Associative Memory Model of the Olfactory Bulb.
(Ofer Hendin, David Horn and Misha Tsodyks) 11
Information Bottleneck for Gaussian Variables
Gal Chechik, Amir Globerson, Naftali Tishby and
Yair Weiss Prepared June 2003. Submitted to
NIPS-2003
matlab
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
5Networks (2)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Artificial Neural Networks represent rules
deterministic relations - between input and
output
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
6Networks (3)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Bayesian Networks represent probabilistic
relations - conditional independencies and
dependencies between variables
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
7Outline
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Introduction/Motivation
- Artificial Neural Networks
- The Perceptron, multilayered FF NN and recurrent
NN - On-line (supervised) learning
- Unsupervised learning and PCA
- Classification
- Capacity of networks
- Bayesian networks (BN)
- Bayes rules and the BN semantics
- Classification using Generative models
- Applications Vision, Text
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
8Motivation
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The research of ANNs is inspired by neurons in
the brain and (partially) driven by the need for
models of the reasoning in the brain. - Scientists are challenged to use machines more
effectively for tasks traditionally solved by
humans (example - driving a car, inferring
scientific referees to papers and many others)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
9Questions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- How can a network learn?
- What will be the learning rate?
- What are the limitations on the network capacity?
- How networks can be used to classify results with
no labels (unsupervised learning)? - What are the relations and differences between
learning in ANN and learning in BN? - How can network models explain high-level
reasoning?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
10History of (modern) ANNs and BNs
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
11Section 2 On-line Learning
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Based on slides from Michael Biehls summer course
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
12Section 2.1 The Perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
13The Perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Input ?
- Adaptive Weights J
- Output S
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
14Perceptron binary output
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Implements a linearly separable classification
of inputs - Milestones
- Perceptron convergence theorem, Rosenblatt (1958)
- Capacity, winder (1963) Cover(1965)
- Statistical Physics of perceptron weights,
Gardner (1988) - How does this device learn?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
15Learning a linearly separable rule from reliable
examples
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Unknown rule ST(?)sign(B??) 1
- Defines the correct classification.
- Parameterized through a teacher perceptron with
weights B?RN, (B?B1) - Only available information example data
- D ?? , S?T(?)sign(B???) for ?1P
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
16Learning a linearly (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Training finding the student weights J
- J parameterizes a hypothesis SS(?)sign(J??)
- Supervised learning is based on the student
performance with respect to the training data D - Binary error measure
- ??T(J) ?S?S(?),S?T(?)
- ??T(J)1 if S?S(?)?S?T(?) ??T(W)0 if
S?S(?)S?T(?)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
17Off-line learning
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Guided by the minimization of a cost function
H(J), e.g., the training error - H(J)? ?tT(J)
- Equilibrium statistical mechanics treatment
- Energy H of N degrees of freedm
- Ensemble of systems is in thermal equilibrium at
formal temperature - Disorder avg. over random examples (replicas)
assumes distribution over the inputs - Macroscopic description, order parameters
- Typical properties of large sustems, P ?N
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
18On-line training
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Single presentation of uncorrelated (new)
??,S?T(?) - Update of student weights
- Learning dynamics in discrete time
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
19On-line training - Statistical Physics approach
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Consider sequence of independent, random
- Thermodynamic limit
- Disorder average over latest example
self-averaging properties - Continuous time limit
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
20Generalization
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Performance of the student (after training) with
respect to arbitrary, new input - In practice empirical mean of mean error measure
over a set of test inputs - In the theoretical analysis average over the
(assumed) probability density of inputs - Generalization error
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
21Generalization (cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The simplest model distribution
- Isotropic density P(?), ? uncorrelated with B and
J - Consider vectors of independent identically
distributed (iid) components ?j with
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
22Geometric argument
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Projection of data into (B, J)-plane yields
isotropic density of inputs
?g?/?
ST(?)SS(?)
For B1
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
23Overlap Parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Sufficient to quantify the success of learning
- RB?J QJ?J
- Random guessing R0, ?g1/2
- Perfect generalization ,
?g0
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
24Derivation for large N
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Given B, J, and uncorrelated random input ??i?0,
??i ?j ??ij, consider student/teacher fields
that are sums of (many) independent random
quantities - xJ???iJi?I
- yB???iBi?i
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
25Central Limit Theorem
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Joint density of (x,y) is for N?8, a two
dimensional Gaussian, fully specified by the
first and the second moments - ?x??iJi??i?0 ?y??iBi??i?0
- ?x2? ?ijJiJj??i?j? ?iJi2 Q
- ?y2? ?ijBiBj??i?j? ?iBi2 1
- ?xy? ?ijJiBj??i?j? ?iJiBi R
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
26Central Limit Theorem (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Details of the input are irrelevant.
- Some possible examples binary, ?i?1, with equal
prob. Uniform, Gaussian.
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
27Generalization Error
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The isotropic distribution is also assumed to
describe the statistics of the example data inputs
Exercise Derive the generalization error as a
function of R,Q use Mathematical notes
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
28Assumptions about the data
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- No spatial correlatins
- No distinguished directions in the input space
- No temporal correlations
- No correlations with the rule
- Single presentation without repeatitions
- Consequences
- Average over data can be performed step by step
- Actual choice of B is irrelevant, it is not
necessary to averaged over the teacher
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
29Hebbian learning (revisited) Hebb 1949
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Off-line interpretation Vallet 1989
- Choice of student weights given D??,S?T?1P
- J(P) ???S?T/N
- Equivalent On-line interpretation
- Dynamics upon single presentation of examples
- J(?) J(?-1) ??S?T/N
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
30Hebb on-line
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- From microscopic to macroscopic recursions for
overlaps
Exercise Derive the update equations of R,Q
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
31Hebb on-line (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Average over the latest example ???
- The random input,?? enters only through the
fields - The random input ?? and J(?-1), B are
statistically independent - The Central Limit Theorems applies and obtains
the joint density
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
32Hebb on-line (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Exercise Derive the update equations of R,Q as a
function of ? use Mathematical notes off-line
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
33Hebb on-line (Cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Continuous time limit, N?8, ? ?/N, d?1/N
Initial conditions - tabula rasa
R(0)Q(0)0 What are the mean values after
training with ?N examples??? See matlab code
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
34Hebb on-line mean values
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The order parameters, Q and R, are self averaging
for infinite N - Self average properties of A(J)
- The observation of a value of A different from
its mean occurs with vanishing probability
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
35Learning Curve ? dependent of the order
parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Exercise Solve the differential equations for R
and Q
Exercise Find the function ?(?)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
36Learning Curve ? dependent of the order
parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
The normalized overlap between the two vectors,
B, J provides the angle between the vectors two
vectors
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
37Learning Curve ? dependent of the order
parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Exercise Find asymptotic behavior of ?(?)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
38Asymptotic expansion draw w. matlab
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
39Questions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- What are other learning algorithms that can be
used for efficient learning? - What training algorithm will provide the best
learning/ the fastest asymptotic decrease?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
40Modified Hebbian learning
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The training algorithm is defined by a modulation
function f - J(?) J(?-1) f() ??S?T/N
- Restriction f may depend on available
quantities f(J(?-1),??,S?T)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
41Perceptron Rosenblatt 1959
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- If classification is correct dont change the
weights. - If classification is incorrect
- if the right class for the ? example is 1 J(?).??
increases. - if right class for the ? example is -1 J(?).??
decreases
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
42Perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Only informative points are used (mistake driven)
- The solution is a linear combination of the
training points - Converges only for linearly separable data
Exercise Derive the update equations of ?,Q as a
function of ?, J,B and ?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
43On-line dynamics Biehl and Riegler 1994
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
44Questions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Find the asymptotic behavior (by simulations
and/or analytically) of the generalization error
for the perceptron algorithm and Hebb algorithm,
which one is better? - What training algorithm will provide the best
learning/ the fastest asymptotic decrease?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
45Learning Curve - Hebb and Perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
46Section 2.2 On-line by gradient descent
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
47Introduction
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Commonly used in practical applications
- Multilayered neural network with continuous
activation functions, where output is a
differentiable function of the adaptive
parameters - Can be used for fitting a function to a data
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
48Linear perceptron and linear regression (1D)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- xJ?
- Using a quadratic loss function and gradient
descent for finding the best curve to fit a data
set see ? , off-line
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
49Simple case Linear perceptron
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Teacher ST(?)yB??
- Student SS(?)xJ??
- Training and performance evaluation are based on
the quadratic error
Consider the training dynamics
Exercise Derive the update equations of R,Q as a
function of ?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
50Linear perceptron (cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Some exercises
- Write a matlab code for the linear perceptron,
teacher-student scenario. - Show that
- Investigate the role of the learning rate ?
- Find the asymptotic decrease to zero errors
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
51Adatron binary output J(?) J(?-1) f() ??S?T/N
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Some exercises
- Write a matlab code for the linear perceptron,
teacher-student scenario. - Find the asymptotic decrease to zero errors
- Compare with the performance of the Perceptron
and Hebb rule
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
52Multilayered feed-forward NN
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Example architecture the soft-committee machine
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
53Multilayered ff NN (cont.)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Transfer function sigmoidal g(x)
- e.g., g(x)tanh(x) or g(x)
- Error function is defined
- The total output
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
54Teacher-Student scenario
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- If teacher and student have the same architecture
but student has K hidden units and teacher as M
hidden units, - Can the student learn the rules?
K
K M
K M
Unlearnable rule
Learnable rule
Overlearnable rule
In the following we will discuss matching
architectures
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
55The error measure
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- One (obvious) choice for continuous outputs
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
56On-line gradient descent
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Assuming the same learning rate, ?, over all the
network, the update equations are for fixed known
v
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
57Assumptions and definitions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Isotropic uncorrelated input data
- The number of input components N is huge
The rule is specified by the norms of the
teacher, say all 1s
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
58Order parameters role
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The set of order parameters and weights is
sufficient for describing the learning, this is
the macroscopic set of parameters
Microscopic KNK degrees of freedom Macroscopic
K(K-1)/2KMK different order parameters
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
59Generalization error erf function Saad Solla
1995
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Reflect symmetries of the soft committee machine
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
60Permutation Symmetry
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The generalization error is characterized by
invariance under permutations of branches - How do you think this feature affects learning
performance?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
61A simple case
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Hidden to output weights are fixed and known
wivi1 - The update rule is
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
62Update of the order parameters
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
63Differential Equations
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
64Learning curves
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
65Section 3 Unsupervised learning
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Based on slides from Michael Biehls summer course
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
66Introduction
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Learning without a teacher!?
- Real world data is, in general, not isotropic and
structure less in input space. - Unsupervised learning extraction of information
from unlabelled inputs
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
67Potential aims
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Correlation analysis
- Clustering of data grouping according to some
similarity criterion - Identification of prototypes represent large
amount of data by few examples - Dimension reduction represent high dimensional
data by few relevant features
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
68A simple example
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Prototypes for high dimensional data directions
in the space - Assume data points are distributed as
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
69A simple example (cont)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The student task is to find the directions B1 and
B2 - The data looks different in different planes!
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
70Student scenario
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Search for the two vectors, using a two student
vectors - Define set of possible learning rules
- Analyze learning abilities
- Compare and choose the best learning
- It would provide the two principle components of
the data
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
71PCA General setting matlab
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Given a set of data points X 1. Compute the
covariance matrix
2. Compute the eigenvalues and eigenvectors of
the covariance matrix
3. Arrange the egienvalues from the biggest to
the smallest. Take the first d eigenvectors
as principle components if the input
dimensionality is to be reduced to d.
4. Project the input data onto the principle
components, which forms the representation of
input data.
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
72Principle Component Analysis
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Algebraic view point Given data find a linear
transformation such that the sum of squared
distances is minimized over all linear
transformations - Statistical view point Given data assume that
each point is a random variable sampled from a
Gaussian with unit covariance and mean. Find the
ML estimator of the means under the constraint
that there are K different means that are
linearly related to the data.
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
73Example vision
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
74Example vision (cont)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Average results for each of the 6400 pixels
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
75First nine eigen faces
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
76Dimensionality Reduction
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The goal is to compress information with minimal
loss - Methods
- Unsupervised learning
- Principle Component Analysis
- Nonnegative Matrix Factorization
- Bayesian Models (Matrices are probabilities)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
77Section 4 Bayesian Networks
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Some slides are from Baldis course on Neural
Networks
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
78Bayesian Statistics
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Bayesian framework for induction we start with
hypothesis space and wish to express relative
preferences in terms of background information
(the Cox-Jaynes axioms). - Axiom 0 Transitivity of preferences.
- Theorem 1 Preferences can be represented by a
real number p (A). - Axiom 1 There exists a function f such that
- p(non A) f(p(A))
- Axiom 2 There exists a function F such that
- p (A,B) F(p(A), p(BA))
- Theorem2 There is always a rescaling w such that
p(A)w(p(A)) is in 0,1, and satisfies the sum
and product rules.
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
79Probability as Degree of Belief
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Sum Rule
- P(non A) 1- P(A)
- Product Rule
- P(A and B) P(A) P(BA)
- BayesTheorem
- P(BA)P(AB)P(B)/P(A)
- Induction Form
- P(MD) P(DM)P(M)/P(D)
- Equivalently
- logP(MD) logP(DM)logP(M)-logP(D)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
80The Asia problem
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Shortness-of-breath (dyspnoea) may be due to
Tuberculosis, Lung cancer or bronchitis, or none
of them. A recent visit to Asia increases the
chances of tuberculosis, while Smoking is known
to be a risk factor for both lung cancer and
Bronchitis. The results of a single chest X-ray
do not discriminate between lung cancer and
tuberculosis, as neither does the presence or
absence of Dyspnoea.
Lauritzen Spiegelhalter 1988
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
81Graphical models
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Successful marriage between Probabilistic Theory
and Graph Theory - M. I. Jordan
P(x1,x2,x3) ? P(x1,x3) P(x2,x3)
P(x1,x2,x3) ? Y(x1,x3) Y(x2,x3)
Applications Vision, Speech Recognition, Error
correcting codes, Bioinformatics
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
82Directed acyclic Graphs
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Involves conditional dependencies
P(x1,x2,x3) P(x1)P(x2)P(x3x1,x2)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
83Directed Graphical Models (2)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Each node is associated with a random variable
- Each arrow is associated with conditional
dependencies (Parentschild) - Shaded nodes illustrates an observed variable
- Plates stand for repetitions of i.i.d. drawings
of the random variables
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
84Classification problem
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- This is problem is unsupervised where one is
searching for best labels that fit the data, and
does not have any examples that contain labels - Perceptron and Support vector machines are widely
used for classifications. These are
discriminative methods
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
85Classification assigning labels to data
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
86Density estimator
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The simplest model for density estimation is the
Naïve Bayes classifier
Assumes that each of the data points is
distributed independently Results in a trivial
learning algorithm Usually does not suffer from
overfitting
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
87Directed graph real world example
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Statistical modeling of data mining Huge corpus,
authors and words are observed, topics and
relations are learned.
The author topic model
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
88Goal
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Automatically extract topical content of
documents and learn association of topics to
authors of documents - Expand existing probabilistic topic models to
include author information - Some queries that model should be able to answer
- What topics does author X work on?
- Which authors work on topic X?
- What are interesting temporal patterns in topics?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
89Previous topic-based models
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Hoffman (1999) Probabilistic Latent Semantic
Indexing (pLSI) - EM implementation
- Problem of overfitting
- Latent Dirichlet Allocation (LDA) Blei, Ng,
Jordan (2003) Griffiths Steyvers, (PNAS 2004)
- Clarified the pLSI model
- Variational EM, Scalability?
- Gibbs sampling technique for inference
- Computationally simple, Efficient (linear with
size of data), Can easily be applied to 100K
documents
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
90 ????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
91Classification
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
92Topics Model for Semantic Representation
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Based on a Professor Mark Steyvers slides, a
joint work of Mark Steyvers (UCI) and Tom
Griffiths (Stanford)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
93The DRM Paradigm
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- The Deese (1959), Roediger, and McDermott (1995)
Paradigm - Subjects hear a series of word lists during the
study phase, each comprising semantically related
items strongly related to another non-presented
word (false target). - Subjects (later) receive recognition tests for
all words plus other distracted words including
the false target. - DRM experiments routinely demonstrate that
subjects claim to recognize false tagets.
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
94Example test of false memory effects in the DRM
Paradaigm
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- STUDY Bed, Rest, Awake, Tired, Dream, Wake,
Snooze, Blanket, Doze, Slumber, Snore, Nap,
Peace, Yawn, Drowsy - FALSE RECALL Sleep 61
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
95A Rational Analysis of Semantic Memory
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Our associative/semantic memory system might
arise from the need to efficiently predict word
usage with just a few basis functions (i.e.,
concepts or topics) - The topics model provides such a rational analysis
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
96A Spatial Representation Latent Semantic
Analysis (Landauer Dumais, 1997)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
97Triangle Inequality constraint on words with
multiple meanings
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Euclidian distance AC ? AB BC
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
98A generative model for topics
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Each document (i.e. context)
- is a mixture of topics.
- Each topic is a distribution
- over words.
- Each word is chosen
- from a single topic.
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
99A toy example
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
wi
TOPIC MIXTURE
P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS
0.1 MYSTERY 0.1 JOY 0.1
P( w z ) SCIENTIFIC 0.4
KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS
0.1 MYSTERY 0.1
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
100All probability to topic 1
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Document HEART, LOVE, JOY, SOUL, HEART, .
wi
One TOPIC
P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS
0.1 MYSTERY 0.1 JOY 0.1
P( w z ) SCIENTIFIC 0.4
KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS
0.1 MYSTERY 0.1
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
101All probability to topic 2
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Document SCIENTIFIC, KNOWLEDGE, SCIENTIFIC,
RESEARCH, .
wi
P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS
0.1 MYSTERY 0.1 JOY 0.1
P( w z ) SCIENTIFIC 0.4
KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS
0.1 MYSTERY 0.1
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
102Application to corpus data
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- TASA corpus text from first grade to college
- representative sample of text
- 26,000 word types (stop words removed)
- 37,000 documents
- 6,000,000 word tokens
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
103Fitting the model
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Learning is unsupervised
- Learning means inverting the generative model
- We estimate P( z w ) assign each word in the
corpus to one of T topics - With T500 topics and 6x106 words, the size of
the discrete state space is (500)6,000,000
HELP! - Efficient sampling approach ? Markov Chain Monte
Carlo (MCMC) - Time Memory requirements linear with T and N
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
104Gibbs Sampling MCMCsee Griffiths Steyvers,
2003 for details
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Assign every word in corpus to one of T topics
- Sampling distribution for z
number of times word w assigned to topic j
number of times topic j used in document d
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
105A selection from 500 topics P(wz j)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- THEORY
- SCIENTISTS
- EXPERIMENT
- OBSERVATIONS
- SCIENTIFIC
- EXPERIMENTS
- HYPOTHESIS
- EXPLAIN
- SCIENTIST
- OBSERVED
- EXPLANATION
- BASED
- OBSERVATION
- IDEA
- EVIDENCE
- THEORIES
- BELIEVED
- DISCOVERED
SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAU
TS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES A
TMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT
BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY S
MELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPI
NAL FIBERS SENSORY
ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM W
ORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE P
AINTER ARTS BEAUTIFUL DESIGNS
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
106Polysemy words with multiple meanings
represented in different topics
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
107Predicting word association
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- LSA finds the closest word
- Topics Model do inference given that one word
was observed what will be the next word with the
highest probability?
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
108Word Association (norms from Nelson et al. 1998)
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
CUE PLANET
- Associate N. People
- 1 EARTH
- 2 STARS
- 3 SPACE
- 4 SUN
- 5 MARS
Model STARS SUN EARTH
SPACE SKY
First associate EARTH is in the set of 5
associates (from the model)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
109P( set contains first associate )
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
110Explaining variability in false recall
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- One factor mean associative strength of list
items to critical item (Deese 1959 Roediger et
al. 2001).
BED REST AWAKE TIRED DROWSY
SLEEP
Mean .431
For 55 DRM lists, R .69 (with the given
lexicon)
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
111One recall component inference
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
- Encoding study words lead to topics distribution
(gist) - Retrieval infer words from stored topics
distribution
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
112Predictions for the Sleep list
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
113Correlation between intrusion rates and
predictions
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004
114Other recall components??? One possibility two
routes add strength
????? ??? ???? ?????????? ???? ???? ?????? ?????
??? ????? ????? ????
Mini-course on ANN and BN, The Multidisciplinary
Brain Research center, Bar-Ilan University, May
2004