1 / 114

Mini-course on Artificial Neural Networks and

Bayesian Networks

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Michal Rosen-Zvi

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Section 1 Introduction

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Networks (1)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Networks serve as a visual way for displaying

relationships - Social networks are examples of flat networks

where the only information is relation between

entities

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Example collaboration network

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

1. Analyzing Cortical Activity using Hidden

Markov Models Itay Gat, Naftali Tishby, and Moshe

Abeles "Network, Computation in Neural Systems",

August 1997. 2. Cortical Activity Flips Among

Quasi Stationary States Moshe Abeles, Hagai

Bergman, Itay Gat, Isaac Meilijson, Eyal

Seidemann, Naftali Tishby, Eilon Vaadia Prepared

Feb 1, 1995, Appeared in the Proceedings of the

National Academy of Science (PNAS) 3. Rigorous

Learning Curve Bounds from Statistical

Mechanics David Haussler, Michael Kearns, H.

Sebastian Seung, and Naftali Tishby Prepared

July 1994. Full version, Machine Learning (1997).

4. H. S. Seung, Haim Sompolinsky, Naftali

Tishby Learning Curves in Large Neural Networks.

COLT 1991 112-127 5. Yann LeCun, Ido Kanter,

Sara A. Solla Second Order Properties of Error

Surfaces. NIPS 1990 918-924 6. Esther Levin,

Naftali Tishby, Sara A. Solla A Statistical

Approach to Learning and Generalization in

Layered Neural Networks. COLT 1989 245-260 7.

Litvak V, Sompolinsky H, Segev I, and Abeles M

(2003) On the Transmission of Rate Code in Long

Feedforward Networks with Excitatory-Inhibitory

Balance. Journal of Neuroscience,

23(7)3006-30158. Senn, W., Segev, I., and

Tsodyks, M. (1998). Reading neural synchrony with

depressing synapses. Neural Computation 10

815-819 8. Tsodkys, M., I.Mit'kov, H.Sompolinsky

(1993) Pattern of synchrony in inhomogeneous

networks of oscillators with pulse interactions.

Phys. Rev. Lett., 9. Memory Capacity of Balanced

Networks (Yuval Aviel, David Horn and Moshe

Abeles) 10. The Role of Inhibition in an

Associative Memory Model of the Olfactory Bulb.

(Ofer Hendin, David Horn and Misha Tsodyks) 11

Information Bottleneck for Gaussian Variables

Gal Chechik, Amir Globerson, Naftali Tishby and

Yair Weiss Prepared June 2003. Submitted to

NIPS-2003

matlab

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Networks (2)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Artificial Neural Networks represent rules

deterministic relations - between input and

output

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Networks (3)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Bayesian Networks represent probabilistic

relations - conditional independencies and

dependencies between variables

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Outline

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Introduction/Motivation
- Artificial Neural Networks
- The Perceptron, multilayered FF NN and recurrent

NN - On-line (supervised) learning
- Unsupervised learning and PCA
- Classification
- Capacity of networks
- Bayesian networks (BN)
- Bayes rules and the BN semantics
- Classification using Generative models
- Applications Vision, Text

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Motivation

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The research of ANNs is inspired by neurons in

the brain and (partially) driven by the need for

models of the reasoning in the brain. - Scientists are challenged to use machines more

effectively for tasks traditionally solved by

humans (example - driving a car, inferring

scientific referees to papers and many others)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Questions

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- How can a network learn?
- What will be the learning rate?
- What are the limitations on the network capacity?
- How networks can be used to classify results with

no labels (unsupervised learning)? - What are the relations and differences between

learning in ANN and learning in BN? - How can network models explain high-level

reasoning?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

History of (modern) ANNs and BNs

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Section 2 On-line Learning

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Based on slides from Michael Biehls summer course

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Section 2.1 The Perceptron

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

The Perceptron

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Input ?
- Adaptive Weights J
- Output S

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Perceptron binary output

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Implements a linearly separable classification

of inputs - Milestones
- Perceptron convergence theorem, Rosenblatt (1958)
- Capacity, winder (1963) Cover(1965)
- Statistical Physics of perceptron weights,

Gardner (1988) - How does this device learn?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Learning a linearly separable rule from reliable

examples

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Unknown rule ST(?)sign(B??) 1
- Defines the correct classification.
- Parameterized through a teacher perceptron with

weights B?RN, (B?B1) - Only available information example data
- D ?? , S?T(?)sign(B???) for ?1P

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Learning a linearly (Cont.)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Training finding the student weights J
- J parameterizes a hypothesis SS(?)sign(J??)
- Supervised learning is based on the student

performance with respect to the training data D - Binary error measure
- ??T(J) ?S?S(?),S?T(?)
- ??T(J)1 if S?S(?)?S?T(?) ??T(W)0 if

S?S(?)S?T(?)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Off-line learning

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Guided by the minimization of a cost function

H(J), e.g., the training error - H(J)? ?tT(J)
- Equilibrium statistical mechanics treatment
- Energy H of N degrees of freedm
- Ensemble of systems is in thermal equilibrium at

formal temperature - Disorder avg. over random examples (replicas)

assumes distribution over the inputs - Macroscopic description, order parameters
- Typical properties of large sustems, P ?N

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

On-line training

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Single presentation of uncorrelated (new)

??,S?T(?) - Update of student weights
- Learning dynamics in discrete time

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

On-line training - Statistical Physics approach

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Consider sequence of independent, random
- Thermodynamic limit
- Disorder average over latest example

self-averaging properties - Continuous time limit

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Generalization

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Performance of the student (after training) with

respect to arbitrary, new input - In practice empirical mean of mean error measure

over a set of test inputs - In the theoretical analysis average over the

(assumed) probability density of inputs - Generalization error

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Generalization (cont.)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The simplest model distribution
- Isotropic density P(?), ? uncorrelated with B and

J - Consider vectors of independent identically

distributed (iid) components ?j with

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Geometric argument

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Projection of data into (B, J)-plane yields

isotropic density of inputs

?g?/?

ST(?)SS(?)

For B1

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Overlap Parameters

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Sufficient to quantify the success of learning
- RB?J QJ?J
- Random guessing R0, ?g1/2
- Perfect generalization ,

?g0

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Derivation for large N

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Given B, J, and uncorrelated random input ??i?0,

??i ?j ??ij, consider student/teacher fields

that are sums of (many) independent random

quantities - xJ???iJi?I
- yB???iBi?i

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Central Limit Theorem

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Joint density of (x,y) is for N?8, a two

dimensional Gaussian, fully specified by the

first and the second moments - ?x??iJi??i?0 ?y??iBi??i?0
- ?x2? ?ijJiJj??i?j? ?iJi2 Q
- ?y2? ?ijBiBj??i?j? ?iBi2 1
- ?xy? ?ijJiBj??i?j? ?iJiBi R

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Central Limit Theorem (Cont.)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Details of the input are irrelevant.
- Some possible examples binary, ?i?1, with equal

prob. Uniform, Gaussian.

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Generalization Error

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The isotropic distribution is also assumed to

describe the statistics of the example data inputs

Exercise Derive the generalization error as a

function of R,Q use Mathematical notes

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Assumptions about the data

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- No spatial correlatins
- No distinguished directions in the input space
- No temporal correlations
- No correlations with the rule
- Single presentation without repeatitions
- Consequences
- Average over data can be performed step by step
- Actual choice of B is irrelevant, it is not

necessary to averaged over the teacher

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Hebbian learning (revisited) Hebb 1949

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Off-line interpretation Vallet 1989
- Choice of student weights given D??,S?T?1P
- J(P) ???S?T/N
- Equivalent On-line interpretation
- Dynamics upon single presentation of examples
- J(?) J(?-1) ??S?T/N

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Hebb on-line

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- From microscopic to macroscopic recursions for

overlaps

Exercise Derive the update equations of R,Q

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Hebb on-line (Cont.)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Average over the latest example ???
- The random input,?? enters only through the

fields - The random input ?? and J(?-1), B are

statistically independent - The Central Limit Theorems applies and obtains

the joint density

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Hebb on-line (Cont.)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Exercise Derive the update equations of R,Q as a

function of ? use Mathematical notes off-line

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Hebb on-line (Cont.)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Continuous time limit, N?8, ? ?/N, d?1/N

Initial conditions - tabula rasa

R(0)Q(0)0 What are the mean values after

training with ?N examples??? See matlab code

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Hebb on-line mean values

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The order parameters, Q and R, are self averaging

for infinite N - Self average properties of A(J)
- The observation of a value of A different from

its mean occurs with vanishing probability

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Learning Curve ? dependent of the order

parameters

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Exercise Solve the differential equations for R

and Q

Exercise Find the function ?(?)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Learning Curve ? dependent of the order

parameters

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

The normalized overlap between the two vectors,

B, J provides the angle between the vectors two

vectors

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Learning Curve ? dependent of the order

parameters

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Exercise Find asymptotic behavior of ?(?)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Asymptotic expansion draw w. matlab

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Questions

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- What are other learning algorithms that can be

used for efficient learning? - What training algorithm will provide the best

learning/ the fastest asymptotic decrease?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Modified Hebbian learning

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The training algorithm is defined by a modulation

function f - J(?) J(?-1) f() ??S?T/N
- Restriction f may depend on available

quantities f(J(?-1),??,S?T)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Perceptron Rosenblatt 1959

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- If classification is correct dont change the

weights. - If classification is incorrect
- if the right class for the ? example is 1 J(?).??

increases. - if right class for the ? example is -1 J(?).??

decreases

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Perceptron

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Only informative points are used (mistake driven)
- The solution is a linear combination of the

training points - Converges only for linearly separable data

Exercise Derive the update equations of ?,Q as a

function of ?, J,B and ?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

On-line dynamics Biehl and Riegler 1994

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Questions

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Find the asymptotic behavior (by simulations

and/or analytically) of the generalization error

for the perceptron algorithm and Hebb algorithm,

which one is better? - What training algorithm will provide the best

learning/ the fastest asymptotic decrease?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Learning Curve - Hebb and Perceptron

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Section 2.2 On-line by gradient descent

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Introduction

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Commonly used in practical applications
- Multilayered neural network with continuous

activation functions, where output is a

differentiable function of the adaptive

parameters - Can be used for fitting a function to a data

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Linear perceptron and linear regression (1D)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- xJ?
- Using a quadratic loss function and gradient

descent for finding the best curve to fit a data

set see ? , off-line

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Simple case Linear perceptron

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Teacher ST(?)yB??
- Student SS(?)xJ??
- Training and performance evaluation are based on

the quadratic error

Consider the training dynamics

Exercise Derive the update equations of R,Q as a

function of ?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Linear perceptron (cont.)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Some exercises
- Write a matlab code for the linear perceptron,

teacher-student scenario. - Show that
- Investigate the role of the learning rate ?
- Find the asymptotic decrease to zero errors

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Adatron binary output J(?) J(?-1) f() ??S?T/N

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Some exercises
- Write a matlab code for the linear perceptron,

teacher-student scenario. - Find the asymptotic decrease to zero errors
- Compare with the performance of the Perceptron

and Hebb rule

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Multilayered feed-forward NN

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Example architecture the soft-committee machine

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Multilayered ff NN (cont.)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Transfer function sigmoidal g(x)
- e.g., g(x)tanh(x) or g(x)
- Error function is defined
- The total output

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Teacher-Student scenario

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- If teacher and student have the same architecture

but student has K hidden units and teacher as M

hidden units, - Can the student learn the rules?

K

K M

K M

Unlearnable rule

Learnable rule

Overlearnable rule

In the following we will discuss matching

architectures

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

The error measure

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- One (obvious) choice for continuous outputs

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

On-line gradient descent

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Assuming the same learning rate, ?, over all the

network, the update equations are for fixed known

v

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Assumptions and definitions

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Isotropic uncorrelated input data
- The number of input components N is huge

The rule is specified by the norms of the

teacher, say all 1s

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Order parameters role

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The set of order parameters and weights is

sufficient for describing the learning, this is

the macroscopic set of parameters

Microscopic KNK degrees of freedom Macroscopic

K(K-1)/2KMK different order parameters

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Generalization error erf function Saad Solla

1995

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Reflect symmetries of the soft committee machine

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Permutation Symmetry

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The generalization error is characterized by

invariance under permutations of branches - How do you think this feature affects learning

performance?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

A simple case

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Hidden to output weights are fixed and known

wivi1 - The update rule is

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Update of the order parameters

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Differential Equations

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Learning curves

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Section 3 Unsupervised learning

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Based on slides from Michael Biehls summer course

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Introduction

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Learning without a teacher!?
- Real world data is, in general, not isotropic and

structure less in input space. - Unsupervised learning extraction of information

from unlabelled inputs

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Potential aims

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Correlation analysis
- Clustering of data grouping according to some

similarity criterion - Identification of prototypes represent large

amount of data by few examples - Dimension reduction represent high dimensional

data by few relevant features

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

A simple example

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Prototypes for high dimensional data directions

in the space - Assume data points are distributed as

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

A simple example (cont)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The student task is to find the directions B1 and

B2 - The data looks different in different planes!

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Student scenario

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Search for the two vectors, using a two student

vectors - Define set of possible learning rules
- Analyze learning abilities
- Compare and choose the best learning
- It would provide the two principle components of

the data

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

PCA General setting matlab

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Given a set of data points X 1. Compute the

covariance matrix

2. Compute the eigenvalues and eigenvectors of

the covariance matrix

3. Arrange the egienvalues from the biggest to

the smallest. Take the first d eigenvectors

as principle components if the input

dimensionality is to be reduced to d.

4. Project the input data onto the principle

components, which forms the representation of

input data.

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Principle Component Analysis

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Algebraic view point Given data find a linear

transformation such that the sum of squared

distances is minimized over all linear

transformations - Statistical view point Given data assume that

each point is a random variable sampled from a

Gaussian with unit covariance and mean. Find the

ML estimator of the means under the constraint

that there are K different means that are

linearly related to the data.

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Example vision

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Example vision (cont)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Average results for each of the 6400 pixels

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

First nine eigen faces

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Dimensionality Reduction

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The goal is to compress information with minimal

loss - Methods
- Unsupervised learning
- Principle Component Analysis
- Nonnegative Matrix Factorization
- Bayesian Models (Matrices are probabilities)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Section 4 Bayesian Networks

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Some slides are from Baldis course on Neural

Networks

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Bayesian Statistics

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Bayesian framework for induction we start with

hypothesis space and wish to express relative

preferences in terms of background information

(the Cox-Jaynes axioms). - Axiom 0 Transitivity of preferences.
- Theorem 1 Preferences can be represented by a

real number p (A). - Axiom 1 There exists a function f such that
- p(non A) f(p(A))
- Axiom 2 There exists a function F such that
- p (A,B) F(p(A), p(BA))
- Theorem2 There is always a rescaling w such that

p(A)w(p(A)) is in 0,1, and satisfies the sum

and product rules.

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Probability as Degree of Belief

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Sum Rule
- P(non A) 1- P(A)
- Product Rule
- P(A and B) P(A) P(BA)
- BayesTheorem
- P(BA)P(AB)P(B)/P(A)
- Induction Form
- P(MD) P(DM)P(M)/P(D)
- Equivalently
- logP(MD) logP(DM)logP(M)-logP(D)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

The Asia problem

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Shortness-of-breath (dyspnoea) may be due to

Tuberculosis, Lung cancer or bronchitis, or none

of them. A recent visit to Asia increases the

chances of tuberculosis, while Smoking is known

to be a risk factor for both lung cancer and

Bronchitis. The results of a single chest X-ray

do not discriminate between lung cancer and

tuberculosis, as neither does the presence or

absence of Dyspnoea.

Lauritzen Spiegelhalter 1988

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Graphical models

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Successful marriage between Probabilistic Theory

and Graph Theory - M. I. Jordan

P(x1,x2,x3) ? P(x1,x3) P(x2,x3)

P(x1,x2,x3) ? Y(x1,x3) Y(x2,x3)

Applications Vision, Speech Recognition, Error

correcting codes, Bioinformatics

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Directed acyclic Graphs

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Involves conditional dependencies

P(x1,x2,x3) P(x1)P(x2)P(x3x1,x2)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Directed Graphical Models (2)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Each node is associated with a random variable
- Each arrow is associated with conditional

dependencies (Parentschild) - Shaded nodes illustrates an observed variable
- Plates stand for repetitions of i.i.d. drawings

of the random variables

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Classification problem

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- This is problem is unsupervised where one is

searching for best labels that fit the data, and

does not have any examples that contain labels - Perceptron and Support vector machines are widely

used for classifications. These are

discriminative methods

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Classification assigning labels to data

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Density estimator

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The simplest model for density estimation is the

NaÃ¯ve Bayes classifier

Assumes that each of the data points is

distributed independently Results in a trivial

learning algorithm Usually does not suffer from

overfitting

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Directed graph real world example

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Statistical modeling of data mining Huge corpus,

authors and words are observed, topics and

relations are learned.

The author topic model

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Goal

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Automatically extract topical content of

documents and learn association of topics to

authors of documents - Expand existing probabilistic topic models to

include author information - Some queries that model should be able to answer

- What topics does author X work on?
- Which authors work on topic X?
- What are interesting temporal patterns in topics?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Previous topic-based models

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Hoffman (1999) Probabilistic Latent Semantic

Indexing (pLSI) - EM implementation
- Problem of overfitting
- Latent Dirichlet Allocation (LDA) Blei, Ng,

Jordan (2003) Griffiths Steyvers, (PNAS 2004)

- Clarified the pLSI model
- Variational EM, Scalability?
- Gibbs sampling technique for inference
- Computationally simple, Efficient (linear with

size of data), Can easily be applied to 100K

documents

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Classification

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Topics Model for Semantic Representation

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Based on a Professor Mark Steyvers slides, a

joint work of Mark Steyvers (UCI) and Tom

Griffiths (Stanford)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

The DRM Paradigm

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- The Deese (1959), Roediger, and McDermott (1995)

Paradigm - Subjects hear a series of word lists during the

study phase, each comprising semantically related

items strongly related to another non-presented

word (false target). - Subjects (later) receive recognition tests for

all words plus other distracted words including

the false target. - DRM experiments routinely demonstrate that

subjects claim to recognize false tagets.

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Example test of false memory effects in the DRM

Paradaigm

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- STUDY Bed, Rest, Awake, Tired, Dream, Wake,

Snooze, Blanket, Doze, Slumber, Snore, Nap,

Peace, Yawn, Drowsy - FALSE RECALL Sleep 61

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

A Rational Analysis of Semantic Memory

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Our associative/semantic memory system might

arise from the need to efficiently predict word

usage with just a few basis functions (i.e.,

concepts or topics) - The topics model provides such a rational analysis

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

A Spatial Representation Latent Semantic

Analysis (Landauer Dumais, 1997)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Triangle Inequality constraint on words with

multiple meanings

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Euclidian distance AC ? AB BC

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

A generative model for topics

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Each document (i.e. context)
- is a mixture of topics.
- Each topic is a distribution
- over words.
- Each word is chosen
- from a single topic.

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

A toy example

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

wi

TOPIC MIXTURE

P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS

0.1 MYSTERY 0.1 JOY 0.1

P( w z ) SCIENTIFIC 0.4

KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS

0.1 MYSTERY 0.1

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

All probability to topic 1

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Document HEART, LOVE, JOY, SOUL, HEART, .

wi

One TOPIC

P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS

0.1 MYSTERY 0.1 JOY 0.1

P( w z ) SCIENTIFIC 0.4

KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS

0.1 MYSTERY 0.1

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

All probability to topic 2

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Document SCIENTIFIC, KNOWLEDGE, SCIENTIFIC,

RESEARCH, .

wi

P( w z ) HEART 0.3 LOVE 0.2 SOUL 0.2 TEARS

0.1 MYSTERY 0.1 JOY 0.1

P( w z ) SCIENTIFIC 0.4

KNOWLEDGE 0.2 WORK 0.1 RESEARCH 0.1 MATHEMATICS

0.1 MYSTERY 0.1

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Application to corpus data

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- TASA corpus text from first grade to college
- representative sample of text
- 26,000 word types (stop words removed)
- 37,000 documents
- 6,000,000 word tokens

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Fitting the model

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Learning is unsupervised
- Learning means inverting the generative model
- We estimate P( z w ) assign each word in the

corpus to one of T topics - With T500 topics and 6x106 words, the size of

the discrete state space is (500)6,000,000

HELP! - Efficient sampling approach ? Markov Chain Monte

Carlo (MCMC) - Time Memory requirements linear with T and N

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Gibbs Sampling MCMCsee Griffiths Steyvers,

2003 for details

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Assign every word in corpus to one of T topics
- Sampling distribution for z

number of times word w assigned to topic j

number of times topic j used in document d

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

A selection from 500 topics P(wz j)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- THEORY
- SCIENTISTS
- EXPERIMENT
- OBSERVATIONS
- SCIENTIFIC
- EXPERIMENTS
- HYPOTHESIS
- EXPLAIN
- SCIENTIST
- OBSERVED
- EXPLANATION
- BASED
- OBSERVATION
- IDEA
- EVIDENCE
- THEORIES
- BELIEVED
- DISCOVERED

SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAU

TS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES A

TMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT

BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY S

MELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPI

NAL FIBERS SENSORY

ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM W

ORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE P

AINTER ARTS BEAUTIFUL DESIGNS

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Polysemy words with multiple meanings

represented in different topics

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL

ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC

E MAGNETS BE MAGNETISM

SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK

RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI

OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN

TIST

BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL

D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI

S TEAMS GAMES SPORTS

JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU

NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F

IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Predicting word association

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- LSA finds the closest word
- Topics Model do inference given that one word

was observed what will be the next word with the

highest probability?

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Word Association (norms from Nelson et al. 1998)

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

CUE PLANET

- Associate N. People
- 1 EARTH
- 2 STARS
- 3 SPACE
- 4 SUN
- 5 MARS

Model STARS SUN EARTH

SPACE SKY

First associate EARTH is in the set of 5

associates (from the model)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

P( set contains first associate )

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Explaining variability in false recall

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- One factor mean associative strength of list

items to critical item (Deese 1959 Roediger et

al. 2001).

BED REST AWAKE TIRED DROWSY

SLEEP

Mean .431

For 55 DRM lists, R .69 (with the given

lexicon)

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

One recall component inference

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

- Encoding study words lead to topics distribution

(gist) - Retrieval infer words from stored topics

distribution

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Predictions for the Sleep list

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Correlation between intrusion rates and

predictions

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004

Other recall components??? One possibility two

routes add strength

????? ??? ???? ?????????? ???? ???? ?????? ?????

??? ????? ????? ????

Mini-course on ANN and BN, The Multidisciplinary

Brain Research center, Bar-Ilan University, May

2004