Loading...

PPT – Markov chain Monte Carlo with people PowerPoint presentation | free to download - id: 532657-MGFiZ

The Adobe Flash plugin is needed to view this content

Markov chain Monte Carlo with people

- Tom Griffiths
- Department of Psychology
- Cognitive Science Program
- UC Berkeley

with Mike Kalish, Stephan Lewandowsky, and Adam

Sanborn

Inductive problems

Computational cognitive science

- Identify the underlying computational problem
- Find the optimal solution to that problem
- Compare human cognition to that solution
- For inductive problems, solutions come from

statistics

Statistics and inductive problems

Cognitive science Categorization Causal

learning Function learning Language

Statistics Density estimation Graphical

models Regression Probabilistic grammars

Statistics and human cognition

- How can we use statistics to understand

cognition? - How can cognition inspire new statistical models?
- applications of Dirichlet process and Pitman-Yor

process models to natural language - exchangeable distributions on infinite binary

matrices via the Indian buffet process (priors on

causal structure) - nonparametric Bayesian models for relational data

Statistics and human cognition

- How can we use statistics to understand

cognition? - How can cognition inspire new statistical models?
- applications of Dirichlet process and Pitman-Yor

process models to natural language - exchangeable distributions on infinite binary

matrices via the Indian buffet process (priors on

causal structure) - nonparametric Bayesian models for relational data

Statistics and human cognition

- How can we use statistics to understand

cognition? - How can cognition inspire new statistical models?
- applications of Dirichlet process and Pitman-Yor

process models to natural language - exchangeable distributions on infinite binary

matrices via the Indian buffet process - nonparametric Bayesian models for relational data

Are people Bayesian?

Reverend Thomas Bayes

Bayes theorem

h hypothesis d data

People are stupid

Predicting the future

- How often is Google News updated?
- t time since last update
- ttotal time between updates
- What should we guess for ttotal given t?

The effects of priors

Evaluating human predictions

- Different domains with different priors
- a movie has made 60 million power-law
- your friend quotes from line 17 of a poem

power-law - you meet a 78 year old man Gaussian
- a movie has been running for 55 minutes

Gaussian - a U.S. congressman has served for 11 years

Erlang - Prior distributions derived from actual data
- Use 5 values of t for each
- People predict ttotal

people

empirical prior

parametric prior

Gotts rule

A different approach

- Instead of asking whether people are rational,

use assumption of rationality to investigate

cognition - If we can predict peoples responses, we can

design experiments that measure psychological

variables

Two deep questions

- What are the biases that guide human learning?
- prior probability distribution P(h)
- What do mental representations look like?
- category distribution P(xc)

Two deep questions

- What are the biases that guide human learning?
- prior probability distribution on hypotheses,

P(h) - What do mental representations look like?
- distribution over objects x in category c, P(xc)

Develop ways to sample from these distributions

Outline

- Markov chain Monte Carlo
- Sampling from the prior
- Sampling from category distributions

Outline

- Markov chain Monte Carlo
- Sampling from the prior
- Sampling from category distributions

Markov chains

x

x

x

x

x

x

x

x

Transition matrix T P(x(t1)x(t))

- Variables x(t1) independent of history given

x(t) - Converges to a stationary distribution under

easily checked conditions (i.e., if it is ergodic)

Markov chain Monte Carlo

- Sample from a target distribution P(x) by

constructing Markov chain for which P(x) is the

stationary distribution - Two main schemes
- Gibbs sampling
- Metropolis-Hastings algorithm

Gibbs sampling

- For variables x x1, x2, , xn and target P(x)
- Draw xi(t1) from P(xix-i)
- x-i x1(t1), x2(t1),, xi-1(t1), xi1(t), ,

xn(t)

Gibbs sampling

(MacKay, 2002)

Metropolis-Hastings algorithm (Metropolis et al.,

1953 Hastings, 1970)

- Step 1 propose a state (we assume

symmetrically) - Q(x(t1)x(t)) Q(x(t))x(t1))
- Step 2 decide whether to accept, with

probability

Metropolis acceptance function

Barker acceptance function

Metropolis-Hastings algorithm

p(x)

Metropolis-Hastings algorithm

p(x)

Metropolis-Hastings algorithm

p(x)

Metropolis-Hastings algorithm

p(x)

A(x(t), x(t1)) 0.5

Metropolis-Hastings algorithm

p(x)

Metropolis-Hastings algorithm

p(x)

A(x(t), x(t1)) 1

Outline

- Markov chain Monte Carlo
- Sampling from the prior
- Sampling from category distributions

Iterated learning (Kirby, 2001)

What are the consequences of learners learning

from other learners?

Analyzing iterated learning

PL(hd)

PL(hd)

PP(dh)

PP(dh)

PL(hd) probability of inferring hypothesis h

from data d PP(dh) probability of generating

data d from hypothesis h

Iterated Bayesian learning

PL(hd)

PL(hd)

PP(dh)

PP(dh)

Analyzing iterated learning

Stationary distributions

- Markov chain on h converges to the prior, P(h)
- Markov chain on d converges to the prior

predictive distribution

(Griffiths Kalish, 2005)

Explaining convergence to the prior

PL(hd)

PL(hd)

PP(dh)

PP(dh)

- Intuitively data acts once, prior many times
- Formally iterated learning with Bayesian agents

is a Gibbs sampler on P(d,h)

(Griffiths Kalish, in press)

Revealing inductive biases

- Many problems in cognitive science can be

formulated as problems of induction - learning languages, concepts, and causal

relations - Such problems are not solvable without bias
- (e.g., Goodman, 1955 Kearns Vazirani, 1994

Vapnik, 1995) - What biases guide human inductive inferences?
- If iterated learning converges to the prior,

then it may provide a method for investigating

biases

Serial reproduction (Bartlett, 1932)

- Participants see stimuli, then reproduce them

from memory - Reproductions of one participant are stimuli for

the next - Stimuli were interesting, rather than controlled
- e.g., War of the Ghosts

General strategy

- Use well-studied and simple stimuli for which

peoples inductive biases are known - function learning
- concept learning
- color words
- Examine dynamics of iterated learning
- convergence to state reflecting biases
- predictable path to convergence

Iterated function learning

- Each learner sees a set of (x,y) pairs
- Makes predictions of y for new x values
- Predictions are data for the next learner

(Kalish, Griffiths, Lewandowsky, in press)

Function learning experiments

Examine iterated learning with different initial

data

Initial data

Iteration

1 2 3 4

5 6 7 8 9

Identifying inductive biases

- Formal analysis suggests that iterated learning

provides a way to determine inductive biases - Experiments with human learners support this idea
- when stimuli for which biases are well understood

are used, those biases are revealed by iterated

learning - What do inductive biases look like in other

cases? - continuous categories
- causal structure
- word learning
- language learning

Statistics and cultural evolution

- Iterated learning for MAP learners reduces to a

form of the stochastic EM algorithm - Monte Carlo EM with a single sample
- Provides connections between cultural evolution

and classic models used in population genetics - MAP learning of multinomials Wright-Fisher
- More generally, an account of how products of

cultural evolution relate to the biases of

learners

Outline

- Markov chain Monte Carlo
- Sampling from the prior
- Sampling from category distributions

Categories are central to cognition

Sampling from categories

Frog distribution P(xc)

A task

- Ask subjects which of two alternatives comes

from a target category

Which animal is a frog?

A Bayesian analysis of the task

Assume

Response probabilities

- If people probability match to the posterior,

response probability is equivalent to the Barker

acceptance function for target distribution p(xc)

Collecting the samples

Which is the frog?

Trial 1

Trial 2

Trial 3

Verifying the method

Training

- Subjects were shown schematic fish of

different sizes and trained on whether they came

from the ocean (uniform) or a fish farm (Gaussian)

Between-subject conditions

Choice task

- Subjects judged which of the two fish came

from the fish farm (Gaussian) distribution

Examples of subject MCMC chains

Estimates from all subjects

- Estimated means and standard deviations are

significantly different across groups - Estimated means are accurate, but standard

deviation estimates are high - result could be due to perceptual noise or

response gain

Sampling from natural categories

- Examined distributions for four natural

categories giraffes, horses, cats, and dogs

Presented stimuli with nine-parameter stick

figures (Olman Kersten, 2004)

Choice task

Samples from Subject 3 (projected onto plane from

LDA)

Mean animals by subject

S1

S2

S3

S4

S5

S6

S7

S8

giraffe

horse

cat

dog

Marginal densities (aggregated across subjects)

- Giraffes are distinguished by neck length,

body height and body tilt - Horses are like giraffes, but with shorter

bodies and nearly uniform necks - Cats have longer tails than dogs

Relative volume of categories

Convex Hull

Minimum Enclosing Hypercube

Convex hull content divided by enclosing

hypercube content

Giraffe Horse Cat Dog

0.00004 0.00006 0.00003 0.00002

Discrimination method (Olman Kersten, 2004)

Parameter space for discrimination

- Restricted so that most random draws were

animal-like

MCMC and discrimination means

Conclusion

- Markov chain Monte Carlo provides a way to sample

from subjective probability distributions - Many interesting questions can be framed in terms

of subjective probability distributions - inductive biases (priors)
- mental representations (category distributions)
- Other MCMC methods may provide further empirical

methods - Gibbs for categories, adaptive MCMC,

A different approach

- Instead of asking whether people are rational,

use assumption of rationality to investigate

cognition - If we can predict peoples responses, we can

design experiments that measure psychological

variables

Randomized algorithms ? Psychological experiments

(No Transcript)

From sampling to maximizing

From sampling to maximizing

- General analytic results are hard to obtain
- (r ? is Monte Carlo EM with a single sample)
- For certain classes of languages, it is possible

to show that the stationary distribution gives

each hypothesis h probability proportional to

P(h)r - the ordering identified by the prior is

preserved, but not the corresponding probabilities

(Kirby, Dowman, Griffiths, in press)

Implications for linguistic universals

- When learners sample from P(hd), the

distribution over languages converges to the

prior - identifies a one-to-one correspondence between

inductive biases and linguistic universals - As learners move towards maximizing, the

influence of the prior is exaggerated - weak biases can produce strong universals
- cultural evolution is a viable alternative to

traditional explanations for linguistic

universals

(No Transcript)

Iterated concept learning

- Each learner sees examples from a species
- Identifies species of four amoebae
- Iterated learning is run within-subjects

hypotheses

data

(Griffiths, Christian, Kalish, in press)

Two positive examples

data (d)

hypotheses (h)

Bayesian model (Tenenbaum, 1999 Tenenbaum

Griffiths, 2001)

d 2 amoebae h set of 4 amoebae

Classes of concepts (Shepard, Hovland, Jenkins,

1958)

color

size

shape

Class 1

Class 2

Class 3

Class 4

Class 5

Class 6

Experiment design (for each subject)

6 iterated learning chains

6 independent learning chains

Estimating the prior

data (d)

hypotheses (h)

Estimating the prior

Prior

Bayesian model

Human subjects

0.861

Class 1

Class 2

0.087

0.009

Class 3

0.002

Class 4

0.013

Class 5

Class 6

0.028

r 0.952

Two positive examples (n 20)

Human learners

Bayesian model

Probability

Probability

Iteration

Iteration

Two positive examples (n 20)

Human learners

Probability

Bayesian model

Three positive examples

data (d)

hypotheses (h)

Three positive examples (n 20)

Human learners

Bayesian model

Probability

Probability

Iteration

Iteration

Three positive examples (n 20)

Human learners

Bayesian model

(No Transcript)

Classification objects

Parameter space for discrimination

- Restricted so that most random draws were

animal-like

MCMC and discrimination means

Problems with classification objects

Problems with classification objects

Minimum Enclosing Hypercube

Convex Hull

Convex hull content divided by enclosing

hypercube content

Giraffe Horse Cat Dog

0.00004 0.00006 0.00003 0.00002

(No Transcript)

Allowing a Wider Range of Behavior

- An exponentiated choice rule results in a

Markov chain with stationary distribution

corresponding to an exponentiated version of the

category distribution, proportional to p(xc)?

Category drift

- For fragile categories, the MCMC procedure could

influence the category representation - Interleaved training and test blocks in the

training experiments

(No Transcript)