Markov chain Monte Carlo with people - PowerPoint PPT Presentation

Loading...

PPT – Markov chain Monte Carlo with people PowerPoint presentation | free to download - id: 532657-MGFiZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Markov chain Monte Carlo with people

Description:

How can cognition inspire new statistical models? applications of ... distinguished by neck length, ... category distributions Markov chains Variables ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 97
Provided by: JoshTen6
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Markov chain Monte Carlo with people


1
Markov chain Monte Carlo with people
  • Tom Griffiths
  • Department of Psychology
  • Cognitive Science Program
  • UC Berkeley

with Mike Kalish, Stephan Lewandowsky, and Adam
Sanborn
2
Inductive problems
3
Computational cognitive science
  • Identify the underlying computational problem
  • Find the optimal solution to that problem
  • Compare human cognition to that solution
  • For inductive problems, solutions come from
    statistics

4
Statistics and inductive problems
Cognitive science Categorization Causal
learning Function learning Language
Statistics Density estimation Graphical
models Regression Probabilistic grammars
5
Statistics and human cognition
  • How can we use statistics to understand
    cognition?
  • How can cognition inspire new statistical models?
  • applications of Dirichlet process and Pitman-Yor
    process models to natural language
  • exchangeable distributions on infinite binary
    matrices via the Indian buffet process (priors on
    causal structure)
  • nonparametric Bayesian models for relational data

6
Statistics and human cognition
  • How can we use statistics to understand
    cognition?
  • How can cognition inspire new statistical models?
  • applications of Dirichlet process and Pitman-Yor
    process models to natural language
  • exchangeable distributions on infinite binary
    matrices via the Indian buffet process (priors on
    causal structure)
  • nonparametric Bayesian models for relational data

7
Statistics and human cognition
  • How can we use statistics to understand
    cognition?
  • How can cognition inspire new statistical models?
  • applications of Dirichlet process and Pitman-Yor
    process models to natural language
  • exchangeable distributions on infinite binary
    matrices via the Indian buffet process
  • nonparametric Bayesian models for relational data

8
Are people Bayesian?
Reverend Thomas Bayes
9
Bayes theorem
h hypothesis d data
10
People are stupid
11
Predicting the future
  • How often is Google News updated?
  • t time since last update
  • ttotal time between updates
  • What should we guess for ttotal given t?

12
The effects of priors
13
Evaluating human predictions
  • Different domains with different priors
  • a movie has made 60 million power-law
  • your friend quotes from line 17 of a poem
    power-law
  • you meet a 78 year old man Gaussian
  • a movie has been running for 55 minutes
    Gaussian
  • a U.S. congressman has served for 11 years
    Erlang
  • Prior distributions derived from actual data
  • Use 5 values of t for each
  • People predict ttotal

14
people
empirical prior
parametric prior
Gotts rule
15
A different approach
  • Instead of asking whether people are rational,
    use assumption of rationality to investigate
    cognition
  • If we can predict peoples responses, we can
    design experiments that measure psychological
    variables

16
Two deep questions
  • What are the biases that guide human learning?
  • prior probability distribution P(h)
  • What do mental representations look like?
  • category distribution P(xc)

17
Two deep questions
  • What are the biases that guide human learning?
  • prior probability distribution on hypotheses,
    P(h)
  • What do mental representations look like?
  • distribution over objects x in category c, P(xc)

Develop ways to sample from these distributions
18
Outline
  • Markov chain Monte Carlo
  • Sampling from the prior
  • Sampling from category distributions

19
Outline
  • Markov chain Monte Carlo
  • Sampling from the prior
  • Sampling from category distributions

20
Markov chains
x
x
x
x
x
x
x
x
Transition matrix T P(x(t1)x(t))
  • Variables x(t1) independent of history given
    x(t)
  • Converges to a stationary distribution under
    easily checked conditions (i.e., if it is ergodic)

21
Markov chain Monte Carlo
  • Sample from a target distribution P(x) by
    constructing Markov chain for which P(x) is the
    stationary distribution
  • Two main schemes
  • Gibbs sampling
  • Metropolis-Hastings algorithm

22
Gibbs sampling
  • For variables x x1, x2, , xn and target P(x)
  • Draw xi(t1) from P(xix-i)
  • x-i x1(t1), x2(t1),, xi-1(t1), xi1(t), ,
    xn(t)

23
Gibbs sampling
(MacKay, 2002)
24
Metropolis-Hastings algorithm (Metropolis et al.,
1953 Hastings, 1970)
  • Step 1 propose a state (we assume
    symmetrically)
  • Q(x(t1)x(t)) Q(x(t))x(t1))
  • Step 2 decide whether to accept, with
    probability

Metropolis acceptance function
Barker acceptance function
25
Metropolis-Hastings algorithm
p(x)
26
Metropolis-Hastings algorithm
p(x)
27
Metropolis-Hastings algorithm
p(x)
28
Metropolis-Hastings algorithm
p(x)
A(x(t), x(t1)) 0.5
29
Metropolis-Hastings algorithm
p(x)
30
Metropolis-Hastings algorithm
p(x)
A(x(t), x(t1)) 1
31
Outline
  • Markov chain Monte Carlo
  • Sampling from the prior
  • Sampling from category distributions

32
Iterated learning (Kirby, 2001)
What are the consequences of learners learning
from other learners?
33
Analyzing iterated learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
PL(hd) probability of inferring hypothesis h
from data d PP(dh) probability of generating
data d from hypothesis h
34
Iterated Bayesian learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
35
Analyzing iterated learning
36
Stationary distributions
  • Markov chain on h converges to the prior, P(h)
  • Markov chain on d converges to the prior
    predictive distribution

(Griffiths Kalish, 2005)
37
Explaining convergence to the prior
PL(hd)
PL(hd)
PP(dh)
PP(dh)
  • Intuitively data acts once, prior many times
  • Formally iterated learning with Bayesian agents
    is a Gibbs sampler on P(d,h)

(Griffiths Kalish, in press)
38
Revealing inductive biases
  • Many problems in cognitive science can be
    formulated as problems of induction
  • learning languages, concepts, and causal
    relations
  • Such problems are not solvable without bias
  • (e.g., Goodman, 1955 Kearns Vazirani, 1994
    Vapnik, 1995)
  • What biases guide human inductive inferences?
  • If iterated learning converges to the prior,
    then it may provide a method for investigating
    biases

39
Serial reproduction (Bartlett, 1932)
  • Participants see stimuli, then reproduce them
    from memory
  • Reproductions of one participant are stimuli for
    the next
  • Stimuli were interesting, rather than controlled
  • e.g., War of the Ghosts

40
General strategy
  • Use well-studied and simple stimuli for which
    peoples inductive biases are known
  • function learning
  • concept learning
  • color words
  • Examine dynamics of iterated learning
  • convergence to state reflecting biases
  • predictable path to convergence

41
Iterated function learning
  • Each learner sees a set of (x,y) pairs
  • Makes predictions of y for new x values
  • Predictions are data for the next learner

(Kalish, Griffiths, Lewandowsky, in press)
42
Function learning experiments
Examine iterated learning with different initial
data
43
Initial data
Iteration
1 2 3 4
5 6 7 8 9
44
Identifying inductive biases
  • Formal analysis suggests that iterated learning
    provides a way to determine inductive biases
  • Experiments with human learners support this idea
  • when stimuli for which biases are well understood
    are used, those biases are revealed by iterated
    learning
  • What do inductive biases look like in other
    cases?
  • continuous categories
  • causal structure
  • word learning
  • language learning

45
Statistics and cultural evolution
  • Iterated learning for MAP learners reduces to a
    form of the stochastic EM algorithm
  • Monte Carlo EM with a single sample
  • Provides connections between cultural evolution
    and classic models used in population genetics
  • MAP learning of multinomials Wright-Fisher
  • More generally, an account of how products of
    cultural evolution relate to the biases of
    learners

46
Outline
  • Markov chain Monte Carlo
  • Sampling from the prior
  • Sampling from category distributions

47
Categories are central to cognition
48
Sampling from categories
Frog distribution P(xc)
49
A task
  • Ask subjects which of two alternatives comes
    from a target category

Which animal is a frog?
50
A Bayesian analysis of the task
Assume
51
Response probabilities
  • If people probability match to the posterior,
    response probability is equivalent to the Barker
    acceptance function for target distribution p(xc)

52
Collecting the samples
Which is the frog?
Trial 1
Trial 2
Trial 3
53
Verifying the method
54
Training
  • Subjects were shown schematic fish of
    different sizes and trained on whether they came
    from the ocean (uniform) or a fish farm (Gaussian)

55
Between-subject conditions
56
Choice task
  • Subjects judged which of the two fish came
    from the fish farm (Gaussian) distribution

57
Examples of subject MCMC chains
58
Estimates from all subjects
  • Estimated means and standard deviations are
    significantly different across groups
  • Estimated means are accurate, but standard
    deviation estimates are high
  • result could be due to perceptual noise or
    response gain

59
Sampling from natural categories
  • Examined distributions for four natural
    categories giraffes, horses, cats, and dogs

Presented stimuli with nine-parameter stick
figures (Olman Kersten, 2004)
60
Choice task
61
Samples from Subject 3 (projected onto plane from
LDA)
62
Mean animals by subject
S1
S2
S3
S4
S5
S6
S7
S8
giraffe
horse
cat
dog
63
Marginal densities (aggregated across subjects)
  • Giraffes are distinguished by neck length,
    body height and body tilt
  • Horses are like giraffes, but with shorter
    bodies and nearly uniform necks
  • Cats have longer tails than dogs

64
Relative volume of categories
Convex Hull
Minimum Enclosing Hypercube
Convex hull content divided by enclosing
hypercube content
Giraffe Horse Cat Dog
0.00004 0.00006 0.00003 0.00002

65
Discrimination method (Olman Kersten, 2004)
66
Parameter space for discrimination
  • Restricted so that most random draws were
    animal-like

67
MCMC and discrimination means
68
Conclusion
  • Markov chain Monte Carlo provides a way to sample
    from subjective probability distributions
  • Many interesting questions can be framed in terms
    of subjective probability distributions
  • inductive biases (priors)
  • mental representations (category distributions)
  • Other MCMC methods may provide further empirical
    methods
  • Gibbs for categories, adaptive MCMC,

69
A different approach
  • Instead of asking whether people are rational,
    use assumption of rationality to investigate
    cognition
  • If we can predict peoples responses, we can
    design experiments that measure psychological
    variables

Randomized algorithms ? Psychological experiments
70
(No Transcript)
71
From sampling to maximizing
72
From sampling to maximizing
  • General analytic results are hard to obtain
  • (r ? is Monte Carlo EM with a single sample)
  • For certain classes of languages, it is possible
    to show that the stationary distribution gives
    each hypothesis h probability proportional to
    P(h)r
  • the ordering identified by the prior is
    preserved, but not the corresponding probabilities

(Kirby, Dowman, Griffiths, in press)
73
Implications for linguistic universals
  • When learners sample from P(hd), the
    distribution over languages converges to the
    prior
  • identifies a one-to-one correspondence between
    inductive biases and linguistic universals
  • As learners move towards maximizing, the
    influence of the prior is exaggerated
  • weak biases can produce strong universals
  • cultural evolution is a viable alternative to
    traditional explanations for linguistic
    universals

74
(No Transcript)
75
Iterated concept learning
  • Each learner sees examples from a species
  • Identifies species of four amoebae
  • Iterated learning is run within-subjects

hypotheses
data
(Griffiths, Christian, Kalish, in press)
76
Two positive examples
data (d)
hypotheses (h)
77
Bayesian model (Tenenbaum, 1999 Tenenbaum
Griffiths, 2001)
d 2 amoebae h set of 4 amoebae
78
Classes of concepts (Shepard, Hovland, Jenkins,
1958)
color
size
shape
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
79
Experiment design (for each subject)
6 iterated learning chains
6 independent learning chains
80
Estimating the prior
data (d)
hypotheses (h)
81
Estimating the prior
Prior
Bayesian model
Human subjects
0.861
Class 1
Class 2
0.087
0.009
Class 3
0.002
Class 4
0.013
Class 5
Class 6
0.028
r 0.952
82
Two positive examples (n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
83
Two positive examples (n 20)
Human learners
Probability
Bayesian model
84
Three positive examples
data (d)
hypotheses (h)
85
Three positive examples (n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
86
Three positive examples (n 20)
Human learners
Bayesian model
87
(No Transcript)
88
Classification objects
89
Parameter space for discrimination
  • Restricted so that most random draws were
    animal-like

90
MCMC and discrimination means
91
Problems with classification objects
92
Problems with classification objects
Minimum Enclosing Hypercube
Convex Hull
Convex hull content divided by enclosing
hypercube content
Giraffe Horse Cat Dog
0.00004 0.00006 0.00003 0.00002

93
(No Transcript)
94
Allowing a Wider Range of Behavior
  • An exponentiated choice rule results in a
    Markov chain with stationary distribution
    corresponding to an exponentiated version of the
    category distribution, proportional to p(xc)?

95
Category drift
  • For fragile categories, the MCMC procedure could
    influence the category representation
  • Interleaved training and test blocks in the
    training experiments

96
(No Transcript)
About PowerShow.com