Bayesian models of inductive generalization in language acquisition - PowerPoint PPT Presentation

Loading...

PPT – Bayesian models of inductive generalization in language acquisition PowerPoint presentation | free to download - id: 6ed328-YTdhN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Bayesian models of inductive generalization in language acquisition

Description:

Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT Joint work with Fei Xu, Amy Perfors, Terry Regier, Charles Kemp – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 61
Provided by: Joshu167
Learn more at: http://web.mit.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Bayesian models of inductive generalization in language acquisition


1
  • Bayesian models of inductive generalization in
    language acquisition
  • Josh Tenenbaum
  • MIT
  • Joint work with Fei Xu, Amy Perfors, Terry
    Regier, Charles Kemp

2
The problem of generalization
  • How can people learn so much from such limited
    evidence?
  • Kinds of objects and their properties
  • Meanings and forms of words, phrases, and
    sentences
  • Causal relations
  • Intuitive theories of physics, psychology,
  • Social structures, conventions, and rules
  • The goal A general-purpose computational
    framework for understanding of how people make
    these inductive leaps, and how they can be
    successful.

3
The problem of generalization
  • How can people learn so much from such limited
    evidence?
  • Learning word meanings from examples

horse
horse
horse
4
The problem of generalization
  • How can people learn so much from such limited
    evidence?
  • Poverty of the stimulus in syntactic
    acquisition

Aux-fronting in interrogatives (Chomsky, Crain
Nakayma)
Simple declaratives The girl is happy. They
are eating Simple interrogatives Is the girl
happy? Are they eating?
Data
H1. Linear move the first auxiliary in the
sentence to the beginning. H2. Hierarchical
move the auxiliary in the main clause to the
beginning.
Hypotheses
Generalization
Complex declarative The girl who is sleeping is
happy. Complex interrogative Is the girl who is
sleeping happy? via H2
Is the girl who sleeping is happy? via H1
5
The problem of generalization
  • How can people learn so much from such limited
    evidence?
  • The answer human learners have abstract
    knowledge that provides inductive constraints
    restrictions or biases on the hypotheses to be
    considered.
  • Word learning whole-object principle, taxonomic
    principle, basic-level bias, shape bias, mutual
    exclusivity,
  • Syntax syntactic rules are defined over
    hierarchical phrase structures rather than linear
    order of words.

Poverty of the stimulus as a scientific tool
6
The big questions
  • How does abstract knowledge guide generalization
    from sparsely observed data?
  • 2. What form does abstract knowledge take,
    across different domains and tasks?
  • 3. What are the origins of abstract knowledge?

7
The approach
  • How does abstract knowledge guide generalization
    from sparsely observed data?
  • Priors for Bayesian inference
  • 2. What form does abstract knowledge take,
    across different domains and tasks?
  • Probabilities defined over structured
    representations graphs, grammars, predicate
    logic, schemas.
  • 3. What are the origins of abstract knowledge?
  • Hierarchical probabilistic models, with
    inference at multiple levels of abstraction and
    multiple timescales.

8
Three case studies of generalization
  • Learning words for object categories
  • Learning abstract word-learning principles
    (learning to learn words)
  • Taxonomic principle
  • Shape bias
  • Learning in syntax unobserved syntactic forms,
    abstract syntactic knowledge

9
Word learning as Bayesian inference (Xu
Tenenbaum, Psych Review 2007)
  • A Bayesian model can explain several core aspects
    of generalization in word learning
  • learning from very few examples
  • learning from only positive examples
  • simultaneous learning of overlapping extensions
  • graded degrees of confidence
  • dependence on pragmatic and social context
  • arguably, better than previous computational
    accounts based on hypothesis elimination (e.g.,
    Siskind) or associative learning (e.g., Regier).

10
Basics of Bayesian inference
  • Bayes rule
  • An example
  • Data John is coughing
  • Some hypotheses
  • John has a cold
  • John has lung cancer
  • John has a stomach flu
  • Likelihood P(dh) favors 1 and 2 over 3
  • Prior probability P(h) favors 1 and 3 over 2
  • Posterior probability P(hd) favors 1 over 2 and
    3

11
Bayesian generalization
X
?
?
?
?
horse
?
?
12
Bayesian generalization
X
Hypothesis space H of possible word meanings
(extensions) e.g., rectangular regions
13
Bayesian generalization
X
Hypothesis space H of possible word meanings
(extensions) e.g., rectangular regions
Assume examples are sampled randomly from the
words extension.
14
Bayesian generalization
X
15
Bayesian generalization
X
16
Bayesian generalization
X
17
Bayesian generalization
X
18
Bayesian generalization
X
c.f. Subset principle
19
Generalization gradients
Hypothesis averaging
Bayes
Maximum likelihood or subset principle
20
Word learning as Bayesian inference (Xu
Tenenbaum, Psych Review 2007)
21
Word learning as Bayesian inference (Xu
Tenenbaum, Psych Review 2007)
  • Prior p(h) Choice of hypothesis space embodies
    traditional constraints whole object principle,
    shape bias, taxonomic principle
  • More fine-grained prior favors more distinctive
    clusters.
  • Likelihood p(X h) Random sampling assumption.
  • Size principle Smaller hypotheses receive
    greater likelihood, and exponentially more so as
    n increases.

22
(No Transcript)
23
(No Transcript)
24
Generalization experiments
Childrens generalizations
Bayesian model
25
Further questions
  • Bayesian learning for other kinds of words?
  • Verbs (Niyogi Alishahi Stevenson Perfors,
    Wonnacott, Tenenbaum)
  • Adjectives (Dowman Schmidt, Goodman, Barner,
    Tenenbaum)
  • How fundamental and general is learning by
    suspicious coincidence (the size principle)?
  • Other domains of inductive generalization in
    adults and children (Tenenbaum et al Xu et al.)
  • Generalization in lt 1-year-old infants (Gerken
    Xu et al.)
  • Bayesian word learning in more natural
    communicative contexts?
  • Cross-situational mapping with real-world scenes
    and utterances (Frank, Goodman Tenenbaum c.f.,
    Yu)

26
Further questions
  • Where do the hypothesis space and priors come
    from?
  • How does word learning interact with conceptual
    development?

27
A hierarchical Bayesian view
Whole-object principle Shape bias Taxonomic
principle
Principles T
thing
animal
tree
Structure S
dog
cat
daisy
Basset hound
...
Data D
?
fep
ziv
?
gip
?
?
?
?
?
28
A hierarchical Bayesian view
Whole-object principle Shape bias Taxonomic
principle
Principles T
thing
animal
tree
Structure S
dog
cat
daisy
Basset hound
...
Data D
?
fep
ziv
?
gip
?
?
?
?
?
29
Different forms of structure
Dominance Order
Line
Ring
Flat
Hierarchy
Taxonomy
Grid
Cylinder
30
Discovery of structural form (Kemp and Tenenbaum)
P(F)
F form
Disjoint clusters
Linear order
Tree-structured taxonomy
P(S F) Simplicity
S structure
  • P(D S)
  • Fit to data

Features
X1 X2 X3 X4 X5 X6 X7
D data

31
Bayesian model selection trading fit vs.
simplicity
Data D
Structure S
32
Bayesian model selection trading fit vs.
simplicity
Balance between fit and simplicity should be
sensitive to the amount of data observed
33
(No Transcript)
34
Development of structural forms as more object
features are observed
35
A hierarchical Bayesian view
Whole-object principle Shape bias Taxonomic
principle
Principles T
thing
animal
tree
Structure S
dog
cat
daisy
Basset hound
...
Data D
?
fep
ziv
?
gip
?
?
?
?
?
36
The shape bias in word learning (Landau, Smith,
Jones 1988)
This is a dax.
Show me the dax
  • A useful inductive constraint many early words
    are labels for object categories, and shape may
    be the best cue to object category membership.
  • English-speaking children typically show the
    shape bias at 24 months, but not at 20 months.

37
Is the shape bias learned?
  • Smith et al (2002) trained 17-month-olds on
    labels for 4 artificial categories
  • After 8 weeks of training (20 min/week),
    19-month-olds show the shape bias

Show me the dax
This is a dax.
38
Transfer to real-world vocabulary
The puzzle The shape bias is a powerful
inductive constraint, yet can be learned from
very little data.
39
Learning abstract knowledge about feature
variability
The intuition - Shape varies across
categories but relatively constant within
nameable categories. - Other features (size,
color, texture) vary both within and across
nameable object categories.
40
Learning a Bayesian prior
Hypothesis space H of possible word meanings
(extensions) e.g., rectangular regions
?
?
?
horse
?
shape
?
?
p(h) uniform
color
41
Learning a Bayesian prior
Hypothesis space H of possible word meanings
(extensions) e.g., rectangular regions
?
?
?
horse
?
shape
?
?
p(h) uniform
color
cat
cup
ball
chair
42
Learning a Bayesian prior
Hypothesis space H of possible word meanings
(extensions) e.g., rectangular regions
?
?
?
horse
?
shape
?
?
p(h) long narrow high others
low
color
cat
cup
ball
chair
43
Learning a Bayesian prior
Hypothesis space H of possible word meanings
(extensions) e.g., rectangular regions
?
?
?
horse
?
shape
?
?
p(h) long narrow high others
low
color
cat
cup
ball
chair
44
Hierarchical Bayesian model
Level 2 nameable object categories in
general
Nameable object categories tend to be homogeneous
in shape, but heterogeneous in color, material,

?
Level 1 specific categories
shape
shape
shape
shape
?
color
color
color
color
cat
cup
ball
chair
Data
45
Hierarchical Bayesian model
Level 2 nameable object categories in
general
Nameable object categories tend to be homogeneous
in shape, but heterogeneous in color, material,

Level 1 specific categories
shape
shape
shape
shape
color
color
color
color
cat
cup
ball
chair
Data
46
ai within-category variability for feature i
p(ai) Exponential(1) p(qiai)
Dirichlet(ai) p(yiqi) Multinomial(qi)
ashape
low
high
Level 2 nameable object categories in
general
acolor
low
high
Level 1 specific categories
qshape
qshape
qshape
qshape
qcolor
qcolor
qcolor
qcolor
cup
ball
chair
cat
Data
yshape , ycolor
47
Learning the shape bias
48
Second-order generalization test
This is a dax.
Show me the dax
49
Three case studies of generalization
  • Learning words for object categories
  • Learning abstract word-learning principles
    (learning to learn words)
  • Taxonomic principle
  • Shape bias
  • Learning in syntax unobserved syntactic forms,
    abstract syntactic knowledge

50
Poverty of the Stimulus argument
E.g., aux-fronting in complex interrogatives
Simple declarative The girl is happy. They are
eating Simple interrogative Is the girl happy?
Are they eating?
Data
H1. Linear move the first auxiliary in the
sentence to the beginning. H2. Hierarchical
move the auxiliary in the main clause to the
beginning.
Hypotheses
Complex declarative The girl who is sleeping is
happy. Complex interrogative Is the girl who is
sleeping happy? via H2
Is the girl who sleeping is happy? via H1
Generalization
gt Inductive constraint
Induction of specific grammatical rules must be
guided by some abstract constraints to prefer
certain hypotheses over others, e.g., syntactic
rules are defined over hierarchical phrase
structures rather than linear order of words.
51
Hierarchical phrase structure
No
Yes
Must this inductive constraint be innately
specified as part of the initial state of the
language faculty? Could it be inferred using
more domain-general capacities?
52
Hierarchical Bayesian model
Linear regular,
Hierarchical context-free
Grammar type T
Specific grammar G
Data D (CHILDES 21500 sentences, 2300 sentence
types)
53
Hierarchical Bayesian model
Linear regular,
Hierarchical context-free
Grammar type T
Specific grammar G
Data D (CHILDES 21500 sentences, 2300 sentence
types)
Simplicity
Fit to data
Unbiased (uniform)
54
Hierarchical Bayesian model
Linear regular,
Hierarchical context-free
Grammar type T
Specific grammar G
Data D (CHILDES 21500 sentences, 2300 sentence
types)
Simplicity
Fit to data
Unbiased (uniform)
55
Hierarchical Bayesian model
Linear regular,
Hierarchical context-free
Grammar type T
Specific grammar G
Data D (CHILDES)
Simplicity
Fit to data
Unbiased (uniform)
56
Results Full corpus
(Note these are -log probabilities, so lower
better!)
Simpler
Tighter fit
Prior
Likelihood
Posterior
CFG S L
REG B M N
CFG S L
REG B M N
CFG S L
REG B M N
57
Generalization results
How well does each grammar predict unseen
sentence forms?
e.g., complex aux-fronted interrogatives

Context-free Regular grammars
grammars
Type In corpus? Example FLAT RG-N RG-M RG-B 1-ST CFG-S CFG-L
Simple Declarative Eagles do fly. (n aux vi)
Simple Interrogative Do eagles fly? (aux n vi)
Complex Declarative Eagles that are alive do fly. (n comp aux adj aux vi)
Complex Interrogative Do eagles that are alive fly? (aux n comp aux adj vi)
Complex Interrogative Are eagles that alive do fly? (aux n comp adj aux vi)
58
Results First file (90 mins)
(Note these are -log probabilities, so lower
better!)
Simpler
Tighter fit
Prior
Likelihood
Posterior
CFG S L
REG B M N
CFG S L
REG B M N
CFG S L
REG B M N
59
Conclusions
  • Bayesian inference over hierarchies of structured
    representations provides a way to study core
    questions of human cognition, in language and
    other domains.
  • What is the content and form of abstract
    knowledge?
  • How can abstract knowledge guide generalization
    from sparse data?
  • How can abstract knowledge itself be acquired?
    What is built in?
  • Going beyond traditional dichotomies.
  • How can structured knowledge be acquired by
    statistical learning?
  • How can domain-general learning mechanisms
    acquire domain-specific inductive constraints?
  • A different way to think about cognitive
    development.
  • Powerful abstractions (taxonomic structure, shape
    bias, hierarchical organization of syntax) can be
    inferred top down, from surprisingly little
    data, together with learning more concrete
    knowledge.
  • Very different from the traditional empiricist or
    nativist views of abstraction. Worth pursuing
    more generally

60
Abstract knowledge in cognitive development
  • Word Learning Whole object bias
    Taxonomic principle
    (Markman)
  • Shape bias
    (Smith)
  • Causal reasoning Causal schemata
    (Kelley)
  • Folk physics Objects are unified, persistent
    (Spelke)
  • Number Counting principles
    (Gelman)
  • Folk biology Principles of taxonomic rank
    (Atran)
  • Folk psychology Principle of rationality
    (Gergely)
  • Ontology M-constraint on predicability
    (Keil)
  • Syntax UG (Chomsky)
  • Phonology Faithfulness, Markedness constraints
    (Prince, Smolensky)
About PowerShow.com