Part III Hierarchical Bayesian Models - PowerPoint PPT Presentation

About This Presentation
Title:

Part III Hierarchical Bayesian Models

Description:

u2. u3. u4. u5. u6. s1. s2. s3. s4. s5. s6. G. U ... u2. u3. u4. u5. u6. s1. s2. s3. s4. s5. s6. G. U. Simultaneous learning at multiple levels ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 115
Provided by: charle65
Category:

less

Transcript and Presenter's Notes

Title: Part III Hierarchical Bayesian Models


1
Part IIIHierarchical Bayesian Models
2
Universal Grammar
Hierarchical phrase structure grammars (e.g.,
CFG, HPSG, TAG)
Grammar
Phrase structure
Utterance
Speech signal
3
Vision
(Han and Zhu, 2006)
4
Word learning
Whole-object principle Shape bias Taxonomic
principle Contrast principle Basic-level bias
Principles
Structure
Data
5
Hierarchical Bayesian models
  • Can represent and reason about knowledge at
    multiple levels of abstraction.
  • Have been used by statisticians for many years.

6
Hierarchical Bayesian models
  • Can represent and reason about knowledge at
    multiple levels of abstraction.
  • Have been used by statisticians for many years.
  • Have been applied to many cognitive problems
  • causal reasoning (Mansinghka et al, 06)
  • language (Chater and Manning, 06)
  • vision (Fei-Fei, Fergus, Perona, 03)
  • word learning (Kemp, Perfors, Tenenbaum,06)
  • decision making (Lee, 06)

7
Outline
  • A high-level view of HBMs
  • A case study
  • Semantic knowledge

8
Universal Grammar
Hierarchical phrase structure grammars (e.g.,
CFG, HPSG, TAG)
P(grammar UG)
Grammar
P(phrase structure grammar)
Phrase structure
P(utterance phrase structure)
Utterance
P(speech utterance)
Speech signal
9
Hierarchical Bayesian model
U
Universal Grammar
P(GU)
G
Grammar
P(sG)
s1
s2
s3
s4
s5
s6
Phrase structure
P(us)
u1
u2
u3
u4
u5
u6
Utterance
10
Hierarchical Bayesian model
U
Universal Grammar
P(GU)
G
Grammar
P(sG)
s1
s2
s3
s4
s5
s6
Phrase structure
P(us)
u1
u2
u3
u4
u5
u6
Utterance
A hierarchical Bayesian model specifies a joint
distribution over all variables in the
hierarchy P(ui, si, G U)
P (ui si) P(si G)
P(GU)
11
Knowledge at multiple levels
  • Top-down inferences
  • How does abstract knowledge guide inferences at
    lower levels?
  • Bottom-up inferences
  • How can abstract knowledge be acquired?
  • Simultaneous learning at multiple levels of
    abstraction

12
Top-down inferences
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Given grammar G and a collection of utterances,
construct a phrase structure for each utterance.
13
Top-down inferences
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer si given ui, G P( si ui, G)
a P( ui si ) P( si G)
14
Bottom-up inferences
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Given a collection of phrase structures, learn a
grammar G.
15
Bottom-up inferences
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer G given si and U P(G si, U) a P(
si G) P(GU)
16
Simultaneous learning at multiple levels
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Given a set of utterances ui and innate
knowledge U, construct a grammar G and a phrase
structure for each utterance.
17
Simultaneous learning at multiple levels
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
  • A chicken-or-egg problem
  • Given a grammar, phrase structures can be
    constructed
  • Given a set of phrase structures, a grammar can
    be learned

18
Simultaneous learning at multiple levels
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer G and si given ui and U P(G, si
ui, U) a P( ui si )P(si G)P(GU)
19
Hierarchical Bayesian model
U
Universal Grammar
P(GU)
G
Grammar
P(sG)
s1
s2
s3
s4
s5
s6
Phrase structure
P(us)
u1
u2
u3
u4
u5
u6
Utterance
20
Knowledge at multiple levels
  • Top-down inferences
  • How does abstract knowledge guide inferences at
    lower levels?
  • Bottom-up inferences
  • How can abstract knowledge be acquired?
  • Simultaneous learning at multiple levels of
    abstraction

21
Outline
  • A high-level view of HBMs
  • A case study Semantic knowledge

22
Folk Biology
The relationships between living kinds are well
described by tree-structured representations
R principles
mouse
S structure
squirrel
chimp
gorilla
D data
Gorillas have hands
23
Folk Biology
R principles
Structural form tree
mouse
squirrel
S structure
chimp
gorilla
D data
24
Outline
  • A high-level view of HBMs
  • A case study Semantic knowledge
  • Property induction
  • Learning structured representations
  • Learning the abstract organizing principles of a
    domain

25
Property induction
R principles
Structural form tree
mouse
squirrel
S structure
chimp
gorilla
D data
26
Property Induction
Structural form tree Stochastic process
diffusion
R Principles
mouse
squirrel
S structure
chimp
gorilla
D data
Approach work with the distribution P(DS,R)
27
Property Induction
Previous approaches Rips (75), Osherson et al
(90), Sloman (93), Heit (98)
28
Bayesian Property Induction
Hypotheses
29
Bayesian Property Induction
Hypotheses
30

D
C
31
Choosing a prior
32
Bayesian Property Induction
  • A challenge
  • We have to specify the prior, which typically
    includes many numbers
  • An opportunity
  • The prior can capture knowledge about the
    problem.

33
Property Induction
Structural form tree Stochastic process
diffusion
R Principles
mouse
squirrel
S structure
chimp
gorilla
D data
34
Biological properties
  • Structure
  • Living kinds are organized into a tree
  • Stochastic process
  • Nearby species in the tree tend to share
    properties

35
Structure
36
Structure
37
Stochastic Process
  • Nearby species in the tree tend to share
    properties.
  • In other words, properties tend to be smooth over
    the tree.

Smooth
Not smooth
38
Stochastic process
Hypotheses
39
Generating a property
y
h
where y tends to be smooth over the tree
threshold
40
S
41
The diffusion process
  • where ?(yi) is 1 if yi 0 and 0 otherwise
    the covariance K encourages y to be
    smooth over the graph S

42
p(yS,R) Generating a property
Let yi be the feature value at node i
i

j
(Zhu, Lafferty, Ghahramani 03)
43
Biological properties
Structural form tree Stochastic process
diffusion
R Principles
mouse
squirrel
S structure
chimp
gorilla
D data
Approach work with the distribution P(DS,R)
44

D
C
45
Results
Human
Model
(Osherson et al)
46
Results
Human
Model
Cows have property P. Elephants have property
P. Horses have property P. All mammals have
property P.
47
Spatial model
Structural form 2D space Stochastic
process diffusion
R principles
squirrel
mouse
S structure
gorilla
chimp
D data
48
Structure
49
Structure
50
Tree vs 2D
Tree diffusion
2D diffusion
horse
all mammals
51
Biological Properties
Structural form tree Stochastic process
diffusion
R Principles
mouse
squirrel
S structure
chimp
gorilla
D data
52
Three inductive contexts
can bite through wire
carries E. Spirus bacteria
has T4 cells
tree diffusion process
chain drift process
network causal transmission
R
Class D
Class D
Class A
Class A
Class A
Class B
Class F
Class E
Class C
S
Class C
Class C
Class B
Class D
Class G
Class E
Class E
Class B
Class F
Class F
Class G
Class G
53
Threshold properties
  • can bite through wire
  • has skin that is more resistant to penetration
    than most synthetic fibers

Doberman
Poodle
Collie
Hippo
Elephant
Cat
Lion
Camel
(Osherson et al Blok et al)
54
Threshold properties
  • Structure
  • The categories can be organized along a single
    dimension
  • Stochastic process
  • Categories towards one end of the dimension are
    more likely to have the novel property

55
Results
has skin that is more resistant to penetration
than most synthetic fibers
1D drift
1D diffusion
(Blok et al, Smith et al)
56
Three inductive contexts
can bite through wire
carries E. Spirus bacteria
has T4 cells
tree diffusion process
chain drift process
network causal transmission
R
Class D
Class D
Class A
Class A
Class A
Class B
Class F
Class E
Class C
S
Class C
Class C
Class B
Class D
Class G
Class E
Class E
Class B
Class F
Class F
Class G
Class G
57
Causally transmitted properties
Grizzly bear
Salmon
(Medin et al Shafto and Coley)
58
Causally transmitted properties
  • Structure
  • The categories can be organized into a directed
    network
  • Stochastic process
  • Properties are generated by a noisy transmission
    process

59
Experiment disease properties
(Shafto et al)
Island
Mammals
60
Results disease properties
Web transmission
Island
Mammals
61
Three inductive contexts
can bite through wire
carries E. Spirus bacteria
has T4 cells
tree diffusion process
chain drift process
network causal transmission
R
Class D
Class D
Class A
Class A
Class A
Class B
Class F
Class E
Class C
S
Class C
Class C
Class B
Class D
Class G
Class E
Class E
Class B
Class F
Class F
Class G
Class G
62
Property Induction
Structural form tree Stochastic process
diffusion
R Principles
mouse
squirrel
S structure
chimp
gorilla
D data
Approach work with the distribution P(DS,R)
63
Conclusions property induction
  • Hierarchical Bayesian models help to explain how
    abstract knowledge can be used for induction

64
Outline
  • A high-level view of HBMs
  • A case study Semantic knowledge
  • Property induction
  • Learning structured representations
  • Learning the abstract organizing principles of a
    domain

65
Structure learning
Structural form tree Stochastic process
diffusion
R Principles
mouse
squirrel
S structure
chimp
gorilla
D data
66
Structure learning
Structural form tree Stochastic process
diffusion
R principles
?
S structure
D data
Goal find S that maximizes P(SD,R)
67
Structure learning
Structural form tree Stochastic process
diffusion
R principles
?
S structure
D data
Goal find S that maximizes P(SD,R) a P(DS,R)
P(SR)
68
Structure learning
Structural form tree Stochastic process
diffusion
R principles
The distribution previously used for property
induction
?
S structure
D data
Goal find S that maximizes P(SD,R) a P(DS,R)
P(SR)

69
Generating features over the tree
mouse
squirrel
chimp
gorilla
70
Generating features over the tree
mouse
squirrel
chimp
gorilla
71
Structure learning
Structural form tree Stochastic process
diffusion
R principles
?
S structure
D data
Goal find S that maximizes P(SD,R) a P(DS,R)
P(SR)

72
P(SR) Generating structures
Inconsistent with R
Consistent with R
73
P(SR) Generating structures
Simple
Complex
74
P(SR) Generating structures
  • Each structure is weighted by the number of
    nodes
    it contains

if S inconsistent with R
otherwise
where is the number of nodes in S
75
Structure Learning
R principles
  • P(SD,R) will be high when
  • The features in D vary smoothly over S
  • S is a simple graph (a graph with few nodes)


S structure
D data

Aim find S that maximizes P(SD,R) a P(DS)
P(SR)
76
Structure Learning
R principles
  • P(SD,R) will be high when
  • The features in D vary smoothly over S
  • S is a simple graph (a graph with few nodes)

S structure

D data
Aim find S that maximizes P(SD,R) a P(DS)
P(SR)

77
Structure learning example
  • Participants rated the goodness of 85 features
    for 48 animals
  • E.g., elephant

gray hairless toughskin big
bulbous longleg tail chewteeth
tusks smelly walks slow strong
muscle quadrapedal inactive
vegetation grazer oldworld bush
jungle ground timid smart group
(Osherson et al)
78
Biological Data
Features
Animals
79
Tree
80
Spatial model
Structural form 2D space Stochastic
process diffusion
R principles
squirrel
mouse
S structure
gorilla
chimp
D data
81
2D space
82
Conclusions structure learning
  • Hierarchical Bayesian models provide a unified
    framework for the acquisition and use of
    structured representations

83
Outline
  • A high-level view of HBMs
  • A case study Semantic knowledge
  • Property induction
  • Learning structured representations
  • Learning the abstract organizing principles of a
    domain

84
Learning structural form
Structural form tree Stochastic process
diffusion
R principles
mouse
squirrel
S structure
chimp
gorilla
D data
85
Which form is best?
Ostrich
Robin
Crocodile
Snake
Bat
Orangutan
Turtle
Robin
Crocodile
Snake
Bat
Turtle
Orangutan
Ostrich
86
Structural forms
Order
Chain
Ring
Partition
Hierarchy
Tree
Grid
Cylinder
87
Learning structural form
could be tree, 2D space, ring, .

Structural form F Stochastic process
diffusion
R principles
?
S structure
D data
Goal find S,F that maximize P(S,FD)
88
Learning structural form
Structural form F Stochastic process
diffusion
R principles
?
S structure
Uniform distribution on the set of forms
D data
Aim find S,F that maximize P(S,FD) a
P(DS)P(SF) P(F)

89
Learning structural form
Structural form F Stochastic process
diffusion
R principles
The distribution used for property induction
?
S structure
D data
Aim find S,F that maximize P(S,FD) a P(DS)
P(SF)P(F)

90
Learning structural form
Structural form F Stochastic process
diffusion
R principles
The distribution used for structure learning
?
S structure
D data
Aim find S,F that maximize P(S,FD) a P(DS)
P(SF)P(F)

91
P(SF) Generating structures from forms
  • Each structure is weighted by the number of
    nodes
    it contains

if S inconsistent with F
otherwise
where is the number of nodes in S
92
P(SF) Generating structures from forms
  • Simpler forms are preferred

Chain
Grid
P(SF)
All possible graph structures S
A
B
C
D
93
Learning structural form
?
F form
?
S structure
D data
Goal find S,F that maximize P(S,FD)
94
Learning structural form
F form
  • P(S,FD) will be high when
  • The features in D vary smoothly over S
  • S is a simple graph (a graph with few nodes)
  • F is a simple form (a form that can generate only
    a few structures)


S structure
D data

Aim find S,F that maximize P(S,FD) a P(DS)
P(SF)P(F)
95
Learning structural form
F form
  • P(S,FD) will be high when
  • The features in D vary smoothly over F
  • S is a simple graph (a graph with few nodes)
  • F is a simple form (a form that can generate only
    a few structures)

S structure

D data

Aim find S,F that maximize P(S,FD) a P(DS)
P(SF)P(F)
96
Form learning Biological Data
  • 33 animals, 110 features

Features
Animals
97
Form learning Biological Data
98
Supreme Court (Spaeth)
  • Votes on 1600 cases (1987-2005)

99
Color (Ekman)
100
Outline
  • A high-level view of HBMs
  • A case study Semantic knowledge
  • Property induction
  • Learning structured representations
  • Learning the abstract organizing principles of a
    domain

101
Where do priors come from?
102
Stochastic process diffusion
mouse
squirrel
chimp
gorilla
103
Structural form tree Stochastic process
diffusion
mouse
squirrel
chimp
gorilla
104
Structural form tree Stochastic process
diffusion
mouse
squirrel
chimp
gorilla
105
Where do structural forms come from?
Order
Chain
Ring
Partition
Hierarchy
Tree
Grid
Cylinder
106
Where do structural forms come from?
Form
Form
Process
Process
107
Node-replacement graph grammars
Production (Chain)
Derivation
108
Node-replacement graph grammars
Production (Chain)
Derivation
109
Node-replacement graph grammars
Production (Chain)
Derivation
110
Where do structural forms come from?
Form
Form
Process
Process
111
The complete space of grammars
1
...
...
4096
112
When can we stop adding levels?
  • When the knowledge at the top level is simple or
    general enough that it can be plausibly assumed
    to be innate.

113
Conclusions
  • Hierarchical Bayesian models provide a unified
    framework which can
  • Explain how abstract knowledge is used for
    induction
  • Explain how abstract knowledge can be acquired

114
Learning abstract knowledge
  • Applications of hierarchical Bayesian models at
    this conference
  • Semantic knowledge Schmidt et al.
  • Learning the M-constraint
  • Syntax Perfors et al.
  • Learning that language is hierarchically
    organized
  • Word learning Kemp et al.
  • Learning the shape bias
Write a Comment
User Comments (0)
About PowerShow.com