Title: Statistical learning, cross-constraints, and the acquisition of speech categories: a computational approach.
1Statistical learning, cross-constraints, and the
acquisition of speech categoriesa computational
approach.
- Joseph Toscano Bob McMurray
- Psychology Department
- University of Iowa
2Acknowledgements
- Acknowledgements
- Dick Aslin
- The MACLab
3Learning phonetic categories
- Infants are initially able to discriminate many
different phonetic contrasts. - They must learn which ones are relevant to their
native language. - This is accomplished within the first year of
life, and infants quickly adopt the categories
present in their language (Werker Tees, 1984).
4Learning phonetic categories
- What is needed for statistical learning?
- A signal and a mechanism
- Availability of statistics (signal)
- Sensitivity to statistics (mechanism)
- continuous sensitivity to VOT
- ability to track frequencies and build clusters
5Statistics in the signal
- What statistical information is available?
- Lisker Abramson, 1964 did a cross-language
analysis of speech - Measured voice-onset time (VOT) from several
speakers in different languages
6Statistics in the signal
- The statistics are available in the signal
7Sensitivity to statistics
- Are infants sensitive to statistics in speech?
- Maye et al., 2002 asked this
- Two groups of infants
- Infants are sensitive to within-category detail
(McMurray Aslin, 2005)
8Learning phonetic categories
- Infants can obtain phoneme categories from
exposure to tokens in the speech signal
voice
-voice
frequency
0ms
50ms
VOT
9Statistical Learning Model
- Statistical learning in a computational model
- What do we need the model to do
- Show learnability. Are statistics sufficient?
- Developmental timecourse.
- Implications for speech in general.
- Can model explain more than category learning?
10Statistical Learning Model
- Clusters of VOTs are Gaussian distributions
Tamil
Cantonese
English
11Statistical Learning Model
- Gaussians defined by three parameters
- Each phoneme category can be represented by these
three parameters
µ the center of the distribution s the spread
of the distribution F the height of the
distribution, reflected by the probability of a
particular value
12Statistical Learning Model
- Modeling approach mixture of Gaussians
13Statistical Learning Model
- Gaussian distributions represent the probability
of occurrence of a particular feature (e.g. VOT) - Start with a large number of Gaussians to reflect
many different values for the feature.
14Statistical Learning Model
- Learning occurs via gradient descent
- Take a single data point as input
- Adjust the location and width of the distribution
by a certain amount, defined by a learning rule
- Move the center of the dist closer to the data
point - Make the dist wider to accommodate the data point
15Statistical Learning Model
Proportion of space under that Gaussian
Probability of a particular point
Equation of a Gaussian
x
16Can the model learn?
- Can the model learn speech categories?
17Can the model learn?
- The model in action
- Fails to learn correct number of categories
- Too many distributions under each curve
- Is this a problem? Maybe.
- Solution Introduce competition
- Competition through winner-take-all strategy
- Only the closest matching Gaussian is adjusted
18Does learning need to be constrained?
- Can the model learn speech categories? Yes.
- Does learning need to be constrained?
19Does learning need to be constrained?
- Unconstrained feature space
- Starting VOTs distributed from -1000 to 1000 ms
- Model fails to learn
- Similar to a situation in which the model has too
few starting distributions
20Does learning need to be constrained?
- Constrained feature space
- Starting VOTs distributed from -100 to 100 ms
- Within the range of actual voice onset times used
in language.
21Are constraints linguistic?
- Can the model learn speech categories? Yes.
- Does learning need to be constrained? Yes.
- Do constraints need to be linguistic?
22Are constraints linguistic?
- Cross-linguistic constraints
- Combined data from languages used in Lisker
Abramson, 1964, and several other languages
23Are constraints linguistic?
- VOTs from
- English
- Thai
- Spanish
- Cantonese
- Korean
- Navajo
- Dutch
- Hungarian
- Tamil
- Eastern Armenian
- Hindi
- Marathi
- French
24- Test the model with two different sets of
starting states
Cross-linguistic based on distribution of VOTs
across languages Random normally distributed
centered around 0ms, range -100ms to 100ms
25- Test the model with two different sets of
starting states
Cross-linguistic based on distribution of VOTs
across languages Random normally distributed
centered around 0ms, range -100ms to 100ms
26Are linguistic constraints helpful?
- Can the model learn speech categories? Yes.
- Does learning need to be constrained? Yes.
- Do constraints need to be linguistic? No.
- Do cross-language constraints help?
27Are linguistic constraints helpful?
- This is the part of the talk that I dont have
any slides for yet.
28What do infants do?
- Can the model learn speech categories? Yes.
- Does learning need to be constrained? Yes.
- Do constraints need to be linguistic? No.
- Do cross-language constraints help? Sometimes.
- What do infants do?
29What do infants do?
- As infants get older, their ability to
discriminate different VOT contrasts decreases. - Initially able to discriminate many contrasts
- Eventually discriminate only those of their
native language
30What do infants do?
- Each models discrimination over time
- Random normal decreases
- Cross-linguistic slight increase
31What do infants do?
- Cross-linguistic starting states lead to faster
category acquisition - Why wouldnt infants take advantage of this?
- Too great a risk of over-generalization
- Better to take more time to do the job right than
to do it too quickly