Title: IRCS/CCN Summer Workshop June 2003 Speech Recognition
1IRCS/CCN Summer WorkshopJune 2003Speech
Recognition
2Why is perception hard?
- Task available signals ? model of the world
around - signals are mostly accidental, inadequate
- sometimes disguised or falsified
- always mixed-up and ambiguous
- Reasoning about the source of signals
- Integration of context what do you expect?
- Sensor fusion integration of vision, sound,
smell etc. - Source (and noise) separation theres more than
one thing out there - Variable perspective, source variation etc.
- depends on the type of signal
- depends on the type of object
- Much harder than chess or calculus!
3Bayesian probability estimation
- Thomas Bayes (1702-1761)
- Minister of the Presbyterian Chapel at Tunbridge
Wells - Amateur mathematician
- Essay towards solving a problem in the doctrine
of chances,published (posthumously) in 1764 - Crucial idea
- background (prior) knowledge about the
plausibility of different theoriescan be
combined with knowledge aboutthe relation of
theories to evidence - in a mathematically well-defined way
- even if all knowledge is uncertain
- to reason about the most likely explanation of
the available evidence - Bayes theorem
- the most important equation in the history of
mathematics (?) - a simple consequence of basic definitions, or
- a still-controversial recipe for the probability
of alternative causes for a given event, or - the implicit foundation of human reasoning
- a general framework for solving the problems of
perception
Tutorial on Bayes Theorem
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8Fundamental theoremof speech recognition
- P(WS) ? P(SW)P(W)
- where W is Word(s) (i.e. message text)
- S is Sound(s) (i.e. speech signal)
- Noisy channel model of communications
engineeringdue to Shannon 1949 - New algorithms, especially relevant to speech
recognition - due to L.E. Baum et al. 1965-1970
- Applied to speech recognition by Jim Baker (CMU
PhD 1975), - Fred Jelinek (IBM speech group gtgt1975)
9Motivations for a Bayesian approach
- A consistent framework for integrating
previous experience and current evidence - A quantitative model for abduction
reasoning about the best explanation - A general method for turning a generative model
into an analytic one analysis by
synthesis helpful where categories ltlt
signals
These motivations apply both in engineering
practice and in the evolution of biological
systems
10Basic architecture of standard speech
recognition technology
- 1. Bayes Rule P(WS) ? P(SW)P(W)
- 2. Approximate P(SW)P(W) as a Hidden Markov
Model - a probabilistic function to get P(SW)
- of a markov chain to get P(W)
- 3. Use Baum/Welch (EM) algorithm to
learn HMM parameters - 4. Use Viterbi decoding
- to find the most probable W given S
- in terms of the estimated HMM
11HMM parameter estimation given
labelled/aligned training data...
12Viterbi decoding given HMM observed
signal...
13Sketch of Baum-Welch (EM) algorithm for
estimating HMM parameters given unaligned
(or even unlabelled) training data
14Other typical detailsComplex elaborations of
the basic ideas
- HMM states ? triphones ? words
- each triphone ? 3-5 states connection pattern
- phone sequence from pronuncing dictionary
- clustering for estimation
- Acoustic features
- RASTA-PLP etc.
- Vocal tract length normalization, speaker
clustering - Output pdf for each state as mixture of gaussians
- Language model as N-gram model over words
- recency/topic effects
- Empirical weighting of language vs. acoustic
models - etc. etc.
15Some limitations of the standard architecture
- Problems with Markovian assumptions
- Modeling trajectory effects
- Variable coordination of articulatory dimensions
- ....