IRCS/CCN Summer Workshop June 2003 Speech Recognition - PowerPoint PPT Presentation

About This Presentation
Title:

IRCS/CCN Summer Workshop June 2003 Speech Recognition

Description:

IRCS/CCN Summer Workshop June 2003 Speech Recognition – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 16
Provided by: MarkLi97
Category:

less

Transcript and Presenter's Notes

Title: IRCS/CCN Summer Workshop June 2003 Speech Recognition


1
IRCS/CCN Summer WorkshopJune 2003Speech
Recognition
2
Why is perception hard?
  • Task available signals ? model of the world
    around
  • signals are mostly accidental, inadequate
  • sometimes disguised or falsified
  • always mixed-up and ambiguous
  • Reasoning about the source of signals
  • Integration of context what do you expect?
  • Sensor fusion integration of vision, sound,
    smell etc.
  • Source (and noise) separation theres more than
    one thing out there
  • Variable perspective, source variation etc.
  • depends on the type of signal
  • depends on the type of object
  • Much harder than chess or calculus!

3
Bayesian probability estimation
  • Thomas Bayes (1702-1761)
  • Minister of the Presbyterian Chapel at Tunbridge
    Wells
  • Amateur mathematician
  • Essay towards solving a problem in the doctrine
    of chances,published (posthumously) in 1764
  • Crucial idea
  • background (prior) knowledge about the
    plausibility of different theoriescan be
    combined with knowledge aboutthe relation of
    theories to evidence
  • in a mathematically well-defined way
  • even if all knowledge is uncertain
  • to reason about the most likely explanation of
    the available evidence
  • Bayes theorem
  • the most important equation in the history of
    mathematics (?)
  • a simple consequence of basic definitions, or
  • a still-controversial recipe for the probability
    of alternative causes for a given event, or
  • the implicit foundation of human reasoning
  • a general framework for solving the problems of
    perception

Tutorial on Bayes Theorem
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Fundamental theoremof speech recognition
  • P(WS) ? P(SW)P(W)
  • where W is Word(s) (i.e. message text)
  • S is Sound(s) (i.e. speech signal)
  • Noisy channel model of communications
    engineeringdue to Shannon 1949
  • New algorithms, especially relevant to speech
    recognition
  • due to L.E. Baum et al. 1965-1970
  • Applied to speech recognition by Jim Baker (CMU
    PhD 1975),
  • Fred Jelinek (IBM speech group gtgt1975)

9
Motivations for a Bayesian approach
  • A consistent framework for integrating
    previous experience and current evidence
  • A quantitative model for abduction
    reasoning about the best explanation
  • A general method for turning a generative model
    into an analytic one analysis by
    synthesis helpful where categories ltlt
    signals

These motivations apply both in engineering
practice and in the evolution of biological
systems
10
Basic architecture of standard speech
recognition technology
  • 1. Bayes Rule P(WS) ? P(SW)P(W)
  • 2. Approximate P(SW)P(W) as a Hidden Markov
    Model
  • a probabilistic function to get P(SW)
  • of a markov chain to get P(W)
  • 3. Use Baum/Welch (EM) algorithm to
    learn HMM parameters
  • 4. Use Viterbi decoding
  • to find the most probable W given S
  • in terms of the estimated HMM

11
HMM parameter estimation given
labelled/aligned training data...
12
Viterbi decoding given HMM observed
signal...
13
Sketch of Baum-Welch (EM) algorithm for
estimating HMM parameters given unaligned
(or even unlabelled) training data
14
Other typical detailsComplex elaborations of
the basic ideas
  • HMM states ? triphones ? words
  • each triphone ? 3-5 states connection pattern
  • phone sequence from pronuncing dictionary
  • clustering for estimation
  • Acoustic features
  • RASTA-PLP etc.
  • Vocal tract length normalization, speaker
    clustering
  • Output pdf for each state as mixture of gaussians
  • Language model as N-gram model over words
  • recency/topic effects
  • Empirical weighting of language vs. acoustic
    models
  • etc. etc.

15
Some limitations of the standard architecture
  • Problems with Markovian assumptions
  • Modeling trajectory effects
  • Variable coordination of articulatory dimensions
  • ....
Write a Comment
User Comments (0)
About PowerShow.com