LING 439/539: Statistical Methods in Speech and Language Processing - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

LING 439/539: Statistical Methods in Speech and Language Processing

Description:

OH: MW 2:00 --3:00 by appoint (also teaching another undergrad class) ... Duda, Hart and Stork (2001). Pattern Classification (2nd ed). JohnWiley & Sons. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 28
Provided by: Ying57
Category:

less

Transcript and Presenter's Notes

Title: LING 439/539: Statistical Methods in Speech and Language Processing


1
LING 439/539 Statistical Methods in Speech and
Language Processing
  • Ying Lin
  • Department of Linguistics
  • University of Arizona

2
Welcome!
  • Get the syllabus
  • Fill out and return the information sheet
  • Email yinglin_at_email.arizona.edu
  • Office Douglass 224
  • OH MW 200 --300 by appoint (also teaching
    another undergrad class)
  • Course webpage see syllabus
  • Listserv coming soon.

3
438/538 and 439/539
  • LING 438/538 (Computational Linguistics)
  • Symbolic representations (mostly syntax), e.g.
    FSA, CFG.
  • Focus on logic
  • Simple probabilistic models, e.g. N-grams.

4
438/538 and 439/539
  • This class complements 438/538
  • Numerical representations (speech signals) need
    digital signal processing
  • Focus on statistics/learning
  • More sophisticated probabilistic models, e.g.
    HMM, PCFG

5
Main reference texts (!)
  • Huang, Acero and Hon (2001). Spoken Language
    Processing A guide to theory, algorithm, and
    system development. Prentice-Hall.
  • Manning and Schutze (1999). Foundations of
    Statistical Natural Language Processing. MIT
    Press.
  • Rabiner and Juang (1993). Fundamental of Speech
    Recognition. Prentice-Hall.
  • Duda, Hart and Stork (2001). Pattern
    Classification (2nd ed). JohnWiley Sons.
  • Rabiner and Schafer (1978). Digital Processing of
    Speech Signals. Prentice-Hall.
  • Hastie, Tibshirani and Friedman (2001). The
    Elements of Statistical Learning. Springer.

6
Guideline for course reading
  • There is no single book that covers all of our
    materials.
  • Most books are written either for EE or CS
    audience only.
  • A few chapters are selected from each book (see
    the reading list). Lecture notes will summarize
    the reading.
  • Expect a rough ride for the first time --
    feedback is greatly appreciated!

7
Three skills for this class
  • 1. Linguistics understanding source of
    particular patterns.
  • 2. Math/Statistics underlying principles of the
    model.
  • 3. Programming implementation
  • This class emphasizes 2, reason
  • Models are based on simple structures
  • Programming skills require much practice

8
What is statistical approach?
  • Narrow uses statistical principle, I.e. based on
    the probability calculus or other theories of
    inductive inference
  • Compared to logic dedutive inference
  • Broad any work that uses a quantative measure of
    success
  • Relevant to both language engineering and
    linguistic science

9
What is statistical approach?
  • Narrow uses statistical principle, I.e. based on
    the probability calculus or other theories of
    inductive inference
  • Compared to logic dedutive inference
  • Broad any work that uses a quantative measure of
    success
  • Relevant to both anguage engineering and
    linguistic science

Thiscourse
10
Language engineering speech recognition
  • Tasks increasing level of difficulty

WordError Rate
11
A brief history of speech recognition
  • 1950s U.S. government started funding research
    on automatic recognition of speech
  • 1960-70s Isolated words, digit strings
  • Debate rules v.s. statistics
  • Dynamic time warping
  • 1980-now continuous speech, speech
    understanding, spoken dialog
  • Hidden Markov model dominates

12
Why the rules didnt work?
  • Completely bottom-up approach
  • Rules are hand-coded by experts
  • Problem variability in speech
  • Sophisticated, symbolic rules are not flexible
    enough to handle continuous speech

Phonetic rules
Phonological rules
How are you?
h A U A? j o U
13
The rise of statistical methods in speech
  • Initial solution hire many linguists to
    continually improve the rule system
  • This turns out to be costly and slow, failing the
    high expectation
  • Advantage of statistical models
  • Allows training on different data flexible,
    scalable
  • Computing power much cheaper than expert
  • Drives the move to less and less constrained
    tasks
  • Bitterness every time I fire a linguist, the
    word error rate goes up -- F. Jelinek (IBM)

14
The rise of statistics in NLP
  • Very similar scenarios also happened in NLP
  • E.g. tagging, parsing, machine translation
  • Old NLP deductive systems, hand-coded
  • New NLP broad-coverage, corpus-based,
    emphasize training, evaluation
  • Speech is now merging with NLP
  • Many tools originated in speech, then got copied
    to NLP
  • New task keep emerging web as an (unstructured)
    data source

15
Basic architecture of todays ASR system
Language model
Acoustic modeling
p(M1),p(M2)
X
Audio speech
Feature extraction
Likelihood p(XM1), p(XM2)
Scoring
rank
Model parameters trained offline M1 I
recognize speech M2 I wreck a nice beach
ANSWER
16
Component 1 signal processing / feature
extraction
  • First 1/3 of the course (also useful for
    understanding synthesis)

17
Examples of some common features
18
Component 2 Acoustic models
  • Mixture of Gaussians p(ot qi) ?
  • Dimension reduction principle component
    analysis, linear discriminant analysis, parameter
    tying

19
Component 3Pronunciation modeling
  • Model for differnent pronunciations of you in
    continuous speech
  • Other types of units triphones, syllables

Each unit is an HMM
20
Component 4 Language model
  • Provide the probability of word sequence models
    p(M) to combine with the acoustic model p(XM)
  • Common N-gram with smoothing, backoff, very hard
    and specialized business
  • Just starting to integrate parsing
  • Fundamental equationM argmaxM p(MX)
    argmaxM p(XM)p(M)Viterbi, beam, A, N-best
    search

21
ASR example of a generative model
  • Component 234 provide an instance of generative
    models
  • Language M generates word sequences
  • Word sequence generates pronunciation
  • Pronunciation generates acoustic features
  • Unsupervised learning/training
  • Maximum likelihood estimation
  • Expectation-Maximization algorithm (different
    incarnations)
  • Main focus of this class

22
Other models to look at
  • Descriptive/maximum entropy models
  • Started in vision, then copied to speech, then
    NLP
  • Discriminative models directly using data to
    construct classifiers, with weak assumptions
    about prob distribution
  • Supervised learning, focus on the perspective of
    classification

Input string
Feature vector
Output labels
count
classifier
Machine learning approach to NLP
23
Problem solved?
  • No, improvements are mostly due to larger
    training set and speed up

Driven byMoores law?
24
Challenges
  • Environment distortion (microphone, noise,
    cocktail party) breaks feature extraction
  • Acoustic condition mismatch
  • Between within speaker variability breaks the
    pronunciation modeling and acoustic modeling
  • Conversational speech breaks the language model
  • Understanding these problems is crucial for
    improving the performance of ASR

25
Dreaming
  • 2001 A Space Odyssey (1968)

Dave Open the pod bay doors, HAL
HAL9000 Im sorry Dave. Im afraid I cant do
that.
26
The reality,before the problem is solved
  • Speech is used as a user interface only when
    people cant use hand
  • Driving a car (use speech to drive?)
  • Device too small (cellphone)
  • Customer service (who will tolerate touch tone?)
  • Dictation (how many people actually use it?)

27
For next time
  • We will start with signal processing
  • Uses engineering math, including power series
    (including convergence), trigonometric functions,
    integration and representation of complex
    numbers.
  • If you forgot or do not know these materials,
    please look for references and study it before
    class.
Write a Comment
User Comments (0)
About PowerShow.com