LING 439/539: Statistical Methods in Speech and Language Processing

About This Presentation

Title:

LING 439/539: Statistical Methods in Speech and Language Processing

Description:

OH: MW 2:00 --3:00 by appoint (also teaching another undergrad class) ... Duda, Hart and Stork (2001). Pattern Classification (2nd ed). JohnWiley & Sons. ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 28

Provided by: Ying57

Category:

more less

Transcript and Presenter's Notes

Title: LING 439/539: Statistical Methods in Speech and Language Processing

1
LING 439/539 Statistical Methods in Speech and
Language Processing

Ying Lin
Department of Linguistics
University of Arizona

2
Welcome!

Get the syllabus
Fill out and return the information sheet
Email yinglin_at_email.arizona.edu
Office Douglass 224
OH MW 200 --300 by appoint (also teaching
another undergrad class)
Course webpage see syllabus
Listserv coming soon.

3
438/538 and 439/539

LING 438/538 (Computational Linguistics)
Symbolic representations (mostly syntax), e.g.
FSA, CFG.
Focus on logic
Simple probabilistic models, e.g. N-grams.

4
438/538 and 439/539

This class complements 438/538
Numerical representations (speech signals) need
digital signal processing
Focus on statistics/learning
More sophisticated probabilistic models, e.g.
HMM, PCFG

5
Main reference texts (!)

Huang, Acero and Hon (2001). Spoken Language
Processing A guide to theory, algorithm, and
system development. Prentice-Hall.
Manning and Schutze (1999). Foundations of
Statistical Natural Language Processing. MIT
Press.
Rabiner and Juang (1993). Fundamental of Speech
Recognition. Prentice-Hall.
Duda, Hart and Stork (2001). Pattern
Classification (2nd ed). JohnWiley Sons.
Rabiner and Schafer (1978). Digital Processing of
Speech Signals. Prentice-Hall.
Hastie, Tibshirani and Friedman (2001). The
Elements of Statistical Learning. Springer.

6
Guideline for course reading

There is no single book that covers all of our
materials.
Most books are written either for EE or CS
audience only.
A few chapters are selected from each book (see
the reading list). Lecture notes will summarize
the reading.
Expect a rough ride for the first time --
feedback is greatly appreciated!

7
Three skills for this class

1. Linguistics understanding source of
particular patterns.
2. Math/Statistics underlying principles of the
model.
3. Programming implementation
This class emphasizes 2, reason
Models are based on simple structures
Programming skills require much practice

8
What is statistical approach?

Narrow uses statistical principle, I.e. based on
the probability calculus or other theories of
inductive inference
Compared to logic dedutive inference
Broad any work that uses a quantative measure of
success
Relevant to both language engineering and
linguistic science

9
What is statistical approach?

Narrow uses statistical principle, I.e. based on
the probability calculus or other theories of
inductive inference
Compared to logic dedutive inference
Broad any work that uses a quantative measure of
success
Relevant to both anguage engineering and
linguistic science

Thiscourse
10
Language engineering speech recognition

Tasks increasing level of difficulty

WordError Rate
11
A brief history of speech recognition

1950s U.S. government started funding research
on automatic recognition of speech
1960-70s Isolated words, digit strings
Debate rules v.s. statistics
Dynamic time warping
1980-now continuous speech, speech
understanding, spoken dialog
Hidden Markov model dominates

12
Why the rules didnt work?

Completely bottom-up approach
Rules are hand-coded by experts
Problem variability in speech
Sophisticated, symbolic rules are not flexible
enough to handle continuous speech

Phonetic rules
Phonological rules
How are you?
h A U A? j o U
13
The rise of statistical methods in speech

Initial solution hire many linguists to
continually improve the rule system
This turns out to be costly and slow, failing the
high expectation
Advantage of statistical models
Allows training on different data flexible,
scalable
Computing power much cheaper than expert
Drives the move to less and less constrained
tasks
Bitterness every time I fire a linguist, the
word error rate goes up -- F. Jelinek (IBM)

14
The rise of statistics in NLP

Very similar scenarios also happened in NLP
E.g. tagging, parsing, machine translation
Old NLP deductive systems, hand-coded
New NLP broad-coverage, corpus-based,
emphasize training, evaluation
Speech is now merging with NLP
Many tools originated in speech, then got copied
to NLP
New task keep emerging web as an (unstructured)
data source

15
Basic architecture of todays ASR system
Language model
Acoustic modeling
p(M1),p(M2)
X
Audio speech
Feature extraction
Likelihood p(XM1), p(XM2)
Scoring
rank
Model parameters trained offline M1 I
recognize speech M2 I wreck a nice beach
ANSWER
16
Component 1 signal processing / feature
extraction

First 1/3 of the course (also useful for
understanding synthesis)

17
Examples of some common features
18
Component 2 Acoustic models

Mixture of Gaussians p(ot qi) ?
Dimension reduction principle component
analysis, linear discriminant analysis, parameter
tying

19
Component 3Pronunciation modeling

Model for differnent pronunciations of you in
continuous speech
Other types of units triphones, syllables

Each unit is an HMM
20
Component 4 Language model

Provide the probability of word sequence models
p(M) to combine with the acoustic model p(XM)
Common N-gram with smoothing, backoff, very hard
and specialized business
Just starting to integrate parsing
Fundamental equationM argmaxM p(MX)
argmaxM p(XM)p(M)Viterbi, beam, A, N-best
search

21
ASR example of a generative model

Component 234 provide an instance of generative
models
Language M generates word sequences
Word sequence generates pronunciation
Pronunciation generates acoustic features
Unsupervised learning/training
Maximum likelihood estimation
Expectation-Maximization algorithm (different
incarnations)
Main focus of this class

22
Other models to look at

Descriptive/maximum entropy models
Started in vision, then copied to speech, then
NLP
Discriminative models directly using data to
construct classifiers, with weak assumptions
about prob distribution
Supervised learning, focus on the perspective of
classification

Input string
Feature vector
Output labels
count
classifier
Machine learning approach to NLP
23
Problem solved?

No, improvements are mostly due to larger
training set and speed up

Driven byMoores law?
24
Challenges

Environment distortion (microphone, noise,
cocktail party) breaks feature extraction
Acoustic condition mismatch
Between within speaker variability breaks the
pronunciation modeling and acoustic modeling
Conversational speech breaks the language model
Understanding these problems is crucial for
improving the performance of ASR

25
Dreaming

2001 A Space Odyssey (1968)

Dave Open the pod bay doors, HAL
HAL9000 Im sorry Dave. Im afraid I cant do
that.
26
The reality,before the problem is solved

Speech is used as a user interface only when
people cant use hand
Driving a car (use speech to drive?)
Device too small (cellphone)
Customer service (who will tolerate touch tone?)
Dictation (how many people actually use it?)

27
For next time

We will start with signal processing
Uses engineering math, including power series
(including convergence), trigonometric functions,
integration and representation of complex
numbers.
If you forgot or do not know these materials,
please look for references and study it before
class.

Write a Comment

User Comments (0)

About PowerShow.com

LING 439/539: Statistical Methods in Speech and Language Processing - PowerPoint PPT Presentation

LING 439/539: Statistical Methods in Speech and Language Processing

OH: MW 2:00 --3:00 by appoint (also teaching another undergrad class) ... Duda, Hart and Stork (2001). Pattern Classification (2nd ed). JohnWiley & Sons. ... – PowerPoint PPT presentation