Title: LING 439/539: Statistical Methods in Speech and Language Processing
1LING 439/539 Statistical Methods in Speech and
Language Processing
- Ying Lin
- Department of Linguistics
- University of Arizona
2Welcome!
- Get the syllabus
- Fill out and return the information sheet
- Email yinglin_at_email.arizona.edu
- Office Douglass 224
- OH MW 200 --300 by appoint (also teaching
another undergrad class) - Course webpage see syllabus
- Listserv coming soon.
3438/538 and 439/539
- LING 438/538 (Computational Linguistics)
- Symbolic representations (mostly syntax), e.g.
FSA, CFG. - Focus on logic
- Simple probabilistic models, e.g. N-grams.
4438/538 and 439/539
- This class complements 438/538
- Numerical representations (speech signals) need
digital signal processing - Focus on statistics/learning
- More sophisticated probabilistic models, e.g.
HMM, PCFG
5Main reference texts (!)
- Huang, Acero and Hon (2001). Spoken Language
Processing A guide to theory, algorithm, and
system development. Prentice-Hall. - Manning and Schutze (1999). Foundations of
Statistical Natural Language Processing. MIT
Press. - Rabiner and Juang (1993). Fundamental of Speech
Recognition. Prentice-Hall. - Duda, Hart and Stork (2001). Pattern
Classification (2nd ed). JohnWiley Sons. - Rabiner and Schafer (1978). Digital Processing of
Speech Signals. Prentice-Hall. - Hastie, Tibshirani and Friedman (2001). The
Elements of Statistical Learning. Springer.
6Guideline for course reading
- There is no single book that covers all of our
materials. - Most books are written either for EE or CS
audience only. - A few chapters are selected from each book (see
the reading list). Lecture notes will summarize
the reading. - Expect a rough ride for the first time --
feedback is greatly appreciated!
7Three skills for this class
- 1. Linguistics understanding source of
particular patterns. - 2. Math/Statistics underlying principles of the
model. - 3. Programming implementation
- This class emphasizes 2, reason
- Models are based on simple structures
- Programming skills require much practice
8What is statistical approach?
- Narrow uses statistical principle, I.e. based on
the probability calculus or other theories of
inductive inference - Compared to logic dedutive inference
- Broad any work that uses a quantative measure of
success - Relevant to both language engineering and
linguistic science
9What is statistical approach?
- Narrow uses statistical principle, I.e. based on
the probability calculus or other theories of
inductive inference - Compared to logic dedutive inference
- Broad any work that uses a quantative measure of
success - Relevant to both anguage engineering and
linguistic science
Thiscourse
10Language engineering speech recognition
- Tasks increasing level of difficulty
WordError Rate
11A brief history of speech recognition
- 1950s U.S. government started funding research
on automatic recognition of speech - 1960-70s Isolated words, digit strings
- Debate rules v.s. statistics
- Dynamic time warping
- 1980-now continuous speech, speech
understanding, spoken dialog - Hidden Markov model dominates
12Why the rules didnt work?
- Completely bottom-up approach
- Rules are hand-coded by experts
- Problem variability in speech
- Sophisticated, symbolic rules are not flexible
enough to handle continuous speech
Phonetic rules
Phonological rules
How are you?
h A U A? j o U
13The rise of statistical methods in speech
- Initial solution hire many linguists to
continually improve the rule system - This turns out to be costly and slow, failing the
high expectation - Advantage of statistical models
- Allows training on different data flexible,
scalable - Computing power much cheaper than expert
- Drives the move to less and less constrained
tasks - Bitterness every time I fire a linguist, the
word error rate goes up -- F. Jelinek (IBM)
14The rise of statistics in NLP
- Very similar scenarios also happened in NLP
- E.g. tagging, parsing, machine translation
- Old NLP deductive systems, hand-coded
- New NLP broad-coverage, corpus-based,
emphasize training, evaluation - Speech is now merging with NLP
- Many tools originated in speech, then got copied
to NLP - New task keep emerging web as an (unstructured)
data source
15Basic architecture of todays ASR system
Language model
Acoustic modeling
p(M1),p(M2)
X
Audio speech
Feature extraction
Likelihood p(XM1), p(XM2)
Scoring
rank
Model parameters trained offline M1 I
recognize speech M2 I wreck a nice beach
ANSWER
16Component 1 signal processing / feature
extraction
- First 1/3 of the course (also useful for
understanding synthesis)
17Examples of some common features
18Component 2 Acoustic models
- Mixture of Gaussians p(ot qi) ?
- Dimension reduction principle component
analysis, linear discriminant analysis, parameter
tying
19Component 3Pronunciation modeling
- Model for differnent pronunciations of you in
continuous speech - Other types of units triphones, syllables
Each unit is an HMM
20Component 4 Language model
- Provide the probability of word sequence models
p(M) to combine with the acoustic model p(XM) - Common N-gram with smoothing, backoff, very hard
and specialized business - Just starting to integrate parsing
- Fundamental equationM argmaxM p(MX)
argmaxM p(XM)p(M)Viterbi, beam, A, N-best
search
21ASR example of a generative model
- Component 234 provide an instance of generative
models - Language M generates word sequences
- Word sequence generates pronunciation
- Pronunciation generates acoustic features
- Unsupervised learning/training
- Maximum likelihood estimation
- Expectation-Maximization algorithm (different
incarnations) - Main focus of this class
22Other models to look at
- Descriptive/maximum entropy models
- Started in vision, then copied to speech, then
NLP - Discriminative models directly using data to
construct classifiers, with weak assumptions
about prob distribution - Supervised learning, focus on the perspective of
classification
Input string
Feature vector
Output labels
count
classifier
Machine learning approach to NLP
23Problem solved?
- No, improvements are mostly due to larger
training set and speed up
Driven byMoores law?
24Challenges
- Environment distortion (microphone, noise,
cocktail party) breaks feature extraction - Acoustic condition mismatch
- Between within speaker variability breaks the
pronunciation modeling and acoustic modeling - Conversational speech breaks the language model
- Understanding these problems is crucial for
improving the performance of ASR
25Dreaming
- 2001 A Space Odyssey (1968)
Dave Open the pod bay doors, HAL
HAL9000 Im sorry Dave. Im afraid I cant do
that.
26The reality,before the problem is solved
- Speech is used as a user interface only when
people cant use hand - Driving a car (use speech to drive?)
- Device too small (cellphone)
- Customer service (who will tolerate touch tone?)
- Dictation (how many people actually use it?)
27For next time
- We will start with signal processing
- Uses engineering math, including power series
(including convergence), trigonometric functions,
integration and representation of complex
numbers. - If you forgot or do not know these materials,
please look for references and study it before
class.