Speech Communication - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Speech Communication

Description:

Phonetics vs. orthography. Letter-phoneme mapping is not 1-to-1: ... Phonetic vs. orthographic transcriptions. Intelligible by native speakers ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 24
Provided by: philipj5
Category:

less

Transcript and Presenter's Notes

Title: Speech Communication


1
Speech Communication
EEM.ssr Speaker Speech Recognition
  • by
  • Dr Philip Jackson

lecturer in speech audio Centre for Vision,
Speech Signal Processing, Department of
Electronic Engineering.
http//www.ee.surrey.ac.uk/Teaching/Courses/eem.ss
r
2
World of speech technologies
Spoken language generation
Spoken language understanding
Spoken dialogue processing
Automatic speech recognition
Emotion in speech
Speaker recognition
Speech technology
Speech perception
Phonetics
Speech enhancement
Speech production
Speech modification
Speech analysis
Speech synthesis
Speech coding
3
Speech-related disciplines
Linguistics
Psychology
Maths stats
Phonetics
Speech science
Acoustics
Computer science
Signal processing
Electronics
4
The speech chain
LISTENER
SPEAKER
SENSORY NERVES
FEEDBACK LINK
EAR
SENSORY NERVES
EAR
MOTOR NERVES
SOUND WAVES
VOCAL MUSCLES
Linguistic
Linguistic
Acoustic
Physiological
Physiological
5
It comes as naturally as breathing
  • Speech is mans preferred modality
  • Can use natural language for interacting with
    complex systems
  • Hands-free
  • Eyes-free
  • Small footprint
  • Requires no training

6
Ideas and language
  • Ideas are concepts or abstract notions
  • Language has a grammar and syntax, and is made up
    of words
  • Develop our understanding of the world at the
    same time as we are learning to talk
  • Many of our thoughts are framed in terms of words
  • Language (and culture) affect the way we think

7
Written vs. spoken language
  • Written language
  • discrete words separated by spaces
  • usually complete, correct spelling
  • opportunity to skip, skim or re-read
  • Spoken language
  • continuous sequence of sounds, usually without
    spaces
  • often damaged, interrupted, parts mumbled

8
Speech is not acoustic text
9
Sounds and words
  • Phonetics
  • How speech sounds are produced
  • Acoustic result of speech articulation
  • Phonology
  • How sounds are used to make words
  • The functions of the sounds within a particular
    language

10
Acoustic signal
  • Sound produced by vibration of vocal cords
  • Sound modified by resonances of the vocal tract
  • International Phonetic Alphabet (IPA)
  • smallest unit in speech where substitution could
    change meaning phoneme

11
Speech sounds
  • Speech production
  • Articulators how do they affect the speech
    waveform?
  • Phonemes
  • What are they, why are they useful?
  • Phonemes are speech sounds in an ideal world.
  • Phonetics
  • How are phonemes actually realized?
  • Phones are speech sounds in the real world.
  • Allophones are different types of realisation.
  • The wider context
  • Language, accent,
  • Speaker differences,
  • Effect of external factors.

12
Vowels, consonants and syllables
  • Vowels
  • Vibrating vocal cords in larynx with clear vocal
    tract
  • Produced using slower extrinsic muscles
  • Consonants
  • Usually some occlusion of the vocal tract
  • Sound source can be from larynx, click or hiss
  • Produced using faster intrinsic muscles
  • Syllables
  • All languages have CV syllables
  • Basic unit of articulation
  • Consonant clusters

13
Phonetics vs. orthography
  • Letter-phoneme mapping is not 1-to-1
  • Some sounds require several letters
  • e.g., sh, ph
  • Some letters have several pronunciations
  • e.g., g, c
  • Some sounds have several transcriptions
  • e.g., /f/ f and ph
  • Some letters produce several sounds
  • e.g., x /ks/
  • Some combinations have complex relations
  • e.g., -ough-
  • Different accents use different phonemes
  • e.g., bath

14
Prosody
  • Pitch
  • Corresponds to the frequency of vibration of the
    vocal cords
  • (Has phonetic significance in tonal languages)
  • Intensity
  • How loud a particular word or syllable is
  • Timing
  • Durations depend on the phrasing (punctuation),
    context (cf. league, leek), etc.
  • Stress timed vs. syllable timed languages

15
Language, accent and dialect
  • Language
  • A system of communication with a vocabulary of
    words, grammar and syntax
  • Different languages have different phonetic
    contrasts (right, light)
  • Accent
  • Pronunciation variations that do not affect
    meaning of spoken utterance (good, food)
  • Intelligible by native speakers
  • Dialect
  • Variations in vocabulary, and possibly other
    aspects, for distinct population

16
Non-acoustic signals
  • Many other sources of information from other
    senses face, body, gesture, touch,
  • can make you hear things differently
  • Lip reading
  • Information about articulation can be derived
    from (peripherally) observing lips
  • Major cue for hearing impaired
  • Significant effect for normal hearers (McGurk)
  • Para-linguistic information
  • Facial mood and emotion
  • Culturally-grounded gestures
  • Modifying gestures
  • Body language
  • Stress and emphasis

17
Complexity demands intelligence
  • Speech is very complex
  • requires fusion of many sources of knowledge
  • Humans have developed large brains and supreme
    intelligence in the animal kingdom to deal with
    it
  • very large number of neurons, in parallel

18
Summary of speech comm.
  • Speech is natural modality of man to interact
    with machines
  • Ideas and language
  • Written vs. spoken language (phonology)
  • Continuous acoustic signal (phonetics)
  • Phonemes, phones and allophones
  • Vowels and consonants
  • Phonetic vs. orthographic transcriptions
  • Intelligible by native speakers
  • Para-linguistic information
  • Prosody intensity, pitch and timing
  • Language, accent and dialect
  • Visual, haptic and contextual information

19
Speech recognition
20
What is speech recognition?
  • Types of spoken language processing
  • Automatic speech recognition (ASR)
  • Spoken language understanding
  • Dialogue systems
  • Paralinguistic speech processing
  • Speech verification
  • Speech coding, enhancement modification
  • Speech synthesis
  • Spoken language generation
  • Speaker recognition identification and
    authentication/verification

21
Speech recognition problem
  • The dream and reality
  • Intelligent machines?
  • Size of vocabulary 50, 1000, 20000 words
  • Speaker -dependent/-independent
  • Discovering our ignorance
  • How does the ear work?
  • How does the brain process sounds to perceive
    concepts?
  • Circumventing our ignorance
  • Ad-hoc rules vs. pattern matching techniques
  • Most successful based on stochastic modelling
  • Recent advances in neural network approaches

22
Dimensions of difficulty
  • Speaker dependency
  • Vocabulary size
  • Isolated words vs. continuous speech
  • Language constraints and knowledge sources
  • Acoustic ambiguity
  • Noise robustness

23
Speech recognition summary
  • Dream and reality
  • Speech-to-text machines
  • Vocabulary size and speaker-dependency trade off
    against recognition accuracy
  • Incomplete specification
  • Of language, of the human ear, the auditory
    nerves and of how the cortex processes speech to
    derive meaning
  • An engineering solution
  • Use pattern matching techniques
  • Most successful based on Hidden Markov Models
  • Recent advances in HMM/ANN hybrids
Write a Comment
User Comments (0)
About PowerShow.com