TimeDomain Methods for Speech Processing - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

TimeDomain Methods for Speech Processing

Description:

Short-Time Average Zero Crossing Rate. Speech vs. Silence Discrimination ... Weak plosive bursts (/p/, /t/, /k/) at the beginning or end. Nasals at the end. ... – PowerPoint PPT presentation

Number of Views:1503
Avg rating:3.0/5.0
Slides: 78
Provided by: aimm02Cs
Category:

less

Transcript and Presenter's Notes

Title: TimeDomain Methods for Speech Processing


1
Time-Domain Methods for Speech Processing
  • ???

2
Contents
  • Introduction
  • Time-Dependent Processing of Speech
  • Short-Time Energy and Average Magnitude
  • Short-Time Average Zero Crossing Rate
  • Speech vs. Silence Discrimination Using Energy
    and Zero-Crossing
  • The Short-Time Autocorrelation Function
  • The Short-Time Average Magnitude Difference
    Function

3
Time-Domain Methods for Speech Processing
  • Introduction

4
Speech Processing Methods
  • Time-Domain Method
  • Involving the waveform of speech signal directly.
  • Frequency-Domain Method
  • Involving some form of spectrum representation.

5
Time-Domain Measurements
  • Average zero-crossing rate, energy, and the
    autocorrelation function.
  • Very simple to implement.
  • Provide a useful basis for estimating important
    features of the speech signal, e.g.,
  • Voiced/unvoiced classification
  • Pitch estimation

6
Time-Domain Methods for Speech Processing
  • Time-Dependent Processing of Speech

7
Time Dependent Natural of Speech
This is a test.
8
Time Dependent Natural of Speech
9
Short-Time Behavior of Speech
  • Assumption
  • The properties of speech signal change slowly
    with time.
  • Analysis Frames
  • Short segment of speech signal.
  • Overlap one another usually.

10
Time-Dependent Analyses
  • Analyzing each frame may produce either a single
    number, or a set of numbers, e.g.,
  • Energy (a single number)
  • Vocal tract parameters (a set of numbers)
  • This will produce a new time-dependent sequence.

11
General Form
n Frame index
x(m) Speech signal
T A linear or nonlinear transformation.
w(m) A window function (finite of infinite).
12
General Form
Qn is a sequence of local weighted average values
of the sequence Tx(m).
13
Example
Energy
Short-Time Energy
14
Example
Short-Time Energy
15
Example
Short-Time Energy
16
General Short-Time-Analysis Scheme
Depending on the choice of window
17
Time-Domain Methods for Speech Processing
  • Short-Time Energy and Average Magnitude

18
Applications
  • Silence Detection
  • Segmentation
  • Lip Sync

19
Short-Time Energy
20
Short-Time Average Magnitude
21
Block Diagram Representation
22
Block Diagram Representation
What is the effect of windows?
23
The Effects of Windows
  • Window length
  • Window function

24
Rectangular Window
25
Rectangular Window
26
Rectangular Window
What is this?
Discuss the effect of window duration.
Discuss the effect of mainlobe width and sidelobe
peak.
27
Commonly Used Windows
28
Commonly Used Windows
Rectangular
Bartlett (Triangular)
Hanning
Hamming
Blackman
29
Commonly Used Windows
Least mainlobe width
30
Examples Short-Time Energy
Rectangular Window
Hamming Window
31
Examples Average Magnitude
Rectangular Window
Hamming Window
32
The Effects of Window Length
  • Increasing the window length N, decreases the
    bandwidth.
  • If N is too small, e.g., less than one pitch
    period, En and Mn will fluctuate very rapidly.
  • If N is too large, e.g., on the order of several
    pitch periods, En and Mn will change very slowly.

33
The Choice of Window Length
  • No signal value of N is entirely satisfactory.
  • This is because the duration of a pitch period
    varies from about 2 ms for a high pitch female or
    a child, up to 25 ms for a very low pitch male.

34
Sampling Rate
  • The bandwidth of both En and Mn is just that of
    the lowpass filter.
  • So, they need not be sampled as frequently as
    speech signals.
  • For example
  • Frame size 20 ms
  • Sample period 10 ms

35
Main Applications of En and Mn
  • To provide the basis for distinguishing voiced
    speech segments from unvoiced segments.
  • Silence detection.

36
Differences of En and Mn
Emphasizing large sample-to-sample variations in
x(n).
The dynamic range (max/min) is approximately the
square root of En.
The differences in level between voiced and
unvoiced regions are not as pronounced as En.
37
FIR and IIR
  • All the windows that we discussed are FIRs.
  • Each of them is a lowpass filter.
  • It can also be an IIR.

38
IIR Example
Recursive formulas
Short-Time Energy
Short-Time Average magnitude
39
Time-Domain Methods for Speech Processing
  • Short-Time Average Zero-Crossing Rate

40
Voiced and Unvoiced Signals
41
The Short-Time Average Zero-Crossing Rate
42
Distribution of Zero-Crossings
43
Example
44
Time-Domain Methods for Speech Processing
  • Speech vs. Silence Discrimination Using Energy
    and Zero-Crossing

45
Speech vs. Silence Discrimination
  • Locating the beginning and end of a speech
    utterance in the environment with background of
    noise.
  • Applications
  • Segmentation of isolated word
  • Automatic speech recognition
  • Save bandwidth for speech transmission

46
Examples
  • In some cases, we can locate the beginning and
    end of a speech utterance using energy alone.

47
Examples
  • In other cases, we can locate the beginning and
    end of a speech utterance using zero-crossing
    rate alone.

48
Examples
  • Sometimes, we cannot do it using one criterion
    alone.

Actual beginning
49
Difficulties
  • In general, it is difficult to locate the
    boundaries if we encounter the following cases
  • Weak fricatives (/f/, /th/, /h/) at the beginning
    or end.
  • Weak plosive bursts (/p/, /t/, /k/) at the
    beginning or end.
  • Nasals at the end.
  • Voiced fricatives which become devoiced at the
    end of words.
  • Trailing off of vowel sounds at the end of an
    utterance.

50
Rabiner and Sambur
  • 10 msec frame with sampling rate 100 time/sec is
    used.
  • The algorithm assumes that the first 100 msec of
    the interval contains no speech.
  • The means and standard deviations of the average
    magnitude and zero-crossing rate of this interval
    are computed to characterize the background noise.

51
The Algorithm
52
The Algorithm
1
2
3
No more than 25 frames
53
Examples
54
Examples
55
Time-Domain Methods for Speech Processing
  • The Short-Time Autocorrelation Function

56
Autocorrelation Functions
57
Properties
1. Even ?(k) ?(?k).
2. ?(k) ? ?(0) for all k.
3. ?(0) is equal to the energy of x(m).
58
Properties
4. If x(m) has period P, i.e. x(m) x(mP), then
59
Properties
4. If x(m) has period P, i.e. x(m) x(mP), then
This motivates us to use autocorrelation for
pitch detection.
60
Short-Time Version
61
Property
Rn(?k)
Rn(k)
62
Property
hk(n?m)
yk(m)
63
Property
hk(n?m)
yk(m)
64
Property
65
Another Formulation
66
Another Formulation
A noncausal formulation
67
Examples
N401
voiced
Unvoiced
Rectangular Window
Hamming Window
68
Examples
Less data will be involved for larger lag k.
N401
N251
N125
69
Modified Short-Time Autocorrelation Function
Original Version
Modified Version
70
Modified Short-Time Autocorrelation Function
Max. lag
71
Modified Short-Time Autocorrelation Function
Max. lag
72
Examples
N401
Similar
voiced
Unvoiced
Rectangular Window
Modified Version
73
Examples
N401
N251
N125
Rectangular Window
Modified Version
74
Time-Domain Methods for Speech Processing
  • The Short-Time Average Magnitude Difference
    Function

75
The AMDF
If x(n) is periodic with period P, then
Computationally more effective than
autocorrelation.
76
Example
voiced
Unvoiced
77
Exercise
  • Recording a piece of yours speech to perform
    voice/unvoice segmentation.
  • Design a effective algorithm to perform
    autocorrelation.
Write a Comment
User Comments (0)
About PowerShow.com