Loading...

PPT – Understand what is Noise PowerPoint presentation | free to view - id: f3510-ZDc1Z

The Adobe Flash plugin is needed to view this content

Understand what is Noise

- Presenter Shih-Hsiang (??)

Spoken Language processing, Chapter Advanced

Digital Signal Processing and Noise Reduction,

Chapter 2Robustness Techniques for Speech

Recognition, Berlin Chen, 2004

Introduction - Noise

- What is Noise?
- Unwanted signal that interferes with the

communication or measurement or processing of an

information-bearing signal - Noise is present in various degrees in almost all

environments - Noise can cause transmission errors and may even

disrupt a communication process - What kind of noises in the real world?

Different kind of noises

Depending on its source

- Acoustic noise
- Emanated from moving, vibrating, colliding

sources - moving cars, air-condition, computer fans,

traffic, people talking in the background, wind,

rain - Electromagnetic noise
- Present at all frequencies ( from electric

devices) - radio, television transmitters and receivers
- Electrostatic noise
- Generated by the presence of a voltage with or

without current flow - fluorescent lighting
- Channel distortions, echo, and fading
- Non-ideal characteristics of communication

channel - radio channel
- Processing noise
- Results from the digital / analog processing of

signals

Different kind of noises (cont.)

Depending on its frequency or time

characteristics

- Narrow band noise
- a noise process with a narrow bandwidth
- Band-limited white noise
- a noise with a flat spectrum and a limited

bandwidth that usually covers the limited

spectrum of the device - White Noise (theoretical concept)
- has the same power at all frequencies
- Coloured noise
- non-white noise or any wideband noise whose

spectrum has a non-flat shape - Pink noise, brown noise
- Impulsive noise
- consists of short-duration pulses of random

amplitude and random duration - Transient noise
- consists of relatively long duration noise pulses

Color of Noise

- White Noise
- a signal with a flat frequency spectrum in linear

space - Pink Noise
- frequency spectrum of pink noise is flat in

logarithmic space - power density decreases 3dB per octave with

increasing frequency - Brown Noise
- power density decrease of 6 dB per octave with

increasing frequency - Blue Noise (Azure Noise)
- power density increases 3 dB per octave with

increasing frequency - Purple Noise (Violet Noise)
- power density increases 6 dB per octave with

increasing frequency - Gray Noise
- noise subjected to a psychoacoustic equal

loudness curve over a given range of frequencies

Spectrogram

- The spectrogram shows the energy in a signal at

each frequency and at each time

Dark areas of spectrogram show high intensity

White Noise

Pink Noise

Brown Noise

Blue Noise

Purple Noise

Gray Noise

Noises in Aurora 2

Babble

Airport

Exhibition

Car

Noises in Aurora 2 (cont.)

Street

Restaurant

Train

Subway

Noise in Speech Recognition

Time-Domain

Frequency-Domain

Additive Noises / Convolutional Noises

- Additive noises can be stationary or

non-stationary - Stationary noises
- the power spectral density does not change over

time - the noises are also narrow-band noises
- such as computer fan, air conditioning, car noise
- Non-stationary noises
- the statistical properties change over time
- wide band noise
- machine gun, door slams, keyboard clicks,

radio/TV, and other speakers voices (babble

noise) - Convolutional noises (channel noises) are mainly

resulted from channel distortion and are

stationary for most cases - Reverberation, the frequency response of

microphone, transmission lines, etc

Reconstruction of incomplete spectrograms for

robust speech recognition

Bhiksha Raj Ramakrishnan Ph.D. dissertation, ECE

Dept, CMU, Apr. 2000 Advisor Richard Stern

- Presenter Shih-Hsiang (??)

Introduction

- The performance of ASR systems degrades greatly

when the speech has been corrupted by noise - Training the same level of noise?
- Two approaches to reduce the mismatch
- Data-compensation methods
- Classifier-compensation methods (Model Adaptation)

Training Data Distribution

Testing Data Distribution

no longer similar

Introduction (cont.)

- The drawback of above approaches
- Most of them assume the noise is stationary
- The effect of the noise can be representable by a

linear transform of the parameters - Effective in the context of their intended

purposes - Human auditory system preferentially processes

the high-energy components of the speech signal

while suppressing the weaker components - Human are able to comprehend speech that has

undergone considerable spectral excision

Introduction (cont.)

- Two new approaches be developed
- Multi-band based approaches Hermansky
- Different frequency bands of speech signals may

be corrupted at different SNRs. - Using divide-and-conquer
- deweighting noise bands
- Missing-feature approaches Cooke
- Low SNR regions are selectively erased or label

as unreliable - Performed on the basis of incomplete-data

Introduction (cont.)

- The advantages of Missing-Features approaches
- Make no assumptions about the corrupting noise
- Do not need to have a knowledge about noise
- Remarkable robust to high levels of noise

corruption - Missing-feature methods
- Classifier modification methods
- Model the effect of the incompleteness of the

data - Spectrogram reconstruction methods
- Estimate the missing components of incomplete

spectrograms and reconstruct them

Introduction (cont.)

- Classifier modification methods
- Spectrogram reconstruction methods ? Todays

topic

Background Information Multivariate Gaussian

Distribution

- When X(X1,
, XL) is a L-dimensional random

vector, the multivariate Gaussian pdf has the

form - Conditional distributions
- If X1 conditional on X2 a is multivariate

normal

mean shift

regression coefficients

Background Information Multivariate Gaussian

Distribution (cont.)

observed data

missing data

Background Information Maximum A-Posteriori (MAP)

Estimation

In MAP estimation the missing data are estimated

to maximize their Likelihood, conditioned on the

value of the observed data

when

is Gaussian We get

Background Information Maximum A-Posteriori

Estimation (cont.)

Figure. The same Gaussian sliced at X 2. The

flat surface in the figure represents the

distribution of all vectors whose X component is

2. This distribution peaks at Y Y1. Thus Y1 is

the MAP estimate of Y when X is 2

Figure. The solid horizontal line shows the

observed value of X. The circle on the

intersection of the solid diagonal line, and the

dotted line, shows where the distribution of

vectors with X2 peaks. This is the MAP estimate

of Y when X2. The solid diagonal line shows how

the position of this peak varies at each value

of X.

Figure. Gaussian distribution of a 2 dimensional

random vector. The mean of the Gaussian is at

1,1. The X and Y components have covariance

1.0, and the covariance between X and Y is 0.5

Background Information Spectrogram

- It is a short pictorial representation of the

short-time periodogram

short-time Fourier transform

where

Px(l,?) represents the power in frequency ? at

time instant l in the signal

S(l,k) represents the kth component of the lth

log-spectral vector

Background Information Spectrogram (cont.)

- Wide-band spectrogramsshorter windows(lt10ms)
- have good time resolution
- Narrow-band spectrogramsLonger windows(gt20ms)
- the harmonics can be clearly seen

Background Information MEL Spectrogram

- Mel spectrogram consists of a sequence of log

mel-spectral vectors

Px(l,k) is the kth component of the mel spectrum

in the lth analysis window mk(j) is the jth DFT

coefficient of the impulse response of the kth

mel filter

The mel spectrogram consists of a sequence of

log-mel-spectral vectors and K is the total

number of mel filters

Background Information MEL Spectrogram (cont.)

Background Information Effect of noise on the

spectrogram

- When the speech signal is corrupted by additive

noise - If assume that the noise is uncorrelated to the

speech signal

time domain

frequency domain

spectrogram

mel-spectrogram

Background Information Effect of noise on the

spectrogram (cont.)

Region have been Deleted when a Local SNR less

then 0 dB

Speech be corrupted to 15db by additive

white noise

Speech be corrupted to 10db By additive

white noise

Region have been Deleted when a Local SNR less

then 0 dB

Recognizing speech with incomplete spectrograms

Modify the manner in which the classifier, or

recognizer

- A speech recognition system is a statistical

pattern classifier - There are two possible approaches to handing
- Data imputation approach
- Marginalization approach

language model

acoustic model

decompose S into its observed and missing

component as SSo,Sm

Sm is not known and thereforce its likelihood

cannot be computed

Spectrogram reconstruction methods for missing

data

Modify the manner in Data-compensate

- Estimating missing regions of incomplete

spectrograms to reconstruct complete spectrogram - Geometrical reconstruction methods
- Linear interpolation
- Nonlinear interpolation with polynomial function
- Cluster-based reconstruction methods
- Single cluster based reconstruction
- Multiple cluster based reconstruction
- Covariance-based reconstruction methods

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Interpolating between adjacent observed elements

in the spectrogram to reconstruct a missing

element - adjacent along frequency axis
- adjacent along time axis
- The interpolation used could be
- simple linear interpolation
- use other higher-order functional forms such as

polynomials, rational functions, or spline

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Linear interpolation
- Given any sequence of numbers s1,s2,
,sM,

where the samples in the intervall1,l2 are

unknown or missing

l2

l1

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Linear interpolation (cont)
- Linear along frequency
- Liner along time

s(l,k) lth spectral vector kth component in the

spectrogram

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Nonlinear interpolation(1) with polynomial

functions - Lagranges formula give a set of L points on a

plane, (x1,y1), (x2,y2), , (xmym)

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Nonlinear interpolation(1) with polynomial

functions (cont.) - Nonliner along frequency
- Nonliner along time

s(l,k) lth spectral vector kth component in the

spectrogram

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Experimental results
- Using mean squared error (MSE) to measure the

accuracy of the reconstructed spectrogram - The greater the MSE, the greater the divergence

between the reconstructed and uncorrupted

spectrograms

True uncorrupted spectrogram

Reconstructed spectrogram

The number of missing elements in the spectrogram

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Experimental results (cont.)

liner interpolation along time

randomly 50 deleted

liner interpolation along frequency

original

nonliner interpolation(1) along frequency

nonliner interpolation(1) along time

nonliner interpolation(2) along frequency

nonliner interpolation(2) along time

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Experimental results (cont.)

MSE along time

MSE along frequency

Accuracy along time

Accuracy along frequency

Spectrogram reconstruction methods for missing

data (cont.) Geometrical reconstruction methods

- Summary
- Linear interpolation estimation can be quite

effective - More detailed models are more likely to be

erroneous - Interpolation along time is generally more

effective than interpolation along frequency - Not enough frequency components
- Several drawbacks
- When the fraction of missing elements is very

high - there might not be sufficient information

remaining in the picture to reconstruct the

missing elements properly - If the observed elements in the spectrogram were

to be distorted, - all missing elements reconstructed on the basis

would also be distorted similarly

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Use the vector statistics of the spectral vector

for reconstruction of the complete spectrogram - Spectral vectors are assumed to be segregated

into a set of cluster

MAP estimaate for the missing component

complete component

observed component

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Single Cluster-based reconstruction methods
- The tth spectral vector in spectrogram denote

S(t) - Missing component of the tth spectral vector

denote Sm(t) - Observed component of the tth spectral vector

denote So(t) - S(t)AtSo(t),Sm(t), where At is the

permutation matrix

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Single Cluster-based reconstruction methods

(cont.) - Experimental results

?complete spectrogram

randomly 50 deleted

original

reconstructed

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Experimental results (cont.)

Accuracy

MSE

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Multiple Cluster-based reconstruction methods
- Two steps to estimate the missing portions of an

incomplete vector - Cluster membership of the vector
- Decide which cluster the vector belongs to
- Once the cluster membership of the vector is

established the distribution of that cluster is

used to obtain MAP estimates for the missing

components

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Multiple Cluster-based reconstruction methods

(cont.) - Step 1 Decide which cluster the vector belongs

to

The cluster membership

Kth cluster

priori probability,P(k)

negative of the log-likelihood

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Multiple Cluster-based reconstruction methods

(cont.) - Step 2 MAP estimates
- Experimental results (Oracle experimental upper

bound)

randomly 70 deleted

codebook512

original

codebook1

codebook8

codebook64

codebook size is the number of clusters used in

the representation

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Experimental results (cont.)

Accuracy

MSE

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Cluster Marginal Reconstruction Identifying

cluster membership based on observed components

alone - Because we have no knowledge about entire S(t)

when some components in S(t) are missing - Step 1 Decide which cluster the vector belongs

to

identify the cluster membership of the vector

based on The observed component of the vector

along

marginalization

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Cluster Marginal Reconstruction Identifying

cluster membership based on observed components

alone (cont.) - Experimental results

randomly 70 deleted

codebook512

original

codebook1

codebook8

codebook64

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Experimental results (cont.)

Wrongly identified cluster

MSE

Accuracy

Spectrogram reconstruction methods for missing

data (cont.) Cluster-based reconstruction methods

- Summary
- Cluster based reconstruction methods can be very

effective in reconstructing missing regions of

spectrogram - When cluster memberships are identified based

only on the observed components, the result is

similar to single-cluster based reconstruction - Single Gaussian model for the distribution is a

good method

Spectrogram reconstruction methods for missing

data (cont.) Covariance-based reconstruction

- Consider the sequence of spectral vectors that

constitute a spectrogram to be the output of a

Gaussian wide-sense stationary (WSS) random

process - The mean of the spectral vectors and covariances

between elements in the spectrogram are

independent of their position in the spectrogram - WSS gives us the following properties
- Mean is not depend on where it occurs
- Covariance between the component of two vector

depends only on the distance

Spectrogram reconstruction methods for missing

data (cont.) Covariance-based reconstruction

apply WSS properties

Spectrogram reconstruction methods for missing

data (cont.) Covariance-based reconstruction

- Example

apply ES(t,k)µ(k)

MAP estimation for Sm can be estimated

4 sec utterance has 400 frames Each spectral

vectors have 20 frequency components There 8000

components in all in the spectrogram

?computational cost very high

Spectrogram reconstruction methods for missing

data (cont.) Covariance-based reconstruction

??

- Reconstructing missing elements individually

Let S(t,k) is an element of the vector of

missing component Com(t,k) is

cross-covariance between So and Sm

expected value of S(t,k)

not all components of So contribute equally to

estimate of S(t,k)

Spectrogram reconstruction methods for missing

data (cont.) Covariance-based reconstruction

- Jointly reconstructing all missing elements in a

vector

The missing element vector for the second

spectral vector is constructed as

The neighborhood of observed vector for Sm(2)

The mean vector for So(2) and Sm(2)

The cross covariance between Sm(2) and So(2)

The autocovariance matrix of So(2)

The second spectral vector would be obtained as

Spectrogram reconstruction methods for missing

data (cont.) Covariance-based reconstruction

- Result

original

randomly 90 deleted

individually

vector jointly

MSE

Accuracy

Estimating the location of corrupt regions in

spectrograms

- Its a difficult task to estimate the reliable

and unreliable region - Use spectrographic mask to distinguish the region
- Binary information about every element in the

spectrogram - The ability of missing feature methods depends

critically on the accuracy of the spectrographic

masks used - False alarm reliable element declared as

unreliable - Miss unreliable element tagged as reliable

Estimating the location of corrupt regions in

spectrograms (cont.)

- The recognition performance degrades very quickly

with increasing fraction of false alarms - The sensitivity of missing-feature methods to

misses is not so much

Estimating the location of corrupt regions in

spectrograms (cont.)

- Using spectral subtraction

Typical values of ? and ß are 0.95 and 2

The initial portion of any utterance is assumed

to contain only noise

Estimating the location of corrupt regions in

spectrograms (cont.)

spectrographic mask estimated using

spectral- subtraction for speech corrupted to 10

dB

oracle spectrographic

spectrographic masks estimated by

spectral- subtraction

oracle spectrographic

Estimating the location of corrupt regions in

spectrograms (cont.)

- Using a bayesian classifier

classification vector

Estimating the location of corrupt regions in

spectrograms (cont.)

spectrographic mask estimated using a classifier

for speech corrupted to 10 dB

oracle spectrographic

spectrographic masks estimated by a classifier

oracle spectrographic