Course Projects - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Course Projects

Description:

Spectrogram. First two Formants and amplitudes. Mel-Frequency Cepstral Coefficients ... Spectrogram. Classes. Speech in foreground. Speech in ... Spectrogram ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 51
Provided by: site4
Category:

less

Transcript and Presenter's Notes

Title: Course Projects


1
Course Projects
  • Luc Lamarche

2
Outline
  • Part I
  • Course Intelligent Systems (Dr. Wail Gueaieb )
  • Project Sound Classification for Hearing Aid
    Application
  • Part II
  • Course Adaptive Signal Processing (Dr. Claude
    D'amours)
  • Project Adaptive Feedback Cancellation for
    Hearing Aids

3
Sound Classification for Hearing Aid Application
  • Part 1
  • By Luc Lamarche

4
Outline
  • Introduction
  • Background Information
  • Implementation
  • Results
  • Conclusion

5
The Compromise
  • When a patient is fitted with a new hearing aid,
    it is customized for the specific type of hearing
    loss.
  • The hearing aid is programmed to optimize the
    users speech intelligibility and sound quality.
  • It is however not possible to optimize both
    measures for all environments. This means that a
    compromised frequency response is used.
  • It is widely agreed that a hearing aid that
    changes its algorithm for different environments
    would significantly increase the users
    satisfaction.

6
Introduction
Nature
Cocktail Party
7
Feature Extraction
Time Domain
Frequency Domain
Speech
Music
8
Feature Extraction
  • Overall input level
  • Fluctuation strength of the overall level
  • Spectral Center
  • Fluctuation strength of spectral center
  • Zero Crossing Ratio (ZCR)
  • Percentage of Low-Energy Frames
  • RMS of a Low-Pass Response
  • Spectral Flux (SP)
  • Mean an Variance of the Discrete Wavelet
    Transform (DWT)
  • Difference of Maximum and Minimum Zero Crossings
  • Linear Predictor Coefficients (LPC)
  • High zero-crossing rate ratio (HZCRR)
  • Low Short-time energy ratio (LSTER)
  • LSP distance
  • Band periodicity (BP)
  • Noise frame ratio (NFR)
  • RMS
  • VDR
  • Silence Ratio

9
Feature Extraction
  • Overall input level
  • Fluctuation strength of the overall level
  • Spectral Center
  • Fluctuation strength of spectral center
  • Zero Crossing Ratio (ZCR)
  • Percentage of Low-Energy Frames
  • RMS of a Low-Pass Response
  • Spectral Flux (SP)
  • Mean an Variance of the Discrete Wavelet
    Transform (DWT)
  • Difference of Maximum and Minimum Zero Crossings
  • Linear Predictor Coefficients (LPC)
  • High zero-crossing rate ratio (HZCRR)
  • Low Short-time energy ratio (LSTER)
  • LSP distance
  • Band periodicity (BP)
  • Noise frame ratio (NFR)
  • RMS
  • VDR
  • Silence Ratio
  • Spectral Flux (SP)
  • High zero-crossing rate ratio (HZCRR)
  • Low Short-time energy ratio (LSTER)
  • Sub-Band Energy Ratio (SBER)
  • Pitch
  • Salience of Pitch
  • Spectrogram

10
Classes
  • Speech in foreground
  • Speech in foreground mixed with speech in
    background
  • Speech in foreground mixed with traffic noise
  • Speech in background
  • Traffic noise
  • Music
  • Nature
  • Alarm signals

11
Classes
  • Speech in foreground
  • Speech in foreground mixed with speech in
    background
  • Speech in foreground mixed with traffic noise
  • Speech in background
  • Traffic noise
  • Music
  • Nature
  • Alarm signals

12
Feature Extraction
  • Small sample segments are taken of audio signals
    for the calculations.
  • These segments are further divided into frames,
    usually overlapping by 50.
  • Features are computed for each frame and
    depending on the feature are combined into one
    feature. (e.g Average of frames)

13
Zero Crossing Ratio (ZCR)
  • ZCR is defined as the number of times that a
    signal changes signs in a frame.
  • Speech has generally a higher zero crossing
    ratio since it is composed of alternating voiced
    and unvoiced sounds in the syllable rate.
  • A more useful measure of the zero crossing ratio
    is the High Zero-crossing Rate Ratio.

14
High Zero-Crossing Rate Ratio
  • HZCRR Is defined as the ratio of frames with ZCR
    above 1.5 times the average ZCR.

42 samples of speech
Speech
44 samples of music
Music
Distribution for music and speech
15
Low Short-time Energy Ratio
  • LSTER It is defined as the ratio of frames with
    an STE above 1.5 times the average STE.

Speech
Music
Figure distribution for LSTER for speech and
music
16
Spectral Flux (SF)
  • Measures the fluctuations in the spectrum between
    two adjacent frames.
  • spectral flux is generally slightly higher for
    speech than for music.
  • Speech frames will contain different phonemes
    (which differ in spectrum).
  • Music will maintain its spectrum for a longer
    period of time.

17
Spectral Flux (SF)
Speech
Music
18
PITCH
  • Pitch is defined as the fundamental frequency of
    a human speech waveform.
  • It is calculated by computing the
    autocorrelation lag with the largest energy

Figure autocorrelation of speech sample of
length 512
19
PITCH
Speech
Music
Figure Distribution of PITCH for speech and music
20
Salient of Pitch (SOP)
  • A second measure based on pitch.
  • It is defined as the ratio of the first peak
    (pitch) value and the zero lag value of the
    autocorrelation function.

Figure autocorrelation of speech sample of
length 512
21
Salient of Pitch (SOP)
Speech
Music
22
Sub-Band Energy Ratio (SBER)
  • To measure the SBER the spectrum is first divided
    into four non-uniform sub-bands.
  • The four sub-bands are from 0,w0/8,
    w0/8,w0/4,w0/4,w0/2 and w0/2,w0, where w0
    is half of the sampling frequency.

Speech
Music
23
SBER2
Speech
Music
Figure x distribution of SBER2 for speech and
music
24
Spectrogram
  • The short-time Fourier transform is method to
    transform a signal that is in time domain to a
    time-frequency domain. It basically performs a
    Fourier transform on a finite window centered at
    t of a signal x(t).
  • Two features can be extracted
  • Mean of Spectrogram
  • Variance of Spectrogram

Var. Spec
Mean Spec
Speech
Music
25
Implementation
What type of Network? MLP
WHY?
Table 1 Results for the classification of speech
in foreground
Table 2 Results for the classification of
background noises
26
Implementation
Feature Extraction
Signal
27
Most Influential Feature
Using each feature as a single input to the ANN
the following results were observed
28
Most Influential Feature
Using each feature as a single input to the ANN
the following results were observed
29
Using 3 best features
  • The training error was reduced to 10.56.

Accuracy Speech 97.7 Music 89.3
Total 93.33
30
Comparison to other work
  • Dr. Fridjof Feldbursch reported Speech 96.3
    and had 6 other classes (89.8) ,using 59
    features.
  • M. Kashif Saeed Khan reported Total accuracy of
    96.6 (Speech/Music)
  • L. Lu reported Speech97.45 Music 93.04
    /Environment sound84.43 Total for speech/music
    98.03

My results
Accuracy Speech 97.7 Music 89.3
Total 93.33
31
Conclusion
  • Implemented ANN performs relatively well compared
    to others.
  • A classification rate of 100 is unreachable
    because humans use context or background
    knowledge to identify sounds, which our system
    can not use.
  • Therefore still need remote or override switch
    for hearing aid.
  • Further work would be to increase the number of
    classes.

32
References
  • 1M. K. S. Khan, W. G. Al-Khatib, M. Moinuddin,
    Automatic classification of speech and music
    using neural networks, in Proc. 2nd ACM Int.
    Workshop on Multimedia databases, 2004, pp.
    94-99.
  • 2 L. Lu, H. Jiang, and H. J. Zhang, A robust
    audio classification and segmentation method, in
    Proc. 9th ACM Int. Conf. Multimedia, 2001, pp.
    203211.
  • 3 F. Feldbusch. Identification of Noises by
    Neural Nets for Application in Hearing Aids. In
    Second International ICSC Symposium on Neural
    Computation NC 2000, pages 505 - 510, Berlin, May
    23-26 2000.
  • 4 Mingchun Liu Chunru Wan Lipo Wang,
    Content-Based Audio Classification and Retrieval
    Using A Fuzzy Logic System Towards Multimedia
    Search Engines Vol. 6 (2002), Issue 5, pp
    357-364, Journal of Soft Computing, Springer.

33
Adaptive Feedback Cancellation for Hearing Aids
  • PART 2

34
Outline
  • Introduction
  • Background Information
  • Adaptive filter used
  • Decorrelation stage
  • Implementation
  • Results
  • Conclusion

35
Introduction
  • One of the biggest problems that most hearing aid
    users complain about is the howling noise of a
    hearing aid when feedback is present.
  • Without a method to reduce the acoustic feedback,
    the hearing aid user is forced to reduce the gain
    of the hearing aid in order to eliminate such
    problems.

36
Solution to problem
  • A solution to the acoustic feedback problem is
    the use of an adaptive filter.
  • The adaptive filter models the feedback transfer
    function.

37
Adaptive Filter

Z-1
Z-1
Z-1
wM-2
wM-1
w0
w1





S
S
S
38
Implementation
  • Adaptive filter used Normalized Least Mean
    Square (NLMS)

1. Initialization 2. Filter output 3.
Compute 4.Update Weights
Where is the step size
39
Adaptive Filter
  • The adaptive filter will adapt its values to
    minimize the Mean square of the signal e(n).

Minimize
  • In this system the signal e(n) contains not only
    the feedback signal u(n), but also s(n), which we
    do not want to be cancelled, thus a decorrelation
    stage is needed.

40
Decorrelation
  • Based on the paper in 1 by H. Alfonso the
    decorrelation method that was used is a frequency
    compressor.
  • This decorrelation stage is added to the hearing
    aid path as a preprocessor to the hearing aid.
  • In theory two pure tones that are of different
    frequency are uncorrelated.

41
Decorrelation Stage
Frequency Compressor
where r is the compression ratio,
42
Frequency Compressor
Sine wave at 5000Hz
4700Hz
43
Simulation and Results
  • Simplifications
  • Feedback path was modeled as
  • Hearing Aid was modeled as
  • Adaptive Filter used
  • NLMS
  • 32 taps
  • Conditions F(z) in real life changes with
    environment therefore we want to have a fast
    convergence time for adaptive filter

44
Results
  • Without adaptive filter
  • Feedback creates an unstable system causing a
    howling noise.

45
Results
  • With Adaptive filter

46
Results
  • Convergence is approximately 5000 samples which
    is 0.23 seconds at a sampling frequency of 22050
    Hz

Figure 7 3 of the 32 adaptive filter weights
with
47
Results
  • Adaptive filter output compared to feedback signal

Figure 8 feedback signal u(n) compared with
adaptive filter output
(µ0.01)
48
Results
  • Varying Step size

Figure 9 b) MSE with µ0.05
Figure 9 a) e(n) with µ0.01
Figure 9 d) MSE with µ1
Figure 9 c) MSE with µ0.1
49
Conclusion
  • Must compromise between Convergence and quality
    by choosing an appropriate step size µ.
  • Results showed that successfully reduced the
    feedback.
  • Further work would be to use a varying feedback
    transfer function to test the tracking ability
    and to find an optimum step size µ.

50
References
1 Harry Alfonso L. Joson, Futoshi Asano, Yiti
Suzuki, and Toshio Sone. Adaptive feedback
cancellation with frequency compression for
hearing aids. In The Journal of the Acoustical
Society of America. Vol. 94, Issue 6, pp.
3248-3254, December 1993. 2 Proakis, J.G. and
Manolakis, D. G. Digital Signal Processing
Principal, Algorithms, and Applications. 3rd ed.
New Jersey Prentice Hall, 1996, pp. 782-792.
Write a Comment
User Comments (0)
About PowerShow.com