Emotional Speech Recognition w Gender Determination - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Emotional Speech Recognition w Gender Determination

Description:

Emotions: Neutral, Sadness, Hot Anger, Happy, (Contempt) Language: English ... Anger. Sadness. Neutral. 1. Simplify and minimize the signals and energy using ... – PowerPoint PPT presentation

Number of Views:353
Avg rating:3.0/5.0
Slides: 15
Provided by: kisan5
Category:

less

Transcript and Presenter's Notes

Title: Emotional Speech Recognition w Gender Determination


1
Emotional Speech Recognitionw/ Gender
Determination
  • Kisang Pak
  • E6820 Speech Audio Processing Recognition
  • Professor Dan Ellis
  • Columbia University

2
System Overview
Project Overview
  • Emotions Neutral, Sadness, Hot Anger, Happy,
    (Contempt)
  • Language English
  • Features to be used See below
  • Classification See below
  • Ranking the probabilities of each emotion

Feature Extraction
  • Fundamental Frequency
  • Jitters in Speech Energy
  • Rise Duration in Speech
  • Energy
  • Rise/Falling Ration in
  • Speech Energy
  • Formant Frequency
  • Pitch Contour under
  • 500 Hz

Classification
Result
Input Signal
  • Neutral
  • Sadness
  • Anger
  • Happy
  • (Contempt)
  • Speech
  • samples
  • Bayes/Neutral Networks
  • For emotional recognition
  • Pitch Track for Gender
  • Determination

3
Emotional Speech Samples
  • Neutral
  • Sadness
  • Anger
  • Happy
  • (Contempt)

neutral
neutral, sounds like sadness
anger
anger, sounds like contempt
sadness
sadness, sounds like neutral
happy
happy, sounds like anger
4
Feature Extractions 1 Fundamental Frequency
  • Frequency domain analysis (adequate for highly
    repetitive signals)
  • Time Domain Analysis (short-term
    autocorrelation) 50500 Hz

Cross correlate the signal using delays from 50
Hz to 500 Hz
The delay that produces the highest amplitude can
be converted to fundamental frequency
Fundamental Frequency Median
5
Feature Extractions 2 Local peaks in Speech
Energy
Feature Extractions 2, 3, and 4 Speech Energy
, fs(nm)s(n)w(m-n) s(n) speech signal,
w(m-n) window (i.e. hamming) of length Nw
Matlab windlength 301 windtype
hamming(winLen)
Feature Extractions 3 Rise Duration in Speech
Energy
Feature Extractions 4 Rise and Falling Ratio
neutral
anger
of Peaks (Male)
6
Feature Extractions 5 1st Formant (via LPC
filter)
1. Simplify and minimize the signals and energy
using Linear Prediction. 2. Plot frequency
response 3. Find roots
happy
1st Formant-Male
sadness
gg_001_happy_1648.62_April-thirteenth.wav
7
Classifier (BayesNeural Networks) Emotion
Recognition
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
w1p1n
w2p2n
w3p3n
w5p5n
w4p4n
w6p6n
NORMALIZATION
S1
N
N
N
N
N
N
w1p1s
w2p2s
w3p3s
w5p5s
w4p4s
w6p6s
S2
S
S
S
S
S
S
Gender Separ-ation
w1p1a
w2p2a
w3p3a
w5p5a
w4p4a
w6p6a
S3
A
A
A
A
A
A
w1p1c
w2p2c
w3p3c
w5p5c
w4p4c
w6p6c
S4
C
C
C
C
C
C
w1p1h
w2p2h
w3p3h
w5p5h
w4p4h
w6p6h
S5
H
H
H
H
H
H
N Neutral, S Sad, A Anger, C Contempt, H
Happy
8
Normal Distribution Fitting
The samples did not really follow the Gaussian
curves
  • Example) Fundamental Frequency Distribution

fo185 Hz
neutral
anger
p1
w1p1a s1
w1p1n 0
9
Weights Factors
Example Happy Speech
10
Results Gender Separation
Results (male) Emotional Speech Recognition
(65 trained, 100 tested)
weights5 4 1 3 7 2
weights3 1 4 6 2 9
Actual
Actual
11
Probability Ranking (Example)
Actual
Example) Actual-gtContempt
Contempt got the 2nd place 38.5 of time
12
Use of different weights
List of emotions
List of emotions to be recognized
Select the best weight constants
Best Results
If Happy is included weights5 4 1 3 7 2
If Contempt is included weights3 1 4 6 2 9
13
Classifier Gender Separation
General Rule Fundamental Frequency lt 140 Hz Male
In emotional speech analysis, general rules do
not apply due to their wide variances.
One method Pitch Tracking Between 250 Hz and 500
Hz
14
Future Works
Immediate Improvement (by the report due
date) Gender Determination Target 80
  • Long-Term Improvement
  • Remodeling Gaussian density distribution
  • More efficient and faster processing
  • Emotional speech resynthesize
  • Determination of Weight Factors
Write a Comment
User Comments (0)
About PowerShow.com