Lingfen Sun - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Lingfen Sun

Description:

How does codec affect speech quality? Using PESQ to calculate perceived MOS score ... Based on codec, bursty loss rate and gender of the talker, a NN model was ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 28

Provided by: lingF5

Category:

more less

Transcript and Presenter's Notes

Title: Lingfen Sun

1
Perceived Speech Quality Prediction for VoIP
Networks

Lingfen Sun
Emmanuel Ifeachor

2
Outline

Introduction
Simulation system
Perceived speech quality analysis
Impact of loss on speech quality
Impact of talkers on speech quality
Perceived speech quality prediction using Neural
Network (NN) method
Conclusions and future work

3
Introduction

Speech quality Measurement
Subjective method (Mean Opinion Score -- MOS)
Objective methods
Intrusive methods (e.g. ITU P.862 PESQ)
Nonintrusive methods (e.g. E-model, NN model)
Why do we need to predict speech quality?
For online monitoring VoIP call
For Quality of Service (QoS) control for VoIP
applications

4
How to predict speech quality?

E-model
All impairments are mapped to R-scale (R? MOS)
Principle "Psychological factors on the
psychological scale are additive"
Static and computational model.
NN-model
To learn the non-linear relationships between
network impairments and perceived speech quality
To adapt to dynamic IP network conditions.

5
Previous work

NN databases are based on subjective test only
As subjective test is time consuming, costly and
stringent, available databases are limited and
cannot cover all the possible scenarios
Only a limited number of subjects attended MOS
tests
Limited number of codecs
Talker dependency has not been considered.

6
Main objectives of work

To undertake a fundamental investigation of the
impact of packet loss on perceived speech quality
using an objective measurement algorithm (e.g.
PESQ)
To investigate the impact of different talkers on
perceived speech quality
To develop a robust NN model for speech quality
prediction based on PESQ.

7
Simulation system structure
quality measure (PESQ)
Measured MOS
Simulated VoIP system
encoder
loss simulator
decoder
Degraded speech
Reference speech

Reference speech is from a speech database

8
Loss Simulator
2 state Gilbert Model to simulate packet loss
characteristics

Network packet loss late arrival loss due to
jitter
Unconditional loss probability (ulp, or average
loss rate), ulp p / (p 1 q)
Conditional loss probability (clp), clp q to
reflect burst loss features

9
Impact of loss on speech quality

How do packet loss and loss burstiness affect
speech quality?
How does packet size affect speech quality?
How does codec affect speech quality?
? Using PESQ to calculate perceived MOS score
? Average over 300 different random "seeds" to
reduce the impact from different loss locations

10
Bursty loss analysis (G.729)
11
Bursty loss analysis (G.723.1)
12
Bursty loss effect

clp has an obvious impact on the perceived speech
quality even for the same average loss rate (ulp)
When burst loss increases (clp increasing), the
MOS score decreases and the variation of the MOS
score also increases.
? Identify ulp and clp as input parameters
related to loss for NN analysis

13
Impact of packet size (G.729)
14
Impact of packet size (G.723.1)
15
Impact of packet size on quality

Packet size has, in general, no obvious influence
on speech quality for a given loss rate.
Variation in speech quality for the same network
loss rate depends on packet size and codec.
Variation in quality due to loss location is the
main obstacle in the prediction of speech quality
? To consider loss only during active talkspurt
frames (not for silence frames or SID frames).

16
Impact of talker on speech quality

To investigate whether difference in talker (male
or female) has an effect on perceived speech
quality
TIMIT data set and ITU data set are used for
investigation

17
Talker Dependency

For 3 male and 3 female samples

18
Talker Dependency (cont.)

For 6 mixed male and female samples

19
Impact of talker on MOS

Impact of different talkers on perceived speech
quality appears to depend mainly on the gender of
the talker (male or female).
The quality for the female talker tends to be
worse than that of the male talker for the same
network impairments.
? Identify gender (male and female) as one of the
input parameters for NN analysis.

20
Quality prediction based on NN

Developed a neural network model (using Stuttgart
Neural Network Simulator).
Identified four variables as inputs to NN
Codec type (G.729, G.723.1 and AMR)
Gender (male and female)
Unconditional loss probability ? ulp (VAD)
Conditional loss probability ? clp(VAD)
One output (MOS)

21
NN structure (for a 4-5-1 net)

a three-layer, feed-forward, neural network
architecture
standard Backpropagation learning algorithm

22
NN database generation

Codec G.729, G.723.1 (6.3Kb/s), AMR (12.2Kb/s)
Gender Male and female
ulp 0, 10, 20, 30 and 40
clp 10, 50 and 90
Packet size 1 to 5
? A total of 362 samples (patterns) were
generated based on PESQ. 70 were chosen as the
training set and 30 as the test dataset.

23
NN training process
Measured MOS
Quality measure (PESQ)
Simulated VoIP system
Reference speech

Degraded speech
?
Backprop
-
Network, Codec Speech parameters
Predicted MOS
24
Predicted MOS vs Measured MOS
Train ? 0.967, r 0.12 Test ?
0.952, r 0.15
25
Validation of the NN model

Generated a validation dataset from other talkers
and different network loss conditions (total 210
samples)
Obtained ? 0.946, r 0.19 for the validation
dataset using a trained 4-5-1 neural network.
? This suggested that the neural network model
works well for speech quality prediction in
general.

26
Conclusions

Investigated the impact of packet loss, codec and
talker on perceived speech quality based on PESQ
The loss pattern, loss burstiness and the gender
of the talker have an impact on speech quality.
Packet size has, in general, no obvious influence
on speech quality, but the deviation in speech
quality depends on packet size and codec.
Based on codec, bursty loss rate and gender of
the talker, a NN model was developed successfully
for speech quality prediction.

27
Future work

Extended to conversational speech quality
prediction to cater for the impact from delay.
Use real VoIP trace data instead of generated
data from Gilbert loss model.
Use more robust neural networks.
Application to QoS Control in VoIP systems.

Write a Comment

User Comments (0)