Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis

About This Presentation

Title:

Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis

Description:

... the 3 possible feature point set configurations, having 100 feature points each. ... Face detection Receiver Operating Characteristic (ROC) curves ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 13

Provided by: NIKO81

Category:

more less

Transcript and Presenter's Notes

Title: Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis

1
Research activities at AUTH related to dialogue
detectionIoannis Pitas Constantine
KotropoulosNikos Nikolaidis

WP6 e-team Audiovisual Understanding

2
Outline

Introduction
Dialogue detection concept cross-correlation of
indicator functions
Speaker turn detection based on speech and
visual cues (mouth activity)
Frontal face detection facial feature detection
(e.g. mouth)
One-two speaker detection
Speaker clustering based on speech and visual
cues
Fingerprinting

3
Indicator functions and their cross-correlation
(1)
A dialogue between two persons from the movie
Secret Window Dialogue 1 .
4
Indicator functions and their cross-correlation
(2)
A scene without a dialogue between two persons
5
Speaker Turn Detection

Audio Segmentation aims at finding acoustic
events within an audio stream. Speaker turn
detection is a special case of speaker
segmentation.
Important step in pre-processing of speech in
order to implement audio indexing or speaker
tracking.
Usually, no prior knowledge about speakers is
assumed.

6
DISTBIC
MODEL BASED SEGMENTATION
7
Frontal face images at quartet and octet
resolution

Original Image Quartet Image Octet
Image

8
Face detection based on corners

The figures show the 3 possible feature point set
configurations, having 100 feature points each.
They differ at the minimum distance allowed
between the feature points. In general, small
inter feature point distances yield a feature
point concentration and poor face detection. The
minimum allowed distance is a parameter of the
training procedure.

9
Face detection Receiver Operating Characteristic
(ROC) curves

For the SVM-based face detection, the best
results were obtained with the sigmoidal kernel.
Best equal error rate 4.5
The maximum likelihood detection commits a few
false alarm. For FAR in 5.2, 5.67 the FRR
drops quickly from 6.1 to 0.7.

10
One/Two Speaker Detection
Two-speaker detection (NIST 2002) Best EER 16.2

One-speaker detection (NIST 2002) Best EER 7.1

Kajarekar, Adami, Hermansky, 2003
11
Frontal face authentication
12
Fingerprinting

Write a Comment

User Comments (0)