Title: Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis
1 Research activities at AUTH related to dialogue
detectionIoannis Pitas Constantine
KotropoulosNikos Nikolaidis
- WP6 e-team Audiovisual Understanding
2Outline
- Introduction
- Dialogue detection concept cross-correlation of
indicator functions - Speaker turn detection based on speech and
visual cues (mouth activity) - Frontal face detection facial feature detection
(e.g. mouth) - One-two speaker detection
- Speaker clustering based on speech and visual
cues - Fingerprinting
3Indicator functions and their cross-correlation
(1)
A dialogue between two persons from the movie
Secret Window Dialogue 1 .
4Indicator functions and their cross-correlation
(2)
A scene without a dialogue between two persons
5Speaker Turn Detection
- Audio Segmentation aims at finding acoustic
events within an audio stream. Speaker turn
detection is a special case of speaker
segmentation. - Important step in pre-processing of speech in
order to implement audio indexing or speaker
tracking. - Usually, no prior knowledge about speakers is
assumed.
6DISTBIC
MODEL BASED SEGMENTATION
7Frontal face images at quartet and octet
resolution
-
- Original Image Quartet Image Octet
Image
8 Face detection based on corners
- The figures show the 3 possible feature point set
configurations, having 100 feature points each.
They differ at the minimum distance allowed
between the feature points. In general, small
inter feature point distances yield a feature
point concentration and poor face detection. The
minimum allowed distance is a parameter of the
training procedure.
9Face detection Receiver Operating Characteristic
(ROC) curves
- For the SVM-based face detection, the best
results were obtained with the sigmoidal kernel.
Best equal error rate 4.5 - The maximum likelihood detection commits a few
false alarm. For FAR in 5.2, 5.67 the FRR
drops quickly from 6.1 to 0.7.
10One/Two Speaker Detection
Two-speaker detection (NIST 2002) Best EER 16.2
One-speaker detection (NIST 2002) Best EER 7.1
Kajarekar, Adami, Hermansky, 2003
11Frontal face authentication
12Fingerprinting