Title: Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Stel
1 Research activities at AUTH related to dialogue
detection Ioannis Pitas Constantine
KotropoulosStelios Asteriadis
- WP5 Focused meeting
- Rocquencourt, Paris
- December 1-2, 2005
2Outline
- Introduction
- Dialogue detection Cross-correlation of the
indicator functions - Speaker turn detection based on speech and visual
cues - Frontal face detection
- Facial feature detection
- Speaker clustering based on speech and visual
cues - Frontal face authentication
- Other related problems
3Dialogue detection (1)
- Dialogue A scene type
- Particular visual appearance
- typical camera shots
- shot length
- lighting, background
- Particular audio appearance
- Aim To recognize actors
- Needs
- Face detection Face verification/recognition
- Speaker turn detection Speaker
verification/recognition
4Dialogue detection (2)
- Dialogue grammar
- Actors do not speak or appear together on the
screen - Two cameras shooting the faces of actors A and B
- Shots from a third camera shooting something
different - Speaker utterance lasts between 2 and 8 sec.
- Silence during speaker turns (person thinking,
surprised, does not know how to react) - Actors could be both present when they are arguing
5Indicator functions and their cross-correlation
(1)
A dialogue between two persons from the movie
Secret Window Dialogue 1 .
6Indicator functions and their cross-correlation
(2)
A scene without a dialogue between two persons
7Speaker Turn Detection
- Audio Segmentation aims at finding acoustic
events within an audio stream. Speaker turn
detection is a special case of speaker
segmentation. - Important step in pre-processing of speech in
order to implement audio indexing or speaker
tracking. - Usually, no prior knowledge about speakers is
assumed.
8DISTBIC (1)
Pre-processing stage
9DISTBIC (2)
1st Segmentation stage
Distance
10DISTBIC (3)
2nd Segmentation stage
MODEL BASED SEGMENTATION
11Frontal face detection algorithms
- Face detection is a considerably difficult task
because it involves locating faces without prior
knowledge of scales, locations, orientations. It
also has to cope with the occlusion problem. - Two appearance based techniques are presented
- The first is built on support vector machines,
- The second is based on eigenvector
decomposition.
12Frontal face images at quartet and octet
resolution
-
- Original Image Quartet Image Octet
Image
13 Face detection based on corners
- The figures show the 3 possible feature point set
configurations, having 100 feature points each.
They differ at the minimum distance allowed
between the feature points. In general, small
inter feature point distances yield a feature
point concentration and poor face detection. The
minimum allowed distance is a parameter of the
training procedure.
14 Statistical training
- The training procedure involves the feature point
set generation on a number of training images. - The feature point sets are generated in user
defined bounding boxes containing facial regions. - Each facial region is represented by the vector
of image intensities at the feature points that
have been sorted in decreasing order of the
smallest eigenvalue of the associated Z matrix.
- PCA is performed on the covariance matrix of
these facial region patterns. The M first
principal components are used.
15ROC curves
- For the SVM-based face detection, the best
results were obtained with the sigmoidal kernel.
Best equal error rate 4.5 - The maximum likelihood detection commits a few
false alarm. For FAR in 5.2, 5.67 the FRR
drops quickly from 6.1 to 0.7.
16Face detection under difficult circumstances
- The eigenvector decomposition approach has been
proven robust to partial occlusion and
illumination changes as well as when multiple
faces appear in a scene.
17Facial feature detection
- Standard PCA is applied to data encoding the
geometry of the characteristics areas. - Such data are the distance of each pixel to its
closest edge and the slope of each respective
connecting line. - The system is trained with models (left-right
eyes, mouth)
18Facial feature detection feature models
19Facial feature detection (details)
- Search for areas on a face that look as much
alike to an artificial model of each
characteristic (left or right eye, mouth) - The similarity is found by comparing the
projection vectors of the model and the searched
area. - The projection is made using the eigenvectors
found during training.
20Edge intensity
- For the eye area localisation, only the most
prominent edges are preserved. - For the eye center detection, many edges are
needed (low threshold for edge detector)
21Facial feature detection face image, edge
detection, distance map, slope map
22Some facial feature results
23Speaker Detection (1)
- One-speaker detection
- To determine whether a specified speaker is
speaking during a given one-side of the
conversation - Two-speaker detection
- To determine whether a specified speaker is
speaking during the entire conversation. - Segmentation
- Select appropriate speakers for training/testing
- Perform one-speaker detection
24Speaker Detection (2)
Two-speaker detection (NIST 2002) Best EER 16.2
One-speaker detection (NIST 2002) Best EER 7.1
Kajarekar, Adami, Hermansky, 2003
25Frontal face authentication
26Other closely related problems
27Format of Signature
28Computation of Video Similarity
- Similarity of two videos, for a certain
displacement d, is computed as - Where Fi(n,m) is the certainty that person m
appears in frame n of video segment i.
29Search Retrieval Algorithm