Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Stel - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Stel

Description:

Actors do not speak or appear together on the screen. Two cameras shooting the ... A dialogue between two persons from the movie 'Secret Window' [Dialogue 1] ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 30
Provided by: niko8
Category:

less

Transcript and Presenter's Notes

Title: Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Stel


1
Research activities at AUTH related to dialogue
detection Ioannis Pitas Constantine
KotropoulosStelios Asteriadis
  • WP5 Focused meeting
  • Rocquencourt, Paris
  • December 1-2, 2005

2
Outline
  • Introduction
  • Dialogue detection Cross-correlation of the
    indicator functions
  • Speaker turn detection based on speech and visual
    cues
  • Frontal face detection
  • Facial feature detection
  • Speaker clustering based on speech and visual
    cues
  • Frontal face authentication
  • Other related problems

3
Dialogue detection (1)
  • Dialogue A scene type
  • Particular visual appearance
  • typical camera shots
  • shot length
  • lighting, background
  • Particular audio appearance
  • Aim To recognize actors
  • Needs
  • Face detection Face verification/recognition
  • Speaker turn detection Speaker
    verification/recognition

4
Dialogue detection (2)
  • Dialogue grammar
  • Actors do not speak or appear together on the
    screen
  • Two cameras shooting the faces of actors A and B
  • Shots from a third camera shooting something
    different
  • Speaker utterance lasts between 2 and 8 sec.
  • Silence during speaker turns (person thinking,
    surprised, does not know how to react)
  • Actors could be both present when they are arguing

5
Indicator functions and their cross-correlation
(1)
A dialogue between two persons from the movie
Secret Window Dialogue 1 .
6
Indicator functions and their cross-correlation
(2)
A scene without a dialogue between two persons
7
Speaker Turn Detection
  • Audio Segmentation aims at finding acoustic
    events within an audio stream. Speaker turn
    detection is a special case of speaker
    segmentation.
  • Important step in pre-processing of speech in
    order to implement audio indexing or speaker
    tracking.
  • Usually, no prior knowledge about speakers is
    assumed.

8
DISTBIC (1)
Pre-processing stage
9
DISTBIC (2)
1st Segmentation stage
Distance
10
DISTBIC (3)
2nd Segmentation stage
MODEL BASED SEGMENTATION
11
Frontal face detection algorithms
  • Face detection is a considerably difficult task
    because it involves locating faces without prior
    knowledge of scales, locations, orientations. It
    also has to cope with the occlusion problem.
  • Two appearance based techniques are presented
  • The first is built on support vector machines,
  • The second is based on eigenvector
    decomposition.

12
Frontal face images at quartet and octet
resolution
  • Original Image Quartet Image Octet
    Image

13
Face detection based on corners
  • The figures show the 3 possible feature point set
    configurations, having 100 feature points each.
    They differ at the minimum distance allowed
    between the feature points. In general, small
    inter feature point distances yield a feature
    point concentration and poor face detection. The
    minimum allowed distance is a parameter of the
    training procedure.

14
Statistical training
  • The training procedure involves the feature point
    set generation on a number of training images.
  • The feature point sets are generated in user
    defined bounding boxes containing facial regions.
  • Each facial region is represented by the vector
    of image intensities at the feature points that
    have been sorted in decreasing order of the
    smallest eigenvalue of the associated Z matrix.
  • PCA is performed on the covariance matrix of
    these facial region patterns. The M first
    principal components are used.

15
ROC curves
  • For the SVM-based face detection, the best
    results were obtained with the sigmoidal kernel.
    Best equal error rate 4.5
  • The maximum likelihood detection commits a few
    false alarm. For FAR in 5.2, 5.67 the FRR
    drops quickly from 6.1 to 0.7.

16
Face detection under difficult circumstances
  • The eigenvector decomposition approach has been
    proven robust to partial occlusion and
    illumination changes as well as when multiple
    faces appear in a scene.

17
Facial feature detection
  • Standard PCA is applied to data encoding the
    geometry of the characteristics areas.
  • Such data are the distance of each pixel to its
    closest edge and the slope of each respective
    connecting line.
  • The system is trained with models (left-right
    eyes, mouth)

18
Facial feature detection feature models
19
Facial feature detection (details)
  • Search for areas on a face that look as much
    alike to an artificial model of each
    characteristic (left or right eye, mouth)
  • The similarity is found by comparing the
    projection vectors of the model and the searched
    area.
  • The projection is made using the eigenvectors
    found during training.

20
Edge intensity
  • For the eye area localisation, only the most
    prominent edges are preserved.
  • For the eye center detection, many edges are
    needed (low threshold for edge detector)

21
Facial feature detection face image, edge
detection, distance map, slope map
22
Some facial feature results
23
Speaker Detection (1)
  • One-speaker detection
  • To determine whether a specified speaker is
    speaking during a given one-side of the
    conversation
  • Two-speaker detection
  • To determine whether a specified speaker is
    speaking during the entire conversation.
  • Segmentation
  • Select appropriate speakers for training/testing
  • Perform one-speaker detection

24
Speaker Detection (2)
Two-speaker detection (NIST 2002) Best EER 16.2

One-speaker detection (NIST 2002) Best EER 7.1

Kajarekar, Adami, Hermansky, 2003
25
Frontal face authentication
26
Other closely related problems
  • Face fingerprinting

27
Format of Signature
28
Computation of Video Similarity
  • Similarity of two videos, for a certain
    displacement d, is computed as
  • Where Fi(n,m) is the certainty that person m
    appears in frame n of video segment i.

29
Search Retrieval Algorithm
Write a Comment
User Comments (0)
About PowerShow.com