Overview : 3D Person Tracking - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Overview : 3D Person Tracking

Description:

40x5min Interactive Seminar Recordings from 5 CHIL rooms. ... Multimodal: Detect speaker turns and audio-visually track the last known speaker. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 17
Provided by: mred8
Category:

less

Transcript and Presenter's Notes

Title: Overview : 3D Person Tracking


1
Overview 3D Person Tracking
  • Keni Bernardin
  • Universität Karlsruhe
  • keni_at_ira.uka.de
  • 8.5.2007

2
Task Description
  • 40x5min Interactive Seminar Recordings from 5
    CHIL rooms. Corner cameras, top camera, min. 3-7
    microphone arrays, 1-2 MarkIII arrays.
  • Subtasks
  • Visual Track every participant for every time
    point in the recorded sequence.
  • Acoustic Detect speech and track the current
    speaker on segments where one (and only one)
    person is speaking.
  • Multimodal Detect speaker turns and
    audio-visually track the last known speaker.

3
Annotations
  • A,V,AV evaluated on video head position
    annotations (only x,y-position on the ground)
  • 2D Head centroid positions in the 4 corner camera
    views, for all participants
  • Triangulated 3D head centroid positions
  • Time segments containing silence or one speaker
    (ignore crosstalk, but not noise!).
  • Annotation is made every second (every 15/25/30th
    video frame)

4
MOT Metrics A reminder
Distance errors
Correspondences
False Positives
Misses
Mismatches
Groundtruths
5
Participant Overview
  • 5 participating sites
  • AIT (Audio, Visual, AV)
  • FBK (Audio, Visual, AV)
  • TUT (Audio)
  • UKA (Audio, Visual, AV)
  • UPC (Audio, Visual, AV)

6
Results Visual Tracking
7
Results Visual Tracking
8
Results Visual Tracking
9
Results Acoustic Tracking
10
Results Acoustic Tracking
11
Results Acoustic Tracking
12
Results Multimodal Tracking
13
Results Multimodal Tracking
14
Results Multimodal Tracking
15
Conclusions
  • Task definition
  • More challenging than last year (multiple
    persons, more sites, speech detection, noise).
    good results!
  • Better balanced multimodal subtask. Improvements?
  • Metrics
  • Same as last year. Perhaps need better way of
    calculating contribution of mismatches.
  • Maybe we want a better measure for computation
    speed, complexity?
  • Technology
  • Visual, Acoustic Still some work to be done
    (Visual problem Reliable detection. Audio
    problem Speech detection, noise sources.
  • Evaluation
  • Went smoothly. Some smaller problems with labels
    (person visibility in cams), calibration files,
    background images
  • Need more varied development data (again). Need
    even better quality control.

16
Future Planning?
  • Task
  • Metrics
  • Annotations
  • Evaluations

17
Metrics
  • MOT Metrics
  • Meant to be used for Single / Multiperson, Audio
    / Visual / AV tracking
  • To allow intuitive comparison of systems across
    modalities
Write a Comment
User Comments (0)
About PowerShow.com