Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop

About This Presentation

Title:

Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop

Description:

Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop ... Measured in millimeters at 667 ms intervals. IRST SLOC scoring software ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 32

Provided by: bin68

Category:

more less

Transcript and Presenter's Notes

Title: Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop

1
Welcome to the Rich Transcription 2005 Spring
Meeting Recognition Evaluation Workshop

July 13, 2005
Royal College of Physicians
Edinburgh, UK

2
TodaysAgenda

1800

Meeting Venue Cleared

Updated July 5, 2005
3
Administrative Points

Participants
Pick up the hard copy proceedings on the front
desk
Presenters
The agenda will be strictly followed
Time slots include QA time.
Presenters should either
Load their presentations on the computer at the
front, or
Test their laptops during the breaks prior to
making their presentation
Wed like to thank
MLMI-05 organizing committee for hosting this
workshop
Caroline Hastings for the workshops
administration
All the volunteers evaluation participants, data
providers, transcribers, annotators, paper
authors, presenters and other contributors

4
The Rich Transcription 2005 Spring Meeting
Recognition Evaluation
http//www.nist.gov/speech/tests/rt/rt2005/spring/

Jonathan Fiscus, Nicolas Radde, John Garofolo,
Audrey Le, Jerome Ajot, Christophe Laprun
July 13, 2005
Rich Transcription 2004
Spring Meeting Recognition Workshop
at MLMI 2005

5
Overview

Rich Transcription Evaluation Series
Research opportunities in the Meeting Domain
RT-05S Evaluation
Audio input conditions
Corpora
Evaluation tasks and results
Conclusion/Future

6
The Rich Transcription Task
Multiple Applications
RICH TRANSCRIPTION Speech-To-Text METADATA
Readable Transcripts
Human-to-Human Speech
Component Recognition Technologies
Smart Meeting Rooms Translation Extraction Retrie
val Summarization
7
Rich Transcription Evaluation Series

Goal
Develop recognition technologies that produce
transcripts which are understandable by humans
and useful for downstream processes.
Domains
Broadcast News (BN)
Conversational Telephone Speech (CTS)
Meeting Room speech
Parameterized Black Box evaluations
Evaluations control input conditions to
investigate weaknesses/strengths
Sub-test scoring provides finer-grained
diagnostics

8
NIST STT Benchmark Test History May. 05
100
Conversational Speech
Read Speech
(Non-English)
CTS Arabic (UL)
Meeting - SDM
Switchboard II
Switchboard Cellular
Broadcast Speech
Meeting - MDM
CTS Mandarin (UL)0
Meeting - IHM
Spontaneous Speech
BNews Mandarin 10X
Varied Microphones
(Non-English)
BNews Arabic 10X
CTS Fisher (UL)
20k
BNews English 1X
10
Noisy
BNews English unlimited
BNews English 10X
5k
1k
1
1988 1989 1990 1991 1992 1993
1994 1995 1996 1997 1998 1999
2000 2001 2002 2003 2004 2005
2006 2007 2008 2009 2010 2011
9
Research Opportunities in the Meeting Domain

Provide fertile environment to advance
state-of-the-art in technologies for
understanding human interaction
Many potential applications
Meeting archives, interactive meeting rooms,
remote collaborative systems
Important Human Language Technology challenges
not posed by other domains
Varied forums and vocabularies
Highly interactive and overlapping spontaneous
speech
Far field speech effects
Ambient noise
Reverberation
Participant movement
Varied room configurations
Many microphone conditions
Many camera views
Multimedia information integration
Person, face, and head detection/tracking

10
RT-05S Evaluation Tasks

Focus on core speech technologies
Speech-to-Text Transcription
Diarization Who Spoke When
Diarization Speech Activity Detection
Diarization Source Localization

11
Five System Input Conditions

Distant microphone conditions
Multiple Distant Microphones (MDM)
Three or more centrally located table mics
Multiple Source Localization Arrays (MSLA)
Inverted T topology, 4-channel digital
microphone array
Multiple Mark III digital microphone Arrays
(MM3A)
Linear topology, 64-channel digital microphone
array
Contrastive microphone conditions
Single Distant Microphone (SDM)
Center-most MDM microphone
Gauge performance benefit using multiple table
mics
Individual Head Microphones (IHM)
Performance on clean speech
Similar to Conversational Telephone Speech
One speaker per channel, conversational speech

12
Training/Development Corpora

Corpora provided at no cost to participants
ICSI Meeting Corpus
ISL Meeting Corpus
NIST Meeting Pilot Corpus
Rich Transcription 2004 Spring (RT-04S)
Development Evaluation Data
Topic Detection and Tracking Phase 4 (TDT4)
corpus
Fisher English conversational telephone speech
corpus
CHIL development test set
AMI development test set and training set
Thanks to ELDA and LDC for making this possible

13
RT-05S Evaluation Test CorporaConference Room
Test Set

Goal-oriented small conference room meetings
Group meetings and decision-making exercises
Meetings involved 4-10 participants
120 minutes Ten excerpts, each twelve minutes
in duration
Five sites donated two meetings each
Augmented Multiparty Interaction (AMI) Program,
International Computer Science Institute (ICSI),
NIST, and Virginia Tech (VT)
No VT data was available for system development
Similar test set construction used for RT-04S
evaluation
Microphones
Participants wore head microphones
Microphones were placed on the table among
participants
AMI meetings included an 8-channel circular
microphone array on the table
NIST meetings include 3 Mark III digital
microphone arrays

14
RT-05S Evaluation Test Corpora Lecture Room Test
Set

Technical lectures in small meeting rooms
Educational events where a single lecturer is
briefing an audience on a particular topic
Meetings excerpts involve one lecturer and up to
five participating audience members
150 minutes 29 excerpts from 16 lectures
Two types of excerpts selected by CMU
Lecturer excerpts 89 minutes, 17 excerpts
Question Answer (QA) excerpts 61 minutes, 12
excerpts
All data collected at Karlsruhe University
Sensors
Lecturer and at most two other participants wore
head microphones
Microphones were placed on the table among
participants
A source localization array mounted on each of
the rooms four walls
Mark III mounted on the wall opposite the lecturer

15
RT-05S Evaluation Participants
Site ID Site Name Evaluation Task Evaluation Task Evaluation Task Evaluation Task
Site ID Site Name STT SPKR SAD SLOC
AMI Augmented Multiparty Interaction Program X
ICSI/SRI International Computer Science Institute and SRI International X X
ITC-irst Center for Scientific and Technological Research X
KU Karlsruhe University X
ELISA Consortium Laboratoire Informatique d'Avignon (LIA), Communication Langagière et Interaction Personne-Système (CLIPS), and LIUM X X
MQU Macquarie University X
Purdue Purdue University X
TNO The Netherlands Organisation for Applied Scientific Research X X
TUT Tampere University of Technology X
16
Diarization Who Spoke When (SPKR) Task

Task definition
Identify the number of participants in each
meeting and create a list of speech time
intervals for each such participant
Several input conditions
Primary MDM
Contrast SDM, MSLA
Four participating sites ICSI/SRI, ELISA, MQU,
TNO

17
SPKR System Evaluation Method

Primary Metric
Diarization Error Rate (DER) the ratio of
incorrectly detected speaker time to total
speaker time
System output speaker segment sets are mapped to
reference speaker segment sets so as to minimize
the total error
Errors consist of
Speaker assignment errors (i.e., detected speech
but not assigned to the right speaker)
False alarm detections
Missed detections
Systems were scored using the mdeval tool
Forgiveness collar of /- 250ms around reference
segment boundaries
DER on non-overlapping speech is the primary
metric

18
RT-05S SPKR ResultsPrimary Systems,
Non-Overlapping Speech

Conference room SDM DER less than MDM
Sign test indicates differences are not
significant
Primary ICSI/SRI Lecture Room system attributed
the entire duration of each test excerpt to be
from a single speaker.
ICSI/SRI contrastive system had a lower DER

19
Lecture Room ResultsBroken Down by Excerpt Type

Lecturer excerpt DERs are lower than QA excerpt
DERs

20
Historical Best System SPKR Performance on
Conference Data

20 relative reduction for MDM
43 relative reduction for SDM

21
Diarization Speech Activity Detection (SAD) Task

Task definition
create a list of speech time intervals where at
least one person is talking
Dry run evaluation for RT-05S
Proposed by CHIL
Several input conditions
Primary MDM
Contrast SDM, MSLA, IHM
Systems designed for the IHM condition must
detect speech and also reject cross talk speech
and breath noises, therefore IHM systems are not
directly comparable to MDM or SDM systems
Three participating sites ELISA, Purdue, TNO

22
SAD System Evaluation Method

Primary metric
Diarization Error Rate (DER)
Same formula and software as used for the SPKR
task
Reduced to a two-class problem speech vs.
non-speech
No speaker assignment errors, just false alarms
and missed detections
Forgiveness collar of /- 250ms around reference
segment boundaries

23
RT-05S SAD ResultsPrimary Systems

DERs for conference and lecture room MDM data are
similar
Purdue didnt compensate for breath noise and
crosstalk

24
Speech-To-Text (STT) Task

Task definition
Systems output a single stream of time-tagged
word tokens
Several input conditions
Primary MDM
Contrast SDM, MSLA, IHM
Two participating sites AMI and ICSI/SRI

25
STT System Evaluation Method

Primary metric
Word Error Rate (WER) - ratio of inserted,
deleted, and substituted words to the total
number of words in the reference
System and reference words are normalized to a
common form
System words are mapped to reference words using
a word-mediated dynamic programming string
alignment program
Systems were scored using the NIST Scoring
Toolkit (SCTK) version 2.1
A Spring 2005 update to the SCTK alignment tool
can now score most of the overlapping speech in
the distant microphone test material
Can now handle up to 5 simultaneous speakers
98 of Conference Room test can be scored
100 of Lecture Room test set can be scored
Greatly improved over Spring 2004 prototype

26
Simultaneous Speech for STT

98 of Conference Room test set has lt 5
overlapping speakers
100 of Lecture Room test set has lt 5
overlapping speakers
The updated alignment tool ran in less than real
time

27
RT-05S STT ResultsPrimary Systems (Incl.
overlaps)
Lecture Room
Conference Room
Microphone conditions

First evaluation for the AMI team
IHM error rates for conference and lecture room
data are comparable
ICSI/SRI lecture room MSLA WER lower than MDM/SDM
WER

28
Historical STT Performance in the Meeting Domain

Performance for ICSI/SRI has dramatically
improved for all conditions

29
STT Error RatesEffect of Simultaneous Speech
30
Diarization Source Localization (SLOC) Task

Task definition
Systems track the three-dimensional position of
the lecturer (using audio input only)
Constrained to lecturer subset of the Lecture
Room test set
Evaluation protocol and metrics defined in the
CHIL Speaker Localization and Tracking
Evaluation Criteria document
Dry run pilot evaluation for RT-05S
Proposed by CHIL
CHIL provided the scoring software and annotated
the evaluation data
One evaluation condition
Multiple source localization arrays
Required calibration of source localization
microphone positions and video cameras
Three participating sites ITC-irst, KU, TNO

31
SLOC System Evaluation Method

Primary Metric
Root Mean Squared Error (RMSE) a measure of the
average Euclidean distance between the reference
speaker position and the system-determined
speaker position
Measured in millimeters at 667 ms intervals
IRST SLOC scoring software
Maurizio Omologo will give further details this
afternoon

32
R-05S SLOC ResultsPrimary Systems

Issues
What accuracy and resolution is needed for
successful beamforming?
What will performance be for multiple speakers?

33
Summary

Nine sites participated in the RT-05S evaluation
Up from six in RT-04S
Four evaluation tasks were supported across two
meeting sub-domains
Two experimental tasks SAD and SLOC successfully
completed
Dramatically lower STT and SPKR error rates for
RT-05S

34
Issues for RT-06 Meeting Eval

Domain
Sub domains
Tasks
Require at least three sites per task
Agreed-upon primary condition for each task
Data contributions
Source data and annotations
Participation intent
Participation commitment
Decision making process
Only sites with intent to participate will have
input to the task definition

35
Proposal for RT-06

Encourage multi-modality systems
Publish video for all 05 meetings
Include video for 06 eval
Tasks
STT
SPKR
Score all speech for the SPKR task, not just the
non-overlapping speech
Multi-stream STT marriage of STT and SPKR
SLOC lecturer and audience
Test sets
2 hour conference meetings
2 hour lecture meetings
Data requirements
IHM, multiple table mics, video,
Time table
Evaluation March/April
Workshop May, East coast US venue
Participation
Data donation

Write a Comment

User Comments (0)

About PowerShow.com

Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop - PowerPoint PPT Presentation

Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop

Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop ... Measured in millimeters at 667 ms intervals. IRST SLOC scoring software ... – PowerPoint PPT presentation