Title: INVOCA Project Speech Interfaces for Air Traffic Control Tasks
1INVOCA ProjectSpeech Interfaces for Air Traffic
Control Tasks
Javier Macías-Guarasa Speech Technology Group
(GTH) Department of Electronic Engineering E.T.S.I
. Telecomunicación (ETSIT) Universidad
Politécnica de Madrid (UPM)
2Overview
- Introduction
- Tasks (applications, prototypes)
- Data collection
- System architecture technical details
- Evaluation
- Demo
- Conclusions
3Introduction (I)
- INVOCA
- Speech Interfaces for Air Traffic Control
INterfaces VOcales para Control de tráfico Aéreo - Project proposal
- AENA Spanish Airports and Air Navigation
- Speech Technology Group ETSIT-UPM
- Exploratory project ? technology evaluation
- Analyze the state of the art of speech
recognition technology and its applications to
air traffic control tasks - Feasibility study to be integrated in SACTA?
(SACTA Advanced System for Air Traffic Control)
4Introduction (II)
5Introduction (III)
- People _at_ GTH
- José M. Pardo
- Javier Ferreiros
- José Colás
- Fernando Fernández
- Valentín Sama
- Ricardo de Córdoba
- Juan M. Montero
- Javier Macías
- José D. Romeral
- More people _at_ GTH
- Sergio Díaz
- María J. Pozuelo
- Gregoire Prime
- Jordi Safont
- Eduardo Campos
- et al.!
- AENA Staff
- Germán González
- Myriam Santamaría
6Tasks (I)
- Identifying suitable target applications (tasks)
within the SACTA environment - Air traffic controllers (ATCs) in control towers
in Barajas (Madrid airport) - Feasible tasks
- Useful tasks
- Outcome
- Isolated word recognition ? IF1
- Spontaneous speech recognition understanding
? IF2
7Tasks (II)Speech Interface IF1 (I)
- Target
- Air Traffic Controllers (ATCs) in control towers
- Must keep an eye on traffic around the airport
- Feasibility of CC speech interfaces to help them
in handling complex control systems? - Application
- Hard to identify in current SACTA status
- Instead replace FOCUCS system (tactile display)
to control main display visualization
8Tasks (III)Speech Interface IF1 (II)
9Tasks (IV)Speech Interface IF1 (III)
10Tasks (V)Speech Interface IF2 (I)
- Target
- Air Traffic Controllers (ATCs) in control towers
- ATCs provide aircraft pilots with instructions
regarding flight level, transponder code, etc. - Some data must/should be entered in the computer
system - Application
- Detect key concepts (slots) and associated data
values in controller ? pilot radio
communication
11Tasks (VI)Speech Interface IF2 (II)
- IF2 subtasks Five, one for every control
position in Barajas Airport - Arrivals
- Authorizations
- North tower
- South tower
- Take offs
- IF1 IF2 both handling Spanish and English
(spoken by Spaniards!)
12Data Collection (I)Standard databases
- Spanish SpeechDat (M FDB)
- Telephone Speech but ATC radio channels are
band limited! - 4000 Speakers, isolated continuous read speech
- Digits, isolated words, digit strings,
phonetically rich sentences, etc. (40 items per
speaker) - But not related to the task
- Need more data!
- For adaptation
- For full retraining?
13Data Collection (II)Speech Interface IF1
- Read isolated words in the task domain
- 16KHz, 16 bits linear (downsample to 8KHz.)
- 30 speakers (15 male 15 female)
- 5 repetitions of every command in the FOCUCS
task vocabulary 228 SP / 176 EN
14Data Collection (III)Speech Interface IF2 (I)
- Real recordings controller ? pilot
- 16KHz, 16 bits linear, downsample to 8KHz.
- Stereo recording speech PTT signal
33htotal 6s/sent 16 wrds/sent
15Data Collection (IV)Speech Interface IF2 (II)
- Process
- Recording chunks of 15 minutes, continuously
- Segmenting in sentences PTT ? easyAverage real
speech contents 16.4 - Transcribing
- Hard, specially in English
- Label pauses, respiration and aspiration, tongue,
unidentified noise, click, cough - Also concept labeling
16Data Collection (V)Speech Interface IF2 (III)
- Samples of Authorizations sentences
- thai niner four three start up approved qnh one
zero one eight !P clear eh !LP fiumicino via
flight plan route !P eh nando !P one charlie
departure squawk !P on one four two six - alitalia zero six nine roger start up approved
and according slot one zero one eight and clear
to !P milan malpensa airport via pinar one !P
bravo departure squawk one four one six - !RUIDO ok we havent got it yet but the supervisor
eh lets me give you start up clearance !ASP and
we will give you the atc clearance we when we
receive it so start up approved eh report your
position again please
17Data Collection (VI)Speech Interface IF2 (IV)
- Concept labeling sample
- olympic two four eight on stand eighty start up
approved with qnh one zero one niner clear to
destination athens via flight plan route nando
two golf standard departure initial flight level
one three zero on the squawk one four seven three - UNDERSTANDING RESULT
- identifierolympic248
- startup_statusSTART UP APPROVED
- destinationathens
- exit_usingnando2G
- transponder1473
- initial_flight_level130
- qnh1019
- parkingstand80
18Data Collection (VII)Speech Interface IF2 (V)
- Samples of Arrivals sentences
- airfrance one five zero zero yes swissair six
five zero vacating - klm seven zero one good morning continue approach
runway three three as number two wind calm
precedent traffic seven six seven four miles ahead
19Data Collection (VIII)Speech Interface IF2 (VI)
- Samples of Take offs sentences
- nostrum eight six one five wind two eight zero
one zero cleared take off runway three six left - speedbird four six five you are number four
behind iberia airbus three twenty on sierra
20Data Collection (IX)Speech Interface IF2 (VII)
- Samples of North tower sentences
- airnostrum eight seven two five continue via alfa
behind iberia seven five seven via kilo tango
forty i call you back hold short mike taxi way - airnostrum eight ou triple seven roger taxi via
kilo mike holding three six left and please give
way traffic mike delta spanair coming out via
mike ou ten is now crossing juliett gate
21Data Collection (X)Speech Interface IF2 (VIII)
- Samples of South tower sentences
- alitalia zero six niner are you able to enter
mike between the airfrance traffic and the
aireuropa your right side atp - sabena now proceed via in a taxi way to the left
and wait for the follow me car
22System Architecture (I) Speech Interface IF1
Spanish HMMs
Spanish dict.
One Pass
Command to UDP
Feature extraction
Recognizedcommand
One Pass
12 LPC-Cepstrumlog energy 13 ? 13 ? ?
English HMMs
English dict.
Change in main display
23System Architecture (II) Speech Interface IF2
Task dependent
Understanding module
Spanish HMMs
Spanish N-gram
Spanish dict.
Tagged dict.
Tagger
One Pass rescoring
Recognizedsentence in a certain language
Preproc.
CD rules
Feature extraction
LanguageID
One Pass rescoring
CD rules
Tag refiner
12 LPC-Cepstrumlog energy 13 ? 13 ? ?
Task dependent
English HMMs
English N-gram
English dict.
Understanding module
Conceptualframe data
24System Architecture (III)
- Preprocessing modeling
- 12 LPC cepstrum logE 13 ? 13 ??
- CMN CVN (utterance level)
- CD continuous HMMs trained with HTK
- Spanish 1509 states, 8 mixtures per state
- English 1400 states, 8 mixtures per state
- Multiple pronunciations in the dictionary!
- Training database Spanish SpeechDat
- Further adaptation Task speaker
25System Architecture (IV)
- Search First pass
- One pass Beam search (for states and for last
states) - Search space reduced to 18 w/o performance
penalty - Bigram LM guided
- Scores on demand
- Non-speech models handling (regarding LM scoring)
- Able to generates n-best output sentences
- Search Second pass
- Rescores first pass output (graph) with trigram
- Task dependent tuned LM and IWP weights
26System Architecture (V)
- Language ID
- Spanish speakers
- Great variability in canonical pronunciation
- Some words pronounced in Spanish (e.g. bravo)
- ATCs mix languages (to greet or say goodbye)
- Initial effort using well known techniques
(PPRLM, etc.) - Final system using LM score comparison!
27System Architecture (VI)
- Understanding module
- Tagger several categories per word
- Number preprocessing
- Tags refiner
- Understanding module
- Understanding module architecture used in other
tasks in our Group - Task dependent time consuming
28Evaluation (I)
- Multiple environments
- Off line, using recorded database (offline)
- With people at GTH, predefined script (online)
- With users (advanced ATC trainees)
- Predefined script (online)
- Predefined scenarios (free online)
- Subjective evaluation
- English Spanish
- Measuring
- Word accuracy rates (IF1 IF2)
- Concept accuracy rates (IF2)
29Evaluation (II)Speech Interface IF1 (I)
30Evaluation (III)Speech Interface IF1 (II)
- On line, predefined script (11 speakers)
- Spanish 50 commands (98 words/speaker)
- English 30 commands (60 words/speaker)
31Evaluation (IV)Speech Interface IF1 (III)
- On line, predefined script (11 speakers)
- Detailed error analysis
32Evaluation (V)Speech Interface IF1 (IV)
- On line, real-task test (11 speakers ATCs)
- Form with different questions (subjective)
- The system understands what I say
AVG 4.0
1 2 3 4
5 ?
33Evaluation (VI)Speech Interface IF1 (V)
- On line, real-task test (11 speakers ATCs)
- Form with different questions (subjective)
- I would use this system instead of the current
one
AVG 3.4
1 2 3 4
5 ?
34Evaluation (VII)Speech Interface IF2 (I)
- Training, adaptation rec. issues
- Spanish authorizations task (prelim. experim.)
- Full retraining is used (using only Auth. DB)
- Rescoring improves only 4 relative (20 in read
speech) ? not used in final prototype
35Evaluation (VIII)Speech Interface IF2 (II)
- Database LM statistics
- Spanish
- English
36Evaluation (IX)Speech Interface IF2 (III)
- Off/on line, word/concept recognition rates
- Spanish
- English
GTH Online 16 spks 10 snt/spk in Sp. 6
snt/spk in Eng. Read Speech!
ATC Online 7 spks 10 snt/spk in Sp. 6
snt/spk in Eng. Read Speech!
37Evaluation (X)Speech Interface IF2 (IV)
- Off/Free on line, word/concept recognition rates
- Spanish
- English
ATC Free online 7 spks scenario based 10
snt/spk in Sp. 6 snt/spk in Eng. 5 additional
OOVs
38Evaluation (XI)Speech Interface IF2 (V)
- Real-world, RT system working in tower
- Word/concept recognition rates
- Language ID rates
Real World, 205 sentences, 3433 reference
words, 588 slots. 10 addit. OOVs
39Evaluation (XII)Speech Interface IF2 (VI)
- Cross task comparison (off line)
- Spanish (average rate for all other tasks)
- English (average rate for all other tasks)
40Demo
- Start praying ?
- Wrong microphone channel
- Wrong speaker!
- IF1
- Using defined dictionary
- Only Spanish, sorry
- IF2
- Random sentences, Spanish English
- Will (try to) point out mistakes
41Conclusions (I)
- Great fun!
- Plenty of space for improvement
- Task dependent restrictions (existing
frequencies flight ids, airport layout data,
etc.) - Concept refining (current set is very broad)
- Rules development
- Speaker/gender adaptation
- More data! ?
42Conclusions (II)
- ASR Technology not ready for prime time!
- Difficult task
- We are talking about planes and people!
- Political issues
- Other applications in this field
- Non critical tasks
- Pseudo-pilots for ATCs training?
- Phraseology trainers
- Indexing
43Questions?
44EvaluationSpeech Interface 1 IF2
- Database LM statistics
- Spanish
- English
45EvaluationSpeech Interface 1 IF2
- Off line, recognition rates
- Spanish
- English
46EvaluationSpeech Interface 1 IF2
- Off line, understanding rates
- Spanish
- English
47System ArchitectureLanguage ID in IF2
- Preliminary experiments with PPRLM
- Need almost 5 seconds to achieve 96
- Bad performance in real task
- Implemented system uses LM score comparison!
48System ArchitectureUnderstanding example
- Lufthansa four three four seven clearance correct
on stand eight one next call one two one decimal
seven bye
-DATA_identifier- -single_digit-
-single_digit- -single_digit-
-single_digit- -ID_freq_change-
-DATA_correct- -garbage-
-ID_freq_change- -DATA_park-
-single_digit- -single_digit-
-garbage- -ID_standby- -ID_freq_change-
-single_digit- -single_digit-
-single_digit- -freq_decimal_point-
-single_digit- -goodbye-
-DATA_identifier-
-single_digit- -ID_freq_change-
-DATA_correct- -garbage-
-ID_freq_change- -DATA_park-
-single_digit- -garbage-
-ID_standby- -ID_freq_change-
-single_digit- -freq_decimal_point-
-single_digit- -goodbye-
49System ArchitectureUnderstanding example
- Lufthansa four three four seven clearance correct
on stand eight one next call one two one decimal
seven bye
-SLOT_identifier-
-ID_freq_change- -DATA_correct-
-DATA_park- -single_digit-
-ID_standby- -ID_freq_change-
-SLOT_freq_change- -goodbye-
-DATA_identifier-
-single_digit- -ID_freq_change-
-DATA_correct- -garbage-
-ID_freq_change- -DATA_park-
-single_digit- -garbage-
-ID_standby- -ID_freq_change-
-single_digit- -freq_decimal_point-
-single_digit- -goodbye-
-SLOT_identifier-
-ID_freq_change- -DATA_correct-
-SLOT_park_id- -ID_standby-
-ID_freq_change- -SLOT_freq_change-
-goodbye-
UNDERSTANDING RESULTS identifierlu
fthansa4347 park_idstand81 freq_change121.7