INVOCA Project Speech Interfaces for Air Traffic Control Tasks - PowerPoint PPT Presentation

About This Presentation
Title:

INVOCA Project Speech Interfaces for Air Traffic Control Tasks

Description:

Multiple pronunciations in the dictionary! Training database: Spanish SpeechDat ... dictionary. Only Spanish, sorry. IF2: Random sentences, Spanish & English ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 50
Provided by: javiermac
Category:

less

Transcript and Presenter's Notes

Title: INVOCA Project Speech Interfaces for Air Traffic Control Tasks


1
INVOCA ProjectSpeech Interfaces for Air Traffic
Control Tasks
Javier Macías-Guarasa Speech Technology Group
(GTH) Department of Electronic Engineering E.T.S.I
. Telecomunicación (ETSIT) Universidad
Politécnica de Madrid (UPM)
2
Overview
  • Introduction
  • Tasks (applications, prototypes)
  • Data collection
  • System architecture technical details
  • Evaluation
  • Demo
  • Conclusions

3
Introduction (I)
  • INVOCA
  • Speech Interfaces for Air Traffic Control
    INterfaces VOcales para Control de tráfico Aéreo
  • Project proposal
  • AENA Spanish Airports and Air Navigation
  • Speech Technology Group ETSIT-UPM
  • Exploratory project ? technology evaluation
  • Analyze the state of the art of speech
    recognition technology and its applications to
    air traffic control tasks
  • Feasibility study to be integrated in SACTA?
    (SACTA Advanced System for Air Traffic Control)

4
Introduction (II)
  • SACTA

5
Introduction (III)
  • People _at_ GTH
  • José M. Pardo
  • Javier Ferreiros
  • José Colás
  • Fernando Fernández
  • Valentín Sama
  • Ricardo de Córdoba
  • Juan M. Montero
  • Javier Macías
  • José D. Romeral
  • More people _at_ GTH
  • Sergio Díaz
  • María J. Pozuelo
  • Gregoire Prime
  • Jordi Safont
  • Eduardo Campos
  • et al.!
  • AENA Staff
  • Germán González
  • Myriam Santamaría

6
Tasks (I)
  • Identifying suitable target applications (tasks)
    within the SACTA environment
  • Air traffic controllers (ATCs) in control towers
    in Barajas (Madrid airport)
  • Feasible tasks
  • Useful tasks
  • Outcome
  • Isolated word recognition ? IF1
  • Spontaneous speech recognition understanding
    ? IF2

7
Tasks (II)Speech Interface IF1 (I)
  • Target
  • Air Traffic Controllers (ATCs) in control towers
  • Must keep an eye on traffic around the airport
  • Feasibility of CC speech interfaces to help them
    in handling complex control systems?
  • Application
  • Hard to identify in current SACTA status
  • Instead replace FOCUCS system (tactile display)
    to control main display visualization

8
Tasks (III)Speech Interface IF1 (II)
9
Tasks (IV)Speech Interface IF1 (III)
  • Prototype architecture

10
Tasks (V)Speech Interface IF2 (I)
  • Target
  • Air Traffic Controllers (ATCs) in control towers
  • ATCs provide aircraft pilots with instructions
    regarding flight level, transponder code, etc.
  • Some data must/should be entered in the computer
    system
  • Application
  • Detect key concepts (slots) and associated data
    values in controller ? pilot radio
    communication

11
Tasks (VI)Speech Interface IF2 (II)
  • IF2 subtasks Five, one for every control
    position in Barajas Airport
  • Arrivals
  • Authorizations
  • North tower
  • South tower
  • Take offs
  • IF1 IF2 both handling Spanish and English
    (spoken by Spaniards!)

12
Data Collection (I)Standard databases
  • Spanish SpeechDat (M FDB)
  • Telephone Speech but ATC radio channels are
    band limited!
  • 4000 Speakers, isolated continuous read speech
  • Digits, isolated words, digit strings,
    phonetically rich sentences, etc. (40 items per
    speaker)
  • But not related to the task
  • Need more data!
  • For adaptation
  • For full retraining?

13
Data Collection (II)Speech Interface IF1
  • Read isolated words in the task domain
  • 16KHz, 16 bits linear (downsample to 8KHz.)
  • 30 speakers (15 male 15 female)
  • 5 repetitions of every command in the FOCUCS
    task vocabulary 228 SP / 176 EN

14
Data Collection (III)Speech Interface IF2 (I)
  • Real recordings controller ? pilot
  • 16KHz, 16 bits linear, downsample to 8KHz.
  • Stereo recording speech PTT signal

33htotal 6s/sent 16 wrds/sent
15
Data Collection (IV)Speech Interface IF2 (II)
  • Process
  • Recording chunks of 15 minutes, continuously
  • Segmenting in sentences PTT ? easyAverage real
    speech contents 16.4
  • Transcribing
  • Hard, specially in English
  • Label pauses, respiration and aspiration, tongue,
    unidentified noise, click, cough
  • Also concept labeling

16
Data Collection (V)Speech Interface IF2 (III)
  • Samples of Authorizations sentences
  • thai niner four three start up approved qnh one
    zero one eight !P clear eh !LP fiumicino via
    flight plan route !P eh nando !P one charlie
    departure squawk !P on one four two six
  • alitalia zero six nine roger start up approved
    and according slot one zero one eight and clear
    to !P milan malpensa airport via pinar one !P
    bravo departure squawk one four one six
  • !RUIDO ok we havent got it yet but the supervisor
    eh lets me give you start up clearance !ASP and
    we will give you the atc clearance we when we
    receive it so start up approved eh report your
    position again please

17
Data Collection (VI)Speech Interface IF2 (IV)
  • Concept labeling sample
  • olympic two four eight on stand eighty start up
    approved with qnh one zero one niner clear to
    destination athens via flight plan route nando
    two golf standard departure initial flight level
    one three zero on the squawk one four seven three
  • UNDERSTANDING RESULT
  • identifierolympic248
  • startup_statusSTART UP APPROVED
  • destinationathens
  • exit_usingnando2G
  • transponder1473
  • initial_flight_level130
  • qnh1019
  • parkingstand80

18
Data Collection (VII)Speech Interface IF2 (V)
  • Samples of Arrivals sentences
  • airfrance one five zero zero yes swissair six
    five zero vacating
  • klm seven zero one good morning continue approach
    runway three three as number two wind calm
    precedent traffic seven six seven four miles ahead

19
Data Collection (VIII)Speech Interface IF2 (VI)
  • Samples of Take offs sentences
  • nostrum eight six one five wind two eight zero
    one zero cleared take off runway three six left
  • speedbird four six five you are number four
    behind iberia airbus three twenty on sierra

20
Data Collection (IX)Speech Interface IF2 (VII)
  • Samples of North tower sentences
  • airnostrum eight seven two five continue via alfa
    behind iberia seven five seven via kilo tango
    forty i call you back hold short mike taxi way
  • airnostrum eight ou triple seven roger taxi via
    kilo mike holding three six left and please give
    way traffic mike delta spanair coming out via
    mike ou ten is now crossing juliett gate

21
Data Collection (X)Speech Interface IF2 (VIII)
  • Samples of South tower sentences
  • alitalia zero six niner are you able to enter
    mike between the airfrance traffic and the
    aireuropa your right side atp
  • sabena now proceed via in a taxi way to the left
    and wait for the follow me car

22
System Architecture (I) Speech Interface IF1
Spanish HMMs
Spanish dict.
One Pass
Command to UDP
Feature extraction
Recognizedcommand
One Pass
12 LPC-Cepstrumlog energy 13 ? 13 ? ?
English HMMs
English dict.
Change in main display
23
System Architecture (II) Speech Interface IF2
Task dependent
Understanding module
Spanish HMMs
Spanish N-gram
Spanish dict.
Tagged dict.
Tagger
One Pass rescoring
Recognizedsentence in a certain language
Preproc.
CD rules
Feature extraction
LanguageID
One Pass rescoring
CD rules
Tag refiner
12 LPC-Cepstrumlog energy 13 ? 13 ? ?
Task dependent
English HMMs
English N-gram
English dict.
Understanding module
Conceptualframe data
24
System Architecture (III)
  • Preprocessing modeling
  • 12 LPC cepstrum logE 13 ? 13 ??
  • CMN CVN (utterance level)
  • CD continuous HMMs trained with HTK
  • Spanish 1509 states, 8 mixtures per state
  • English 1400 states, 8 mixtures per state
  • Multiple pronunciations in the dictionary!
  • Training database Spanish SpeechDat
  • Further adaptation Task speaker

25
System Architecture (IV)
  • Search First pass
  • One pass Beam search (for states and for last
    states)
  • Search space reduced to 18 w/o performance
    penalty
  • Bigram LM guided
  • Scores on demand
  • Non-speech models handling (regarding LM scoring)
  • Able to generates n-best output sentences
  • Search Second pass
  • Rescores first pass output (graph) with trigram
  • Task dependent tuned LM and IWP weights

26
System Architecture (V)
  • Language ID
  • Spanish speakers
  • Great variability in canonical pronunciation
  • Some words pronounced in Spanish (e.g. bravo)
  • ATCs mix languages (to greet or say goodbye)
  • Initial effort using well known techniques
    (PPRLM, etc.)
  • Final system using LM score comparison!

27
System Architecture (VI)
  • Understanding module
  • Tagger several categories per word
  • Number preprocessing
  • Tags refiner
  • Understanding module
  • Understanding module architecture used in other
    tasks in our Group
  • Task dependent time consuming

28
Evaluation (I)
  • Multiple environments
  • Off line, using recorded database (offline)
  • With people at GTH, predefined script (online)
  • With users (advanced ATC trainees)
  • Predefined script (online)
  • Predefined scenarios (free online)
  • Subjective evaluation
  • English Spanish
  • Measuring
  • Word accuracy rates (IF1 IF2)
  • Concept accuracy rates (IF2)

29
Evaluation (II)Speech Interface IF1 (I)
  • Off line, Main results

30
Evaluation (III)Speech Interface IF1 (II)
  • On line, predefined script (11 speakers)
  • Spanish 50 commands (98 words/speaker)
  • English 30 commands (60 words/speaker)

31
Evaluation (IV)Speech Interface IF1 (III)
  • On line, predefined script (11 speakers)
  • Detailed error analysis

32
Evaluation (V)Speech Interface IF1 (IV)
  • On line, real-task test (11 speakers ATCs)
  • Form with different questions (subjective)
  • The system understands what I say

AVG 4.0
1 2 3 4
5 ?
33
Evaluation (VI)Speech Interface IF1 (V)
  • On line, real-task test (11 speakers ATCs)
  • Form with different questions (subjective)
  • I would use this system instead of the current
    one

AVG 3.4
1 2 3 4
5 ?
34
Evaluation (VII)Speech Interface IF2 (I)
  • Training, adaptation rec. issues
  • Spanish authorizations task (prelim. experim.)
  • Full retraining is used (using only Auth. DB)
  • Rescoring improves only 4 relative (20 in read
    speech) ? not used in final prototype

35
Evaluation (VIII)Speech Interface IF2 (II)
  • Database LM statistics
  • Spanish
  • English

36
Evaluation (IX)Speech Interface IF2 (III)
  • Off/on line, word/concept recognition rates
  • Spanish
  • English

GTH Online 16 spks 10 snt/spk in Sp. 6
snt/spk in Eng. Read Speech!
ATC Online 7 spks 10 snt/spk in Sp. 6
snt/spk in Eng. Read Speech!
37
Evaluation (X)Speech Interface IF2 (IV)
  • Off/Free on line, word/concept recognition rates
  • Spanish
  • English

ATC Free online 7 spks scenario based 10
snt/spk in Sp. 6 snt/spk in Eng. 5 additional
OOVs
38
Evaluation (XI)Speech Interface IF2 (V)
  • Real-world, RT system working in tower
  • Word/concept recognition rates
  • Language ID rates

Real World, 205 sentences, 3433 reference
words, 588 slots. 10 addit. OOVs
39
Evaluation (XII)Speech Interface IF2 (VI)
  • Cross task comparison (off line)
  • Spanish (average rate for all other tasks)
  • English (average rate for all other tasks)

40
Demo
  • Start praying ?
  • Wrong microphone channel
  • Wrong speaker!
  • IF1
  • Using defined dictionary
  • Only Spanish, sorry
  • IF2
  • Random sentences, Spanish English
  • Will (try to) point out mistakes

41
Conclusions (I)
  • Great fun!
  • Plenty of space for improvement
  • Task dependent restrictions (existing
    frequencies flight ids, airport layout data,
    etc.)
  • Concept refining (current set is very broad)
  • Rules development
  • Speaker/gender adaptation
  • More data! ?

42
Conclusions (II)
  • ASR Technology not ready for prime time!
  • Difficult task
  • We are talking about planes and people!
  • Political issues
  • Other applications in this field
  • Non critical tasks
  • Pseudo-pilots for ATCs training?
  • Phraseology trainers
  • Indexing

43
Questions?
44
EvaluationSpeech Interface 1 IF2
  • Database LM statistics
  • Spanish
  • English

45
EvaluationSpeech Interface 1 IF2
  • Off line, recognition rates
  • Spanish
  • English

46
EvaluationSpeech Interface 1 IF2
  • Off line, understanding rates
  • Spanish
  • English

47
System ArchitectureLanguage ID in IF2
  • Preliminary experiments with PPRLM
  • Need almost 5 seconds to achieve 96
  • Bad performance in real task
  • Implemented system uses LM score comparison!

48
System ArchitectureUnderstanding example
  • Lufthansa four three four seven clearance correct
    on stand eight one next call one two one decimal
    seven bye

-DATA_identifier- -single_digit-
-single_digit- -single_digit-
-single_digit- -ID_freq_change-
-DATA_correct- -garbage-
-ID_freq_change- -DATA_park-
-single_digit- -single_digit-
-garbage- -ID_standby- -ID_freq_change-
-single_digit- -single_digit-
-single_digit- -freq_decimal_point-
-single_digit- -goodbye-
-DATA_identifier-
-single_digit- -ID_freq_change-
-DATA_correct- -garbage-
-ID_freq_change- -DATA_park-
-single_digit- -garbage-
-ID_standby- -ID_freq_change-
-single_digit- -freq_decimal_point-
-single_digit- -goodbye-
49
System ArchitectureUnderstanding example
  • Lufthansa four three four seven clearance correct
    on stand eight one next call one two one decimal
    seven bye

-SLOT_identifier-
-ID_freq_change- -DATA_correct-
-DATA_park- -single_digit-
-ID_standby- -ID_freq_change-
-SLOT_freq_change- -goodbye-
-DATA_identifier-
-single_digit- -ID_freq_change-
-DATA_correct- -garbage-
-ID_freq_change- -DATA_park-
-single_digit- -garbage-
-ID_standby- -ID_freq_change-
-single_digit- -freq_decimal_point-
-single_digit- -goodbye-
-SLOT_identifier-
-ID_freq_change- -DATA_correct-
-SLOT_park_id- -ID_standby-
-ID_freq_change- -SLOT_freq_change-
-goodbye-
UNDERSTANDING RESULTS identifierlu
fthansa4347 park_idstand81 freq_change121.7
Write a Comment
User Comments (0)
About PowerShow.com