Multilingual HLT in Europe and the development of ASR - PowerPoint PPT Presentation

About This Presentation

Title:

Multilingual HLT in Europe and the development of ASR

Description:

Multilingual HLT in Europe and the development of ASR. Louis C. ... Local Languages (D. Gibbon) regional programs (Europe; Asia; Oceania; Africa; Latin America) ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 50

Provided by: louisc

Category:

more less

Transcript and Presenter's Notes

Title: Multilingual HLT in Europe and the development of ASR

1
Multilingual HLT in Europe and the development of
ASR

Louis C.W. Pols
Institute of Phonetic Sciences
University of Amsterdam
The Netherlands

PRASA2001 Franschhoek, South Africa 30 Nov.
2001, keynote
2
Some history

Liesbeth Botha spent half a year at our institute
during second half of 1996
ever since the possible organization of a
workshop or a major conference in South Africa
was considered
(cancelled) AST Workshop on Human Language
Technologies for E-Governance in a Multilingual
Society, Stellenbosch
PRASA2001 Franschhoek, 29-30 Nov., incl.
Speech Processing and AST project
I always wanted to visit South Africa!

3
Overview

Multilingual Europe (vs. Multilingual South
Africa)
EU Framework Programs Human Language Technology
(HLT)
Other (European) programs and organizations
ISCA
Dutch speech database initiatives (vs. AST)
Speech science and technology ASR development
Academia (knowledge) and industry (applications)
Conclusions

4
Multilingual Europe

Europe (West, Central, East)
EU-countries
Candidate-EU-countries
Schengen countries (internally no boundary
control)
Euro countries (300 M people)
many nations and even more languages
multilingual community and (open) market
e-commerce, telebanking, infokiosk, etc.

5
(No Transcript)
6
(No Transcript)
7
EU Framework Program FP5

Human Language Technologies RTD (HLT)
http//www.hltcentral.org/
part of Information Society Technologies (IST),
Key Action III (Multimedia Contents and
Tools)
part of fifth Framework Program 98-02 (FP5)
IST 3600 M (26.5 of FP5) HLT 125 M
HLT Multilingual communication Natural
Interactivity Cross-lingual information
management Support Accompanying Measures

8
6th Framework program

FP6 (02-06) the way forward
proposal published Febr. 2001
one of 7 priority themes
Information Society Technologies
also networks of excellence
IST budget 3600 M

9
Complaints from academia

too much application user oriented
little room for research (reaction Commission it
is time for HLT to show its usefulness!),
but .... pendulum swings!
speech data not freely available (only with
delay and at (high) costs via ELRA)
still several very interesting projects
we participated before (SAM, EuroCocosda,
somewhat in SpeechDat) but barely anymore, but
(KPN Research and) Nijmegen University still do

10
Some HLT speech projects

C-ORAL-ROM Integrated Reference Corpora for
Spoken Romance Languages (1/01, 36 mo)
CORETEX Improving Core Speech Recognition
Technology (4/00, 36 mo)
I-EYE Interacting with Eyes Gaze Assisted Access
to Information in Multiple Languages (1/00, 30
mo)
NESPOLE! NEgotiating through SPOken Lang. in
E-comm. (1/00, 30 mo)
SIRIDUS Specification, Interaction and
Reconfiguration In Dialogue Understanding Systems
(1/00, 36 mo)
SMADA Sp. Driven Multimodal Automatic Directory
Assist. (1/00, 36 mo) (finalizing ITRW Advanced
ASR for Telecom Appl., Nov. 2002, Avignon)
SPEECON Sp. Driven Interfaces for Consumer
Applications (2/00, 24 mo)

11
Some past HLT projects

ARISE Automatic Railway Systems for Europe
(10/96, 24 mo)
CAVE Caller Verification in Bank and
Telecommunication (11/95, 24 mo)
EAGLES Expert Advisory Group on Language
Engineering Standards (11/97, 24 mo)
ELRA European Language Resources Association
(9/95, 50 mo)
ELSE Evaluation in Language and Speech
Engineering (1/98, 16 mo)
SPEECHDAT Speech Databases for Creation of Voice
Driven Teleservices (3/96, 34 mo)
SPEECHDAT-CAR (3/98, 30 mo) variants
VODIS Advanced Speech Technologies for
Voice-operated Driver Information Systems (11/95,
43 mo)

12
some HLT support projects

CLASS Collaboration in Language and Speech
Science and technology (Int. WS on
Information Presentation and Natural Multimodal
Dialogue, Verona Italy, Dec 14-15, 2001)
ELSNET-HLT The European Network of Excellence in
Human Language Technologies
HOPE HLT Opportunity Promotion in Europe, Euromap
ISLE-HLT Int. Standards for Language Engineering
(Eagles follow-up) incl. I/O Meta Data Initiative
(IMDI), see also COREX

13
eContent

eContent part of eEurope initiative
European Digital Content on the Global Networks,
01-05, 100 M, 1st call 3/2001
Action Line 2 (AL2) addresses the intersection of
the content and language industries, more
specifically the design, production and
distribution of high-quality European digital
content for the global networks in an
increasingly multilingual and multicultural
socio-economic environment
http//www.hltcentral.org/econtent/

14
MLIS

Multilingual Information Society Program
Supporting the creation of a framework of
services for European language resources
Encouraging the use of language technologies,
resources and standards
Promoting the use of advanced language tools in
the Community and Member States public sector
one call in June 99, 15 M, some 30 proj.
f.i. NL-TRANSLEX Machine Translation for Dutch
and English/French/German

15
INTAS

International Association for the promotion of
co-operation with scientists from the New
Independent States of the former Soviet Union
(NIS)
established June 1993
Open Thematic Call 2000 (budget 16 M )
max budget 150 k/project (max 30 k/NIS partner)
INTAS 915 Spontaneous Speech of Typologically
Unrelated Languages (Russian, Finnish and Dutch)
Comparison of Phonetic Properties (90 k, 7/01,
36 mo)

16
Euromap

HLT Opportunity Promotion in Europe (HOPE)
(2/00, 24 mo, 8 national focus points)
to raise awareness of the benefits of human
language technologies (HLT) with companies,
organizations and users to accelerate technology
transfer from the research base to the market to
stimulate community building in specific domains
(tourism and e-commerce).
General http//www.hltcentral.org/euromap/
Dutch site http//www.taalunieversum.org/tst/en/

17
European Language Resources Association

A non-profit organization to promote the
creation, verification, and distribution of
language resources.
US counterpart LDC
173 resources sold in 2000.
organizer of LREC conferences (third one in May
2002 in Las Palmas, Spain)
speech related resources 200
written resources 145
terminological resources
tools and software
http//www.icp.grenet.fr/ELRA/home.html

18
ELSNET

European Network of Excellence in Human Language
Technologies
one of the 20 networks within FP5
Transfer of knowledge and expertise Shared
goals Evaluation Shared language resources
Promotion of best practice Interoperability by
means of standardization
yearly Elsnet Summer Schools July 15-26,
2002 Odense, Denmark, Evaluation and Assessment
of Text and Speech Systems
Newsletter Elsnews http//www.elsnet.org

19
COCOSDA

Internat. organization for coordinating the
globalized efforts in spoken language resources
and sp. technology evaluation
yearly, jointly, with Eurospeech and ICSLP since
Chiavari, Italy, Sept. 91 (Eurosp.91) and
before Oriental Cocosda
topic domains
Evaluation of Speech Underst. and Dialogue
Systems (W. Minker)
Multi-modal corpora (S. Nakamura)
Corpus Annotation Tools (S. Bird)
Local Languages (D. Gibbon)
regional programs (Europe Asia Oceania Africa
Latin America)
data center representatives (LDC, S. Bird ELRA,
K. Choukri)
http//www.itl.atr.co.jp/cocosda

20
COCOSDA matrix
21
COST

European Cooperation in the field of Scientific
and Technical Research (60 k per
action, for additional costs only)
COST 249 Continuous Speech Recognition over the
Telephone (19 countries start 5/94 6 yrs final
report)
COST 250 Speaker Recognition in Telephony
COST 258 The Naturalness of Synthetic Speech
COST 277 Nonlinear Speech Processing
COST 278 Spoken Language Interaction in
Telecommun.
http//cost.cordis.lu/src/home.cfm

22
EURESCOM

the European Institute for Research and Strategic
Studies in Telecommunications
20 shareholders from 19 European countries (major
European network operators and service providers)
f.i. MUST - MUltimodal, multilingual information
Services with small mobile Terminals (P1104)

23
ISCA

European Speech Comm. Association founded in 88
from ESCA to ISCA at Eurospeech99 in Budapest
membership organization
organizer of Eurospeech/ICSLP - Interspeech
organizer of specialized workshops (ITRWs)
Special interest groups (SIGs)
Speech Communication Journal (http//www.elsevier.
com/locate/specom)
http//www.isca-speech.org/

24
Eurospeech-ICSLP-Interspeech

odd years (Eurospeech) even years (ICSLP)
(in Europe) (elsewhere)
1 Paris 89 Kobe 90
2 Genoa 91 Banff 92
3 Berlin 93 Yokohama 94
4 Madrid 95 Philadelphia 96
5 Rhodes 97 Sydney 98
6 Budapest 99 Beijing 00
7 Aalborg 01 Denver 02
8 Geneva 03 Seoul 04
9 Lisbon 05 ?? 06

past
future
25
ISCA SIGs

Speech Synthesis - SynSig
Audio Visual Speech - AVISA
Speech And Language Technology for MInority
Languages - SALTMIL
Integration of Speech Technology in (Language)
Learning - InSTIL
SPeaker and Language Characterization - SPLC
Education in the Field of Speech Communication -
EduSIG
Speech Prosody - SProSIG
Dialogue Processing - SigDial (also within ACL)
Groupe Francophone de la Communication Parlée -
GFCP

26
ISCA ITRWs (forthcoming)

Prosody in Speech Recognition and Understanding -
Prosody 2001Molly Pitcher Inn, Red Bank, NJ.
October 22-24, 2001
TIPS - Temporal Integration in the Perception of
Speech Aix-en-Provence, France, 8-10 April 2002
Multi-Modal Dialogue in Mobile Environments
Kloster Irsee, Germany, June 17-21, 2002
Advanced ASR for Telecom Applications Palais des
Papes, Avignon, France, November 27-29, 2002
Supported but not organized by ISCA
2001 International Workshop on Automatic Sp.
Recogn. and Underst. Madonna di Campiglio
(Trento), Italy, December 9-13, 2001
Speech Prosody 2002 Aix-en-Provence, France,
11-13 April, 2002

27
IEEE

IEEE Signal Processing Society
MMSP01, Workshop on Multimedia Signal
Processing, Cannes, France, October 3-5, 2001
ASRU01, Automatic Speech Recognition and
Understanding Workshop, Madonna de Campiglio
(Trento), Italy, December 9-13, 2001
2002 International Workshop on Multimedia Signal
Processing, US Virgin islands, December 9-11,
2002
IEEE Trans. on Signal Processing / Speech and
Audio Processing / Multimedia / Neural Networks
http//www.ieee.org/

28
DARPA NIST

DARPA Projects and Yearly evaluations
CSR (Continuous Speech Recognition)
LVCSR (Large Vocabulary Conversational Speech
Recognition)
ATIS (Air Travel Information System)
Language Recognition (Identification and
Verification)
Speaker Recognition (Identification and
Verification)

29
NATO-ASI

ASI Advanced Study Institute
many different domains
certain restrictions on NATO vs. non-NATO
participants, free registration, some funding
Dynamics of Speech Production and Perception,
Il Ciocci, Italy, June 23 July 6, 2002
send application before Jan. 15, 2002 to
asi2001_at_ebire.org
Organizing Cee. Pierre L. Divenyi Klára Vicsi

30
European national programs

German Verbmobil SmartKom (since 9/99) Bavarian
Archive for Speech Signals (BAS)
Spoken Dutch Corpus
French AUP
Swedish Centre for Speech Technology (CTT)
Swedish National Graduate School in Language
Technology (GSLT)

31
Dutch speech database initiatives

Speech Processing Expertise Center SPEX
5,000 speakers Polyphone
1,000 speakers SpeechDat variants
NWO Priority program TST-OVIS (public
transportation information system over telephone)
1,000 hrs CGN (Dutch-Flemish)
5.5 hrs open source IFA-corpus
TST Platform
ToDI (Transcription of Dutch Intonation)

32
Spoken Dutch Corpus

4.6 M, 5 yrs, 10 M words, 1000 hrs of speech
Corpus design and compilation
Recording and digitization
Orthographic transcription (all)
Lemmatization and POS tagging (all)
Lexicon link-up (all)
Broad phonetic transcription (1 M)
Word segmentation (1 M)
Syntactic annotation (1 M)
Prosodic annotation (250 k)
Development of exploitation software COREX
http//lands.let.kun.nl/cgn/home.htm

33
IFA corpus

5.5 hrs of high-quality-recorded speech
4 male and 4 female speakers
more than 30 min. per speaker
various speaking styles per speaker
from conversational and read speech, to isolated
sentences, words and syllables
everything phonemically segmented labeled
free access via SQL query language
http//www.fon.hum.uva.nl/IFAcorpus

34
Speech science and speech technology

we should try to bridge that gap
see my keynotes at ICPhS 99 and Eurospeech01
Flexible, robust and efficient human speech
processing versus present-day speech technology
Acquiring and implementing phonetic knowledge
we have to understand each other in order to be
able to communicate and to contribute
probabilistic vs. knowledge driven
adding (multiple) knowledge (sources) to improve
performance
much knowledge in speech databases

35
Phonetics ?? Speech Techn.
36
Do recognizers need intelligent ears?

intelligent ears ? front-end pre-processor
only if it improves performance
humans are generally better speech processors
than machines, perhaps system developers can
learn from human behavior
robustness at stake (noise, reverberation,
incompleteness, restoration, competing speakers,
variable speaking rate, context, dialects,
non-nativeness, style, emotion)

37
What is (phonetic) knowledge?

phonetic textbook knowledge
probabilistic knowledge from databases
fixed set of features vs. adaptable set
trading relations, selectivity
knowledge of the world, expectation
global vs. detailed

38
How good ishuman/machine speech recogn.?
39
Human vs. machine (ASR)

machine surprisingly good for certain tasks
machine could be better for many others
robustness, outliers
what are the limits of human performance?
in noise
for degraded speech
missing information (trading)

40
Human word intelligibility vs. noise
41
Robustness to degraded speech

speech time-modulated signal in frequency bands
relatively insensitive to (spectral) distortions
prerequisite for digital hearing aid
modulating spectral slope -5 to 5 dB/oct,
0.25-2 Hz
temporal smearing of envelope modulation
ca. 4 Hz max. in modulation spectrum ? syllable
LPgt4 Hz and HPlt8 Hz little effect on
intelligibility
spectral envelope smearing
for BWgt1/3 oct masked SRT starts to degrade

42
Robustness to degraded speechand missing
information

partly reversed speech (Saberi Perrott,
Nature, 4/99)
fixed duration segments time reversed or shifted
in time perfect sentence intelligibility up to
50 ms (demo every 50 ms reversed original )
low frequency modulation envelope (3-8 Hz) vs.
acoustic spectrum
syllable as information unit? (S. Greenberg)
gap and click restoration (Warren)
gating experiments

43
Desired pre-processor characteristics in ASR

basic sensitivity for stationary and dynamic
sounds
robustness to degraded speech
rather insensitive to spectral and temporal
smearing
robustness to noise and reverberation
filter characteristics
is BP, PLP, MFCC, RASTA, TRAPS good enough?
lateral inhibition (spectral sharpening)
dynamics
what can be neglected?
non-linearities, limited dynamic range, active
elements, co-modulation, secondary pitch, etc.

44
Caricature of present-day speech recognizers

fixed pre-processor, fixed features
trained with a variety of speech input
much global information, but ..... no
interrelations
monaural, uni-modal input
pitch extractor generally not operational
performs well on average behavior
but ..... does poorly on any type of outlier
(OOV, non-native, fast or whispered speech, other
communication channel, new topic, new speaker)
neglects lots of useful (phonetic) information
heavily relies on language model

45
Useful information durational variability
Adopted from Wang (1998)
46
Academia (knowledge) and industry (applications)

what do industry and universities expect from
each other? (panel discussion at E01)
proper education and training ? E-masters
good exchange between academia industry
participation in joint projects ? speech DB
adapt to requirements ? CAIP Symposium
open source approach ? Linux, praat, HTK
complaints sometimes bad management and high
risk (puts HLT in bad spotlight, e.g. LH)

47
Information Technology for Homeland Security

Center for Advanced Information Processing,
CAIP Symposium, Rutgers Univ., Nov. 29
subsequent to events of Sept. 11, CAIP modified
its traditional Annual Research Review
Symposium identifies issues in Homeland Security
and encourages research, particularly with
university-industry cooperation
e.g., biometric and voice identification fusing
voice and face data multimodal interfaces for
asset deployment face-tracking for
identification microphone array for speaker
tracking

48
E-masters inLanguage and Speech

Course Content
Theoretical Linguistics
Natural Language Processing
Phonetics and Phonology
Cognitive models for speech language processing
Speech signal processing
Pattern recognition
Language engineering applications
http//www.cstr.ed.ac.uk/euromasters/

49
Conclusions

collecting speech corpora in national languages
(like in SA) is and excellent basis, both for
research and for applications
combine industrial and academic skills
make proper use of experiences elsewhere
thats why we are all here at this workshop!
good luck and thank you for your attention

Write a Comment

User Comments (0)