Multilingual HLT in Europe and the development of ASR - PowerPoint PPT Presentation

About This Presentation
Title:

Multilingual HLT in Europe and the development of ASR

Description:

Multilingual HLT in Europe and the development of ASR. Louis C. ... Local Languages (D. Gibbon) regional programs (Europe; Asia; Oceania; Africa; Latin America) ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 50
Provided by: louisc
Category:

less

Transcript and Presenter's Notes

Title: Multilingual HLT in Europe and the development of ASR


1
Multilingual HLT in Europe and the development of
ASR
  • Louis C.W. Pols
  • Institute of Phonetic Sciences
  • University of Amsterdam
  • The Netherlands

PRASA2001 Franschhoek, South Africa 30 Nov.
2001, keynote
2
Some history
  • Liesbeth Botha spent half a year at our institute
    during second half of 1996
  • ever since the possible organization of a
    workshop or a major conference in South Africa
    was considered
  • (cancelled) AST Workshop on Human Language
    Technologies for E-Governance in a Multilingual
    Society, Stellenbosch
  • PRASA2001 Franschhoek, 29-30 Nov., incl.
    Speech Processing and AST project
  • I always wanted to visit South Africa!

3
Overview
  • Multilingual Europe (vs. Multilingual South
    Africa)
  • EU Framework Programs Human Language Technology
    (HLT)
  • Other (European) programs and organizations
  • ISCA
  • Dutch speech database initiatives (vs. AST)
  • Speech science and technology ASR development
  • Academia (knowledge) and industry (applications)
  • Conclusions

4
Multilingual Europe
  • Europe (West, Central, East)
  • EU-countries
  • Candidate-EU-countries
  • Schengen countries (internally no boundary
    control)
  • Euro countries (300 M people)
  • many nations and even more languages
  • multilingual community and (open) market
  • e-commerce, telebanking, infokiosk, etc.

5
(No Transcript)
6
(No Transcript)
7
EU Framework Program FP5
  • Human Language Technologies RTD (HLT)
  • http//www.hltcentral.org/
  • part of Information Society Technologies (IST),
    Key Action III (Multimedia Contents and
    Tools)
  • part of fifth Framework Program 98-02 (FP5)
  • IST 3600 M (26.5 of FP5) HLT 125 M
  • HLT Multilingual communication Natural
    Interactivity Cross-lingual information
    management Support Accompanying Measures

8
6th Framework program
  • FP6 (02-06) the way forward
  • proposal published Febr. 2001
  • one of 7 priority themes
  • Information Society Technologies
  • also networks of excellence
  • IST budget 3600 M

9
Complaints from academia
  • too much application user oriented
  • little room for research (reaction Commission it
    is time for HLT to show its usefulness!),
    but .... pendulum swings!
  • speech data not freely available (only with
    delay and at (high) costs via ELRA)
  • still several very interesting projects
  • we participated before (SAM, EuroCocosda,
    somewhat in SpeechDat) but barely anymore, but
    (KPN Research and) Nijmegen University still do

10
Some HLT speech projects
  • C-ORAL-ROM Integrated Reference Corpora for
    Spoken Romance Languages (1/01, 36 mo)
  • CORETEX Improving Core Speech Recognition
    Technology (4/00, 36 mo)
  • I-EYE Interacting with Eyes Gaze Assisted Access
    to Information in Multiple Languages (1/00, 30
    mo)
  • NESPOLE! NEgotiating through SPOken Lang. in
    E-comm. (1/00, 30 mo)
  • SIRIDUS Specification, Interaction and
    Reconfiguration In Dialogue Understanding Systems
    (1/00, 36 mo)
  • SMADA Sp. Driven Multimodal Automatic Directory
    Assist. (1/00, 36 mo) (finalizing ITRW Advanced
    ASR for Telecom Appl., Nov. 2002, Avignon)
  • SPEECON Sp. Driven Interfaces for Consumer
    Applications (2/00, 24 mo)

11
Some past HLT projects
  • ARISE Automatic Railway Systems for Europe
    (10/96, 24 mo)
  • CAVE Caller Verification in Bank and
    Telecommunication (11/95, 24 mo)
  • EAGLES Expert Advisory Group on Language
    Engineering Standards (11/97, 24 mo)
  • ELRA European Language Resources Association
    (9/95, 50 mo)
  • ELSE Evaluation in Language and Speech
    Engineering (1/98, 16 mo)
  • SPEECHDAT Speech Databases for Creation of Voice
    Driven Teleservices (3/96, 34 mo)
  • SPEECHDAT-CAR (3/98, 30 mo) variants
  • VODIS Advanced Speech Technologies for
    Voice-operated Driver Information Systems (11/95,
    43 mo)

12
some HLT support projects
  • CLASS Collaboration in Language and Speech
    Science and technology (Int. WS on
    Information Presentation and Natural Multimodal
    Dialogue, Verona Italy, Dec 14-15, 2001)
  • ELSNET-HLT The European Network of Excellence in
    Human Language Technologies
  • HOPE HLT Opportunity Promotion in Europe, Euromap
  • ISLE-HLT Int. Standards for Language Engineering
    (Eagles follow-up) incl. I/O Meta Data Initiative
    (IMDI), see also COREX

13
eContent
  • eContent part of eEurope initiative
  • European Digital Content on the Global Networks,
    01-05, 100 M, 1st call 3/2001
  • Action Line 2 (AL2) addresses the intersection of
    the content and language industries, more
    specifically the design, production and
    distribution of high-quality European digital
    content for the global networks in an
    increasingly multilingual and multicultural
    socio-economic environment
  • http//www.hltcentral.org/econtent/

14
MLIS
  • Multilingual Information Society Program
  • Supporting the creation of a framework of
    services for European language resources
  • Encouraging the use of language technologies,
    resources and standards
  • Promoting the use of advanced language tools in
    the Community and Member States public sector
  • one call in June 99, 15 M, some 30 proj.
  • f.i. NL-TRANSLEX Machine Translation for Dutch
    and English/French/German

15
INTAS
  • International Association for the promotion of
    co-operation with scientists from the New
    Independent States of the former Soviet Union
    (NIS)
  • established June 1993
  • Open Thematic Call 2000 (budget 16 M )
  • max budget 150 k/project (max 30 k/NIS partner)
  • INTAS 915 Spontaneous Speech of Typologically
    Unrelated Languages (Russian, Finnish and Dutch)
    Comparison of Phonetic Properties (90 k, 7/01,
    36 mo)

16
Euromap
  • HLT Opportunity Promotion in Europe (HOPE)
    (2/00, 24 mo, 8 national focus points)
  • to raise awareness of the benefits of human
    language technologies (HLT) with companies,
    organizations and users to accelerate technology
    transfer from the research base to the market to
    stimulate community building in specific domains
    (tourism and e-commerce).
  • General http//www.hltcentral.org/euromap/
  • Dutch site http//www.taalunieversum.org/tst/en/

17
European Language Resources Association
  • A non-profit organization to promote the
    creation, verification, and distribution of
    language resources.
  • US counterpart LDC
  • 173 resources sold in 2000.
  • organizer of LREC conferences (third one in May
    2002 in Las Palmas, Spain)
  • speech related resources 200
  • written resources 145
  • terminological resources
  • tools and software
  • http//www.icp.grenet.fr/ELRA/home.html

18
ELSNET
  • European Network of Excellence in Human Language
    Technologies
  • one of the 20 networks within FP5
  • Transfer of knowledge and expertise Shared
    goals Evaluation Shared language resources
    Promotion of best practice Interoperability by
    means of standardization
  • yearly Elsnet Summer Schools July 15-26,
    2002 Odense, Denmark, Evaluation and Assessment
    of Text and Speech Systems
  • Newsletter Elsnews http//www.elsnet.org

19
COCOSDA
  • Internat. organization for coordinating the
    globalized efforts in spoken language resources
    and sp. technology evaluation
  • yearly, jointly, with Eurospeech and ICSLP since
    Chiavari, Italy, Sept. 91 (Eurosp.91) and
    before Oriental Cocosda
  • topic domains
  • Evaluation of Speech Underst. and Dialogue
    Systems (W. Minker)
  • Multi-modal corpora (S. Nakamura)
  • Corpus Annotation Tools (S. Bird)
  • Local Languages (D. Gibbon)
  • regional programs (Europe Asia Oceania Africa
    Latin America)
  • data center representatives (LDC, S. Bird ELRA,
    K. Choukri)
  • http//www.itl.atr.co.jp/cocosda

20
COCOSDA matrix
21
COST
  • European Cooperation in the field of Scientific
    and Technical Research (60 k per
    action, for additional costs only)
  • COST 249 Continuous Speech Recognition over the
    Telephone (19 countries start 5/94 6 yrs final
    report)
  • COST 250 Speaker Recognition in Telephony
  • COST 258 The Naturalness of Synthetic Speech
  • COST 277 Nonlinear Speech Processing
  • COST 278 Spoken Language Interaction in
    Telecommun.
  • http//cost.cordis.lu/src/home.cfm

22
EURESCOM
  • the European Institute for Research and Strategic
    Studies in Telecommunications
  • 20 shareholders from 19 European countries (major
    European network operators and service providers)
  • f.i. MUST - MUltimodal, multilingual information
    Services with small mobile Terminals (P1104)

23
ISCA
  • European Speech Comm. Association founded in 88
  • from ESCA to ISCA at Eurospeech99 in Budapest
  • membership organization
  • organizer of Eurospeech/ICSLP - Interspeech
  • organizer of specialized workshops (ITRWs)
  • Special interest groups (SIGs)
  • Speech Communication Journal (http//www.elsevier.
    com/locate/specom)
  • http//www.isca-speech.org/

24
Eurospeech-ICSLP-Interspeech
  • odd years (Eurospeech) even years (ICSLP)
  • (in Europe) (elsewhere)
  • 1 Paris 89 Kobe 90
  • 2 Genoa 91 Banff 92
  • 3 Berlin 93 Yokohama 94
  • 4 Madrid 95 Philadelphia 96
  • 5 Rhodes 97 Sydney 98
  • 6 Budapest 99 Beijing 00
  • 7 Aalborg 01 Denver 02
  • 8 Geneva 03 Seoul 04
  • 9 Lisbon 05 ?? 06

past
future
25
ISCA SIGs
  • Speech Synthesis - SynSig
  • Audio Visual Speech - AVISA
  • Speech And Language Technology for MInority
    Languages - SALTMIL
  • Integration of Speech Technology in (Language)
    Learning - InSTIL
  • SPeaker and Language Characterization - SPLC
  • Education in the Field of Speech Communication -
    EduSIG
  • Speech Prosody - SProSIG
  • Dialogue Processing - SigDial (also within ACL)
  • Groupe Francophone de la Communication Parlée -
    GFCP

26
ISCA ITRWs (forthcoming)
  • Prosody in Speech Recognition and Understanding -
    Prosody 2001Molly Pitcher Inn, Red Bank, NJ.
    October 22-24, 2001
  • TIPS - Temporal Integration in the Perception of
    Speech Aix-en-Provence, France, 8-10 April 2002
  • Multi-Modal Dialogue in Mobile Environments
    Kloster Irsee, Germany, June 17-21, 2002
  • Advanced ASR for Telecom Applications Palais des
    Papes, Avignon, France, November 27-29, 2002
  • Supported but not organized by ISCA
  • 2001 International Workshop on Automatic Sp.
    Recogn. and Underst. Madonna di Campiglio
    (Trento), Italy, December 9-13, 2001
  • Speech Prosody 2002 Aix-en-Provence, France,
    11-13 April, 2002

27
IEEE
  • IEEE Signal Processing Society
  • MMSP01, Workshop on Multimedia Signal
    Processing, Cannes, France, October 3-5, 2001
  • ASRU01, Automatic Speech Recognition and
    Understanding Workshop, Madonna de Campiglio
    (Trento), Italy, December 9-13, 2001
  • 2002 International Workshop on Multimedia Signal
    Processing, US Virgin islands, December 9-11,
    2002
  • IEEE Trans. on Signal Processing / Speech and
    Audio Processing / Multimedia / Neural Networks
  • http//www.ieee.org/

28
DARPA NIST
  • DARPA Projects and Yearly evaluations
  • CSR (Continuous Speech Recognition)
  • LVCSR (Large Vocabulary Conversational Speech
    Recognition)
  • ATIS (Air Travel Information System)
  • Language Recognition (Identification and
    Verification)
  • Speaker Recognition (Identification and
    Verification)

29
NATO-ASI
  • ASI Advanced Study Institute
  • many different domains
  • certain restrictions on NATO vs. non-NATO
    participants, free registration, some funding
  • Dynamics of Speech Production and Perception,
    Il Ciocci, Italy, June 23 July 6, 2002
  • send application before Jan. 15, 2002 to
    asi2001_at_ebire.org
  • Organizing Cee. Pierre L. Divenyi Klára Vicsi

30
European national programs
  • German Verbmobil SmartKom (since 9/99) Bavarian
    Archive for Speech Signals (BAS)
  • Spoken Dutch Corpus
  • French AUP
  • Swedish Centre for Speech Technology (CTT)
    Swedish National Graduate School in Language
    Technology (GSLT)

31
Dutch speech database initiatives
  • Speech Processing Expertise Center SPEX
  • 5,000 speakers Polyphone
  • 1,000 speakers SpeechDat variants
  • NWO Priority program TST-OVIS (public
    transportation information system over telephone)
  • 1,000 hrs CGN (Dutch-Flemish)
  • 5.5 hrs open source IFA-corpus
  • TST Platform
  • ToDI (Transcription of Dutch Intonation)

32
Spoken Dutch Corpus
  • 4.6 M, 5 yrs, 10 M words, 1000 hrs of speech
  • Corpus design and compilation
  • Recording and digitization
  • Orthographic transcription (all)
  • Lemmatization and POS tagging (all)
  • Lexicon link-up (all)
  • Broad phonetic transcription (1 M)
  • Word segmentation (1 M)
  • Syntactic annotation (1 M)
  • Prosodic annotation (250 k)
  • Development of exploitation software COREX
  • http//lands.let.kun.nl/cgn/home.htm

33
IFA corpus
  • 5.5 hrs of high-quality-recorded speech
  • 4 male and 4 female speakers
  • more than 30 min. per speaker
  • various speaking styles per speaker
  • from conversational and read speech, to isolated
    sentences, words and syllables
  • everything phonemically segmented labeled
  • free access via SQL query language
  • http//www.fon.hum.uva.nl/IFAcorpus

34
Speech science and speech technology
  • we should try to bridge that gap
  • see my keynotes at ICPhS 99 and Eurospeech01
  • Flexible, robust and efficient human speech
    processing versus present-day speech technology
  • Acquiring and implementing phonetic knowledge
  • we have to understand each other in order to be
    able to communicate and to contribute
  • probabilistic vs. knowledge driven
  • adding (multiple) knowledge (sources) to improve
    performance
  • much knowledge in speech databases

35
Phonetics ?? Speech Techn.
36
Do recognizers need intelligent ears?
  • intelligent ears ? front-end pre-processor
  • only if it improves performance
  • humans are generally better speech processors
    than machines, perhaps system developers can
    learn from human behavior
  • robustness at stake (noise, reverberation,
    incompleteness, restoration, competing speakers,
    variable speaking rate, context, dialects,
    non-nativeness, style, emotion)

37
What is (phonetic) knowledge?
  • phonetic textbook knowledge
  • probabilistic knowledge from databases
  • fixed set of features vs. adaptable set
  • trading relations, selectivity
  • knowledge of the world, expectation
  • global vs. detailed

38
How good ishuman/machine speech recogn.?
39
Human vs. machine (ASR)
  • machine surprisingly good for certain tasks
  • machine could be better for many others
  • robustness, outliers
  • what are the limits of human performance?
  • in noise
  • for degraded speech
  • missing information (trading)

40
Human word intelligibility vs. noise
41
Robustness to degraded speech
  • speech time-modulated signal in frequency bands
  • relatively insensitive to (spectral) distortions
  • prerequisite for digital hearing aid
  • modulating spectral slope -5 to 5 dB/oct,
    0.25-2 Hz
  • temporal smearing of envelope modulation
  • ca. 4 Hz max. in modulation spectrum ? syllable
  • LPgt4 Hz and HPlt8 Hz little effect on
    intelligibility
  • spectral envelope smearing
  • for BWgt1/3 oct masked SRT starts to degrade

42
Robustness to degraded speechand missing
information
  • partly reversed speech (Saberi Perrott,
    Nature, 4/99)
  • fixed duration segments time reversed or shifted
    in time perfect sentence intelligibility up to
    50 ms (demo every 50 ms reversed original )
  • low frequency modulation envelope (3-8 Hz) vs.
    acoustic spectrum
  • syllable as information unit? (S. Greenberg)
  • gap and click restoration (Warren)
  • gating experiments

43
Desired pre-processor characteristics in ASR
  • basic sensitivity for stationary and dynamic
    sounds
  • robustness to degraded speech
  • rather insensitive to spectral and temporal
    smearing
  • robustness to noise and reverberation
  • filter characteristics
  • is BP, PLP, MFCC, RASTA, TRAPS good enough?
  • lateral inhibition (spectral sharpening)
    dynamics
  • what can be neglected?
  • non-linearities, limited dynamic range, active
    elements, co-modulation, secondary pitch, etc.

44
Caricature of present-day speech recognizers
  • fixed pre-processor, fixed features
  • trained with a variety of speech input
  • much global information, but ..... no
    interrelations
  • monaural, uni-modal input
  • pitch extractor generally not operational
  • performs well on average behavior
  • but ..... does poorly on any type of outlier
    (OOV, non-native, fast or whispered speech, other
    communication channel, new topic, new speaker)
  • neglects lots of useful (phonetic) information
  • heavily relies on language model

45
Useful information durational variability
Adopted from Wang (1998)
46
Academia (knowledge) and industry (applications)
  • what do industry and universities expect from
    each other? (panel discussion at E01)
  • proper education and training ? E-masters
  • good exchange between academia industry
  • participation in joint projects ? speech DB
  • adapt to requirements ? CAIP Symposium
  • open source approach ? Linux, praat, HTK
  • complaints sometimes bad management and high
    risk (puts HLT in bad spotlight, e.g. LH)

47
Information Technology for Homeland Security
  • Center for Advanced Information Processing,
    CAIP Symposium, Rutgers Univ., Nov. 29
  • subsequent to events of Sept. 11, CAIP modified
    its traditional Annual Research Review
  • Symposium identifies issues in Homeland Security
    and encourages research, particularly with
    university-industry cooperation
  • e.g., biometric and voice identification fusing
    voice and face data multimodal interfaces for
    asset deployment face-tracking for
    identification microphone array for speaker
    tracking

48
E-masters inLanguage and Speech
  • Course Content
  • Theoretical Linguistics
  • Natural Language Processing
  • Phonetics and Phonology
  • Cognitive models for speech language processing
  • Speech signal processing
  • Pattern recognition
  • Language engineering applications
  • http//www.cstr.ed.ac.uk/euromasters/

49
Conclusions
  • collecting speech corpora in national languages
    (like in SA) is and excellent basis, both for
    research and for applications
  • combine industrial and academic skills
  • make proper use of experiences elsewhere
  • thats why we are all here at this workshop!
  • good luck and thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com