Wolfgang Hess 60 years young - PowerPoint PPT Presentation

About This Presentation
Title:

Wolfgang Hess 60 years young

Description:

Egbert de Boer: reverse-correlation method. Wolfgang's further carrier ... Masterpiece in 1983, 698-pages book 'Pitch determination of speech signals. ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 21
Provided by: louisc
Category:
Tags: hess | wolfgang | years | young

less

Transcript and Presenter's Notes

Title: Wolfgang Hess 60 years young


1
Wolfgang Hess 60 years young
Speech is beautiful
  • Louis C.W. Pols
  • Institute of Phonetic Sciences
  • University of Amsterdam

Bonn, Sept. 29, 2000
2
IKP, Bonn
IFA, Amsterdam
3
Speech is beautiful
  • most natural form of communication
  • it is efficient
  • highly complex and challenging
  • towards multi- and interdisciplinary communities
  • natural speech synthesis ? full knowledge
  • ASR lasting challenge
  • speech is extremely robust to distortions
  • speech is eloquent singing speeches are awful
  • speech community is nice

4
robustness to degraded speech
  • partly reversed speech
  • (Saberi Perrott, Nature, 4/99)
  • fixed duration segments time reversed or
  • shifted in time
  • perfect sentence intelligibility up to 50 ms
  • (demo every 50 ms reversed original )

5
Wolfgang
  • engineer by training
  • emphasis on signal processing (Münich)
  • pitch-synchronous spectral analysis
  • applied for phoneme and word recognition
  • and for voice detection and pitch extraction
  • speech synthesis (Bonn)

6
History, almost 30 yrs ago
  • 7th International Congress on Acoustics 1971,
    Budapest, Hungary
  • first international (speech) conference
  • Satellite Speech Symposium, Szeged
  • Hess, Grundfrequenzsynschrone digitale
    Spektralanalyse von Sprachsignalen mit beliebig
    feiner Auflösung im Frequenzbereich
  • also papers in German, and even in Russian
  • engineering interest in speech analysis
  • forthcoming specialization in sp. recogn. pitch
    extr.

7
Budapest ICA
  • many influential people from international speech
    science community, already present there
  • topics at that time far away from our present
    interests in almost every respect
  • topics and ambitions
  • approaches taken
  • type and size of data sets
  • see some names and topics (nostalgia!)

8
speech processing
  • Velichko (Russia) dynamic programming
  • Bishnu Atal towards predictive coding
  • Sakoe (Japan) dynamic processing for time
    normalization
  • Osamu Fujimura
  • - dynamic palatography,
  • - electromyography (hooked-wire electrodes),
  • - computer-controlled dynamic radiography
  • (Tokio x-ray microbeam generator)
  • Jim Flanagan focal points in sp. comm. research

9
speech synthesis
  • Cecile Coker articulatory synthesis
  • Paul Mermelstein Bishnu Atal vocal transfer
    functions for speech synthesis
  • Johan Liljencrants formant synthesis OVE III
  • Helmut Mangold synthesis with a limited set of
    dynamic transitions
  • Werner Endress synthesis via intermediate sounds
  • Peter Denes word concatenation
  • Fujimura, Coker Umeda prosody in synthesis
  • Larry Rabiner 2-pole digital filters for
    synthesis
  • we were away a year ago

10
speech recognition
  • Hans Tillmann (abs.) Bonner DAWID-II-system
  • Kasya, Kido, Krause Tarnóczy vowel recogn.
  • Velichko 60 words
  • Rao 225 VCV utterances, diad matching
  • Sakoe 2300 isolated Japanese 10 digits
  • Dreyfus-Graf artificial language
  • Erman 54 isolated words over telephone
  • Neeley 54 words recognition in noise
  • Pols 50 Dutch words, stationary phoneme parts
  • Renato de Mori zero crossings

11
speech perception, musical acoustics,
psycho-acoustics
  • Rao plosive-vowel interaction
  • Kozhevnikov AM vowel-like stimuli
  • Ludmilla Chistovich vowel discrimination
  • Johan Sundberg pitch extraction of folk music
  • Max Mathews music synthesis
  • Tammo Houtgast lateral inhibition in psychoac.
  • Evans Wilson neurophysiological evidence
  • Bela Julesz critical bands in vision and
    audition
  • Egbert de Boer reverse-correlation method

12
Wolfgangs further carrier
  • Dissertation in 1972
  • Digitale grundfrequenzsynchrone Analyse von
    Sprachsignale als Teil eines automatischen
    Spracherkennungssystems
  • Masterpiece in 1983, 698-pages book
  • Pitch determination of speech signals.
    Algorithms and devices, published by Springer
    Verlag.
  • Chair in Phonetics in Bonn in 1986
  • publications, keynotes, conference organizer

13
ESCA/ISCA and Eurospeech
  • ESCA grounded in 1988
  • Joseph Mariani first president (1988-1993)
  • Louis Pols 2nd president (1993-1997)
  • Wolfgang final keynote at E97 in Rhodes
  • since Sept. 1997 Roger Moore president
  • since death Christian Benoit (April 25, 1998)
    Wolfgang secretary of ESCA
  • since Eurospeech99 in Budapest ? ISCA

14
ICA 1971
  • all speech analysis based on filters or formants
  • LPC was about to be introduced
  • all synthesis based on formant synthesis
  • diphone concept did not yet exist
  • virtually no attention for TTS synthesis-by-rule
  • all speech recognition based on word-template
    matching
  • probabilistic approach yet unknown
  • vocabulary size of the order of 50 words only

15
present-day synthesis
  • mainly corpus-based concatenative synthesis with
    non-uniform units (e.g., CHATR, Festival,
    Next-Gen, Laureate, Bonner system)
  • large storage, optimal search
  • high naturalness and intelligibility
  • but.one speaker, one style, one application
  • room for further improvement

16
possible improvements
  • general or application-specific corpus
  • how to reduce storage requirements
  • annotation details at various levels
  • optimize search algorithms and cost functions
  • fewer prototypes, generate certain variants
  • preferable units, fall-back mechanism
  • new voice, speaking style, emotion, rate
  • can voice be personalized (cont.)

17
possible improvements (cont.)
  • how much manipulation in concatenation
  • combining stored speech and synthetic speech
  • better prosody (copy, concept, rules)
  • intonation modelling (discrete or continuous
    detailed or sparse signal oriented or
    linguistically meaningful)
  • concept for duration modelling
  • sentence accent and prominence

18
presently not very popular
  • formant synthesis (but see MITalk)
  • diphone and demisyllable synthesis (but see many
    operational systems Dutch Fluency, German
    Hadifix, Multi-lingual Lucent TTS)
  • use of forms of parameterized speech (as soon as
    more manipulation is required again)
  • many voices, speaking styles, emotions, rates
  • importanc of system evaluation (Jenolan Caves)

19
future for Wolfgang
  • being in the midst of new and challenging
    developments
  • to produce
  • (in the most efficient way)
  • the highest achievable quality of synthetic
    speech
  • (given specific dialogue applications)
  • is a large responsibility
  • but also a lot of fun to do (cont.)

20
future for Wolfgang (cont.)
  • Wolfgang and the IKP group enjoy doing this
  • for German and other languages
  • and like to report about it at international
    forums
  • it attracts many good students
  • these are excellent conditions for continuing
    this work

? I wish Wolfgang and all his colleague a lot of
success in the years to come!
Write a Comment
User Comments (0)
About PowerShow.com