Ingen lysbildetittel - PowerPoint PPT Presentation

About This Presentation
Title:

Ingen lysbildetittel

Description:

Supplement to existing databases of read speech for training and testing ... filler. nontrans. report. one episode file [b-]noisy blah[-b] ... annotation level: 1 ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 17
Provided by: even165
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Ingen lysbildetittel


1
RUNDKASTAn Annotated NorwegianBroadcast News
Speech Corpus
  • LREC 2008
  • Ingunn Amdal, Ole Morten Strand,
  • Jørn Almberg, and Torbjørn Svendsen

2
Overview
  • Purpose of Rundkast
  • An overview of the database Rundkast
  • Structure of annotation
  • Orthographic transcription
  • Broad phonetic annotation

3
Purpose of Rundkast
  • Databases of broadcast news can be used for a
    number of research topics in speech technology
    such as
  • Supplement to existing databases of read speech
    for training and testing automatic speech
    recognition and speaker adaptation.
  • Research on recognition of spontaneous speech.
  • Research on automatic indexing of audio data.
  • Research on topic and/or speaker segmentation.
  • Research on speech/non-speech detection (e.g.
    background music).
  • International research cooperation involving
    speech technology for broadcast news
    applications.
  • A corpus of this kind is necessary for language
    technology research, but has not been available
    for Norwegian

4
Overview of Rundkasthttp//www.iet.ntnu.no/projec
ts/rundkast/
  • Database of 77 hours radio broadcast news
    fromthe Norwegian Broadcasting Corporation
    (NRK)
  • Read and spontaneous speech, as well as
    spontaneous dialogsand multipart discussions
  • There is large variation between speakers,
    speaking styles and topics
  • Speaker turns may be rapid and several speakers
    may talk simultaneously
  • The quality of the recordings include studio and
    telephone(mobile, satellite etc)
  • Frequent occurrences of background noise,
    jingles,music and audio illustrations
  • Funded by the Norwegian University of Science and
    Technology (NTNU)

5
Structure of annotation
  • Rundkast is hierarchically organizedand
    orthographically annotated
  • Name of programme, type and date
  • Name of speaker (if known) and dialect (5
    regions)
  • Type of speech spontaneity, channel, recording
    quality
  • Segmented in speaker turns of app. 2-5 seconds
  • Orthographic transcription (standard Norwegian)
  • Labels for noise (speaker noise, background noise
    etc.)
  • Labels for pronunciation mistakes, foreign words,
    unintelligible speech etc.
  • 70 hrs work per hour of recording
  • Transcriber used for annotation standard-tool

6
Hierarchy of annotation levels
levels 1section, 2speaker turn, and 3segment
7
Orthographic transcription
  • The lowest level in the annotation hierarchy,
    segments, are transcribed orthographically.
  • Orthographic transcription of spoken language is
    a challenge, especially for Norwegian. Using
    dialect also in official circumstances is more
    and more accepted.
  • The majority of RUNDKAST is not compliant to any
    standard pronunciation.
  • The aim of the conventions for the orthographic
    transcription in RUNDKAST is to minimize
    uncertainty about pronunciations and facilitate
    consistency.

8
Orthographic transcriptionMain conventions
  • Words are transcribed with the written forms
    closest to actual pronunciations. A limited
    number of interjections are allowed.
  • Text codes are used to mark mispronunciations,
    truncations, and unknown words.
  • Numbers and symbols are written out as words.
  • Abbreviations are not used.
  • Punctuation marks are restricted to comma,
    period, and question mark.
  • Space is used between spelled letters, also when
    acronyms have spelled pronunciation.
  • Capital letters are used in proper names,
    spellings, and acronyms, but not at the start of
    sentences.

9
Example annotation in Transcriber
10
Broad phonetic annotation
  • Part of the data were to be phonetically
    annotated
  • Use for low-level experiments in ASR (new
    methods), smaller Norwegian counterpart to TIMIT
  • Auto-segmentation for e.g. unit selection TTS
  • Annotation to be based on existing standards
    with necessary adjustments
  • Exploit experience and specifications from
    development of Norwegian speech synthesis
    databases
  • Suitable level of detail Acoustic boundaries
    should be labeled, but more phonemic than
    phonetic
  • Consistency of utmost importance!

11
Broad phonetic annotationSelected data
  • 10 speakers (5 male and 5 female)
  • Amount of speech per speaker
  • app 5 min planned speech and 1 min spontaneous
    speech
  • discard noisy parts (as far as possible)
  • from more than one programme
  • use turn segmentation from orthographic
    annotation
  • All in all 1 hour of speech
  • Approximately 1000 hours of work

12
Broad phonetic annotationMain principles
  • The annotation is mainly phonemic using the
    phoneme symbols closest to the perceived sound
  • Acoustic boundaries should be marked some
    acoustically motivated symbols are included
  • A transcription as close as possible to the
    citation form is preferred
  • Norwegian standard SAMPA is preferred
  • Some English phonemes included as well as dialect
    variants
  • Example 3 variants of the /r/-sound/r/
    (tap/trill)/R/ (uvular fricative)/r\/
    (approximant)

13
Broad phonetic annotationAnnotation procedure
  1. Conversion of orthographic transcription to a
    format suitable for automatic transcription.
  2. Automatic segmentation with a phonotypical
    transcription using a speech recognizer.
  3. Manual correction of both segments and labels by
    four phonetics students using Praat.
  4. Format check.
  5. Control of all annotation by one supervisor.

14
Broad phonetic annotationComments on deviations
  • Always cases of uncertainty, need a log for
    these.
  • Problem will the log be read?
  • Solution Codes for deviations!
  • Additional Praat tier for deviations
  • Synchronous with the phoneme tier
  • Easy to utilize automatically
  • Examples
  • creaky voice
  • unexpected voiced/unvoiced
  • uncertain boundary or symbol
  • ... in addition a log file with whatever
    deviations left

15
Example annotation in Praat
16
Concluding remarks
  • Availability
  • Planned to be included for non-commercial use in
    a future Norwegian language bank
  • Will complement other corpora also intended to be
    included
  • To be validated by Spex
  • Planned use at NTNU SIRKUS project
  • Investigation in new paradigms for ASR
  • Low-level phone recognition experiments initially
  • multi-linguality aspects
  • Spoken information retrieval
Write a Comment
User Comments (0)
About PowerShow.com