Fist page - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Fist page

Description:

one female voice from a short story ... part: about 2,000 short paragraphs, each speaker was asked to read 40 paragraphs. ... 4. / doc {online documentation and ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 17
Provided by: hotu5
Category:

less

Transcript and Presenter's Notes

Title: Fist page


1
Country Report Vietnam Luong Chi Mai Institute
of Information Technology Vietnamese Academy of
Science and Technology lcmai_at_ioit.ac.vn
2
VLSP National Project (ICT Program)
  • National Project 2006 2008, 2008 2010 with
    participation of ten research groups (all active
    groups on VLSP)
  • Objectives
  • Basic research on methods for processing
    Vietnamese language and speech
  • Build and develop several typical products for
    VLSP for public end-users.
  • Build and develop indispensable resources and
    tools for the VLSP development

3
Objective of the Project Basic research
  • Basic research on methods for processing
    Vietnamese language and speech.
  • Applied research to adapt methods, technologies,
    advanced techniques for other languages to
    Vietnamese language and speech.

Typical products for the end-users
Resources and tools for VLSP
Computation methods for VLSP
4
Phonetic Structure
Pitch
(7)
(1)
(5)
(3)
(6)
(2)
(4)
(8)
Time
  • (7) and (8) have F0 contour similar to (5) and
    (6), but rise and fall more sharply
  • (8) is not accompanied by glottalization

5
Some Current Text Corpora
  • Monolingual corpora VLC (Vietnam Lexicography
    Centre), UNS-VNUHCM, etc. for Vietnamese
  • Bilingual corpora The EVC corpus (UNS-VNUHCM)
    consists of 400,000 pairs of E-V sentences
    (approx. 5,500,000 words) in the fields of
    Science and Technology (Computer,
    Electronics,..). This EVC has been being
    partially annotated with morphology (word
    boundary, lemmatize), POS and Sense tags
    semi-automatically.

6
Some Current Speech Corpora
  • Broadcasting Speech Corpus VOV (Voice of Vietnam)
    contains ? 23,000 utterances, ? 4,000 distinct
    syllables
  • 30 broadcasters and speakers reciting stories,
    news reports, colloquy
  • Data digitized at 16,000 Hz sampling rate, using
    16 bits per sample
  • All data were manually transcribed at syllable
    level
  • At phonetic level Corpus contains all Vietnamese
    phonemes, but is not phonetically balanced (up to
    50 capacity from story reading programs)
  • The number of speakers is limited and most
    speakers are Northern persons. So, corpus does
    not cover most variations of Vietnamese speech

7
Some Current Speech Corpora
  • Telephone Speech Corpus
  • Mobile phone 170 speakers from the North (males
    55) and (females 45), 1600 digit strings
  • Cordless phone 208 speakers from the South (130
    males, 78 female, 442 utterances with 2340 words
  • Labeling at syllable level, labeled manually at
    phonetic level, using forced alignment with
    manual adjustment (using HTK and CSLU toolkit)
  • Develop Dialog System for Continuous Digit
    Recognition and VnTTS for reading SMS on Smart
    phone (Symbian).

8
Some Current Speech Corpora
  • Vietnamese TTS Speech corpus
  • one female voice from a short story
  • 567 utterances of an average length of 15
    syllables, about 40.000 syllables
  • 11 kHz sampling rate, and 16-bit resolution
  • Corpus is labeled in syllable level, segment
    boundaries
  • Prosody detection (Fujisaki model) CART for
    manipulation of duration
  • Develop Vietnamese TTS system VnVoice based on
    PSOLA

9
Some Current Speech Corpora
  • VNSpeech corpus
  • 5 different kinds of units Phoneme, Tones,
    Digits and string of digits, application words,
    sentences and paragraphs
  • text collected by using a web-robot (about 2500
    websites in Vietnam) with about 10,020,000
    sentences.
  • 50 speakers
  • Sentences corpus is divided into two parts, a
    common part and a private part
  • The common part 33 conversations and 37
    paragraphs. They were read by all speakers.
  • The private part about 2,000 short paragraphs,
    each speaker was asked to read 40 paragraphs.

10
Some Current Speech Corpora
  • Distribution of mono-phones in speech corpus and
    Web corpora

11
Some Current Speech Corpora
  • Distribution of six tones in speech corpus and
    Web corpora

12
Design New Corpora
  • Main goal design and realize of corpora
    available
  • to provide the Vietnamese researchers with a
    basic amount of speech material for general
    speech research, including speech synthesis and
    speech recognition
  • for developing commercial speech recognition
    engines in given purposes (number recognizer,
    limited command recognizer, name recognizer, ...)

13
Design of General Corpus
  • General Corpora
  • for general purpose to do research on continuous
    independent-speaker recognition with the large
    vocabulary
  • selected text from text corpus, and text selected
    by linguistics
  • Number of speakers 200-300 (50 -male and 50 -
    female), ages 15-45
  • Number of sentences 300 sentences, each sentence
    is spoken by one speaker for at lease 3 times.
  • Size of vocabulary 3000-4000 Vietnamese
    syllables
  • Requirement context balance is obtained among
    Vietnamese phonemes

14
Design of Specific Corpus
  • Continuous Digit Corpus
  • is used to make the recognition applications
  • Number of speakers 100-200 (50 -male and 50 -
    female), ages 15-45
  • Concurrence of digits should be approximately the
    same for all digits, the sentences consist of
    digits with random order and have variant
    lengths.
  • The number of word will be from 10000 words, each
    sentence is recorded at least 3 times for each
    speaker.
  • Name corpus
  • Popular Vietnamese names including family name
    and first name.
  • Number of speaker 100-200 (50 -male and 50 -
    female), ages 15-45
  • Each sentence consists of one full name and
    spoken by a speaker for at least 3 times.
  • The sentence should contain as many as possible
    names of people in Vietnam. The size of
    vocabulary is estimated about 2000 words.

15
Corpus Organization
  • The database consists of several sub-databases
  • 1. /general Material for training, testing the
    general recognizer
  • 2. /Digit Material for training, testing digit
    continuous recognizer
  • 3. /Name Material for training, testing proper
    name recognizer
  • .... Other sub-database to be inserted
    depending on the specific purposes
  • Each sub-database has directory hierarchy is as
    follows
  • 1. / database root directory
  • 2. /train to be used for system training
  • 3. /test material to be used for system
    testing
  • 4. / doc online documentation and tables
    unusual)
  • The train and test directories contain
    sub-directories corresponding to each speaker,
    whose names are coded as follows XXXSRR where
  • XXX speaker identifier
  • S sex code F for female, M for male
  • RR region code (HN for Hanoi, SG for Saigon,
    HU for Hue).
  • Each sentence directory contains 3
    sentence-related files.
  • wav wave file
  • phn - phonetically-based transcription

16
Thank you !
Write a Comment
User Comments (0)
About PowerShow.com