Fist page

About This Presentation

Transcript and Presenter's Notes

Title: Fist page

1
Country Report Vietnam Luong Chi Mai Institute
of Information Technology Vietnamese Academy of
Science and Technology lcmai_at_ioit.ac.vn
2
VLSP National Project (ICT Program)

National Project 2006 2008, 2008 2010 with
participation of ten research groups (all active
groups on VLSP)
Objectives
Basic research on methods for processing
Vietnamese language and speech
Build and develop several typical products for
VLSP for public end-users.
Build and develop indispensable resources and
tools for the VLSP development

3
Objective of the Project Basic research

Basic research on methods for processing
Vietnamese language and speech.
Applied research to adapt methods, technologies,
advanced techniques for other languages to
Vietnamese language and speech.

Typical products for the end-users
Resources and tools for VLSP
Computation methods for VLSP
4
Phonetic Structure
Pitch
(7)
(1)
(5)
(3)
(6)
(2)
(4)
(8)
Time

(7) and (8) have F0 contour similar to (5) and
(6), but rise and fall more sharply
(8) is not accompanied by glottalization

5
Some Current Text Corpora

Monolingual corpora VLC (Vietnam Lexicography
Centre), UNS-VNUHCM, etc. for Vietnamese
Bilingual corpora The EVC corpus (UNS-VNUHCM)
consists of 400,000 pairs of E-V sentences
(approx. 5,500,000 words) in the fields of
Science and Technology (Computer,
Electronics,..). This EVC has been being
partially annotated with morphology (word
boundary, lemmatize), POS and Sense tags
semi-automatically.

6
Some Current Speech Corpora

Broadcasting Speech Corpus VOV (Voice of Vietnam)
contains ? 23,000 utterances, ? 4,000 distinct
syllables
30 broadcasters and speakers reciting stories,
news reports, colloquy
Data digitized at 16,000 Hz sampling rate, using
16 bits per sample
All data were manually transcribed at syllable
level
At phonetic level Corpus contains all Vietnamese
phonemes, but is not phonetically balanced (up to
50 capacity from story reading programs)
The number of speakers is limited and most
speakers are Northern persons. So, corpus does
not cover most variations of Vietnamese speech

7
Some Current Speech Corpora

Telephone Speech Corpus
Mobile phone 170 speakers from the North (males
55) and (females 45), 1600 digit strings
Cordless phone 208 speakers from the South (130
males, 78 female, 442 utterances with 2340 words
Labeling at syllable level, labeled manually at
phonetic level, using forced alignment with
manual adjustment (using HTK and CSLU toolkit)
Develop Dialog System for Continuous Digit
Recognition and VnTTS for reading SMS on Smart
phone (Symbian).

8
Some Current Speech Corpora

Vietnamese TTS Speech corpus
one female voice from a short story
567 utterances of an average length of 15
syllables, about 40.000 syllables
11 kHz sampling rate, and 16-bit resolution
Corpus is labeled in syllable level, segment
boundaries
Prosody detection (Fujisaki model) CART for
manipulation of duration
Develop Vietnamese TTS system VnVoice based on
PSOLA

9
Some Current Speech Corpora

VNSpeech corpus
5 different kinds of units Phoneme, Tones,
Digits and string of digits, application words,
sentences and paragraphs
text collected by using a web-robot (about 2500
websites in Vietnam) with about 10,020,000
sentences.
50 speakers
Sentences corpus is divided into two parts, a
common part and a private part
The common part 33 conversations and 37
paragraphs. They were read by all speakers.
The private part about 2,000 short paragraphs,
each speaker was asked to read 40 paragraphs.

10
Some Current Speech Corpora

Distribution of mono-phones in speech corpus and
Web corpora

11
Some Current Speech Corpora

Distribution of six tones in speech corpus and
Web corpora

12
Design New Corpora

Main goal design and realize of corpora
available
to provide the Vietnamese researchers with a
basic amount of speech material for general
speech research, including speech synthesis and
speech recognition
for developing commercial speech recognition
engines in given purposes (number recognizer,
limited command recognizer, name recognizer, ...)

13
Design of General Corpus

General Corpora
for general purpose to do research on continuous
independent-speaker recognition with the large
vocabulary
selected text from text corpus, and text selected
by linguistics
Number of speakers 200-300 (50 -male and 50 -
female), ages 15-45
Number of sentences 300 sentences, each sentence
is spoken by one speaker for at lease 3 times.
Size of vocabulary 3000-4000 Vietnamese
syllables
Requirement context balance is obtained among
Vietnamese phonemes

14
Design of Specific Corpus

Continuous Digit Corpus
is used to make the recognition applications
Number of speakers 100-200 (50 -male and 50 -
female), ages 15-45
Concurrence of digits should be approximately the
same for all digits, the sentences consist of
digits with random order and have variant
lengths.
The number of word will be from 10000 words, each
sentence is recorded at least 3 times for each
speaker.
Name corpus
Popular Vietnamese names including family name
and first name.
Number of speaker 100-200 (50 -male and 50 -
female), ages 15-45
Each sentence consists of one full name and
spoken by a speaker for at least 3 times.
The sentence should contain as many as possible
names of people in Vietnam. The size of
vocabulary is estimated about 2000 words.

15
Corpus Organization

The database consists of several sub-databases
1. /general Material for training, testing the
general recognizer
2. /Digit Material for training, testing digit
continuous recognizer
3. /Name Material for training, testing proper
name recognizer
.... Other sub-database to be inserted
depending on the specific purposes
Each sub-database has directory hierarchy is as
follows
1. / database root directory
2. /train to be used for system training
3. /test material to be used for system
testing
4. / doc online documentation and tables
unusual)
The train and test directories contain
sub-directories corresponding to each speaker,
whose names are coded as follows XXXSRR where
XXX speaker identifier
S sex code F for female, M for male
RR region code (HN for Hanoi, SG for Saigon,
HU for Hue).
Each sentence directory contains 3
sentence-related files.
wav wave file
phn - phonetically-based transcription

16
Thank you !

Write a Comment

User Comments (0)

About PowerShow.com

Fist page PowerPoint PPT Presentation