Recent Activities for Speech IO Assessment and Speech Database in Korea YongJu Lee Director, Prof. S - PowerPoint PPT Presentation

1 / 7
About This Presentation
Title:

Recent Activities for Speech IO Assessment and Speech Database in Korea YongJu Lee Director, Prof. S

Description:

... which it generated in cooperation with ETRI for TV guide and telematics domain. ... consideration for each of TV guide, weather, schedule, and telematics domains. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 8
Provided by: vnd
Category:

less

Transcript and Presenter's Notes

Title: Recent Activities for Speech IO Assessment and Speech Database in Korea YongJu Lee Director, Prof. S


1
Recent Activities for Speech I/O Assessment and
Speech Database in Korea Yong-Ju Lee
(Director, Prof./ SiTEC, Wonkwang University,
Korea)
2
  • Creation Distribution of Language
    Resources for Common Use
  • SiTEC (Speech Information Technology Industry
    Promotion Center)
  • ETRI (Electronics Telecommunications Research
    Institute)

3
SiTEC(1)
  • Dialogue corpus
  • large speech corpus based on a large dialogue
    scenario which it generated in cooperation with
    ETRI for TV guide and telematics domain.
  • 597 dialogue scenarios were made up taking the
    dialogue complexity into consideration for each
    of TV guide, weather, schedule, and telematics
    domains.
  • And the dialogue corpus also contains alternative
    natural dialogues with alternative expressions
    having meanings equivalent to the expressions in
    the corresponding dialogues.
  • Pieces of information on pragmatics, concept,
    domain, sentence regularization, discourse, and
    frame are tagged to 500 dialogues among 597
    scenarios.
  • A dialogue speech corpus has been created of
    32,000 utterances made by 100 speakers according
    to these scenarios.

4
SiTEC(2)
  • LILA project
  • Creation of a number of spoken databases for
    training Automatic Speech Recognition Systems in
    the Asian Pacific area.
  • Speech Data are collected through the mobile
    telephone network.
  • The LILA consortium is composed of a large number
    of industrial companies. Each company is in
    charge of the production of a database. The
    consortium shares the databases produced in the
    project.
  • ELRA is the key organization of this project.
  • Together with HMT (a company in Korea) affiliated
    with SITEC, SITEC is working on creating a Korean
    speech corpus for the project of the LILA
    consortium assisted by ELRA.
  • For this project, each of 1,000 speakers utters
    60 words and sentences through digital telephone
    lines. This project will be completed at the end
    of this year.

5
  • Speech corpus for robot application
  • For the implementation of  speech recognition in
    the robot, a corpus of remote speech by distance
    and direction has been created by using
    multi-channel microphone array and HAT( Head and
    Torso simulator)
  • 400 speakers were recorded.  

6
ETRI
  • ETRI has been constantly working on speech and
    text corpora as part of work on information
    resources management.
  • This year ETRI has worked on creating speech
    corpora of foreign languages, such as English,
    Japanese, and Chinese, acquired in the cars and
    multi-channel environments, each of which
    contains 50,000 utterances.

7
Speech I/O assessment
  • Researchers at ETRI and SITEC have just started
    to study on the procedures of objective
    assessment for performance of
  • speech recognition in the robot
  • and dialogue style TTS
  • Hope to report on the details at the next
    meeting!!
Write a Comment
User Comments (0)
About PowerShow.com