Development and Operational Result of Real Environment Speechoriented Guidance Systems Kitarobo and

About This Presentation

Title:

Development and Operational Result of Real Environment Speechoriented Guidance Systems Kitarobo and

Description:

Development and Operational Result of Real Environment Speech ... Hiromichi KAWANAMI, Tobias CINCAREK, Shota TAKEUCHI, Hiroshi SARUWATARI, Kiyohiro SHIKANO ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 21

Provided by: mar101

Category:

more less

Transcript and Presenter's Notes

Title: Development and Operational Result of Real Environment Speechoriented Guidance Systems Kitarobo and

1
Development and Operational Result of Real
Environment Speech-oriented Guidance Systems
Kita-robo and Kita-chan
Hiromichi KAWANAMI, Tobias CINCAREK, Shota
TAKEUCHI, Hiroshi SARUWATARI, Kiyohiro SHIKANO
kawanami_at_is.naist.jp
Nara Institute of Science and Technology
2
Outline

Motivation and Goal
Introduction of speech oriented information
guidance systems, Kita-chan and Kita-robo
Kita-chan and Kita-robo user speech database

3
Motivation and Goal

Investigation of data portability
How much do AM/LM/example questions by a dialogue
system contribute to a new dialogue system?
How much should we transcribe and label data of
the new system manually in addition to the above
(to realize response accuracy to the level of the
preceding system)?
? Cincarek, et al., Trans. IEICE(E) (to be
published)
The result enables to estimate cost for
developing a new dialogue system.
Comparison of CG agent system and Robot body
Which interface is used? By what age groups?
The result enables to support to design
appropriate interfaces.

4
Outline

Motivation and Goal
Introduction of speech oriented information
guidance systems, Kita-chan and Kita-robo
The preceding system, Takemaru-kun
Kita-chan with CG agent and Kita-robo with
robot-body
System structures
Speech recognition module
Response generation module
Kita-chan and Kita-robo database

5
The preceding system, Takemaru-kun

Location
Entrance of a public center
Domain
Facilities of the center
Local information (the city, sightseeing,
traffic, public institution)
General (News, weather forecast, date, time)
Character profile, Greetings
Dialogue strategy
Example-based one-question-one-answer
Interface
User input speech, mouse
System response synthetic speech, CG animation,
web browser
Operation period Nov. 2002 to present

6
Appearance of Takemaru-kun
The North Community Center, Ikoma city, Nara
CG agent animation
Web browser
directional microphone
mouse
speaker
Takemaru-kun is a mascot character of Ikoma city.
7
Kita-chan and Kita-robo appearances
Railway station, Gakken Kita-Ikoma In Ikoma
city, Nara
Kita-chan dialogue system
Kita-robo dialogue system
8
Appearance of Kita-chan
speakers
directional microphone
Web browsers
CG agent animation
Touch panel display
Kita-chan is a mascot character of
Gakken-kita-ikoma station.
9
Appearance of Kita-robo
speakers
(Movie camera for speaker detection (plan))
Web browser
CG eyes animation
directional microphone
No mouse or touch panel
10
System comparison
11
Speech recognition module
GMMs (adult /child / laugh/cough/noise)
Mic. input
if noise/cough/laugh, reject
Power and ZC threshold
Speech / noise discrimination using GMMs and
length
if speech, continue decoding
Duration threshold
Parallel decoding
System input
Using adult AM/LM (N-gram)
AM likelihood comparison
Using child AM/LM (N-gram)
if Using adult AM/LM (described grammar)
Likelihood threshold
reject
Using child AM/LM (described grammar)
Text, used decoder info.
Response generation module (next slide)
12
Response generation module
if decoded by described grammar
Generating surface sentence using rules
(separated for adult and for child)
Response sentence, Web URL, Animation
text
decoder
Searching the most similar example question and
outputting the corresponding response
if decoded by N-gram
if child
if adult
QADB (Pairs of example question and system
response ) for child
QADB (Pairs of example question and system
response ) for adult
13
From Takemaru-kun Video

noise, cough, laugh rejection
Adult/child discrimination

14
Outline

Motivation and Goal
Introduction of speech oriented information
guidance systems, Kita-chan and Kita-robo
Kita-chan and Kita-robo database
System inputs of 21 months (since Mar. 2006)
Eight months database with manual transcription
and label
Preliminary analysis

15
Database

All system inputs to the two systems are recorded
Twenty-one months (in present)
Noise input is also preserved.
Manual Database
Database of first eight months from each systems
with manual transcription and labels by hearing
of 5 labelers

16
Manual Database

Waveform with speech/noise classification
information
if speech,
Transcription
Transcription
Pronunciation using Kana
Noise tag insertion to them
noise, background conversation, lack of initial
part, overflow, etc.
Valid / Invalid classification
Valid Utterance which intends to get a system
response
Invalid Utterance which does not intend to get a
system response
Label
Age group
Pre-school / lower grade student / higher grade
student / adult / elderly
Gender
All labels are given subjectively by hearing.
(Appropriate system response for Valid utterance)

17
Operational results from Database
Numbers of valid utterance input to Kita-chan.
7 months (2006/04 to 2006/07) total Valid
input total 14,682 utterances Invalid input
total 12,849 utterances
18
Operational results from Database
Numbers of valid utterance input to Kita-robo.
7 months (2006/04 to 2006/07) total Valid
input total 27,397 utterances Invalid input
total 21,637 utterances
19
Conclusion