Title: Development and Operational Result of Real Environment Speechoriented Guidance Systems Kitarobo and
1Development and Operational Result of Real
Environment Speech-oriented Guidance Systems
Kita-robo and Kita-chan
Hiromichi KAWANAMI, Tobias CINCAREK, Shota
TAKEUCHI, Hiroshi SARUWATARI, Kiyohiro SHIKANO
kawanami_at_is.naist.jp
Nara Institute of Science and Technology
2Outline
- Motivation and Goal
- Introduction of speech oriented information
guidance systems, Kita-chan and Kita-robo - Kita-chan and Kita-robo user speech database
3Motivation and Goal
- Investigation of data portability
- How much do AM/LM/example questions by a dialogue
system contribute to a new dialogue system? - How much should we transcribe and label data of
the new system manually in addition to the above
(to realize response accuracy to the level of the
preceding system)? - ? Cincarek, et al., Trans. IEICE(E) (to be
published) - The result enables to estimate cost for
developing a new dialogue system. - Comparison of CG agent system and Robot body
- Which interface is used? By what age groups?
- The result enables to support to design
appropriate interfaces.
4Outline
- Motivation and Goal
- Introduction of speech oriented information
guidance systems, Kita-chan and Kita-robo - The preceding system, Takemaru-kun
- Kita-chan with CG agent and Kita-robo with
robot-body - System structures
- Speech recognition module
- Response generation module
- Kita-chan and Kita-robo database
5The preceding system, Takemaru-kun
- Location
- Entrance of a public center
- Domain
- Facilities of the center
- Local information (the city, sightseeing,
traffic, public institution) - General (News, weather forecast, date, time)
- Character profile, Greetings
- Dialogue strategy
- Example-based one-question-one-answer
- Interface
- User input speech, mouse
- System response synthetic speech, CG animation,
web browser - Operation period Nov. 2002 to present
6Appearance of Takemaru-kun
The North Community Center, Ikoma city, Nara
CG agent animation
Web browser
directional microphone
mouse
speaker
Takemaru-kun is a mascot character of Ikoma city.
7Kita-chan and Kita-robo appearances
Railway station, Gakken Kita-Ikoma In Ikoma
city, Nara
Kita-chan dialogue system
Kita-robo dialogue system
8Appearance of Kita-chan
speakers
directional microphone
Web browsers
CG agent animation
Touch panel display
Kita-chan is a mascot character of
Gakken-kita-ikoma station.
9Appearance of Kita-robo
speakers
(Movie camera for speaker detection (plan))
Web browser
CG eyes animation
directional microphone
No mouse or touch panel
10System comparison
11Speech recognition module
GMMs (adult /child / laugh/cough/noise)
Mic. input
if noise/cough/laugh, reject
Power and ZC threshold
Speech / noise discrimination using GMMs and
length
if speech, continue decoding
Duration threshold
Parallel decoding
System input
Using adult AM/LM (N-gram)
AM likelihood comparison
Using child AM/LM (N-gram)
if Using adult AM/LM (described grammar)
Likelihood threshold
reject
Using child AM/LM (described grammar)
Text, used decoder info.
Response generation module (next slide)
12Response generation module
if decoded by described grammar
Generating surface sentence using rules
(separated for adult and for child)
Response sentence, Web URL, Animation
text
decoder
Searching the most similar example question and
outputting the corresponding response
if decoded by N-gram
if child
if adult
QADB (Pairs of example question and system
response ) for child
QADB (Pairs of example question and system
response ) for adult
13From Takemaru-kun Video
- noise, cough, laugh rejection
- Adult/child discrimination
14Outline
- Motivation and Goal
- Introduction of speech oriented information
guidance systems, Kita-chan and Kita-robo - Kita-chan and Kita-robo database
- System inputs of 21 months (since Mar. 2006)
- Eight months database with manual transcription
and label - Preliminary analysis
15Database
- All system inputs to the two systems are recorded
- Twenty-one months (in present)
- Noise input is also preserved.
- Manual Database
- Database of first eight months from each systems
with manual transcription and labels by hearing
of 5 labelers
16Manual Database
- Waveform with speech/noise classification
information - if speech,
- Transcription
- Transcription
- Pronunciation using Kana
- Noise tag insertion to them
- noise, background conversation, lack of initial
part, overflow, etc. - Valid / Invalid classification
- Valid Utterance which intends to get a system
response - Invalid Utterance which does not intend to get a
system response - Label
- Age group
- Pre-school / lower grade student / higher grade
student / adult / elderly - Gender
- All labels are given subjectively by hearing.
- (Appropriate system response for Valid utterance)
17Operational results from Database
Numbers of valid utterance input to Kita-chan.
7 months (2006/04 to 2006/07) total Valid
input total 14,682 utterances Invalid input
total 12,849 utterances
18Operational results from Database
Numbers of valid utterance input to Kita-robo.
7 months (2006/04 to 2006/07) total Valid
input total 27,397 utterances Invalid input
total 21,637 utterances
19Conclusion
- Introduction of Kita-chan and Kita-robo
- Kita-chan with CG agent and Kita-robo with
robot-body - Real environment system at a Railway station
- Database
- System inputs of 21 months (since Mar. 2006)
- Eight months database with manual transcription
and label - Operational result
- The Number of Kita-robo inputs are about two
times to Kita-chan.
20- Thank you for your attention.