Title: Development of conversational interfaces at Nokia Research Center
1Development of conversational interfaces at
Nokia Research Center
-
- Boda Péter Pál
- peter.boda_at_nokia.com
- Language Technology Applications, Voice
Interfaces Group - Speech and Audio Systems Laboratory
- Nokia Research Center
- 14 October, 2002
2Contents
- Background
- personal
- Language Technology and Applications group at NRC
- A commercial implementation Nokia One Voice
Service - Overview of CATCH-2004 multilingual
conversational interface - Demos
- Summary
3Personal background
- Born in 1965, Miskolc, Hungary
- M.Sc. in Telecommunications, 1991, Budapest,
Tech. Univ. of Budapest - Post-graduate studies TUB 1991-1994, HUT
1992-1994, Nijmegen 1995 - Lic. Tech. Speech Technology and Neural Networks,
1995, Helsinki, HUT - Working on
- speech analysis 1990-1995
- speech recognition 1995-1997
- spoken dialogue systems, language technology
1996- - Interest
- Natural Language Understanding (semantic
decoding) - Dialogue Management
- Processing multimodal and contextual input
4Language Technology and Applications
- Mission develop language technology for Nokias
offering - Dialogue-based application development for
telecommunication (mainly network-based
implementations) - Seamless integration of Natural Language
Understanding technology to user interfaces - Covering the entire development process
- conceptual design
- data collection and analysis
- grammar building and tuning, NLU training
testing - Wizard-of-Oz experiments
- type-in and speech-enabled tests
- objective and subjective evaluation
- human factors consideration, usability studies
Personnel a diverse team of linguists, software
and telecomm engineers
5What will new generation of speech interfaces
bring?
- Enhanced usability
- - naturalness in terms of linguistic expressions
- - ease of use
- - human-human like dialogues
- - accelerated system-user interactions
- Well-defined framework to port to other languages
tasks - - end-to-end solutions (design, data collection,
Wizard-of-Oz studies, implementation, test,
assessment) - - shortened development cycle (development tools).
6A commercial implementation Nokia One Voice
Service
- http//www.nokia.com/nokiaone
7Nokia One Voice Service
8Speech interface for e-mail reading
- Features
-
- DTMF and speech access (language of the user
interface is English) - dialogue-based implementation with mid-complex
task grammar - functionalitites
- browsing e-mails
- selecting for reading
- send in SMS
- reply with voice clip
- accurate language identification
- text-to-speech (TTS) for several languages when
reading back e-mails - English, Finnish, Italian, French, German,
Spannish - e-mail preprocessors prior to TTS
9Some general comments
- Before implementing any speech interface
- think about its role replacement or addition?
- if addition, how it will help/complete the
current user interface - is there any real added value it can bring?
acceleration, security? - think carefully the efforts you need to develop
a solution - amount and ratio of research and implementation
- never underestimate the results of
user/usability tests go for real - TTS is important, users comment primarily that
and not the - recognition part. TTS can mean language
technology, as well.
10An EU project CATCH-2004 Converse in
AThens-2004, Cologne, Helsinki
11A multi-multi-multi project .
Jan 2000-June 2002 30 months
7 partners 5 countries
603 Person-Months 6.5 M (3.25 from EC)
2 demonstrators Athens, Helsinki 1 tester
Cologne
16 deliverables 11 milestones
12Consortium
Finland
France, Germany, Greece, Czech Republic
Germany
Greece
Gerhard-Mercator Universität Duisburg
NTUA
13Overview
- The "flag-ship" of the 5th EU-IST programme
- Objectives
- conversational interface to (city) information
services build various applications, possessing
high performance accuracy and satisfying
requirements set for well-functioning spoken
dialogue systems - multilingual (Finnish, English, German, Greek)
- multidevice (kiosk, phone, smart wireless)
- multimodal (GUI, speech)
- Internet infrastructure (WAP, VoiceXML, remote
databases) - Nokia's role
- WAP access
- Multimodal browsing
- NLU development for Helsinki demonstrator
- Helsinki demos
- 2000 Art-Goes-Kapakka - just to experiment the
NLU toolkit - 2001 Program Guide Information Service - has
relevance to other project
14Inside the NLU module
Database
Natural Language Understanding (NLU) incl.
Dialogue Manager
Speech recognition
Speech synthesis
15What does NLU module do?
(1) Interprets the meaning of the user utterance
and decides what to do with the
utterance. (2) Interacts with the backend
database (3) Decides what kind of answer will be
provided
- The NLU toolkit employed in CATCH-2004
- IBM ViaVoicePhone Telephony Natural Language
Tools - Statistical approach
- The speaker is not restricted to any particular
vocabulary or commands but can freely express the
request by using natural language expressions.
16The components of NLU module
- The NLU module contains four main components.
Sequence of words, as the LM allows.
Output of the recogniser
Statistical Classer
Extracts the key concepts of the utterance.
Transforms certain concepts to a form which is
understood by the backend database.
Canonicalizer
Determines what to do with the key concepts from
the classer.
Statistical Parser
Directs the interaction between the user and the
system.
Dialog Manager
17Multilingual Architecture
Speech recognition
NLU
Multilingual classer
Multilingual TASK
Multilingual parser (Lang ID)
Multilingual LM/Voc
Canonicalizer
Dialog manager
Multilingual AM
Answer generation (language-dependent TTS)
LM language model Voc vocabulary AM acoustic
models TTS text-to-speech Lang ID language
identification
18Historically speaking .
- Helsinki demos
- 2000 Art-Goes-Kapakka - just to experiment the
NLU toolkit - 2001 Program Guide Information Service more
realistic - AGK
- developed as the first NLU application at Nokia
- good exercise to walk through (with sweat) the
entire development process - strict co-operation with IBM, regular consulting
- results were comparable to others
- easiness handlable size complexity, (almost)
available database - PGIS
- we wanted a more real-life application
- Electronic Program Guides are coming to use as
digital TV spreads - on-going standardisations (MPEG-7 -gt program
types and sub-types)
19Supported functionalities in PGIS
- A LIST based on the following parameters
- A QUERY about a particular program
DATE PROGRAM NAME PROGRAM TYPE
TIME LANGUAGE PERFORMER CHANNEL
PRICE NEW
DATE YEAR PRICE TIME COUNTRY OF
ORIGIN RESTRICTIONS DURATION EPISODE
TITLE DESCRIPTION CHANNEL WEB
ADDRESS PERFORMERS RE-RUN LANGUAGE PEOPLE
BEHIND THE PROGRAM PROGRAM TYPE SUBTITLES
20Gaining better performance via iterative tests
error rate
English system
automatic speech recognition (word error rate)
21A real (!!!!) dialogue with PGIS
- S Welcome to Program Guide. What would you like
to know? - U Could you please tell me about movies tonight?
- S I found the following movie programs on Monday
between 6 pm and 12 pm - It's Always Fair Weather, The Murder Men, Boom
Town, and The Hook. - U What time is the first one?
- S It's Always Fair Weather starts at 5 past 5
pm. How else can I help you? - U How long does it last?
- S It's Always Fair Weather lasts 1 hours 40
minutes. What information can I offer you next? - U Who are the actors?
- S It's Always Fair Weather. The actors are Gene
Kelly, Dan Dailey, Michael Kidd, - Cyd Charisse. How else can I help you?
- U Can I have a description for it please?
- S I can offer you the following type of
description about the program called - It's Always Fair Weather. The unbeatable musical
team of Kelly and Donen is behind - this zappy musical about three serviceman
buddies reunited after 10 years... - What else can I do for you?
- U That's all for now. Good-bye.
- S Thank you for using the Program Guide.
Good-bye.
22Welcome to Program Guide! How may I help you?
Movies tonight?
NEW CONTEXT!
Michael Douglas is in Coma
Channels?
I found the following programs
Movies with Michael Douglas?
BBC World, CNN, Eurosport, TCM
. starting time of the 1st?
NEW CONTEXT!
NEW CONTEXT!
Whats on BBC World tonight at 10pm?
sorry, no programs for youngsters (in Finnish)
it starts at 5.15pm
World News at 10pm
. duration? (in Finnish)
Programs for youngsters? (in Finnish)
. duration? (in Finnish)
it is 1h 25min long (in Finnish)
NEW CONTEXT!
it takes 5 minutes (in Finnish)
. description?
What kind of info I can offer next? (in Finnish)
Thats all for now.
I can offer the following description . To
text message?
no text message, thanks. (in Finnish)
Good bye!
23What lessons have we learnt?
- In general
- Research project has its own difficulties risk
must be taken but within limits - Know your partners, their capabilities and be
initiative in co-operation - Strong dependency on one partners technology
might be problematic - About technology
- Good to have linguists around, although many of
the development phases require engineering
skills - Everything should be planned as precisely as
possible, even tests and evaluation methods - The best results are gained with successive
test-evaluation-improvement cycles - This kind of technology is quite new ? the users
often dont know the possibilities of the system,
therefore the instructions must be very guiding
and clear - difficult if only a demo system available with
fake database, without comparable traditional
system - test users must be awarded very crucial,
otherwise no motivation - The real picture about system functionality and
operability can be gained only from real users in
real situations.
24Finally .
- Gábor Dénes (1969)
- "If enough people work hard enough on the problem
of speech recognition, it will be solved by mid
next century."
25References
- http//www.nokia.com/nokiaone
- Oria, D. Koskinen, E., E-Mail Goes Mobile The
design and implementation of a spoken language
interface to e-mail ICSLP2002 - http//www.catch2004.org/
- Harrikari, H., M. Mast, T. Ross H. Schulz
2002, Different Approaches to Build Multilingual
Conversational Systems. 5th International
Conference on Text, Speech and Dialogue, TSD
2002, Brno, Czech Republic. - Kleindienst J., L. Seredi, P. Kapanen J.
Bergman 2002a, CATCH-2004 Multi-Modal browser
Overview Description with Usability Analysis.
IEEE 4th International Conference on Multi-modal
Interfaces, Pittsburgh, PA, U.S.A. - Kleindienst J., L. Seredi, P. Kapanen J.
Bergman 2002b, Loosely-coupled approach towards
multi-modal browsing, Submitted to Universal
Access in Information Society magazines special
issue on Multi-modal User Interfaces. - Boda, P. et al. Subjective Evaluation of a
Personalised Conversational Interface to a
Program Guide Information System Submitted to
the User Modeling and User-Adapted Interaction
journal (UMUAI) Special Issue on User Modeling
and Personalization for Television.
26Abbreviations
- AM
- ASR
- CTI
- DM
- LM
- NLU
- SUI
- TTS
- VVT
- WOZ
- acoustic model
- automatic speech recognition
- computer-telephone integration
- dialogue manager
- langauge model
- natural language understanding
- speech user interface
- text-to-speech synthesis
- ViaVoice Telephony (IBM's speech resources)
- wizard of Oz