Development of conversational interfaces at Nokia Research Center - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Development of conversational interfaces at Nokia Research Center

Description:

1 NOKIA NRC kieliteknologia kurssi.PPT/ 14.10.2002 / Boda P ter ... this zappy musical about three serviceman buddies reunited after 10 years. ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 27
Provided by: bod95
Category:

less

Transcript and Presenter's Notes

Title: Development of conversational interfaces at Nokia Research Center


1
Development of conversational interfaces at
Nokia Research Center
  • Boda Péter Pál
  • peter.boda_at_nokia.com
  • Language Technology Applications, Voice
    Interfaces Group
  • Speech and Audio Systems Laboratory
  • Nokia Research Center
  • 14 October, 2002

2
Contents
  • Background
  • personal
  • Language Technology and Applications group at NRC
  • A commercial implementation Nokia One Voice
    Service
  • Overview of CATCH-2004 multilingual
    conversational interface
  • Demos
  • Summary

3
Personal background
  • Born in 1965, Miskolc, Hungary
  • M.Sc. in Telecommunications, 1991, Budapest,
    Tech. Univ. of Budapest
  • Post-graduate studies TUB 1991-1994, HUT
    1992-1994, Nijmegen 1995
  • Lic. Tech. Speech Technology and Neural Networks,
    1995, Helsinki, HUT
  • Working on
  • speech analysis 1990-1995
  • speech recognition 1995-1997
  • spoken dialogue systems, language technology
    1996-
  • Interest
  • Natural Language Understanding (semantic
    decoding)
  • Dialogue Management
  • Processing multimodal and contextual input

4
Language Technology and Applications
  • Mission develop language technology for Nokias
    offering
  • Dialogue-based application development for
    telecommunication (mainly network-based
    implementations)
  • Seamless integration of Natural Language
    Understanding technology to user interfaces
  • Covering the entire development process
  • conceptual design
  • data collection and analysis
  • grammar building and tuning, NLU training
    testing
  • Wizard-of-Oz experiments
  • type-in and speech-enabled tests
  • objective and subjective evaluation
  • human factors consideration, usability studies

Personnel a diverse team of linguists, software
and telecomm engineers
5
What will new generation of speech interfaces
bring?
  • Enhanced usability
  • - naturalness in terms of linguistic expressions
  • - ease of use
  • - human-human like dialogues
  • - accelerated system-user interactions
  • Well-defined framework to port to other languages
    tasks
  • - end-to-end solutions (design, data collection,
    Wizard-of-Oz studies, implementation, test,
    assessment)
  • - shortened development cycle (development tools).

6
A commercial implementation Nokia One Voice
Service
  • http//www.nokia.com/nokiaone

7
Nokia One Voice Service
8
Speech interface for e-mail reading
  • Features
  • DTMF and speech access (language of the user
    interface is English)
  • dialogue-based implementation with mid-complex
    task grammar
  • functionalitites
  • browsing e-mails
  • selecting for reading
  • send in SMS
  • reply with voice clip
  • accurate language identification
  • text-to-speech (TTS) for several languages when
    reading back e-mails
  • English, Finnish, Italian, French, German,
    Spannish
  • e-mail preprocessors prior to TTS

9
Some general comments
  • Before implementing any speech interface
  • think about its role replacement or addition?
  • if addition, how it will help/complete the
    current user interface
  • is there any real added value it can bring?
    acceleration, security?
  • think carefully the efforts you need to develop
    a solution
  • amount and ratio of research and implementation
  • never underestimate the results of
    user/usability tests go for real
  • TTS is important, users comment primarily that
    and not the
  • recognition part. TTS can mean language
    technology, as well.

10
An EU project CATCH-2004 Converse in
AThens-2004, Cologne, Helsinki
  • http//www.catch2004.org/

11
A multi-multi-multi project .
Jan 2000-June 2002 30 months
7 partners 5 countries
603 Person-Months 6.5 M (3.25 from EC)
2 demonstrators Athens, Helsinki 1 tester
Cologne
16 deliverables 11 milestones
12
Consortium
Finland
France, Germany, Greece, Czech Republic
Germany
Greece
Gerhard-Mercator Universität Duisburg
NTUA
13
Overview
  • The "flag-ship" of the 5th EU-IST programme
  • Objectives
  • conversational interface to (city) information
    services build various applications, possessing
    high performance accuracy and satisfying
    requirements set for well-functioning spoken
    dialogue systems
  • multilingual (Finnish, English, German, Greek)
  • multidevice (kiosk, phone, smart wireless)
  • multimodal (GUI, speech)
  • Internet infrastructure (WAP, VoiceXML, remote
    databases)
  • Nokia's role
  • WAP access
  • Multimodal browsing
  • NLU development for Helsinki demonstrator
  • Helsinki demos
  • 2000 Art-Goes-Kapakka - just to experiment the
    NLU toolkit
  • 2001 Program Guide Information Service - has
    relevance to other project

14
Inside the NLU module
Database
Natural Language Understanding (NLU) incl.
Dialogue Manager
Speech recognition
Speech synthesis
15
What does NLU module do?
(1) Interprets the meaning of the user utterance
and decides what to do with the
utterance. (2) Interacts with the backend
database (3) Decides what kind of answer will be
provided
  • The NLU toolkit employed in CATCH-2004
  • IBM ViaVoicePhone Telephony Natural Language
    Tools
  • Statistical approach
  • The speaker is not restricted to any particular
    vocabulary or commands but can freely express the
    request by using natural language expressions.

16
The components of NLU module
  • The NLU module contains four main components.

Sequence of words, as the LM allows.
Output of the recogniser
Statistical Classer
Extracts the key concepts of the utterance.
Transforms certain concepts to a form which is
understood by the backend database.
Canonicalizer
Determines what to do with the key concepts from
the classer.
Statistical Parser
Directs the interaction between the user and the
system.
Dialog Manager
17
Multilingual Architecture
Speech recognition
NLU
Multilingual classer
Multilingual TASK
Multilingual parser (Lang ID)
Multilingual LM/Voc
Canonicalizer
Dialog manager
Multilingual AM
Answer generation (language-dependent TTS)
LM language model Voc vocabulary AM acoustic
models TTS text-to-speech Lang ID language
identification
18
Historically speaking .
  • Helsinki demos
  • 2000 Art-Goes-Kapakka - just to experiment the
    NLU toolkit
  • 2001 Program Guide Information Service more
    realistic
  • AGK
  • developed as the first NLU application at Nokia
  • good exercise to walk through (with sweat) the
    entire development process
  • strict co-operation with IBM, regular consulting
  • results were comparable to others
  • easiness handlable size complexity, (almost)
    available database
  • PGIS
  • we wanted a more real-life application
  • Electronic Program Guides are coming to use as
    digital TV spreads
  • on-going standardisations (MPEG-7 -gt program
    types and sub-types)

19
Supported functionalities in PGIS
  • A LIST based on the following parameters
  • A QUERY about a particular program

DATE PROGRAM NAME PROGRAM TYPE
TIME LANGUAGE PERFORMER CHANNEL
PRICE NEW
DATE YEAR PRICE TIME COUNTRY OF
ORIGIN RESTRICTIONS DURATION EPISODE
TITLE DESCRIPTION CHANNEL WEB
ADDRESS PERFORMERS RE-RUN LANGUAGE PEOPLE
BEHIND THE PROGRAM PROGRAM TYPE SUBTITLES
20
Gaining better performance via iterative tests
error rate
English system
automatic speech recognition (word error rate)
21
A real (!!!!) dialogue with PGIS
  • S Welcome to Program Guide. What would you like
    to know?
  • U Could you please tell me about movies tonight?
  • S I found the following movie programs on Monday
    between 6 pm and 12 pm
  • It's Always Fair Weather, The Murder Men, Boom
    Town, and The Hook.
  • U What time is the first one?
  • S It's Always Fair Weather starts at 5 past 5
    pm. How else can I help you?
  • U How long does it last?
  • S It's Always Fair Weather lasts 1 hours 40
    minutes. What information can I offer you next?
  • U Who are the actors?
  • S It's Always Fair Weather. The actors are Gene
    Kelly, Dan Dailey, Michael Kidd,
  • Cyd Charisse. How else can I help you?
  • U Can I have a description for it please?
  • S I can offer you the following type of
    description about the program called
  • It's Always Fair Weather. The unbeatable musical
    team of Kelly and Donen is behind
  • this zappy musical about three serviceman
    buddies reunited after 10 years...
  • What else can I do for you?
  • U That's all for now. Good-bye.
  • S Thank you for using the Program Guide.
    Good-bye.

22
Welcome to Program Guide! How may I help you?
Movies tonight?
NEW CONTEXT!
Michael Douglas is in Coma
Channels?
I found the following programs
Movies with Michael Douglas?
BBC World, CNN, Eurosport, TCM
. starting time of the 1st?
NEW CONTEXT!
NEW CONTEXT!
Whats on BBC World tonight at 10pm?
sorry, no programs for youngsters (in Finnish)
it starts at 5.15pm
World News at 10pm
. duration? (in Finnish)
Programs for youngsters? (in Finnish)
. duration? (in Finnish)
it is 1h 25min long (in Finnish)
NEW CONTEXT!
it takes 5 minutes (in Finnish)
. description?
What kind of info I can offer next? (in Finnish)
Thats all for now.
I can offer the following description . To
text message?
no text message, thanks. (in Finnish)
Good bye!
23
What lessons have we learnt?
  • In general
  • Research project has its own difficulties risk
    must be taken but within limits
  • Know your partners, their capabilities and be
    initiative in co-operation
  • Strong dependency on one partners technology
    might be problematic
  • About technology
  • Good to have linguists around, although many of
    the development phases require engineering
    skills
  • Everything should be planned as precisely as
    possible, even tests and evaluation methods
  • The best results are gained with successive
    test-evaluation-improvement cycles
  • This kind of technology is quite new ? the users
    often dont know the possibilities of the system,
    therefore the instructions must be very guiding
    and clear
  • difficult if only a demo system available with
    fake database, without comparable traditional
    system
  • test users must be awarded very crucial,
    otherwise no motivation
  • The real picture about system functionality and
    operability can be gained only from real users in
    real situations.

24
Finally .
  • Gábor Dénes (1969)
  • "If enough people work hard enough on the problem
    of speech recognition, it will be solved by mid
    next century."

25
References
  • http//www.nokia.com/nokiaone
  • Oria, D. Koskinen, E., E-Mail Goes Mobile The
    design and implementation of a spoken language
    interface to e-mail ICSLP2002
  • http//www.catch2004.org/
  • Harrikari, H., M. Mast, T. Ross H. Schulz
    2002, Different Approaches to Build Multilingual
    Conversational Systems. 5th International
    Conference on Text, Speech and Dialogue, TSD
    2002, Brno, Czech Republic.
  • Kleindienst J., L. Seredi, P. Kapanen J.
    Bergman 2002a, CATCH-2004 Multi-Modal browser
    Overview Description with Usability Analysis.
    IEEE 4th International Conference on Multi-modal
    Interfaces, Pittsburgh, PA, U.S.A.
  • Kleindienst J., L. Seredi, P. Kapanen J.
    Bergman 2002b, Loosely-coupled approach towards
    multi-modal browsing, Submitted to Universal
    Access in Information Society magazines special
    issue on Multi-modal User Interfaces.
  • Boda, P. et al. Subjective Evaluation of a
    Personalised Conversational Interface to a
    Program Guide Information System Submitted to
    the User Modeling and User-Adapted Interaction
    journal (UMUAI) Special Issue on User Modeling
    and Personalization for Television.

26
Abbreviations
  • AM
  • ASR
  • CTI
  • DM
  • LM
  • NLU
  • SUI
  • TTS
  • VVT
  • WOZ
  • acoustic model
  • automatic speech recognition
  • computer-telephone integration
  • dialogue manager
  • langauge model
  • natural language understanding
  • speech user interface
  • text-to-speech synthesis
  • ViaVoice Telephony (IBM's speech resources)
  • wizard of Oz
Write a Comment
User Comments (0)
About PowerShow.com