Multi-Modal%20Dialogue%20in%20Personal%20Navigation%20Systems

About This Presentation

Title:

Multi-Modal%20Dialogue%20in%20Personal%20Navigation%20Systems

Description:

General description of an application that could be operated in multiple ... Being the passenger in a car. Being in highly noisy environment. Their conclusion ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 24

Provided by: Arthu61

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Multi-Modal%20Dialogue%20in%20Personal%20Navigation%20Systems

1
Multi-Modal Dialogue in Personal Navigation
Systems

Arthur Chan

2
Introduction

The term multi-modal
General description of an application that could
be operated in multiple input/output modes.
E.g
Input voice, pen, gesture, face expression.
Output voice, graphical output

3
Multi-modal Dialogue (MMD) in Personal Navigation
System

Motivation of this presentation
Navigation System provides MMD
an interesting scenario
a case why MMD is useful
Structure of this presentation
3 system papers
ATT MATCH
speech and pen input with pen gesture
Speechworks Walking Direction System
speech and stylus input
Univ. of Saarland REAL
Speech and pen input
Both GPS and a magnetic tracker were used.

4
Multi-modal Language Processing for Mobile
Information Access
5
Overall Function

A working city guide and navigation system
Easy access restaurant and subway information
Runs on a Fujitsu pen computer
Users are free to
give speech command
draw on display with stylus

6
Types of Inputs

Speech Input
show cheap italian restaurants in chelsea
Simultaneous Speech and Pen Input
Circle and area
Say show cheap italian restaurants in
neighborhood at the same time.
Functionalities include
Review
Subway routine

7
Input Overview

Speech Input
Use ATT Watson speech recognition engine
Pen Input (electron Ink)
Allow usage of pen gesture.
It could be a complex, pen input
Use special aggregation techniques for all this
gesture.
Inputs would be combined using lattice
combination.

8
Pen Gesture and Speech Input

For example
U How do I get to this place?
ltuser circled one of the restaurant displayed on
the mapgt
S Where do you want to go from?
U 25th St 3rd Avenue
ltuser writes 25th St 3rd Avenuegt
ltSystem compute the shortest route gt

9
Summary

Interesting aspects of the system
Illustrate the real life scenario where
multi-modal inputs could be used
Design issue
how different inputs should be used together?
Algorithmic issue
how different inputs should be combined together?

10
Multi-modal Spoken Dialog with Wireless Devices
11
Overview

Work by Speechworks
Jointly conducted by speech recognition and user
interface folks
Two distinct elements
Speech recognition
In a embedded domain, which speech recognition
paradigm should be used?
embedded speech recognition?
network speech recognition?
distributed speech recognition?
User interface
How to situationlize the application?

12
Overall Function

Walking Directions Application
Assume user walking in an unknown city
Compaq iPAQ 3765 PocketPC
Users could
Select a city, start-end addresses
Display a map
Control the display
Display directions
Display interactive directions in the form of
list of steps.
Accept speech input and stylus input
Not pen gesture.

13
Choice of speech recognition paradigm

Embedded speech recognition
Only simple commands could be used due to
computation limits.
Network speech recognition
Bandwidth is required
Sometimes network would be cut-off
Distributed speech recognition
Client takes care of front-end
Server takes care of decoding
ltIssues higher complexity of the code. gt

14
User Interface

Situationalization
Potential scenario
Sitting at a desk
Getting out of a cab, building, subway and
preparing to walk somewhere
Walking somewhere with hands free
Walking somewhere carrying things
Driving somewhere in heavy traffic
Driving somewhere in light traffic
Being the passenger in a car
Being in highly noisy environment.

15
Their conclusion

Balances of audio and visual information
Could be reduced to 4 complementary components
Single-modal
1, Visual Mode
2, Audio Mode
Multi-modal
3, Visual dominant
4, Visual dominant

16
A glance of UI
17
Summary

Interesting aspects
Great discussion on
how speech recognition could be used in an
embedded domain
how the user would use the dialogue application

18
Multi-modal Dialog in a Mobile Pedestrian
Navigation System
19
Overview

Pedestrian Navigation System
Two components
IRREAL indoor navigation system
Use magnetic tracker
ARREAL outdoor navigation system
Use GPS

20
Speech Input/Output

Speech Input
HTK / IBM Viavoice embedded and Logox was being
evaluated
Speech Output
Festival

21
Visual output

Both 2D and 3D spatialization supported

22
Interesting aspects

Tailor the system for elderly people
Speaker clustering
to improve recognition rate for elderly people
Model selection
Choose from two models based on likelihood
Elderly models
Normal adult models

23
Conclusion

Aspects of multi-modal dialogue
What kind of inputs should be used?
How speech and other inputs could be
combined/interacted?
How users would use the system?
How the system should respond to the users?

Write a Comment

User Comments (0)

About PowerShow.com

Multi-Modal%20Dialogue%20in%20Personal%20Navigation%20Systems - PowerPoint PPT Presentation

Multi-Modal%20Dialogue%20in%20Personal%20Navigation%20Systems

General description of an application that could be operated in multiple ... Being the passenger in a car. Being in highly noisy environment. Their conclusion ... – PowerPoint PPT presentation