Multi-Modal%20Dialogue%20in%20Personal%20Navigation%20Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Multi-Modal%20Dialogue%20in%20Personal%20Navigation%20Systems

Description:

General description of an application that could be operated in multiple ... Being the passenger in a car. Being in highly noisy environment. Their conclusion ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 24
Provided by: Arthu61
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Multi-Modal%20Dialogue%20in%20Personal%20Navigation%20Systems


1
Multi-Modal Dialogue in Personal Navigation
Systems
  • Arthur Chan

2
Introduction
  • The term multi-modal
  • General description of an application that could
    be operated in multiple input/output modes.
  • E.g
  • Input voice, pen, gesture, face expression.
  • Output voice, graphical output

3
Multi-modal Dialogue (MMD) in Personal Navigation
System
  • Motivation of this presentation
  • Navigation System provides MMD
  • an interesting scenario
  • a case why MMD is useful
  • Structure of this presentation
  • 3 system papers
  • ATT MATCH
  • speech and pen input with pen gesture
  • Speechworks Walking Direction System
  • speech and stylus input
  • Univ. of Saarland REAL
  • Speech and pen input
  • Both GPS and a magnetic tracker were used.

4
Multi-modal Language Processing for Mobile
Information Access
5
Overall Function
  • A working city guide and navigation system
  • Easy access restaurant and subway information
  • Runs on a Fujitsu pen computer
  • Users are free to
  • give speech command
  • draw on display with stylus

6
Types of Inputs
  • Speech Input
  • show cheap italian restaurants in chelsea
  • Simultaneous Speech and Pen Input
  • Circle and area
  • Say show cheap italian restaurants in
    neighborhood at the same time.
  • Functionalities include
  • Review
  • Subway routine

7
Input Overview
  • Speech Input
  • Use ATT Watson speech recognition engine
  • Pen Input (electron Ink)
  • Allow usage of pen gesture.
  • It could be a complex, pen input
  • Use special aggregation techniques for all this
    gesture.
  • Inputs would be combined using lattice
    combination.

8
Pen Gesture and Speech Input
  • For example
  • U How do I get to this place?
  • ltuser circled one of the restaurant displayed on
    the mapgt
  • S Where do you want to go from?
  • U 25th St 3rd Avenue
  • ltuser writes 25th St 3rd Avenuegt
  • ltSystem compute the shortest route gt

9
Summary
  • Interesting aspects of the system
  • Illustrate the real life scenario where
    multi-modal inputs could be used
  • Design issue
  • how different inputs should be used together?
  • Algorithmic issue
  • how different inputs should be combined together?

10
Multi-modal Spoken Dialog with Wireless Devices
11
Overview
  • Work by Speechworks
  • Jointly conducted by speech recognition and user
    interface folks
  • Two distinct elements
  • Speech recognition
  • In a embedded domain, which speech recognition
    paradigm should be used?
  • embedded speech recognition?
  • network speech recognition?
  • distributed speech recognition?
  • User interface
  • How to situationlize the application?

12
Overall Function
  • Walking Directions Application
  • Assume user walking in an unknown city
  • Compaq iPAQ 3765 PocketPC
  • Users could
  • Select a city, start-end addresses
  • Display a map
  • Control the display
  • Display directions
  • Display interactive directions in the form of
    list of steps.
  • Accept speech input and stylus input
  • Not pen gesture.

13
Choice of speech recognition paradigm
  • Embedded speech recognition
  • Only simple commands could be used due to
    computation limits.
  • Network speech recognition
  • Bandwidth is required
  • Sometimes network would be cut-off
  • Distributed speech recognition
  • Client takes care of front-end
  • Server takes care of decoding
  • ltIssues higher complexity of the code. gt

14
User Interface
  • Situationalization
  • Potential scenario
  • Sitting at a desk
  • Getting out of a cab, building, subway and
    preparing to walk somewhere
  • Walking somewhere with hands free
  • Walking somewhere carrying things
  • Driving somewhere in heavy traffic
  • Driving somewhere in light traffic
  • Being the passenger in a car
  • Being in highly noisy environment.

15
Their conclusion
  • Balances of audio and visual information
  • Could be reduced to 4 complementary components
  • Single-modal
  • 1, Visual Mode
  • 2, Audio Mode
  • Multi-modal
  • 3, Visual dominant
  • 4, Visual dominant

16
A glance of UI
17
Summary
  • Interesting aspects
  • Great discussion on
  • how speech recognition could be used in an
    embedded domain
  • how the user would use the dialogue application

18
Multi-modal Dialog in a Mobile Pedestrian
Navigation System
19
Overview
  • Pedestrian Navigation System
  • Two components
  • IRREAL indoor navigation system
  • Use magnetic tracker
  • ARREAL outdoor navigation system
  • Use GPS

20
Speech Input/Output
  • Speech Input
  • HTK / IBM Viavoice embedded and Logox was being
    evaluated
  • Speech Output
  • Festival

21
Visual output
  • Both 2D and 3D spatialization supported

22
Interesting aspects
  • Tailor the system for elderly people
  • Speaker clustering
  • to improve recognition rate for elderly people
  • Model selection
  • Choose from two models based on likelihood
  • Elderly models
  • Normal adult models

23
Conclusion
  • Aspects of multi-modal dialogue
  • What kind of inputs should be used?
  • How speech and other inputs could be
    combined/interacted?
  • How users would use the system?
  • How the system should respond to the users?
Write a Comment
User Comments (0)
About PowerShow.com