Putthatthere: Voice and Gesture at the Graphics Interface Richard A' Bolt - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Putthatthere: Voice and Gesture at the Graphics Interface Richard A' Bolt

Description:

Walls house banks of loudspeakers on either side of ... Used by the Media Room ... associate this new name with the object that is being pointed to at the time ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 38
Provided by: infmU
Category:

less

Transcript and Presenter's Notes

Title: Putthatthere: Voice and Gesture at the Graphics Interface Richard A' Bolt


1
Put-that-there Voice and Gesture at the
Graphics InterfaceRichard A. Bolt
  • Presented by Kathleen Murray
  • Tracey Gordon
  • Grainne Sharkey

2
  • What is described in this presentation involves
    the user commanding simple shapes about a large
    graphics display with the use of voice and
    simultaneous pointing
  • This allows the free usage of pronouns which
    makes the interaction more natural
  • Gesture aided by voice gains precision in its
    power to reference, as you will see later in the
    presentation

3
Introduction
  • Basis of this presentation has been brought about
    by the Architecture Machine Group at the
    Massachusetts Institute of Techology
  • They have been experimenting with the conjoint
    use of voice-input and gesture recognition to
    command events on large graphics displays
  • Central Interest is the combination of voice and
    gesture into one modality
  • Approach involves significant use of pronouns
    as temporary variables to reference items on the
    display

4
  • Interactions to be described in this presentation
    are staged in the Architecture Machine Groups
    Media Room
  • Physical facility
  • Users terminal is literally a room into which
    they step

5
Media Room
  • Size of a personal office
  • Cabling, from devices used in the room, are
    hidden underfloor
  • Walls house banks of loudspeakers on either side
    of a large projection screen to the front of the
    user
  • Users chair incorprates two small joysticks on
    each arm sensitive to pressure and direction
  • Beside each joystick is a square-shaped touch
    sensitive pad

6
  • Colour TV monitors are situated on either side of
    the users chair with a transparent,
    touch-sensitive pad

7
  • The Media Room with its user chair plays a key
    role in the researching into a Spatial Data
    Management System (SDMS)
  • Spatially Indexing Data
  • Derives from our everyday experience of
    retrieiving items
  • Retrieval is natural and automatic
  • Even with a messy desk, for example, the user
    will have a well-known knowledge of where the
    items are located on the desk. They will have
    developed a mental image of the layout of the
    desk

8
  • SDMS
  • Navigate about to specific information
  • Used by the Media Room
  • Information appears in its entirety upon one of
    the colour TV monitors near the users chair
  • The user can move a you-are-here marker
    (transparent rectangular overlay) about the
    information using the chairs right-hand joystick
    or can directly touch the TV screen
  • This information can be displayed in increased
    detail on the large screen
  • Use the left-hand joystick to zoom-in on the
    information

9
  • The two spatial orders within the room
  • Virtual graphical space
  • The users immediate real space
  • can converge to become effectively one
    continuous interactive space
  • User awareness of this common space is implicit
  • User points, gestures, references up, down
    and so on, freely and naturally as the user is
    situated in a real space
  • With this interactive situation, it leads to two
    new technical offerings, those being
  • Connected speech recognition
  • Position sensing in space

10
Speech and Space The Technologies
  • Speech Recognisers
  • Two categories
  • Those which recognise discrete or isolated
    utterances - the speaker must talk to the system
    in a clipped or word-by-word style
  • Those which recognise connected speech
  • Connected speech recognisers allows up to five
    words or utterances per spoken sentence and no
    pauses between words

11
  • Speech Recognisers (Cont..)
  • Response time is about 300m/secs and output is a
    display of the text
  • Devices vocabulary
  • Held in recognisers active memory
  • A set of word reference patterns
  • Maximum of 120 words
  • If there is an optional discete utterance mode,
    then the size of the vocabulary may be larger,
    about 1000 words
  • System comes with a lightweight, head-mounted
    microphone which is used by the Media Room

12
  • Space
  • Space position and orientation sensing technology
    suitable for this Media Room was made by Polhemus
    Navigation Science Inc. of Essex, Vermont
  • Called Remote Object Position Attitude
    Measurement System (ROPAMS)
  • Essentials of the system are
  • Three coils are cross-linked into a plastic cube,
    their mountings mutually corresponds to the x,y
    and z spatial axes
  • Two of these cubes are involved one that acts as
    a transmitter and another that functions as a
    sensor
  • Arrangement of the coils in each cube creates an
    antenna that is sensitive in all three
    orientations

13
  • Space (Cont..)
  • Transmitter cube
  • Transmits a signal to the sensor cube
  • If signal isnt strong enough, this will generate
    an error and require to re-aim
  • In the Media Room, this cube sits on a block to
    the right of the users chair
  • Sensor Cube
  • Very lightweight
  • Point in space is determined by the three coils
    on this cube
  • In the Media Room, the user has the space-sensing
    cube attached to a wristband

14
  • We have now described to you what physical
    aspects are involved with the Media Room and the
    technologies developed to allow voice and gesture
    inputs at the graphics interface
  • We will now move on and present to you how this
    room (users terminal) and its related
    technologies (connected-speech recogniser and
    position sensing cube) work together to generate
    a natural and powerful interaction

15
Commands
  • The user is seated before the Media Rooms large
    screen.
  • They have a space-sensing cube attached to a
    watchband on their wrist, the systems microphone
    is ready and listening.
  • Some system commands, that demonstrate voice and
    pointing are the following.
  • Create
  • Move
  • Make that

16
Create
  • In the demonstration system, the large screen
    will initially be clear or have a simple backdrop
    such as a map.
  • Simple items are placed against this background.
  • These items will be basic shapes, such as
    circles, squares or diamonds.
  • These items can be moved about, replicated, their
    attributes altered, or ordered to vanish.
  • Variable attributes are colour and size.

17
  • The user points to some spot on the large screen.
  • A small, white x cursor on the screen provides
    running visual feedback for pointing.
  • The user then says Create a blue square there.
  • A blue square appears on the spot where the user
    is pointing.
  • As the size of the square is not given
    explicitly, the default size, which is medium
    will be used.
  • There is no default colour or shape, so these
    must be specified.

18
  • The position of the feed-back cursor on the
    screen at the time the spoken there occurs,
    becomes the spot where the item that is being
    created is placed.
  • Thus the occurrence of the spoken there is
    functionally a when.
  • In effect the command is a call to a create
    routine. The routine will require certain
    parameters to be supplied.
  • Before the user recites there, the parameter
    values are supplied, such as the shape, colour
    and size of the object that is going to be
    created.
  • The parameter input, completes the conjunction of
    the position pointing input with the utterance of
    there.

19
Move
  • There are a number of ways the user can move
    items about the screen.
  • Example Move the blue triangle to the right of
    the green square.
  • This example relies on voice mode only.
  • When the blue triangle is addressed, it
    de-saturates as immediate feedback, disappears
    from its present site to re-appear centered in a
    spot to the right of the green square.
  • A reasonable placement of the exact positioning
    to the right is executed.
  • The item is now where the user ordered it to be.

20
  • The user could also say Move that to the right
    of the green square.
  • This option employs the pronoun that, the user
    simultaneously points to what is intended.
  • In this mode of giving the command the user may
    not only omit the word blue and triangle,
    they do not need to know what the thing is, or
    what it is called.
  • That is thus defined as the item that is
    currently being pointed to.

21
  • The entire command move the blue triangle to the
    right of the green square can be shortened into
    put that there
  • A mini-thesaurus of common synonyms, such as
    move, put, etc., is built into the system
    vocabulary.
  • The Copy command is a variant of the move
    action, except that the image of the item to be
    moved also remains in place at the original spot.

22
Make that
  • Attributes of any item that the user has created
    by voice and gesture can be modified. Here, the
    attributes are colour and size.
  • Example Make the blue triangle smaller
  • This command causes the referenced item to be
    reduced in size.
  • Example Make that a large blue diamond
  • If this is uttered while the user is pointing at
    a small yellow circle, it will be transformed.

23
  • In the command line make that like that
  • the second that is, functionally, a when to
    read the x,y coordinate of pointing.
  • The item indicated when the second that is
    uttered becomes the model for change.
  • The first referenced item, is replaced in a
    copy like fashion by the second referenced
    item.
  • Delete
  • The delete command allows the user to drop
    selected items from the display. It can be
    accessed via voice or pointing.

24
  • I have described how the system generates a
    natural and powerful interaction, with the use of
    the commands that I have talked about.
  • Tracey will now discuss the Naming of the items
    and give a summary of the overall paper.

25
Naming
  • Using the Call that command the user can name
    objects that he/she points to on the screen
  • Eg Call that (pointing to a blue square)the
    calendar
  • Call that is processed by the recogniser
  • This tells the system that the naming command has
    been issued
  • The x, y co-ordinates of the object being pointed
    to are recorded by the host system
  • When the naming command is issued the system
    switches from recogniser mode to training mode

26
Recogniser and Training modes
  • Recogniser mode
  • the system is listening for keywords and commands
    that it recognises - call that
  • Training mode
  • the system records new words and adds them to a
    file - the calendar
  • The system records the last part of the sentence
  • It adds this as a new entry to its file of word
    reference patterns

27
  • It then returns to recogniser mode
  • It is ready for the next verbal input
  • Switching between recogniser mode and training
    mode takes a finite amount of time
  • A brief pause is required in the spoken command
    line to accommodate this
  • User tends to pause at that point anyway
  • waiting for feedback that they have actually
    contacted the item that they are pointing to
  • This pause masks the systems need to pause
    between recognition and training modes
  • It does however suggest a break down in general
    convenience of continuous Vs discrete speech
    input

28
Intelligent systems
  • Hopefully the system will one day be intelligent
    enough to somehow interpret as well as recognise
  • when it hears a certain keyword it will
    automatically switch from recognition mode to
    training mode, without the need to pause.

29
  • For the Call that command the intelligent
    system would
  • truncate the recognised keyword from the spoken
    sentence - e.g. Call that
  • and take the rest of the sentence to be the new
    name
  • associate this new name with the object that is
    being pointed to at the time
  • the blue square is now the calendar
  • To maintain overall co-ordination with the host
    system
  • the recogniser transmits ASCII codes for
    recognised or learned words
  • also transmits any relevant control codes

30
  • Advantage
  • eliminates the need for the user to pause within
    the spoken dialogue
  • Disadvantage
  • problem of Coarticulation
  • when the user speaks, the sound of the word is
    influenced by the words that either precede or
    follow the word in question
  • it is important that the user speaks very
    clearly, especially when adding a new word

31
Summary
  • The example used in this presentation shows
    simple objects being moved about a blank screen
  • It shows how easy it would be to use a similar
    system in real world situations
  • moving ships about a harbour map in planning a
    harbour facility
  • moving battalion formations about as overlays on
    a terrain map

32
  • The main advantage of using the Put-that-there
    system, is that you can indicate what you want to
    do with items in a natural, spontaneous way
  • pointing to them and addressing them in spoken
    words - not typed symbols
  • The use of pronouns make it even more simple and
    natural to use
  • An example of a system that uses Voice and
    Gesture recognition is IBMs DreamSpace
  • The system hears the users voice commands and
    sees their gestures

33
Critical Analysis
  • This paper suggests a system that provides a more
    efficient and natural method of communicating
    commands to a computer
  • The author doesnt provide an evaluation of the
    approach. No explicit evidence is offered to
    prove the system works as it is described.
  • A weakness of the system is that the speech
    recognition mechanism requires more intelligence
    to provide a truly natural interaction
  • the user has to pause when giving certain
    commands to allow time to process

34
  • The descriptions are convincing and the
    underlining argument for the success of such a
    system seems sound
  • but it is impossible to fully determine the
    success of such a system without any proof
  • A weakness of the paper is that it may spend too
    much time describing the specifics of hardware
    technologies of the Media Room
  • it doesnt seem necessary to go as in-depth as it
    does
  • the central part of the paper can be conveyed
    well enough without it
  • The paper would be most useful for people who
    have an interest in enhancing the quality of
    human computer interactions

35
How it relates to our Final Year Projects
  • One of the projects includes an intelligent
    spoken interface for an on-line banking system.
  • It already involves the use of voice directed to
    the interface, rather than the use of
    keyboard/mouse
  • To incorporate gesture recognition, the user
    could point at items on the interface to make the
    interaction more natural
  • e.g point to the Balance icon and say Show me
    this

36
  • Another of the projects provides a property buy
    sell on-line.
  • It does not currently use either voice or gesture
    recognition, but they could be included to add a
    more powerful and natural means of interacting
    with the system
  • the user could point to a property and ask to
    Show me the details of this
  • the user could point to the scroll arrows and say
    scroll up or scroll down

37
  • Another of the projects is to generate a 2D map
    from a 3D virtual environment.
  • The map will show the users position in the
    environment.
  • This system could be used in this project.
  • The user could point to an area on the map and
    ask what is this? a description of the area
    being pointed to could then be given.
  • The user can flag different areas on the map that
    they wish to go to, the user could point to a
    place where they wish to insert a flag and say
    Insert a flag there.
  • The map could be used for automatic navigation,
    where the user could go to a place by selecting
    it from the map. The user could point to the area
    that they wish to go to, and say Go there. The
    user will then go to that location In the virtual
    environment.
Write a Comment
User Comments (0)
About PowerShow.com