AudioSense: A Simulation - PowerPoint PPT Presentation

About This Presentation
Title:

AudioSense: A Simulation

Description:

... sound. Simulated by an object's interaction property ... Sound properties ... section to ignore certain sound properties. Volume/amplitude ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 46
Provided by: Allan121
Learn more at: https://www.evl.uic.edu
Category:

less

Transcript and Presenter's Notes

Title: AudioSense: A Simulation


1
AudioSense A Simulation
  • Progress Report
  • EECS 578
  • Allan Spale

2
Background of Concept
  • Taking the train home and listening to the sounds
    around me
  • How would deaf people be able to perceive the
    environment?
  • What assistance would be useful in helping people
    adapt to the environment?

3
Project Goals
  • Develop a CAVE application that will simulate
    aspects of audio perception
  • Display the text of speaking objects in space
  • Display the description text of non-speaking
    objects in space
  • Display visual cues of multiple sound sources
  • Allow the user to selectively listen to different
    sound sources

4
Topics in the Project
  • Augmented reality
  • Illustrated by objects in a virtual environment
  • 3D sound
  • Simulated by an objects interaction property
  • Speech recognition
  • Simulated by text near the object
  • Will remain static during simulation
  • Virtual reality / CAVE
  • Method for presenting the project
  • Not discussed in this presentation

5
Augmented Reality
  • Definition
  • provides means of intuitive information
    presentation for enhancing situational awareness
    and perception by exploiting the natural and
    familiar human interaction modalities with the
    environment.
  • -- Behringer et al. 1999

6
Augmented RealityDevice Diagnostics
  • Architecture components aid in performing a
    diagnostic tests
  • Computer vision used to track the object in space
  • Speech recognition (command-style) used for user
    interface
  • 3D graphics (wireframe and shaded objects) to
    illustrate an objects internal structure
  • 3D audio emits from an item that allows the user
    to find the location within the object

7
Augmented Reality
  • Device
  • diagnostics

8
Augmented Reality
  • Device
  • diagnostics

9
Augmented RealityDevice Diagnostics
  • Summary
  • Providing 3D graphics and sound helps the user
    better diagnose items
  • Might also want text information on the display
  • Tracking methodology still needs improvement
  • Speech recognition of commands could be expanded
    to include annotation
  • Utilize IP connection to distribute computing
    power from the wearable computer

10
Augmented RealityMultimedia Presentations in
the Real World
  • Mobile Augmented Reality System (MARS)
  • Tracking performed by Global Positioning System
    (GPS) and another device
  • Display is a see-through and head-mounted
  • Interaction based on location and gaze
  • Additional interaction provided by hand-held
    device

11
Augmented RealityMultimedia Presentations in
the Real World
  • System overview
  • Selection occurs through proximity or gaze
    direction followed by a menu system
  • Information presentation
  • Video (on hand-held deivce) or images accompanied
    by narration (on head-mounted display)
  • Virtual reality (for places that are not able to
    be visited)
  • Augmented reality (illustrate where items were)

12
Augmented Reality
  • Multimedia
  • presentations
  • in the
  • real world

13
Augmented Reality
  • Multimedia
  • presentations
  • in the
  • real world

14
Augmented RealityMultimedia Presentations in
the Real World
  • Conclusions
  • Current system is too heavy and visually
    undesirable
  • Might want to make hand-held display a palm-top
    computer
  • Permit authoring of content
  • Create a collaboration between indoor and outdoor
    system users

15
3D SoundAudio-only Web Browsing
  • Must overcome difficulties with utilizing 3D
    sound
  • X axis sounds identifiable, Y and Z axes sounds
    are not identifiable
  • Need exists to create structure in audio rendered
    web pages
  • Document reading appears spatially from left to
    right in an adequate amount of time
  • Utilize earcons and selective listening
  • Provide meta-content for quick document overview

16
3D Sound
  • Audio-only
  • Web browsing

17
3D SoundAudio-only Web Browsing
  • Future work
  • Improve link information that extends beyond web
    page title and time duration
  • Benefits of auditory browsing aids
  • Improved comprehension
  • Better browsing experience for visually impaired
    and sited users

18
3D SoundInteractive 3D Sound Hyperstories
  • Hyperstories
  • Story occurring in a hypermedia context
  • Forms a nested context model
  • World objects can be passive, active, static, or
    dynamic

19
3D SoundInteractive 3D Sound Hyperstories
  • AudioDoom
  • Like computer game of Doom, but different
  • All world objects represented with sound
  • Sound represented in a volume almost parallel
    to the users eyes
  • User interacts with the world objects using an
    ultrasonic joystick with haptic functionality
  • Organized by partitioned spaces

20
3D Sound
  • Interactive
  • 3D sound
  • hyperstories

21
3D Sound
  • Interactive
  • 3D sound
  • hyperstories

22
3D SoundInteractive 3D Sound Hyperstories
  • Despite elapsed time between sessions, users
    remembered the world structure well
  • Authors illustrate the possibility of
    rendering a spatial navigable structure by
    using only spatialized sound.
  • Opens the possibilities for educational software
    for the blind within the hyperstory context

23
Speech RecognitionMedia retrieval and indexing
  • Problems with media retrieval and indexing
  • Lots of media being generated too costly and
    time-consuming to index manually
  • Ideal system design
  • Speaker independence
  • Noisy-recording environment capability
  • Open vocabulary

24
Speech RecognitionMedia retrieval and indexing
  • Using Hidden Markov Models the system achieved
    the results in Table 1
  • To improve results, using string matching
    techniques will help overcome recognition stream
    errors

25
Speech RecognitionMedia retrieval and indexing
  • String matching strategy
  • Develop the search term
  • Divide the recognition stream into a set of
    sub-strings
  • Implement an initial filter process
  • Identify edit operations for remaining
    sub-strings in the recognition stream
  • Calculate the similarity measure for the search
    term and matched strings

26
Speech Recognition
  • Media retrieval and indexing

27
Speech RecognitionMedia retrieval and indexing
  • Results of implementing the string matching
    strategy
  • Permitting more operations improved recall
    performance but degraded precision performance
  • Despite low performance rates, a system
    performing these tasks will be commercially
    viable

28
Speech RecognitionContinuous Speech Recognition
  • Problems with continuous speech recognition
  • Has unpredictable errors that are unlike other
    predictable user input errors
  • The absence of context aids makes recognition
    difficult for the computer
  • Speech user interfaces are still in a
    developmental stage and will improve over time

29
Speech RecognitionContinuous Speech Recognition
  • Two modes
  • Keyboard-mouse and speech
  • Two tasks
  • Composition and transcription
  • Results
  • Keyboard-mouse tasks were faster and more
    efficient than speech tasks

30
Speech RecognitionContinuous Speech Recognition
  • Correction methods
  • Two general correction methods
  • Inline correction, separate proofreading
  • Speech inline correction methods
  • Select text and reenter, delete text and reenter,
    use correction box, correct problems during
    correction

31
Speech Recognition
  • Continuous speech recognition

32
Speech Recognition
  • Continuous speech recognition

33
Speech RecognitionContinuous Speech Recognition
  • Discussion of errors
  • Inline correction is preferred by users
    regardless of modality
  • Proofreading had increased usage with speech
    because of unpredictable system errors
  • Keyboard-mouse involved deleting and reentering
    the word
  • Despite ability to correct inline with speech,
    errors typically occurred during correction
  • Dialog boxes used as a last resort

34
Speech RecognitionContinuous Speech Recognition
  • Discussion of results
  • Users still do not feel that they can be
    productive using a speech interface for
    continuous recognition
  • More studies must be conducted to improve the
    speech interface for users

35
Project Implementation
  • Write a CAVE application using YG
  • 3D objects simulate sound producing objects
  • No speech recognition will occur since predefined
    text will be attached to each object
  • Objects will move in space
  • Objects will not always produce sound
  • Objects may not be in the line of sight

36
Project Implementation
  • Write a CAVE application using YG
  • Sound location
  • Show directional vectors for each object that
    emits a sound
  • Longer the vector, the farther away the object is
    from the user
  • X, Y will use arrowheads, Z will use dot / "X"
    symbol
  • Dot is for an object behind the user, "X" symbol
    is for an object in front of the user
  • Only visible if sound can be heard by the user

37
Project Implementation
  • Write a CAVE application using YG
  • Sound properties
  • Represented using a square
  • Size represents volume/amplitude (probably will
    not consider distance that affects volume)
  • Color represents pitch/frequency
  • Only visible if sound can be heard by the user

38
Project Implementation
  • Write a CAVE application using YG
  • Simulate cocktail party effect
  • Allow user to enlarge text from an object that is
    far away
  • Provide configuration section to ignore certain
    sound properties
  • Volume/amplitude
  • Pitch/frequency

39
Project Tasks Completed
  • Basic project design
  • Have read some documentation about YG
  • Tested functionality of YG in my account
  • Established contacts with people that have
    programmed CAVE applications using YG
  • Will provide 3D models and code that demonstrates
    some functionalities of YG features upon request
  • Will help with answering questions and
    demonstrating and explaining features of YG

40
Project Timeline
  • Week of March 25
  • Practice modifying existing YG programs
  • Collect needed 3D models for program
  • Week of April 1
  • Code objects and their accompanying text
  • Implement movement patterns for objects

41
Project Timeline
  • Week of April 8
  • Attempt to turn on and off the sound of objects
  • Work with interaction properties of objects that
    will determine visualizing sound properties
  • Week of April 15
  • Continue working on visualizing sound properties
  • Work on enlarging/reducing text of an object

42
Project Timeline
  • Week of April 22
  • Create simple sound filtering menus
  • Test program in CAVE
  • EXAM WEEK Week of April 29
  • Practice presentation
  • Present project

43
Bibliography
  • Behringer, R., Chen, S., Sundareswaran, V., Wang,
    K., and Vassiliou, M. (1998). A Novel Interface
    for Device Diagnostics Using Speech Recognition,
    Augmented Reality Visualization, and 3D Audio
    Auralization, in Proceedings of IEEE
    International Conference on Multimedia Computing
    and Systems Vol I, Institute of Electrical and
    Electronics Engineers, Inc., 427-432.
  • Goose, S. and Moller, C. (1999). A 3D Audio
    Only Interactive Web Browser Using
    Spatialization to Convey Hypermedia Document
    Structure, in Proceedings of the seventh ACM
    international conference on Multimedia (Orlando
    FL, October 1999), ACM Press, 363-371.

44
Bibliography
  • Hollerer, T., Feiner, S., and Pavlik, J. (1998).
    Situated Documentaries Embedding Multimedia
    Presentations in the Real World, in Proceedings
    of the 3rd International Symposium on Wearable
    Computers (October 1999, San Francisco CA),
    Institute of Electrical and Electronics
    Engineers, Inc., 1-8.
  • Karat, C.-M., Halverson, C., Horn, D., and Karat,
    J. (1999). Patterns of Entry and Correction in
    Large Vocabulary Continuous Speech Recognition
    Systems, in CHI '99, Proceeding of the CHI 99
    conference on Human factors in computing systems
    the CHI is the limit (Pittsburgh PA, May 1999),
    ACM Press, 568-575.

45
Bibliography
  • Lumbreras, M., Sanchez, J. (1999). Interactive 3D
    Sound Hyperstories for Blind Children, in CHI
    '99, Proceeding of the CHI 99 conference on Human
    factors in computing systems the CHI is the
    limit (Pittsburgh PA, May 1999), ACM Press,
    318-325.
  • Robetison, J., Wong, W. Y., Chung, C., Kim, D. K.
    (1998). Automatic Speech Recognition for
    Generalised Time Based Media Retrieval and
    Indexing, in Proceedings of the sixth ACM
    international conference on Multimedia (Bristol
    UK, September 1998), ACM Press, 241-246.
Write a Comment
User Comments (0)
About PowerShow.com