MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology

Description:

Dane Barney, Anna Cavender, Neva Cherniavsky, Jessica DeWitt, Rahul Vanam ... Come see our poster and talk to Anna and Neva in room 615 ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 29
Provided by: ann45
Category:

less

Transcript and Presenter's Notes

Title: MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology


1
MobileASL Intelligibility of Sign Language as
Constrained by Mobile Phone Technology
  • Richard Ladner, Eve Riskin
  • Dane Barney, Anna Cavender, Neva Cherniavsky,
    Jessica DeWitt, Rahul Vanam
  • University of Washington
  • Sheila Hemami, Frank Ciaramello
  • Cornell University

2
ASL
  • ASL is the preferred language for over 1,000,000
    Deaf people in the U.S.
  • ASL is not a code for English
  • Signs usually occur within the sign-box
  • Composed of location, orientation, shape of hands
    and arms facial expressions
  • Usually uses 2 hands, but one-handed signing not
    uncommon

3
Our goal
  • ASL communication using video cell phones over
    current U.S. cell phone network

Challenges
  • Limited network bandwidth
  • Limited processing power on cell phones

4
Cell Phone Network Constraints
  • MobileASL is about fair access to the current
    network
  • As soon as possible, no special accommodations
  • Not geographically limited
  • Lower bitrate more accessible
  • 3G 3rd Generation
  • Special service, perhaps more expensive
  • Not yet widespread
  • Will still have congestion
  • Low bit rate goal
  • GPRS (General Packet Radio Service)
  • Ranges from 30kbps to 80kbps (download)
  • Perhaps half that for upload

5
Codec Used x264
  • Open source implementation of H.264 standard
  • Doubles compression ratio over MPEG2
  • x264 offers faster encoding
  • Off-the-shelf H.264 decoder can be used

6
Outline
  • MobileASL Focus Group
  • Eyetracking Motivation
  • User Studies
  • Encoder Complexity
  • Rate, Distortion, Complexity Optimization
  • Activity Recognition
  • Spatial Compression

7
MobileASL Focus Group
  • 4 Deaf people, mid-20s to mid-40s,
  • 1 hour
  • Open ended questions
  • Physical Setup
  • Camera, distance,
  • Features
  • Compatibility, text,
  • Privacy Concerns
  • ASL is a visual language
  • Scenarios
  • Lighting, driving, relay services,

Anna Cavender, cavender_at_cs.washington.edu
8
Implications of Focus Group
  • I dont foresee any limitations. I would use
    the phone anywhere the grocery store, the bus,
    the car, a restaurant, anywhere!
  • There is a need within the Deaf Community for
    mobile ASL conversations
  • Existing video phone technology (with minor
    modifications) would be usable

Anna Cavender, cavender_at_cs.washington.edu
9
Outline
  • MobileASL Focus Group
  • Eyetracking Motivation
  • User Studies
  • Encoder Complexity
  • Rate, Distortion, Complexity Optimization
  • Activity Recognition
  • Spatial Compression

10
Eyetracking Studies
  • Participants watched ASL videos while eye
    movements were tracked
  • Important regions of the video could be encoded
    differently

Muir et al. (2005) and Agrafiotis et al. (2003)
Anna Cavender, cavender_at_cs.washington.edu
11
Eyetracking Results
  • 95 of eye movements within 2 degrees visual
    angle of the signers face (demo)
  • Implications Face region of video is most
    visually important
  • Detailed grammar in face requires foveal vision
  • Hands and arms can be viewed in peripheral vision

Anna Cavender, cavender_at_cs.washington.edu
12
Outline
  • MobileASL Focus Group
  • Eyetracking Motivation
  • User Studies
  • Encoder Complexity
  • Rate, Distortion, Complexity Optimization
  • Activity Recognition
  • Spatial Compression

13
Video Phone Study ROI
  • Varied quality in fixed-sized region around the
    face

2x quality in face
4x quality in face
Anna Cavender, cavender_at_cs.washington.edu
14
Video Phone Study FPS
  • Varied frame rate 10 fps and 15 fps
  • For a given bit rate
  • Fewer frames more bits per frame

Anna Cavender, cavender_at_cs.washington.edu
15
Implications of results
  • A mid-range ROI was preferred
  • Optimal tradeoff between clarity in face and
    distortion in rest of sign-box
  • Lower frame rate preferred
  • Optimal tradeoff between clarity of frames and
    number of frames per second
  • Results independent of bit rate

Anna Cavender, cavender_at_cs.washington.edu
16
Outline
  • MobileASL Focus Group
  • Eyetracking Motivation
  • User Studies
  • Encoder Complexity
  • Rate, Distortion, Complexity Optimization
  • Activity Recognition
  • Spatial Compression

17
Encoder Complexity
  • Processors on cell phones are limited
  • Encoding videos is computationally intensive
  • Encoder needs to be real-time
  • Simpler encoding parameters are available, but
    quality suffers
  • Faster encoding less quality

Dane Barney, dane_at_cs.washington.edu
18
Rate, distortion and complexity optimization
Distortion
Input parameters
H.264 encoder
Raw video
Encoding time
Goal Best possible quality for least encoding
time at a given bitrate
Rahul Vanam, rahulv_at_u.washington.edu
19
Rate, distortion and complexity comparison
Testing with 10 ASL videos at 30 kb/s
Max difference in PSNR 0.68 dB Amount of time
reduction 99
Rahul Vanam, rahulv_at_u.washington.edu
20
Encoder Complexity Study - Speed
21
Implications of Results
  • We can increase the encoding time from 2 to 7.3
    frames per second without significantly affecting
    intelligibility!
  • We are closer to our goal of 10 fps.

Rahul Vanam, rahulv_at_u.washington.edu and Anna
Cavender, cavender_at_cs.washington.edu
22
Outline
  • MobileASL Focus Group
  • Eyetracking Motivation
  • User Studies
  • Encoder Complexity
  • Rate, Distortion, Complexity Optimization
  • Activity Recognition
  • Spatial Compression

23
Activity Recognition
  • Motivation
  • Finger spelling requires a higher bit rate and/or
    frame rate for intelligibility than signing
  • Dont waste bits when the user is not signing
    (i.e., is listening as opposed to speaking)
  • Goal
  • Recognize these three states finger spelling,
    signing, not signing
  • Perform recognition in real time

Neva Cherniavsky, nchernia_at_cs.washington.edu
24
Solution
  • Use H.264 motion vectors as input to system
  • Use probabilistic techniques to automatically
    recognize activity
  • Hidden Markov Models
  • Kalman filters

Neva Cherniavsky, nchernia_at_cs.washington.edu
25
Outline
  • MobileASL Focus Group
  • Eyetracking Motivation
  • User Studies
  • Encoder Complexity
  • Rate, Distortion, Complexity Optimization
  • Activity Recognition
  • Spatial Compression

26
Spatial Compression
  • Dynamic Region-of-Interest
  • Skin detection algorithms
  • New video quality metric based on ASL
    intelligibility
  • Traditional video quality measures, such as PSNR,
    are not good measures of intelligibility

Frank Ciaramello, fmc3_at_cornell.edu
27
Conclusion
  • Video compression for ASL communication using
    video cell phones over current U.S. cell phone
    network

Challenges
  • Limited network bandwidth
  • Limited processing power on cell phones

28
MobileASL
  • Intelligibility of Sign Language as Constrained
    by Mobile Phone Technology

http//www.cs.washington.edu/research/MobileASL
Come see our poster and talk to Anna and Neva
in room 615
Supported by NSF CCF-0514353 and an NSF graduate
fellowship
Write a Comment
User Comments (0)
About PowerShow.com