MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology

About This Presentation

Title:

MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology

Description:

Dane Barney, Anna Cavender, Neva Cherniavsky, Jessica DeWitt, Rahul Vanam ... Come see our poster and talk to Anna and Neva in room 615 ... – PowerPoint PPT presentation

Number of Views:197

Avg rating:3.0/5.0

Slides: 29

Provided by: ann45

Learn more at: https://www.cs.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology

1
MobileASL Intelligibility of Sign Language as
Constrained by Mobile Phone Technology

Richard Ladner, Eve Riskin
Dane Barney, Anna Cavender, Neva Cherniavsky,
Jessica DeWitt, Rahul Vanam
University of Washington
Sheila Hemami, Frank Ciaramello
Cornell University

2
ASL

ASL is the preferred language for over 1,000,000
Deaf people in the U.S.
ASL is not a code for English
Signs usually occur within the sign-box
Composed of location, orientation, shape of hands
and arms facial expressions
Usually uses 2 hands, but one-handed signing not
uncommon

3
Our goal

ASL communication using video cell phones over
current U.S. cell phone network

Challenges

Limited network bandwidth
Limited processing power on cell phones

4
Cell Phone Network Constraints

MobileASL is about fair access to the current
network
As soon as possible, no special accommodations
Not geographically limited
Lower bitrate more accessible
3G 3rd Generation
Special service, perhaps more expensive
Not yet widespread
Will still have congestion
Low bit rate goal
GPRS (General Packet Radio Service)
Ranges from 30kbps to 80kbps (download)
Perhaps half that for upload

5
Codec Used x264

Open source implementation of H.264 standard
Doubles compression ratio over MPEG2
x264 offers faster encoding
Off-the-shelf H.264 decoder can be used

6
Outline

MobileASL Focus Group
Eyetracking Motivation
User Studies
Encoder Complexity
Rate, Distortion, Complexity Optimization
Activity Recognition
Spatial Compression

7
MobileASL Focus Group

4 Deaf people, mid-20s to mid-40s,
1 hour
Open ended questions
Physical Setup
Camera, distance,
Features
Compatibility, text,
Privacy Concerns
ASL is a visual language
Scenarios
Lighting, driving, relay services,

Anna Cavender, cavender_at_cs.washington.edu
8
Implications of Focus Group

I dont foresee any limitations. I would use
the phone anywhere the grocery store, the bus,
the car, a restaurant, anywhere!
There is a need within the Deaf Community for
mobile ASL conversations
Existing video phone technology (with minor
modifications) would be usable

Anna Cavender, cavender_at_cs.washington.edu
9
Outline

MobileASL Focus Group
Eyetracking Motivation
User Studies
Encoder Complexity
Rate, Distortion, Complexity Optimization
Activity Recognition
Spatial Compression

10
Eyetracking Studies

Participants watched ASL videos while eye
movements were tracked
Important regions of the video could be encoded
differently

Muir et al. (2005) and Agrafiotis et al. (2003)
Anna Cavender, cavender_at_cs.washington.edu
11
Eyetracking Results

95 of eye movements within 2 degrees visual
angle of the signers face (demo)
Implications Face region of video is most
visually important
Detailed grammar in face requires foveal vision
Hands and arms can be viewed in peripheral vision

Anna Cavender, cavender_at_cs.washington.edu
12
Outline

MobileASL Focus Group
Eyetracking Motivation
User Studies
Encoder Complexity
Rate, Distortion, Complexity Optimization
Activity Recognition
Spatial Compression

13
Video Phone Study ROI

Varied quality in fixed-sized region around the
face

2x quality in face
4x quality in face
Anna Cavender, cavender_at_cs.washington.edu
14
Video Phone Study FPS

Varied frame rate 10 fps and 15 fps
For a given bit rate
Fewer frames more bits per frame

Anna Cavender, cavender_at_cs.washington.edu
15
Implications of results

A mid-range ROI was preferred
Optimal tradeoff between clarity in face and
distortion in rest of sign-box
Lower frame rate preferred
Optimal tradeoff between clarity of frames and
number of frames per second
Results independent of bit rate

Anna Cavender, cavender_at_cs.washington.edu
16
Outline

MobileASL Focus Group
Eyetracking Motivation
User Studies
Encoder Complexity
Rate, Distortion, Complexity Optimization
Activity Recognition
Spatial Compression

17
Encoder Complexity

Processors on cell phones are limited
Encoding videos is computationally intensive
Encoder needs to be real-time
Simpler encoding parameters are available, but
quality suffers
Faster encoding less quality

Dane Barney, dane_at_cs.washington.edu
18
Rate, distortion and complexity optimization
Distortion
Input parameters
H.264 encoder
Raw video
Encoding time
Goal Best possible quality for least encoding
time at a given bitrate
Rahul Vanam, rahulv_at_u.washington.edu
19
Rate, distortion and complexity comparison
Testing with 10 ASL videos at 30 kb/s
Max difference in PSNR 0.68 dB Amount of time
reduction 99
Rahul Vanam, rahulv_at_u.washington.edu
20
Encoder Complexity Study - Speed
21
Implications of Results

We can increase the encoding time from 2 to 7.3
frames per second without significantly affecting
intelligibility!
We are closer to our goal of 10 fps.

Rahul Vanam, rahulv_at_u.washington.edu and Anna
Cavender, cavender_at_cs.washington.edu
22
Outline

MobileASL Focus Group
Eyetracking Motivation
User Studies
Encoder Complexity
Rate, Distortion, Complexity Optimization
Activity Recognition
Spatial Compression

23
Activity Recognition

Motivation
Finger spelling requires a higher bit rate and/or
frame rate for intelligibility than signing
Dont waste bits when the user is not signing
(i.e., is listening as opposed to speaking)
Goal
Recognize these three states finger spelling,
signing, not signing
Perform recognition in real time

Neva Cherniavsky, nchernia_at_cs.washington.edu
24
Solution

Use H.264 motion vectors as input to system
Use probabilistic techniques to automatically
recognize activity
Hidden Markov Models
Kalman filters

Neva Cherniavsky, nchernia_at_cs.washington.edu
25
Outline

MobileASL Focus Group
Eyetracking Motivation
User Studies
Encoder Complexity
Rate, Distortion, Complexity Optimization
Activity Recognition
Spatial Compression

26
Spatial Compression

Dynamic Region-of-Interest
Skin detection algorithms
New video quality metric based on ASL
intelligibility
Traditional video quality measures, such as PSNR,
are not good measures of intelligibility

Frank Ciaramello, fmc3_at_cornell.edu
27
Conclusion

Video compression for ASL communication using
video cell phones over current U.S. cell phone
network

Challenges

Limited network bandwidth
Limited processing power on cell phones

28
MobileASL

Intelligibility of Sign Language as Constrained
by Mobile Phone Technology

http//www.cs.washington.edu/research/MobileASL
Come see our poster and talk to Anna and Neva
in room 615
Supported by NSF CCF-0514353 and an NSF graduate
fellowship

Write a Comment

User Comments (0)

About PowerShow.com

MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology - PowerPoint PPT Presentation

MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology

Dane Barney, Anna Cavender, Neva Cherniavsky, Jessica DeWitt, Rahul Vanam ... Come see our poster and talk to Anna and Neva in room 615 ... – PowerPoint PPT presentation