Title: MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology
1MobileASL Intelligibility of Sign Language as
Constrained by Mobile Phone Technology
- Richard Ladner, Eve Riskin
- Dane Barney, Anna Cavender, Neva Cherniavsky,
Jessica DeWitt, Rahul Vanam - University of Washington
- Sheila Hemami, Frank Ciaramello
- Cornell University
2ASL
- ASL is the preferred language for over 1,000,000
Deaf people in the U.S. - ASL is not a code for English
- Signs usually occur within the sign-box
- Composed of location, orientation, shape of hands
and arms facial expressions - Usually uses 2 hands, but one-handed signing not
uncommon
3Our goal
- ASL communication using video cell phones over
current U.S. cell phone network
Challenges
- Limited network bandwidth
- Limited processing power on cell phones
4Cell Phone Network Constraints
- MobileASL is about fair access to the current
network - As soon as possible, no special accommodations
- Not geographically limited
- Lower bitrate more accessible
- 3G 3rd Generation
- Special service, perhaps more expensive
- Not yet widespread
- Will still have congestion
- Low bit rate goal
- GPRS (General Packet Radio Service)
- Ranges from 30kbps to 80kbps (download)
- Perhaps half that for upload
5Codec Used x264
- Open source implementation of H.264 standard
- Doubles compression ratio over MPEG2
- x264 offers faster encoding
- Off-the-shelf H.264 decoder can be used
6Outline
- MobileASL Focus Group
- Eyetracking Motivation
- User Studies
- Encoder Complexity
- Rate, Distortion, Complexity Optimization
- Activity Recognition
- Spatial Compression
7MobileASL Focus Group
- 4 Deaf people, mid-20s to mid-40s,
- 1 hour
- Open ended questions
- Physical Setup
- Camera, distance,
- Features
- Compatibility, text,
- Privacy Concerns
- ASL is a visual language
- Scenarios
- Lighting, driving, relay services,
Anna Cavender, cavender_at_cs.washington.edu
8Implications of Focus Group
- I dont foresee any limitations. I would use
the phone anywhere the grocery store, the bus,
the car, a restaurant, anywhere! - There is a need within the Deaf Community for
mobile ASL conversations - Existing video phone technology (with minor
modifications) would be usable
Anna Cavender, cavender_at_cs.washington.edu
9Outline
- MobileASL Focus Group
- Eyetracking Motivation
- User Studies
- Encoder Complexity
- Rate, Distortion, Complexity Optimization
- Activity Recognition
- Spatial Compression
10Eyetracking Studies
- Participants watched ASL videos while eye
movements were tracked - Important regions of the video could be encoded
differently
Muir et al. (2005) and Agrafiotis et al. (2003)
Anna Cavender, cavender_at_cs.washington.edu
11Eyetracking Results
- 95 of eye movements within 2 degrees visual
angle of the signers face (demo) - Implications Face region of video is most
visually important - Detailed grammar in face requires foveal vision
- Hands and arms can be viewed in peripheral vision
Anna Cavender, cavender_at_cs.washington.edu
12Outline
- MobileASL Focus Group
- Eyetracking Motivation
- User Studies
- Encoder Complexity
- Rate, Distortion, Complexity Optimization
- Activity Recognition
- Spatial Compression
13Video Phone Study ROI
- Varied quality in fixed-sized region around the
face
2x quality in face
4x quality in face
Anna Cavender, cavender_at_cs.washington.edu
14Video Phone Study FPS
- Varied frame rate 10 fps and 15 fps
- For a given bit rate
- Fewer frames more bits per frame
Anna Cavender, cavender_at_cs.washington.edu
15Implications of results
- A mid-range ROI was preferred
- Optimal tradeoff between clarity in face and
distortion in rest of sign-box - Lower frame rate preferred
- Optimal tradeoff between clarity of frames and
number of frames per second - Results independent of bit rate
Anna Cavender, cavender_at_cs.washington.edu
16Outline
- MobileASL Focus Group
- Eyetracking Motivation
- User Studies
- Encoder Complexity
- Rate, Distortion, Complexity Optimization
- Activity Recognition
- Spatial Compression
17Encoder Complexity
- Processors on cell phones are limited
- Encoding videos is computationally intensive
- Encoder needs to be real-time
- Simpler encoding parameters are available, but
quality suffers - Faster encoding less quality
Dane Barney, dane_at_cs.washington.edu
18Rate, distortion and complexity optimization
Distortion
Input parameters
H.264 encoder
Raw video
Encoding time
Goal Best possible quality for least encoding
time at a given bitrate
Rahul Vanam, rahulv_at_u.washington.edu
19Rate, distortion and complexity comparison
Testing with 10 ASL videos at 30 kb/s
Max difference in PSNR 0.68 dB Amount of time
reduction 99
Rahul Vanam, rahulv_at_u.washington.edu
20Encoder Complexity Study - Speed
21Implications of Results
- We can increase the encoding time from 2 to 7.3
frames per second without significantly affecting
intelligibility! - We are closer to our goal of 10 fps.
Rahul Vanam, rahulv_at_u.washington.edu and Anna
Cavender, cavender_at_cs.washington.edu
22Outline
- MobileASL Focus Group
- Eyetracking Motivation
- User Studies
- Encoder Complexity
- Rate, Distortion, Complexity Optimization
- Activity Recognition
- Spatial Compression
23Activity Recognition
- Motivation
- Finger spelling requires a higher bit rate and/or
frame rate for intelligibility than signing - Dont waste bits when the user is not signing
(i.e., is listening as opposed to speaking) - Goal
- Recognize these three states finger spelling,
signing, not signing - Perform recognition in real time
Neva Cherniavsky, nchernia_at_cs.washington.edu
24Solution
- Use H.264 motion vectors as input to system
- Use probabilistic techniques to automatically
recognize activity - Hidden Markov Models
- Kalman filters
Neva Cherniavsky, nchernia_at_cs.washington.edu
25Outline
- MobileASL Focus Group
- Eyetracking Motivation
- User Studies
- Encoder Complexity
- Rate, Distortion, Complexity Optimization
- Activity Recognition
- Spatial Compression
26Spatial Compression
- Dynamic Region-of-Interest
- Skin detection algorithms
- New video quality metric based on ASL
intelligibility - Traditional video quality measures, such as PSNR,
are not good measures of intelligibility
Frank Ciaramello, fmc3_at_cornell.edu
27Conclusion
- Video compression for ASL communication using
video cell phones over current U.S. cell phone
network
Challenges
- Limited network bandwidth
- Limited processing power on cell phones
28MobileASL
- Intelligibility of Sign Language as Constrained
by Mobile Phone Technology
http//www.cs.washington.edu/research/MobileASL
Come see our poster and talk to Anna and Neva
in room 615
Supported by NSF CCF-0514353 and an NSF graduate
fellowship