UC Berkeley

About This Presentation

Transcript and Presenter's Notes

Title: UC Berkeley

1
Recognizing Action at a Distance

A.A. Efros, A.C. Berg, G. Mori, J. Malik
UC Berkeley

2
Looking at People
Far field
Near field

3-pixel man
Blob tracking
vast surveillance literature

300-pixel man
Limb tracking
e.g. Yacoob Black, Rao Shah, etc.

3
Medium-field Recognition
4
Appearance vs. Motion
5
Goals

Recognize human actions at a distance
Low resolution, noisy data
Moving camera, occlusions
Wide range of actions (including non-periodic)

6
Our Approach

Motion-based approach
Non-parametric use large amount of data
Classify a novel motion by finding the most
similar motion from the training set
Related Work
Periodicity analysis
Polana Nelson Seitz Dyer Bobick et al
Cutler Davis Collins et al.
Model-free
Temporal Templates Bobick Davis
Orientation histograms Freeman et al Zelnik
Irani
Using MoCap data Zhao Nevatia, Ramanan
Forsyth

7
Gathering action data

Tracking
Simple correlation-based tracker
User-initialized

8
Figure-centric Representation

Stabilized spatio-temporal volume
No translation information
All motion caused by persons limbs
Good news indifferent to camera motion
Bad news hard!
Good test to see if actions, not just
translation, are being captured

9
Remembrance of Things Past

Explain novel motion sequence by matching to
previously seen video clips
For each frame, match based on some temporal
extent

input sequence
Challenge how to compare motions?
10
How to describe motion?

Appearance
Not preserved across different clothing
Gradients (spatial, temporal)
same (e.g. contrast reversal)
Edges/Silhouettes
Too unreliable
Optical flow
Explicitly encodes motion
Least affected by appearance
but too noisy

11
Spatial Motion Descriptor
Image frame
Optical flow
12
Spatio-temporal Motion Descriptor

Sequence A
S

Sequence B
t
13
Football Actions matching
Input Sequence
Matched Frames
input
matched
14
Football Actions classification
10 actions 4500 total frames 13-frame motion
descriptor
15
Classifying Ballet Actions
16 Actions 24800 total frames 51-frame motion
descriptor. Men used to classify women and vice
versa.
16
Classifying Tennis Actions
6 actions 4600 frames 7-frame motion
descriptor Woman player used as training, man as
testing.
17
Classifying Tennis

Red bars show classification results

18
Querying the Database
input sequence
database
19
2D Skeleton Transfer

We annotate database with 2D joint positions
After matching, transfer data to novel sequence
Ajust the match for best fit

Input sequence
Transferred 2D skeletons
20
3D Skeleton Transfer

We populate database with rendered stick figures
from 3D Motion Capture data
Matching as before, we get 3D joint positions
(kind of)!

Input sequence
Transferred 3D skeletons
21
Do as I Do Motion Synthesis
input sequence
synthetic sequence

Matching two things
Motion similarity across sequences
Appearance similarity within sequence (like
VideoTextures)
Dynamic Programming

22
Do as I Do
Source Motion
Source Appearance
3400 Frames
Result
23
Do as I Say Synthesis
run walk left swing walk
right jog
run
jog
swing
walk right
walk left
synthetic sequence

Synthesize given action labels
e.g. video game control

24
Do as I Say

Red box shows when constraint is applied

25
Actor Replacement
SHOW VIDEO (GregWorldCup.avi, DivX)
26
Conclusions

In medium field action is about motion
What we propose
A way of matching motions at coarse scale
What we get out
Action recognition
Skeleton transfer
Synthesis Do as I Do Do as I say
What we learned?
A lot to be said for the little guy!

Write a Comment

User Comments (0)

About PowerShow.com

UC Berkeley PowerPoint PPT Presentation