# Recognizing and Tracking Human Action - PowerPoint PPT Presentation

Title: Recognizing and Tracking Human Action

1
Recognizing and Tracking Human Action
• Josephine Sullivan and Stefan Carlsson

2
Define Tracking
3
Traditional tracking
• Kalman Filters
• Condensation
• HMM
• Matching articulated 3d models
• Similarities?
• Problems?

4
New approach
• What is the difference between tracking and
recognition?
• Assume Pose recognition and activity recognition
are equivalent.
• Now track activity by repeating recognition of
key frames

5
Discussion reasons for previous approach
• Why the distinction between tracking and
recognition?
• Applications?
• Projectile tracking
• Motion capture

6
Object descriptors
• Embedding global data in local descriptors
• Order Structure
• Shape context

7
Order Structure
• Problem find correspondence between deformed
shapes
• Solution
• Sample points on contour
• Describe shape using order structure
• Order of points and intersections of tangent lines

8
Order Structure
• Many transformations preserve order structure
• Superset of Affine and Projective transformations
• Encodes perceptual similarity
• Encodes properties of point sets, lines, and
combinations of points and lines.
• Descriptor for Point sets - orientation
• Set a,b,c has orientation if traversing them
in order means anti-clockwise rotation

9
Order Structure
• Descriptor for Sets of lines
• Uses points and lines are projectively dual
• p - homogeneous coords for a point
• q - oriented homogeneous line coords for line
thru p, then qTp 0
• q (a,1,b) where axyb 0.
• Order type for a set of 3 lines is then

10
Order Structure
• Descriptor for combinations of points and lines
• Oriented coordinates gt every line has a
direction
• Assign a left-right position for every point
w.r.t every line
• Unique order structure for arbitrary set of
points
• Order structure for a set characterized by an
index

qi line pj point
11
Order Structure
• Algorithm
• Voting matrix

12
Order Structure
• Perceptual similarity example human pose

13
Shape Context descriptor
• Sample points from edges in image
• Each points descriptor is a histogram of the
relative coordinates of all other points.

14
Action Recognition using Key Frames
• Deciding images are related
• pai and pbi are coordinates of corresponding
points in images A and B.
• T is class of transformations that define
relation between A and B. (known a priori)
• Matching Distance
• General case
• Using pure translation

15
Action recognition using Key Frames
• 30 second tennis sequence
• Coarse automatic tracking
• Edge detection done on upper half of player
• No deletion of background edges
• Selected a key frame and computed matching score
wrt. each other frame.
• 9 local minima shown, each the start of a
forehand stroke.

16
Action recognition using Key Frames
17
Tracking
• Point transferral
• Each key frame is marked manually
• For each point in key frame, a subset of points
in the image are chosen, and a translation is
estimated.

Point corresponding to PkR in image It
Simple local translation
Point in keyframe R
18
Updating the Voting Matrix
• Extra information to improve accuracy
• Use standard tracker for head and body
localization. (Brand, Shadow Puppetry)
• Set V(piR, pjt) 0 if the points arent close to
the corresponding lines in corresponding matched
head/body quadrangles.

19
Further constraints
• Want to enforce similar arrangement of interior
points in images that are matched to key frames
• Also incorporate intensity around points
• Monte-Carlo smoothing is used to correct outlying
points

20
Tracking using Shape Context
• Mori Malik
• Very similar technique, using shape context
descriptor
• Very clear that frames are processed
independently
• Tested on standard data

21
Tracking w/Shape Context Movie
22
Discussion Questions
• Results - how effective?
• Effect of rate of motion?
• Efficiency of closed loop system?
• No need for background subtraction?
• Flexibility to multiple actions?
• Do they give a specific order to key frames?
• Is the coarse tracking too simple?
• What about poses facing away from camera?
View by Category
About This Presentation
Title:

## Recognizing and Tracking Human Action

Description:

### What is the difference between tracking and recognition? ... Problem: find correspondence between deformed shapes. Solution. Sample points on contour ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 23
Provided by: chenand
Learn more at: https://cseweb.ucsd.edu
Category:
Tags:
User Comments (0)
Transcript and Presenter's Notes

Title: Recognizing and Tracking Human Action

1
Recognizing and Tracking Human Action
• Josephine Sullivan and Stefan Carlsson

2
Define Tracking
3
Traditional tracking
• Kalman Filters
• Condensation
• HMM
• Matching articulated 3d models
• Similarities?
• Problems?

4
New approach
• What is the difference between tracking and
recognition?
• Assume Pose recognition and activity recognition
are equivalent.
• Now track activity by repeating recognition of
key frames

5
Discussion reasons for previous approach
• Why the distinction between tracking and
recognition?
• Applications?
• Projectile tracking
• Motion capture

6
Object descriptors
• Embedding global data in local descriptors
• Order Structure
• Shape context

7
Order Structure
• Problem find correspondence between deformed
shapes
• Solution
• Sample points on contour
• Describe shape using order structure
• Order of points and intersections of tangent lines

8
Order Structure
• Many transformations preserve order structure
• Superset of Affine and Projective transformations
• Encodes perceptual similarity
• Encodes properties of point sets, lines, and
combinations of points and lines.
• Descriptor for Point sets - orientation
• Set a,b,c has orientation if traversing them
in order means anti-clockwise rotation

9
Order Structure
• Descriptor for Sets of lines
• Uses points and lines are projectively dual
• p - homogeneous coords for a point
• q - oriented homogeneous line coords for line
thru p, then qTp 0
• q (a,1,b) where axyb 0.
• Order type for a set of 3 lines is then

10
Order Structure
• Descriptor for combinations of points and lines
• Oriented coordinates gt every line has a
direction
• Assign a left-right position for every point
w.r.t every line
• Unique order structure for arbitrary set of
points
• Order structure for a set characterized by an
index

qi line pj point
11
Order Structure
• Algorithm
• Voting matrix

12
Order Structure
• Perceptual similarity example human pose

13
Shape Context descriptor
• Sample points from edges in image
• Each points descriptor is a histogram of the
relative coordinates of all other points.

14
Action Recognition using Key Frames
• Deciding images are related
• pai and pbi are coordinates of corresponding
points in images A and B.
• T is class of transformations that define
relation between A and B. (known a priori)
• Matching Distance
• General case
• Using pure translation

15
Action recognition using Key Frames
• 30 second tennis sequence
• Coarse automatic tracking
• Edge detection done on upper half of player
• No deletion of background edges
• Selected a key frame and computed matching score
wrt. each other frame.
• 9 local minima shown, each the start of a
forehand stroke.

16
Action recognition using Key Frames
17
Tracking
• Point transferral
• Each key frame is marked manually
• For each point in key frame, a subset of points
in the image are chosen, and a translation is
estimated.

Point corresponding to PkR in image It
Simple local translation
Point in keyframe R
18
Updating the Voting Matrix
• Extra information to improve accuracy
• Use standard tracker for head and body
localization. (Brand, Shadow Puppetry)
• Set V(piR, pjt) 0 if the points arent close to
the corresponding lines in corresponding matched
head/body quadrangles.

19
Further constraints
• Want to enforce similar arrangement of interior
points in images that are matched to key frames
• Also incorporate intensity around points
• Monte-Carlo smoothing is used to correct outlying
points

20
Tracking using Shape Context
• Mori Malik
• Very similar technique, using shape context
descriptor
• Very clear that frames are processed
independently
• Tested on standard data

21
Tracking w/Shape Context Movie
22
Discussion Questions
• Results - how effective?
• Effect of rate of motion?
• Efficiency of closed loop system?
• No need for background subtraction?
• Flexibility to multiple actions?
• Do they give a specific order to key frames?
• Is the coarse tracking too simple?
• What about poses facing away from camera?
About PowerShow.com