Recognizing and Tracking Human Action - PowerPoint PPT Presentation


Title: Recognizing and Tracking Human Action


1
Recognizing and Tracking Human Action
  • Josephine Sullivan and Stefan Carlsson

2
Define Tracking
3
Traditional tracking
  • Kalman Filters
  • Condensation
  • HMM
  • Matching articulated 3d models
  • Similarities?
  • Problems?

4
New approach
  • What is the difference between tracking and
    recognition?
  • Assume Pose recognition and activity recognition
    are equivalent.
  • Now track activity by repeating recognition of
    key frames

5
Discussion reasons for previous approach
  • Why the distinction between tracking and
    recognition?
  • Applications?
  • Projectile tracking
  • Motion capture

6
Object descriptors
  • Embedding global data in local descriptors
  • Order Structure
  • Shape context

7
Order Structure
  • Problem find correspondence between deformed
    shapes
  • Solution
  • Sample points on contour
  • Describe shape using order structure
  • Order of points and intersections of tangent lines

8
Order Structure
  • Many transformations preserve order structure
  • Superset of Affine and Projective transformations
  • Encodes perceptual similarity
  • Encodes properties of point sets, lines, and
    combinations of points and lines.
  • Descriptor for Point sets - orientation
  • Set a,b,c has orientation if traversing them
    in order means anti-clockwise rotation

9
Order Structure
  • Descriptor for Sets of lines
  • Uses points and lines are projectively dual
  • p - homogeneous coords for a point
  • q - oriented homogeneous line coords for line
    thru p, then qTp 0
  • q (a,1,b) where axyb 0.
  • Order type for a set of 3 lines is then

10
Order Structure
  • Descriptor for combinations of points and lines
  • Oriented coordinates gt every line has a
    direction
  • Assign a left-right position for every point
    w.r.t every line
  • Unique order structure for arbitrary set of
    points
  • Order structure for a set characterized by an
    index

qi line pj point
11
Order Structure
  • Algorithm
  • Voting matrix

12
Order Structure
  • Perceptual similarity example human pose

13
Shape Context descriptor
  • Sample points from edges in image
  • Each points descriptor is a histogram of the
    relative coordinates of all other points.

14
Action Recognition using Key Frames
  • Deciding images are related
  • pai and pbi are coordinates of corresponding
    points in images A and B.
  • T is class of transformations that define
    relation between A and B. (known a priori)
  • Matching Distance
  • General case
  • Using pure translation

15
Action recognition using Key Frames
  • 30 second tennis sequence
  • Coarse automatic tracking
  • Edge detection done on upper half of player
  • No deletion of background edges
  • Selected a key frame and computed matching score
    wrt. each other frame.
  • 9 local minima shown, each the start of a
    forehand stroke.

16
Action recognition using Key Frames
17
Tracking
  • Point transferral
  • Each key frame is marked manually
  • For each point in key frame, a subset of points
    in the image are chosen, and a translation is
    estimated.

Point corresponding to PkR in image It
Simple local translation
Point in keyframe R
18
Updating the Voting Matrix
  • Extra information to improve accuracy
  • Use standard tracker for head and body
    localization. (Brand, Shadow Puppetry)
  • Set V(piR, pjt) 0 if the points arent close to
    the corresponding lines in corresponding matched
    head/body quadrangles.

19
Further constraints
  • Want to enforce similar arrangement of interior
    points in images that are matched to key frames
  • Also incorporate intensity around points
  • Monte-Carlo smoothing is used to correct outlying
    points

20
Tracking using Shape Context
  • Mori Malik
  • Very similar technique, using shape context
    descriptor
  • Very clear that frames are processed
    independently
  • Tested on standard data

21
Tracking w/Shape Context Movie
22
Discussion Questions
  • Results - how effective?
  • Effect of rate of motion?
  • Efficiency of closed loop system?
  • No need for background subtraction?
  • Flexibility to multiple actions?
  • Do they give a specific order to key frames?
  • Is the coarse tracking too simple?
  • What about poses facing away from camera?
View by Category
About This Presentation
Title:

Recognizing and Tracking Human Action

Description:

What is the difference between tracking and recognition? ... Problem: find correspondence between deformed shapes. Solution. Sample points on contour ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 23
Provided by: chenand
Learn more at: https://cseweb.ucsd.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Recognizing and Tracking Human Action


1
Recognizing and Tracking Human Action
  • Josephine Sullivan and Stefan Carlsson

2
Define Tracking
3
Traditional tracking
  • Kalman Filters
  • Condensation
  • HMM
  • Matching articulated 3d models
  • Similarities?
  • Problems?

4
New approach
  • What is the difference between tracking and
    recognition?
  • Assume Pose recognition and activity recognition
    are equivalent.
  • Now track activity by repeating recognition of
    key frames

5
Discussion reasons for previous approach
  • Why the distinction between tracking and
    recognition?
  • Applications?
  • Projectile tracking
  • Motion capture

6
Object descriptors
  • Embedding global data in local descriptors
  • Order Structure
  • Shape context

7
Order Structure
  • Problem find correspondence between deformed
    shapes
  • Solution
  • Sample points on contour
  • Describe shape using order structure
  • Order of points and intersections of tangent lines

8
Order Structure
  • Many transformations preserve order structure
  • Superset of Affine and Projective transformations
  • Encodes perceptual similarity
  • Encodes properties of point sets, lines, and
    combinations of points and lines.
  • Descriptor for Point sets - orientation
  • Set a,b,c has orientation if traversing them
    in order means anti-clockwise rotation

9
Order Structure
  • Descriptor for Sets of lines
  • Uses points and lines are projectively dual
  • p - homogeneous coords for a point
  • q - oriented homogeneous line coords for line
    thru p, then qTp 0
  • q (a,1,b) where axyb 0.
  • Order type for a set of 3 lines is then

10
Order Structure
  • Descriptor for combinations of points and lines
  • Oriented coordinates gt every line has a
    direction
  • Assign a left-right position for every point
    w.r.t every line
  • Unique order structure for arbitrary set of
    points
  • Order structure for a set characterized by an
    index

qi line pj point
11
Order Structure
  • Algorithm
  • Voting matrix

12
Order Structure
  • Perceptual similarity example human pose

13
Shape Context descriptor
  • Sample points from edges in image
  • Each points descriptor is a histogram of the
    relative coordinates of all other points.

14
Action Recognition using Key Frames
  • Deciding images are related
  • pai and pbi are coordinates of corresponding
    points in images A and B.
  • T is class of transformations that define
    relation between A and B. (known a priori)
  • Matching Distance
  • General case
  • Using pure translation

15
Action recognition using Key Frames
  • 30 second tennis sequence
  • Coarse automatic tracking
  • Edge detection done on upper half of player
  • No deletion of background edges
  • Selected a key frame and computed matching score
    wrt. each other frame.
  • 9 local minima shown, each the start of a
    forehand stroke.

16
Action recognition using Key Frames
17
Tracking
  • Point transferral
  • Each key frame is marked manually
  • For each point in key frame, a subset of points
    in the image are chosen, and a translation is
    estimated.

Point corresponding to PkR in image It
Simple local translation
Point in keyframe R
18
Updating the Voting Matrix
  • Extra information to improve accuracy
  • Use standard tracker for head and body
    localization. (Brand, Shadow Puppetry)
  • Set V(piR, pjt) 0 if the points arent close to
    the corresponding lines in corresponding matched
    head/body quadrangles.

19
Further constraints
  • Want to enforce similar arrangement of interior
    points in images that are matched to key frames
  • Also incorporate intensity around points
  • Monte-Carlo smoothing is used to correct outlying
    points

20
Tracking using Shape Context
  • Mori Malik
  • Very similar technique, using shape context
    descriptor
  • Very clear that frames are processed
    independently
  • Tested on standard data

21
Tracking w/Shape Context Movie
22
Discussion Questions
  • Results - how effective?
  • Effect of rate of motion?
  • Efficiency of closed loop system?
  • No need for background subtraction?
  • Flexibility to multiple actions?
  • Do they give a specific order to key frames?
  • Is the coarse tracking too simple?
  • What about poses facing away from camera?
About PowerShow.com