Object%20Level%20Grouping%20for%20Video%20Shots - PowerPoint PPT Presentation

About This Presentation
Title:

Object%20Level%20Grouping%20for%20Video%20Shots

Description:

Fitness' of the model is calculated using the entire data ... Reorder matrix by finding connected components of a graph of the thresholded co ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 24
Provided by: csU73
Learn more at: http://www.cs.ucf.edu
Category:

less

Transcript and Presenter's Notes

Title: Object%20Level%20Grouping%20for%20Video%20Shots


1
Object Level Grouping for Video Shots
  • Josef Sivic
  • Frederik Schaffalitzky
  • Andrew Zisserman
  • Robotics Research Group,
  • Department of Engineering Science,
  • University of Oxford

2
Results
3
Results
4
Summary of work
  • Segment semi rigidly moving 3D objects in a shot
  • Detect regions of interest
  • Track regions of interest
  • Partition tracks into groups belonging to same
    object
  • Repair long range tracks (occlusions in a scene)
  • Identify same object from different viewing
    aspects (Another paper Video Google)

5
Basic Segmentation and Tracking
  • Detect affine invariant regions of interest
    represented by ellipse

6
Basic Segmentation and Tracking
  • Can also use points of interest or corner detector

7
Basic Segmentation and Tracking
  • Create tracks for matching regions/points in
    consecutive frames
  • They use normalized cross correlation and RANSAC
    algorithm
  • http//homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_C
    OPIES/CANTZLER2/ransac.pdf

8
RANSAC Algorithm
  • Stands for Random Sample Consensus
  • Used for fitting a model to data with outliers
  • Algorithm
  • Many sets random samples are chosen from data and
    a model is calculated
  • Fitness of the model is calculated using the
    entire data
  • The best model and the corresponding samples
    are removed from the data and algorithm is run
    again
  • Algorithm continues until not enough samples fit
    any model

9
Tracking Failure
  • Can be because of three reasons
  • Region is not detected in a frame
  • Region is detected but not matched
  • Occlusion
  • Short range track repair can take care of first
    two reasons

10
Short Range Track Repair
  • Each feature tracked for more than 5 frames is
    propagated to next frame according to motion
    estimated from nearest 5 tracks
  • 6D hypercube of affine transformations is
    searched to match the new region with previously
    detected regions
  • Cannot be done for interest points
  • One of the following actions are taken
  • No new region is instantiated
  • New region is found and added to the track
  • If matched region belongs to another track
    beginning, the tracks are joined

11
Short Range Track Repair Results
12
Cont ..
13
Shape from Motion
http//vision.stanford.edu/public/publication/toma
si/tomasiTr92Text.pdf
14
Object Extraction
  • We find the 3D basis trajectories of rank 3 (like
    structure from motion)
  • To find this, the reprojection error is minimized
    given by

15
Track segmentation process
  • Basic motion grouping
  • Fit homography to a sliding window of three
    frames using RANSAC
  • Remove inlying set and use RANSAC again
  • Aggregate Segmentation
  • Over a wider baseline of 10 frames,
  • Build a track-to-track co-occurrence matrix
  • Reorder matrix by finding connected components of
    a graph of the thresholded co-occurrence matrix

16
Aggregate Segmentation
17
Object Extraction
  • The previous step will give no more than 10
    dominant clusters (empirical)
  • Group pairs of clusters together over a wider
    baseline (20 frames here)
  • Each iteration,
  • Four tracks are selected
  • Rank 3 basis trajectories are calculated
  • All other tracks are projected onto the basis
  • If reprojection error is low, the tracks are
    grouped together
  • If more than 90 of the tracks are grouped
    together, the clusters are joined into one cluster

18
Object Extraction
19
Long Range Track Repair
20
Long Range Track Repair
  • Simple technique by applying nearest neighbor on
    scale invariant feature transform and spatial
    location for each region
  • If number of matched tracks exceeds are
    threshold, they are considered to be the same
    object

21
Search
  • Another paper by Zisserman titled Video Google

22
Yun Zhais Results
23
End
Write a Comment
User Comments (0)
About PowerShow.com