Visual Event Recognition in Videos by Learning from Web Data PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Visual Event Recognition in Videos by Learning from Web Data


1
Visual Event Recognition in Videos by Learning
from Web Data
  • Lixin Duan, Dong Xu, Ivor Tsang, Jiebo Luo
  • Nanyang Technological University, Singapore
  • Kodak Research Labs, Rochester, NY, USA

2
Outline
  • Overview of the Event Recognition System
  • Similarity between Videos
  • Aligned Space-Time Pyramid Matching
  • Cross-Domain Problem
  • Adaptive Multiple Kernel Learning
  • Experiments
  • Conclusion

3
Overview
  • GOAL Recognize consumer videos
  • Large intra-class variability limited labeled
    videos

4
Overview
  • GOAL Recognize consumer videos by leveraging a
    large number of loosely labeled web videos (e.g.,
    from YouTube)

A Large Number of Web Videos
5
Overview
  • Flowchart of the system

Video Database
Test video
Classifier
Output
6
Similarity between Videos
  • Pyramid matching methods
  • Temporally aligned pyramid matching, D. Xu and
    S.-F. Chang 1
  • Unaligned space-time pyramid matching, I. Laptev
    2

7
Similarity between Videos
  •  

8
Similarity between Videos
  • Aligned Space-Time Pyramid Matching
  • Level 1

Distance
 
 
9
Similarity between Videos
  • Integer-flow Earth Movers Distance (EMD), Y.
    Rubner 3

 
10
Similarity between Videos
 
 
  • Integer-flow Earth Movers Distance (EMD), Y.
    Rubner 3

 
11
Cross-Domain Problem
  • Data distribution mismatch between consumer
    videos and web videos
  • Consumer videos Naturally captured
  • Web videos Edited Selected
  • Maximum Mean Discrepancy (MMD), K. M. Borgwardt
    4

 
 
 
 
12
Cross-Domain Problem
  •  

Prior information
 
 
13
Cross-Domain Problem
  •  

 
 
 
 
 
14
Cross-Domain Problem
  • Adaptive Multiple Kernel Learning (A-MKL)

MMD
Structural risk functional
 
where
 
 
15
Cross-Domain Problem
  •  

 
 
16
Cross-Domain Problem
  •  

17
Experiments
  • Data set
  • 195 consumer videos and 906 web videos collected
    by ourselves and from Kodak Consumer Video
    Benchmark Data Set 5
  • 6 events wedding, birthday, picnic,
    parade, show and sports
  • Training data 3 videos per event from consumer
    videos and all web videos
  • Test data The rest consumer videos

18
Experiments
  •  

19
Experiments
Aligned
Unaligned
  • Aligned Space-Time Pyramid Matching (ASTPM) vs.
    Unaligned Space-Time Pyramid Matching (USTPM)
  • ASTPM is better than USTPM at Level 1

20
Experiments
  •  

 
21
Experiments
  • Comparisons of cross-domain learning methods
  • (a) SIFT features
  • (b) ST features
  • (c) SIFT features and ST features
  • parade 75.7 (A-MKL) vs. 62.2 (FR)

22
Experiments
  • Comparisons of cross-domain learning methods
  • Relative improvements
  • SVM_T 36.9
  • SVM_AT 8.6
  • Feature Replication (FR) 6 7.6
  • Adaptive SVM (A-SVM) 7 49.6
  • Domain Transfer SVM (DTSVM) 8 9.9
  • MKL-based methods
  • Better fuse SIFT features and ST features
  • Handle noise in the loose labels

23
Conclusion
  • We propose a new event recognition framework for
    consumer videos by leveraging a large number of
    loosely labeled web videos.
  • We develop a new aligned space-time pyramid
    matching method.
  • We present a new cross-domain learning method
    A-MKL which handles the mismatch between the
    data distributions of the consumer video domain
    and the web video domain.

24
References
  • 1 D. Xu and S.-F. Chang. Video event
    recognition using kernel
  • methods with multi-level temporal alignment.
    T-PAMI,
  • 30(11)19851997, 2008.
  • 2 I. Laptev, M. Marszalek, C. Schmid, and B.
    Rozenfeld. Learning realistic human actions from
    movies. In CVPR, 2008.
  • 3 Y. Rubner, C. Tomasi, and L. J. Guibas. The
    Earth movers distance as a metric for image
    retrieval. IJCV, 40(2) 99-121, 2000.
  • 4 K. M. Borgwardt, A. Gretton, M. J. Rasch,
    H.-P. Kriegel, B. Schölkopf, and A. Smola.
    Integrating structured biological data by kernel
    maximum mean discrepancy. In ISMB, 2006.

25
References
  • 5 F. Bach, G. R. G. Lanckriet, and M. I.
    Jordan. Multiple kernel learning, conic duality
    and the SMO algorithm. In ICML, 2004.
  • 6 H. Daumé III. Frustratingly easy domain
    adaptation. In ACL, 2007.
  • 7 L. Duan, I. W. Tsang, D. Xu, and S. J.
    Maybank. Domain transfer svm for video concept
    detection. In CVPR, 2009.
  • 8 J. Yang, R. Yan, and A. G. Hauptmann.
    Cross-domain video concept detection using
    adaptive svms. In ACM MM, 2007.
  • 9 D. G. Lowe. Distinctive image features from
    scale-invariant keypoints. IJCV, 60(2)91110,
    2004.

26
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com