ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery - PowerPoint PPT Presentation

Loading...

PPT – ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery PowerPoint presentation | free to view - id: ae0e9-OGQ3N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery

Description:

... tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc. ... Find the hidden links between isolated news, events, etc. ... – PowerPoint PPT presentation

Number of Views:219
Avg rating:3.0/5.0
Slides: 60
Provided by: csieN5
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery


1
Content-Based Video Analysis based on Audiovisual
Features for Knowledge Discovery
  • Chia-Hung Yeh
  • Signal and Image Processing Institute
  • Department of Electrical Engineering
  • University of Southern California

2
Vision
Parsing or Segmentation
3
Guidelines
  • Motivation
  • Introduction
  • Overview of visual and audio content
  • Video abstraction
  • Multimodal information concept
  • Knowledge discovery via video mining
  • Our previous work
  • Conclusion and future work

4
Motivation
  • Amazing growth in the amount of digital video
    data in recent years.
  • Develop tools for classify, retrieve and abstract
    video content
  • Develop tools for summarization and abstraction
  • Bridge a gap between low-level features and
    high-level semantic content
  • To let machine understand video is important and
    challenging

5
Why, What and How
  • Why video content analysis?
  • Modern multimedia technologies have led to huge
    amount of digital video collections. But,
    efficient access to video content is still in its
    infancy, because of its bulky data volume and
    unstructured data format.
  • What is video content analysis?
  • Video content analysis analyzes the video content
    and attempts to automatically understand the
    embedded video semantics as humans do
  • How to do video content analysis?

6
Overview of Visual Content
  • Structured analysis
  • Extract hierarchical video structure

Key sentences
Sentences
grouped into
Words
segmented into
Text Document
7
Overview of Audio Content
  • Continuous in the time domain, not like visual
  • Multiple sound source exists in a sound track
    like many objects in a single frame
  • It is tough to separate audio content and give a
    suitable description
  • Framework in MPEG-7, silence, timbre, waveform,
    spectal, harmonic and fundamental frequency
  • Some special features for music and speech

8
Content-Based Video Indexing
  • Process of attaching content based labels to
    video shots
  • Essential for content-based classification and
    retrieval
  • Some required techniques
  • Shot detection
  • Key frame selection
  • Object segmentation and recognition
  • Visual/audio feature extraction
  • Speech recognition, video text, VOCR

9
Content-Based Video Classification
  • Segment classify videos into meaning categories
  • Classify videos based on predefined topic
  • Multimodal concept
  • Visual features
  • Audio features
  • Metadata features
  • Domain-specific knowledge

10
Query (Retrieval Methods)
  • Simple visual feature query
  • Feature combination query
  • Query by example (QBE)
  • Retrieve video which is similar to example
  • Localized feature query
  • Example retrieve video with a running car toward
    right
  • Object relationship query
  • Concept query (query by keyword)
  • Metadata
  • Time, date and etc.

11
The Ways to Browse a Video
  • Playback faster
  • Audio time scale modification time saving
    factor 1.5 to 2.5
  • 15 - 20 time reduction by removing and
    shortening pauses
  • Storyboard
  • Composed of representative still frames
    (Keyframes)
  • Moving storyboard
  • Display keyframes while synchronized with the
    original audio track
  • Highlight
  • Pre-defined special event (example sport and
    news)
  • Skimming
  • Extract short video clips to build a much shorter
    video

12
Timeline of Related Technique Development
13
Image Retrieval and Video Browsing
  • Query by Image Content (QBIC), IBM, 1995
  • Complex multi-feature and multi-object queries
  • Video browsing
  • Quickly and efficiently Discover the information
  • Browsing and searching are usually complement
    each other
  • Visual content browsing us easier than audio
    content
  • Achieved by static storyboard, dynamic video
    clips, fast forward
  • Representative work
  • Gary Marchionini, University of Maryland
  • S.-F. Chang, Columbia University

14
Video Abstraction
  • Video summarization and video skimming
  • Belong to video abstraction and different from
    video browsing
  • Automatically retrieve the most significant and
    most representative a collection of segments
  • Required techniques
  • Shot detection, scene generation
  • Motion analysis
  • Face recognition
  • Audio segmentation
  • Text detection
  • Music detection

15
Video Abstraction
  • A video abstract
  • A sequence of still or moving images which
    preserve essential original video content while
    it is much shorter than the original one
  • Applications
  • Automated authoring of web
  • content
  • Web news
  • Web seminar
  • Consumer domain applications
  • Analyzing, filtering, and browsing

16
Video Summarization (I)
  • A collection of salient frames that represent the
    underlying content
  • Most related work focus on the ways to extract
    still frame
  • Categorize into three classes
  • Frame-based
  • Randomly or uniformly select
  • Shot-based
  • Keyframe
  • Feature-based
  • Motion, color and so on

17
Video Summarization (II)
  • Representative work
  • Y. Taniguchi, (1995)
  • Frame-based scheme
  • Simple but may not representative due to not
    uniform length of shots
  • H.-J. Zhang, Microsoft Research China (1997)
  • Keyframe based on color histogram
  • Gong and Liu, NEC Laboratories of American (2003)
  • SVD (Single Value Decomposition)
  • Capture temporal and spatial characteristics
  • Tseng, Lin and J. R. Smith, IBM T. J. Research
    Center (2002)
  • Video summarization scheme for pervasive mobile
    device

18
Video Skimming
  • A good skim is much like a movie trailer
  • A synopsis of the entire video
  • Representative work
  • M. Smith and T. Kanade, Carnegie Mellon
    University (1995)
  • Audio and image characterization
  • S. Pfeiffer, University of Mannheim (1996)
  • VAbstract system
  • Detection of special events such as dialogs,
    explosions and text occurrences
  • H. Sundaram and S.-F. Chang, Columbia University
    (2001)
  • A semantics skimming system
  • Visual complexity for human understanding
  • Film syntax

19
Video Skimming Application
  • Video content transcoding
  • Content-based live sport video filtering

20
Video Shot Structure
  • Shot, a cinematic term, is the smallest
    addressable video unit (the building block). A
    shot contains a set of continuously recorded
    frames
  • Two types of video shots
  • Camera break ? abrupt content change between
    neighboring frames. Usually corresponds to an
    editing cut
  • Gradual transition ? smooth content change over a
    set of consecutive frames. Usually caused by
    special effects
  • Shot detection is usually the first step towards
    video content analysis

21
Scene Characteristics
  • Scene is a semantic concept which refers to a
    relatively complete video paragraph with coherent
    semantic meaning It is subjectively defined
  • Shots within a movie scene have following 3
    features
  • Visual similarity ?
  • Since a scene could only be developed within
    certain spatial and temporal localities, the
    directors have to repeat some essential shots to
    convey parallelism and continuity of activities
    due to the sequential nature of film making
  • Audio similarity
  • Similar background noises
  • Speeches from the same person have similar
    acoustic characteristics
  • Time locality
  • Visually similar shots should also be temporally
    close to each other if they do belong to the same
    scene

22
Basic Audio Features
  • Energy
  • Silence or pause detection
  • Zero crossing rate (ZCR)
  • The frequency of the audio signal amplitude
    passing through the zero value in a given time
  • Energy centroid
  • Speech range 100 Hz to 7k Hz
  • Music range 16 Hz to 16000 Hz
  • Band periodicity
  • Harmonic sounds
  • Music High frequency components are integer
    multiples of the lowest one
  • Speech Pitch
  • MFCC - (Mel-Frequency Cepstral Coefficients)
  • 13 linearly-spaced filters

23
Multimodal Information Concept
24
Multimodal Framework for Video Content
Interpretation
  • Application on automatic TV Programs abstraction
  • Allow user to request topic-level programs
  • Integrate multiple modalities visual, audio and
    text information
  • Multi-level concepts
  • Low low-level feature
  • Mid object detection, event modeling
  • High classification result of semantic content
  • Probabilistic model using Bayesian network for
    classification (causal relationship,
    domain-knowledge)

25
Probabilistic Model Data Fusion
26
How to Work with the Framework
  • Preprocessing
  • Video segmentation (shot detection) and key frame
    selection
  • VOCR, speech recognition
  • Feature Extraction
  • Visual features based on key-frame
  • Color, texture, shape, sketch, etc.
  • Motion features
  • Camera operation Panning, Tilting, Zooming,
    Tracking, Booming, Dollying
  • Motion trajectories (moving objects)
  • Object abstraction, recognition
  • Audio features
  • average energy, bandwidth, pitch, mel-frequency
    cepstral coefficients, etc.
  • Textual features (Transcript)
  • Knowledge tree, a lot of keyword categories
    politics, entertainment, stock, art, war, etc.
  • Word spotting, vote histogram
  • Building and training the Bayesian network

27
Challenging Points
  • Preprocessing is significant in the framework.
  • Accuracy of key-frame selection
  • Accuracy of speech recognition VOCR
  • Good feature extraction is important for the
    performance of classification.
  • Modeling semantic video objects and events
  • How to integrate multiple modalities still need
    to be well considered

28
Knowledge Discovery via Video Mining
  • Objectives
  • Find the hidden links between isolated news,
    events, etc.
  • Find the general trend of an event development
  • Predict the possible future event
  • Discover abnormal events
  • Required Technologies
  • Domain-specific knowledge model
  • Mining association rules, sequential patterns and
    correlations
  • Effective and fast classification and clustering
  • Challenges
  • Model build-up in special knowledge domain
  • Integration of semantic mining and feature-based
    mining
  • Effective and scalable classification and
    clustering algorithms

29
Video Mining Issues
  • Frequent/Sequential Pattern Discovery
  • Fast and scalable algorithms for mining frequent,
    sequential and structured patterns and for
    correlation analysis
  • Similarity of rule/event search/measurement
  • Efficient and fast classification and clustering
    algorithms
  • Constraint-based classification and clustering
    algorithms
  • Spatiotemporal data mining algorithms
  • Stream data mining (classification and
    clustering) algorithms
  • Surprise/outlier discovery and measurement
  • Detection of outliers based on similarity and
    trend analysis
  • Detection of outliers and surprised events based
    on stream data mining algorithms
  • Multidimensional data mining for trend prediction

30
Framework of Video Mining
31
Our Previous Work
  • TV Commercial Detection
  • Visual/audio information processing
  • Cinema rules
  • Intensity mapping
  • Tempo analysis in digital video (Professional
    video)
  • Audio tempo
  • Motion tempo
  • Home video processing (Non-professional)
  • Quality enhancement (Bad shot detection)
  • Music and video matching

32
Commercial Detection
  • First step to do any TV program content
    management
  • Monitor broadcast
  • Government
  • Advertisement Company
  • Commercial features
  • Delimiting black frame (not available in some
    countries)
  • High cut frequency and short shot interval
    (important feature)
  • Still images
  • Special editing styles and effects
  • Text and logo

33
Commercial Detection
  • Visual information processing
  • Black frame detection
  • Shot detection its statistic analysis
  • Still image detection
  • Text-region detection
  • Edge change rate detection
  • Audio information processing
  • Volume control
  • Silence

34
Commercial Detection
  • Structure of TV program

Normal program
Normal Program with Station logo
Spot
Spot
Normal program
Black frame
Structure of TV program
35
Shot Detection Its Statistic Analysis
Commercial Start point
36
Still Image Detection
  • Still Image
  • Video Clip is composed of a sequence of image
  • Find out a set of consecutive images that have
    little change over a period of time
  • Difficulty
  • Even though we feel that video clip is still, the
    difference between two consecutive images is
    seldom zero
  • It is tough to measure the moving part. (human
    eyes are sensitive to motion)
  • Main idea
  • Quantify motion in each image to detect still
    image

37
Still Image Detection
Error detection
Really still images
38
Tempo Analysis and Cinema Rules
  • The visual story - seeing the structure of film,
    TV, and new media, Bruce Block
  • Relationship between story structure and visual
    structure
  • Their intensity maps are correlated
  • Principle of contrast and affinity
  • The greater the contrast in a visual component,
    the more the visual intensity or dynamic
    increases

39
Cinema Rules
  • Every feature film has a well designed story
    structure, which contains the beginning
    (exposition), the middle (conflict), and the end
    (resolution)

EX exposition ? gives the facts needed to begin
the story CO conflict ? contains rising actions
or conflict CX climax R resolution ? end the
story
40
Cinema Rules
  • Scene
  • A simple theme in a scene
  • Each scene is composed of setup part, progressing
    part, and resolution part
  • Final film is just a way to present this theme
  • Dialog
  • Close-up view
  • A story unit
  • A example of scene
  • Main actors drove the main actress from train
    station back to home
  • A simple action
  • Met at train station -gtOn the road-gtAnother main
    actor joined them -gt Arrive home

41
Audio Tempo
  • Music tempo
  • Definition in music
  • Note
  • Meter A longer period contains many beats. For
    example, we can count as ONE-two-three,
    ONE-two-three
  • Tempo (pace/beat period)
  • It is often indicated in the beginning. For
    example, the rate should be 100 quarter notes per
    minute (100 times we clap per minute)

42
Audio Tempo
  • Speech tempo
  • Emotion detection
  • Segmental durations
  • Syllable or phoneme
  • Audio tempo
  • Short time pace
  • Short-term memory
  • The number of sound events per unit of time
  • The more events, the faster it seems to go
  • Onset
  • A new note or a new syllable

43
Audio Tempo
  • Diagram of audio tempo analysis

44
Audio Tempo
  • Frequency filterbank
  • Perceptual frequency
  • Critical bands
  • Wavelet-packet
  • Multirate system
  • Envelope extractor
  • Rectify
  • Filtering 50 ms half-Hamming window
  • Differentiator
  • First-order difference
  • Half-wave rectified

Input signal and detected onsets
45
Audio Tempo
  • Boundary of story units
  • Local minima of audio tempo
  • Post signal processing
  • Help to get local minima
  • Three steps
  • Lowpass filtering
  • Morphological operation
  • Minmax
  • Close operation
  • Detect local minima
  • Detected valleys

Post processing for audio tempo analysis
46
Motion Analysis
  • The variance of motion vector
  • Where is a window, is the
    average length of motion vectors for each shot,
    and is shot index

47
Motion Analysis
  • Boundary of story units
  • Transition Edges
  • Post processing
  • Morphological operation
  • Median
  • Maxmin
  • Minmax
  • Gradient
  • Detect edges

Post processing for visual tempo
48
Skimming Video
  • Test data
  • Legends of The Fall
  • Beginning 26 minutes
  • MPEG format
  • 352240 pixels
  • 44.1 KHz

49
Home Video Processing
  • Home video characteristics
  • Fragmental
  • Sound may not be very important
  • Bad shots
  • Stabilization
  • Focus
  • Lighting

50
Bad Shots
  • Shaky
  • Drive
  • Walk
  • Vibration of the camera motions of successive
    frames

51
Bad Shots
  • Ill-light
  • Too dark/bright
  • Variance too much
  • Diaphragm
  • Lighting Problem
  • Average of luminance
  • Highest 1/3 pixels and lowest 1/3 pixels
  • Negative feedback

52
Bad Shots
  • Blur
  • Motion blur
  • Out-of-focus blur
  • Foggy blur

53
Music and Video Matching
  • Shot detection
  • Remove bad shots
  • Match music tempo
  • Shot length
  • Motion activity

54
Authoring Scheme
  • Match music tempo
  • High tempo
  • Small segment length
  • Transition time
  • High motion activity

55
Experimental Results
  • Test data
  • Input music 5.5-minutes music, Canon
  • Input video clips
  • Activities of babies of 0 3 years old
  • Man-made bad shots
  • Average clip length is about 20 seconds
  • Total length is 50 minutes

56
Well-Known Research in Video Content Analysis
Field
  • Well-known university
  • Digital Video Multimedia laboratory (DVMM),
    Columbia University
  • MIT Media laboratory
  • Information Digital Video Understanding, Carnegie
    Mellon University
  • Department of Electrical and Computer
    Engineering, University of Illinois of
    Urbana-Champaign
  • Signal and Image Processing Institute, University
    of Southern California
  • Department of Electrical Engineering, Princeton
    University
  • Language and media processing laboratory,
    University of Maryland

57
Well-Known Research in Video Content Analysis
Field
  • Well-known RD laboratory
  • IBM T. J. Watson research center
  • IBM Almaden research center
  • Intel corporation
  • Sharp Laboratory of America (SLA)
  • Microsoft research laboratory
  • Microsoft research China
  • Hawlett-Packard research laboratory
  • ATT Bell laboratory
  • InterVideo
  • Pinnacle

58
Conclusion
  • Introduction of several basic concepts
  • Basic processing and low-level feature extraction
  • Semantic video modeling and indexing
  • Multimodal framework for topic classification of
    Video
  • Knowledge discovery via video mining
  • Our research results
  • Discussion of Challenging problems

59
Questions
Thank You
About PowerShow.com