A Framework for the Representation and Integration of Multimedia Content and Context Information - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

A Framework for the Representation and Integration of Multimedia Content and Context Information

Description:

Example: Letterman. CC Categories. ECE-CMU, April 29, 2002. Example: Letterman PSS # 12. CC Categories. Mid-level audio probabilities ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: A Framework for the Representation and Integration of Multimedia Content and Context Information


1
A Framework for the Representation and
Integrationof Multimedia Content and Context
Information
Radu S. JasinschiPhilips Research
2
Overview
  • Introduction
  • Related work
  • Problem statement
  • Proposed formalism
  • Representation content and context
  • Multimodal integration
  • Bayesian networks content information
  • Hierarchical priors context information
  • Application Video Scout
  • Experiments
  • Conclusion

3
Introduction
  • Market facts
  • Digital video consumption 300 channels
  • Personalized Video Recorders next wave
  • Web search engines exponential grow in
    multimedia information
  • Research needs
  • Content-based video analysis and retrieval of
    multimedia information
  • High-level video content information indexing
  • Proposed framework
  • Content and context information
  • Structured representation
  • probabilistic integration

4
Related Work
  • Video databases
  • QBIC (IBM)
  • Informedia (CMU)
  • Virage
  • VideoQ (Columbia University)
  • Probabilistic Methods
  • M. Naphade (IBM)
  • N. Vasconcelos (COMPAQ)
  • Speech driven applications
  • C. Neti (IBM)
  • T. Chen (CMU)

5
Problem Statement
  • How do we segment, index, and, store many hours
    of video from 300 TV channels?
  • How do we represent and integrate multimodal
    information?

6
Overview
  • Introduction
  • Related work
  • Problem statement
  • Proposed formalism
  • Representation content and context
  • Multimodal integration
  • Bayesian networks content information
  • Hierarchical priors context information
  • Application Video Scout
  • Experiments
  • Conclusion

7
Proposed Formalism
  • Structured multimedia representation
  • Content information
  • Granularity
  • Abstraction
  • Context information
  • Probabilistic method of multimodal information
    integration
  • Bayesian networks
  • Hierarchical priors

8
Multimedia Content Information
  • Multimedia content objects
  • Three modalities
  • visual shots, faces, trees, etc.
  • audio speech, music, etc.
  • text transcript, keywords, etc.
  • Structured content representation
  • Levels of granularity and abstraction
  • Allows for the consistent representation and
    integration of multimedia content information

9
Structured Content Representation
  • Content granularity levels of detail
  • Content abstraction semantic information

10
Multimedia Context Information
  • Context information
  • Underlying structure, signature or patterns
  • Supports an interpretation but it is not and
    interpretation itself
  • Can be used to constraint the content
    information, reducing the number of possible
    interpretations
  • Content versus context information

11
Multimedia Context Taxonomy
12
Semantic (Textual) Context
  • Formalized in the linguistic domain
  • Example the proposition P (Holmes is a
    detective) has an ambiguous meaning
  • Knowledge of its semantic context, in this case
    the stories of Sherlock Holmes, disambiguates the
    statement
  • Formalization ist (context-of (Sherlock Holmes
    stories, Holmes is a detective))
  • General structure ist(C, P), where C is the
    context

13
Multimedia Context
  • Visual context taxonomy

14
Multimedia Context
  • Audio context taxonomy

15
Multimodal Integration
  • Combine evidence robustness
  • Use all modalities visual, audio, text
  • Integrate content information
  • Integrate content and context

16
Probabilistic Framework
  • Bayesian network
  • Integrate content information at the same
    granularity level
  • intra-modality same mode, different attributes
  • inter-modality different mode and attributes
  • Link different levels of granularity
  • Hierarchical priors
  • Integrate content and context
  • Context use as prior information to content

17
Bayesian Network Example
18
Bayesian Network Elements
  • Directed acyclic graph
  • Conditional probability
  • Joint probability distribution

19
Hierarchical Priors Example
20
Hierarchical Priors Elements
  • Chapman-Kolmogoroff equation
  • Nested priors

21
Content and Context Layers
  • Combine Bayesian networks and hierarchical priors

22
Overview
  • Introduction
  • Related work
  • Problem statement
  • Proposed formalism
  • Representation content and context
  • Multimodal integration
  • Bayesian networks content information
  • Hierarchical priors context information
  • Application Video Scout
  • Experiments
  • Conclusion

23
Application Video Scout
  • End-to-end system prototype of personal video
    recorder
  • Input
  • Broadcast TV program video
  • Electronic program guide (EPG)
  • Personal profiles program (PPP) and content
    (CPP)
  • Output
  • Segmented and indexed TV program segments by
    topics

24
Video Scout Overview
25
Content and Context Layers
26
TV Programs
  • Domain structure
  • Commercials versus program parts
  • Commercials short (30sec.), fast pace
  • Program long (gt 5min.), specific structure
  • Multimodal (visual, audio, and transcript)
    information
  • Structural correlation
  • Stochastic nature of multimedia information

27
TV Program Structure
Commercials
28
Overview
  • Introduction
  • Related work
  • Problem statement
  • Proposed formalism
  • Representation content and context
  • Multimodal integration
  • Bayesian networks content information
  • Hierarchical priors context information
  • Application Video Scout
  • Experiments
  • Conclusion

29
Experiments
  • Input
  • 9 TV programs (6 hrs.)
  • Financial news and talk shows
  • Features
  • Visual keyframes, visual text, faces
  • Audio noise (No), speech (Sp), music (Mu),
    SpMu, SpSp, and SpMu
  • Transcript (close captioning) 20 categories
  • Output TV program segments and their
    classification according to topics

30
Algorithm for Segmentation and Indexing
  • 1. Commercial segmentation
  • 2. Program sub-segment (PSS) generation
  • 3. Frame-based probability generation
  • 4. PSS probabilities computation
  • P_AUDIO_FIN, P_AUDIO_TALK, P_CC_FIN, P_CC_TALK
  • P_FACETEXT_FIN, P_FACETEXT_TALK
  • 5. Combine PSS with context probabilities
  • 6. Compute joint probabilities
  • P_FIN_TOPIC, P_TALK_TOPIC

31
Example Letterman
  • CC Categories

32
Example Letterman PSS 12
  • CC Categories
  • Mid-level audio probabilities
  • Mid-level visual features probabilities

33
PSS Content Probabilities
  • PSS 12, start_time 23614, end_time 24727
    (frames)
  • Visual
  • P_V_FACE 0.91, P_V_TEXT 0.09
  • Audio
  • P_NOISE 0.11, P_SPEECH 0.74, P_MUSIC 0.00,
    P_SPEECH NOISE 0.00,
  • P_SPEECH SPEECH 0.00, P_SPEECH MUSIC 0.15
  • Transcript (Close Captions)
  • P_CC_WEATHER 0.20, P_CC_INTERNATIONAL 0.20,
    P_CC_CRIME 0.00, P_CC_SPORT 0.20, P_CC_MOVIE
    0.20, P_CC_FASHION 0.00,
  • P_CC_TECH_STOCK 0.00, P_CC_MUSIC 0.00,
    P_CC_AUTOMOBILE 0.00, P_CC_WAR 0.00,
    P_CC_ECONOMY 0.20, P_CC_ENERGY 0.00,
  • P_CC_STOCK 0.00, P_CC_VIOLENCE 0.00,
    P_CC_FINANCIAL 0.00, P_CC_NATIONAL 0.00,
    P_CC_BIOTECH 0.00, P_CC_DISASTER 0.00,
  • P_CC_ART 0.00, P_CC_POLITICS 0.00

34
Audio Genre Context Probabilities
35
Visual Genre Context Probabilities
36
Audio Genre Context Extraction
  • 1. Select TV programs of a known genre
  • 2. Segment commercials
  • 3. Tessellate the program part into units, such
    as the PSS based on close captions
  • 4. Determine a probability for each PSS based on
    the vote/probability table
  • 5. Sum up the votes for each vote/probability
  • 6. Select the best vote/probability
  • context (probability) pattern

37
Vote/Probability Table and Results
38
Vote/Probability Results News
39
Combining Content Context
  • Final multimodal joint probabilities
  • P_FACETEXT_FIN 0.0, P_FACETEXT_TALK 1.0
  • P_AUDIO_FIN 0.0, P_AUDIO_TALK 1.0
  • P_CC_CAT_FIN 0.5, P_CC_CAT_TALK 0.5
  • Final joint topic probabilities
  • P_FIN-TOPICS 0.0, P_TALK-TOPICS 0.5
  • Accumulated classification results for first 12
    segments
  • Class. of FIN SEGS 2, of TALK SEGS 10,
    Comm. 0

40
Classification Results Content and Context
Integration
Precision 91.4, Recall 85.7
Precision 81.1, Recall 86.9
41
Classification Results Financial News with and
without Integration
With context/content integration
No context/content integration
42
Classification Results Talk Show with and
without Integration
With context/content integration
No context/content integration
43
Conclusion
  • Novel multimedia framework
  • Representation
  • Content data tessellation granularity
  • Content semantic structure abstraction
  • multimedia context
  • Multilayered content/context structure
  • Multimodal integration
  • Context and context
  • probabilistic method
  • Bayesian networks
  • hierarchical priors
  • Video Scout beyond the TiVo paradigm

44
Achievements
  • Exhibitions
  • Philips Corporate Research Exhibition (CRE) 2001
  • ICME 2000 Exhibition
  • ACM 2000 Exhibition
  • Customer presentation 2000
  • 7 papers
  • 5 International conferences (presented)
  • 1 International conference (accepted)
  • 1 Journal paper (submitted)
  • 14 Patents (filed)

45
Acknowledgement
  • CIM team that collaborated in this work
  • Nevenka Dimitrova
  • Lalitha Agnihotri
  • Jennifer Louie
  • Thomas McGee
  • Radu Jasinschi
  • Dongge Li
  • Mei Shi
  • John Zimmerman
Write a Comment
User Comments (0)
About PowerShow.com