Title: A Framework for the Representation and Integration of Multimedia Content and Context Information
1A Framework for the Representation and
Integrationof Multimedia Content and Context
Information
Radu S. JasinschiPhilips Research
2Overview
- Introduction
- Related work
- Problem statement
- Proposed formalism
- Representation content and context
- Multimodal integration
- Bayesian networks content information
- Hierarchical priors context information
- Application Video Scout
- Experiments
- Conclusion
3Introduction
- Market facts
- Digital video consumption 300 channels
- Personalized Video Recorders next wave
- Web search engines exponential grow in
multimedia information - Research needs
- Content-based video analysis and retrieval of
multimedia information - High-level video content information indexing
- Proposed framework
- Content and context information
- Structured representation
- probabilistic integration
4Related Work
- Video databases
- QBIC (IBM)
- Informedia (CMU)
- Virage
- VideoQ (Columbia University)
- Probabilistic Methods
- M. Naphade (IBM)
- N. Vasconcelos (COMPAQ)
- Speech driven applications
- C. Neti (IBM)
- T. Chen (CMU)
5Problem Statement
- How do we segment, index, and, store many hours
of video from 300 TV channels? - How do we represent and integrate multimodal
information?
6Overview
- Introduction
- Related work
- Problem statement
- Proposed formalism
- Representation content and context
- Multimodal integration
- Bayesian networks content information
- Hierarchical priors context information
- Application Video Scout
- Experiments
- Conclusion
7Proposed Formalism
- Structured multimedia representation
- Content information
- Granularity
- Abstraction
- Context information
- Probabilistic method of multimodal information
integration - Bayesian networks
- Hierarchical priors
8Multimedia Content Information
- Multimedia content objects
- Three modalities
- visual shots, faces, trees, etc.
- audio speech, music, etc.
- text transcript, keywords, etc.
- Structured content representation
- Levels of granularity and abstraction
- Allows for the consistent representation and
integration of multimedia content information
9Structured Content Representation
- Content granularity levels of detail
- Content abstraction semantic information
10Multimedia Context Information
- Context information
- Underlying structure, signature or patterns
- Supports an interpretation but it is not and
interpretation itself - Can be used to constraint the content
information, reducing the number of possible
interpretations - Content versus context information
11Multimedia Context Taxonomy
12Semantic (Textual) Context
- Formalized in the linguistic domain
- Example the proposition P (Holmes is a
detective) has an ambiguous meaning - Knowledge of its semantic context, in this case
the stories of Sherlock Holmes, disambiguates the
statement - Formalization ist (context-of (Sherlock Holmes
stories, Holmes is a detective)) - General structure ist(C, P), where C is the
context
13Multimedia Context
14Multimedia Context
15Multimodal Integration
- Combine evidence robustness
- Use all modalities visual, audio, text
- Integrate content information
- Integrate content and context
16Probabilistic Framework
- Bayesian network
- Integrate content information at the same
granularity level - intra-modality same mode, different attributes
- inter-modality different mode and attributes
- Link different levels of granularity
- Hierarchical priors
- Integrate content and context
- Context use as prior information to content
17Bayesian Network Example
18Bayesian Network Elements
- Directed acyclic graph
- Conditional probability
- Joint probability distribution
19Hierarchical Priors Example
20Hierarchical Priors Elements
- Chapman-Kolmogoroff equation
- Nested priors
21Content and Context Layers
- Combine Bayesian networks and hierarchical priors
22Overview
- Introduction
- Related work
- Problem statement
- Proposed formalism
- Representation content and context
- Multimodal integration
- Bayesian networks content information
- Hierarchical priors context information
- Application Video Scout
- Experiments
- Conclusion
23Application Video Scout
- End-to-end system prototype of personal video
recorder - Input
- Broadcast TV program video
- Electronic program guide (EPG)
- Personal profiles program (PPP) and content
(CPP) - Output
- Segmented and indexed TV program segments by
topics
24Video Scout Overview
25Content and Context Layers
26TV Programs
- Domain structure
- Commercials versus program parts
- Commercials short (30sec.), fast pace
- Program long (gt 5min.), specific structure
- Multimodal (visual, audio, and transcript)
information - Structural correlation
- Stochastic nature of multimedia information
27TV Program Structure
Commercials
28Overview
- Introduction
- Related work
- Problem statement
- Proposed formalism
- Representation content and context
- Multimodal integration
- Bayesian networks content information
- Hierarchical priors context information
- Application Video Scout
- Experiments
- Conclusion
29Experiments
- Input
- 9 TV programs (6 hrs.)
- Financial news and talk shows
- Features
- Visual keyframes, visual text, faces
- Audio noise (No), speech (Sp), music (Mu),
SpMu, SpSp, and SpMu - Transcript (close captioning) 20 categories
- Output TV program segments and their
classification according to topics
30Algorithm for Segmentation and Indexing
- 1. Commercial segmentation
- 2. Program sub-segment (PSS) generation
- 3. Frame-based probability generation
- 4. PSS probabilities computation
- P_AUDIO_FIN, P_AUDIO_TALK, P_CC_FIN, P_CC_TALK
- P_FACETEXT_FIN, P_FACETEXT_TALK
- 5. Combine PSS with context probabilities
- 6. Compute joint probabilities
- P_FIN_TOPIC, P_TALK_TOPIC
31Example Letterman
32Example Letterman PSS 12
- Mid-level audio probabilities
- Mid-level visual features probabilities
33PSS Content Probabilities
- PSS 12, start_time 23614, end_time 24727
(frames) - Visual
- P_V_FACE 0.91, P_V_TEXT 0.09
- Audio
- P_NOISE 0.11, P_SPEECH 0.74, P_MUSIC 0.00,
P_SPEECH NOISE 0.00, - P_SPEECH SPEECH 0.00, P_SPEECH MUSIC 0.15
- Transcript (Close Captions)
- P_CC_WEATHER 0.20, P_CC_INTERNATIONAL 0.20,
P_CC_CRIME 0.00, P_CC_SPORT 0.20, P_CC_MOVIE
0.20, P_CC_FASHION 0.00, - P_CC_TECH_STOCK 0.00, P_CC_MUSIC 0.00,
P_CC_AUTOMOBILE 0.00, P_CC_WAR 0.00,
P_CC_ECONOMY 0.20, P_CC_ENERGY 0.00, - P_CC_STOCK 0.00, P_CC_VIOLENCE 0.00,
P_CC_FINANCIAL 0.00, P_CC_NATIONAL 0.00,
P_CC_BIOTECH 0.00, P_CC_DISASTER 0.00, - P_CC_ART 0.00, P_CC_POLITICS 0.00
34Audio Genre Context Probabilities
35Visual Genre Context Probabilities
36Audio Genre Context Extraction
- 1. Select TV programs of a known genre
- 2. Segment commercials
- 3. Tessellate the program part into units, such
as the PSS based on close captions - 4. Determine a probability for each PSS based on
the vote/probability table - 5. Sum up the votes for each vote/probability
- 6. Select the best vote/probability
- context (probability) pattern
37Vote/Probability Table and Results
38Vote/Probability Results News
39Combining Content Context
- Final multimodal joint probabilities
- P_FACETEXT_FIN 0.0, P_FACETEXT_TALK 1.0
- P_AUDIO_FIN 0.0, P_AUDIO_TALK 1.0
- P_CC_CAT_FIN 0.5, P_CC_CAT_TALK 0.5
- Final joint topic probabilities
- P_FIN-TOPICS 0.0, P_TALK-TOPICS 0.5
- Accumulated classification results for first 12
segments - Class. of FIN SEGS 2, of TALK SEGS 10,
Comm. 0
40Classification Results Content and Context
Integration
Precision 91.4, Recall 85.7
Precision 81.1, Recall 86.9
41Classification Results Financial News with and
without Integration
With context/content integration
No context/content integration
42Classification Results Talk Show with and
without Integration
With context/content integration
No context/content integration
43Conclusion
- Novel multimedia framework
- Representation
- Content data tessellation granularity
- Content semantic structure abstraction
- multimedia context
- Multilayered content/context structure
- Multimodal integration
- Context and context
- probabilistic method
- Bayesian networks
- hierarchical priors
- Video Scout beyond the TiVo paradigm
44Achievements
- Exhibitions
- Philips Corporate Research Exhibition (CRE) 2001
- ICME 2000 Exhibition
- ACM 2000 Exhibition
- Customer presentation 2000
- 7 papers
- 5 International conferences (presented)
- 1 International conference (accepted)
- 1 Journal paper (submitted)
- 14 Patents (filed)
45Acknowledgement
- CIM team that collaborated in this work
- Nevenka Dimitrova
- Lalitha Agnihotri
- Jennifer Louie
- Thomas McGee
- Radu Jasinschi
- Dongge Li
- Mei Shi
- John Zimmerman