Title: Prof. Marc Davis
1Towards Computational Media Metadata for
Media Automation and Reuse
- Prof. Marc Davis
- University of California at Berkeley
- School of Information Management and Systems
- www.sims.berkeley.edu/marc
2Marc Davis Background
- New Assistant Professor at SIMS (School of
Information Management and Systems) - Background
1980 1984 B.A. from Wesleyan University in the College of Letters
1984 1987 M.A. from the University of Konstanz in Literary Theory and Philosophy
1990 1995 Ph.D. from MIT Media Laboratory in Media Arts and Sciences
1993 1998 Member of the Research Staff and Project Coordinator at Interval Research Corporation
1999 2002 Chairman and CTO of Amova
3Marc Davis Research
- Creating technology and applications that will
enable daily media consumers to become daily
media producers - Research and teaching in the theory, design, and
development of digital media systems for creating
and using media metadata to automate media
production and reuse
4Presentation Outline
- Problem Setting
- Representing Video
- Current Approaches
- New Solutions
- Methodological Considerations
- Future Work
5Presentation Outline
- Problem Setting
- Representing Video
- Current Approaches
- New Solutions
- Methodological Considerations
- Future Work
6Global Media Network
- Digital video produced anywhere by anyone
accessible to anyone anywhere - Todays video users become tomorrows video
producers - Not 500 Channels 500,000,000 Video Web Sites
7What is the Problem?
- Today people cannot easily create, find, edit,
share, and reuse media - Computers dont understand video content
- Video is opaque and data rich
- We lack structured representations
- Without content representation (metadata),
manipulating digital video will remain like
word-processing with bitmaps
8Technology Goals
- Goals
- Increase access to media content
- Decrease effort in media handling and reuse
- Improve usefulness of media content
- Technology
- Create metadata about media content
- Use metadata to automate media production and
reuse
9Presentation Outline
- Problem Setting
- Representing Video
- Current Approaches
- New Solutions
- Methodological Considerations
- Future Work
10Representing Video
- Streams vs. Clips
- Video syntax and semantics
- Ontological issues in video representation
11Video is Temporal
12Streams vs. Clips
13Stream-Based Representation
- Makes annotation pay off
- The richer the annotation, the more numerous the
possible segmentations of the video stream - Clips
- Change from being fixed segmentations of the
video stream, to being the results of retrieval
queries based on annotations of the video stream - Annotations
- Create representations which make clips, not
representations of clips
14Video Syntax and Semantics
- The Kuleshov Effect
- Video has a dual semantics
- Sequence-independent invariant semantics of shots
- Sequence-dependent variable semantics of shots
15Ontological Issues for Video
- Video plays with rules for identity and
continuity - Space
- Time
- Character
- Action
16Space and Time Actual vs. Inferable
- Actual Recorded Space and Time
- GPS
- Studio space and time
- Inferable Space and Time
- Establishing shots
- Cues and clues
17Character and Continuity
- Identity of character is constructed through
- Continuity of actor
- Continuity of role
- Alternative continuities
- Continuity of actor only
- Continuity of role only
18Representing Action
- Physically-based description for
sequence-independent action semantics - Abstract vs. conventionalized descriptions
- Temporally and spatially decomposable actions and
subactions - Issues in describing sequence-dependent action
semantics - Mental states (emotions vs. expressions)
- Cultural differences (e.g., bowing vs. greeting)
19Cinematic Actions
- Cinematic actions support the basic narrative
structure of cinema - Reactions/Proactions
- Nodding, screaming, laughing, etc.
- Focus of Attention
- Gazing, headturning, pointing, etc.
- Locomotion
- Walking, running, etc.
- Cinematic actions can occur
- Within the frame/shot boundary
- Across the frame boundary
- Across shot boundaries
20Presentation Outline
- Problem Setting
- Representing Video
- Current Approaches
- New Solutions
- Methodological Considerations
- Future Work
21The Search for Solutions
- Current approaches to creating metadata dont
work - Signal-based analysis
- Keywords
- Natural language
- Need standardized metadata framework
- Designed for video and rich media data
- Human and machine readable and writable
- Standardized and scaleable
- Integrated into media capture, archiving,
editing, distribution, and reuse
22Signal-Based Parsing
- Practical problem
- Parsing unstructured, unknown video is very, very
hard - Theoretical problem
- Mismatch between percepts and concepts
23Perceptual/Conceptual Issue
Similar Percepts / Dissimilar Concepts
Clown Nose
Red Sun
24Perceptual/Conceptual Issue
Dissimilar Percepts / Similar Concepts
John Dillingers
Timothy McVeighs
Car
Car
25Signal-Based Parsing
- Effective and useful automatic parsing
- Video
- Scene break detection
- Camera motion analysis
- Low level visual similarity
- Feature tracking
- Audio
- Pause detection
- Audio pattern matching
- Simple speech recognition
- Approaches to automated parsing
- At the point of capture, integrate the recording
device, the environment, and agents in the
environment into an interactive system - After capture, use human-in-the-loop algorithms
to leverage human and machine intelligence
26Keywords vs. Semantic Descriptors
dog, biting, Steve
27Keywords vs. Semantic Descriptors
dog, biting, Steve
28Why Keywords Dont Work
- Are not a semantic representation
- Do not describe relations between descriptors
- Do not describe temporal structure
- Do not converge
- Do not scale
29Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
30Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
31Notation for Time-Based Media Music
32Presentation Outline
- Problem Setting
- Representing Video
- Current Approaches
- New Solutions
- Methodological Considerations
- Future Work
33New Solutions for Creating Metadata
After Capture
During Capture
34New Solutions for Creating Metadata
After Capture
During Capture
35After Capture Media Streams
36Media Streams Features
- Key features
- Stream-based representation (better segmentation)
- Semantic indexing (what things are similar to)
- Relational indexing (who is doing what to whom)
- Temporal indexing (when things happen)
- Iconic interface (designed visual language)
- Universal annotation (standardized markup schema)
- Key benefits
- More accurate annotation and retrieval
- Global usability and standardization
- Reuse of rich media according to content and
structure
37New Solutions for Creating Metadata
After Capture
During Capture
38Moores Law for Cameras
2000
2002
400
Kodak DX4900
Kodak DC40
40
SiPix StyleCam Blink
Nintendo GameBoy Camera
39From Manual to Automated Production
1990 - 2000
Manual Production Process Manual Generic
Tools Point Solutions
Post-Production Focus
- Results
- Software tools are difficult to use
- Difficult to store, retrieve, edit media
- User is focused on production
- Results
- Easy to use automated solutions
- Reusable personalizable media assets
- Experience and activity driven
Result Lose the advantages of digital media
40Creating Metadata During Capture
Current Capture Paradigm Multiple Captures To
Get 1 Good Capture
New Capture Paradigm 1 Good Capture Drives
Multiple Uses
41Active Capture
42Active Capture
- Active engagement and communication among the
capture device, agent(s), and the environment - Re-envision capture as a control system with
feedback
- Use multiple data sources and communication to
simplify the capture scenario - Use HCI to support human-in-the-loop algorithms
for computer vision and audition
43Active Capture
- Automated direction
- No need for director
- Real-time audio-video analysis in an interactive
control loop - Computer-controlled interactive audio and visual
cues - Automated cinematography
- No need for camera and sound crew
- Real-time audio-video analysis in an interactive
control loop - Automated post-production reframing and
relighting of video
44Active Capture Good Capture
45Active Capture Error Handling
46Jim Lanahan in an MCI Ad
47Jim Lanahan in T2 Trailer
48Jim Lanahan in an _at_Home Banner
49Evolution of Media Production
- Customized production
- Skilled creation of one media product
- Mass production
- Automatic replication of one media product
- Mass customization
- Skilled creation of adaptive media templates
- Automatic production of customized media
50Editing Paradigm Has Not Changed
51Central Idea Movies as Programs
- Movies change from being static data to programs
- Shots are inputs to a program that computes new
media based on content representation and
functional dependency (US Patents 6,243,087
5,969,716)
52Automatic Video and Audio Editing
Automatically edit the output movie based on
content representation of dialogue and sound
Example of editing based on dialogue
Example of synchronizing video to music
53Automatic Audio-Video Synchronization
Raw Celery Chopping Video
U2 Numb Audio
Unsynched Numb Celery Music Video
Synched Numb Celery Music Video
54Adaptive Media
- Adaptive Media Templates
- Co-adapt template media assets and input media
assets - Based on the content of the media assets and a
set of functions and parameters - To compute unique customized and personalized
media results - Adaptive Media Functions
- Take in media and metadata ? produce new media
55Adaptive Media Design Space
Author- Generated
Compilation Movie Making
Traditional Movie Making
Historical Documentary Movie Making
Structure
Not Author- Generated
Author- Generated
Content
56Adaptive Media Design Space
57The Blank Page Approach
58Captain Zoom IV MadLib
59Constructing With Lego Blocks
60Video MadLibs and Video Lego
- Video MadLibs
- Adaptive media template with open slots
- Structure is fixed
- Content can be varied
- Video Lego
- Reusable media components that know how to fit
together - Structure is constrained
- Content can be varied
61Automated Media Production Process
Reusable Online Asset Database
62Technology Summary
- Active Capture automates direction and
cinematography to create reusable media assets - Adaptive Media uses adaptive media templates and
automatic editing functions to eliminate the need
for editing on the part of end users - Media Streams provides a framework for creating
metadata to make media assets searchable and
reusable - Together, these technologies will automate,
personalize, and speed up media production,
distribution, and reuse
63Patents
- Patents Issued
- Time-Based Media Processing System. US Patent
6,243,087. Continuation of US Patent 5,969,716.
Filed September 28, 1999. Issued June 5,
2001. - Patents Pending
- Automatic Personalized Media Creation System.
Filed January 3, 2000. - Automatic User Performance Capture System.
Filed January 3, 2000. - Automatic Media Editing System. Filed January
3, 2000. - Method for Creating Reusable Automatic
Personalized Media. Filed January 3, 2000. - Automatic Media and Advertising System. Filed
January 3, 2000. - Automatic Electronic Advertising Viewership
Tracking System. Filed January 3, 2000. - Automatic Personalized Media Identification
System. Filed January 3, 2000. - Secure Uniform Resource Locator System. Filed
January 3, 2000.
64Presentation Outline
- Problem Setting
- Representing Video
- Current Approaches
- New Solutions
- Methodological Considerations
- Future Work
65Methodological Considerations
- Techne-centered methodology
- Construction of theories informed by constructing
artifacts - Construction of artifacts informed by
(de)constructing theories - Practitioners Kuleshov, Eisenstein, Papert,
Narrative Intelligence Reading Group - Inherently interdisciplinary activity
- Information science, computer science, film
theory and production, media studies, semiotics,
user interface and interaction design and testing
66Presentation Outline
- Problem Setting
- Representing Video
- Current Approaches
- New Solutions
- Methodological Considerations
- Future Work
67Computational Media
- More intimately integrate two great 20th century
inventions
68Technical Research Challenges
- Develop end-to-end metadata system for automated
media capture, processing, management, and reuse - Creating metadata
- Represent action sequences and higher level
narrative structures - Integrate legacy metadata (keywords, natural
language) - Gather more and better metadata at the point of
capture (develop metadata cameras) - Develop human-in-the-loop indexing algorithms
and interfaces - Using metadata
- Develop media components (MediaLego)
- Integrate linguistic and other query interfaces
69Non-Technical Challenges
- Standardization of media metadata (MPEG-7)
- Broadband infrastructure and deployment
- Intellectual property and economic models for
sharing and reuse of media assets
70Garage Cinema Research Projects
- Media Metadata
- Moving Media Streams to MPEG-7 (XML)
- Creating Java-based Web annotation and retrieval
front-end - Integration throughout production cycle
- Active Capture
- Developing more Active Capture routines
- Adaptive Media
- Developing higher-order functions
- Hello World Application
- E-Berkeley Photo ID
71For More Info
- Marc Davis Email
- marc_at_sims.berkeley.edu
- Marc Davis Web Site
- www.sims.berkeley.edu/marc
- Spring 2003 course on Multimedia Information at
UC Berkeley SIMS