Prof. Marc Davis

About This Presentation

Title:

Prof. Marc Davis

Description:

Prof' Marc Davis – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 72

Provided by: ValuedGate2256

Learn more at: http://fusion.sims.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Prof. Marc Davis

1
Towards Computational Media Metadata for
Media Automation and Reuse

Prof. Marc Davis
University of California at Berkeley
School of Information Management and Systems
www.sims.berkeley.edu/marc

2
Marc Davis Background

New Assistant Professor at SIMS (School of
Information Management and Systems)
Background

1980 1984 B.A. from Wesleyan University in the College of Letters
1984 1987 M.A. from the University of Konstanz in Literary Theory and Philosophy
1990 1995 Ph.D. from MIT Media Laboratory in Media Arts and Sciences
1993 1998 Member of the Research Staff and Project Coordinator at Interval Research Corporation
1999 2002 Chairman and CTO of Amova
3
Marc Davis Research

Creating technology and applications that will
enable daily media consumers to become daily
media producers
Research and teaching in the theory, design, and
development of digital media systems for creating
and using media metadata to automate media
production and reuse

4
Presentation Outline

Problem Setting
Representing Video
Current Approaches
New Solutions
Methodological Considerations
Future Work

5
Presentation Outline

Problem Setting
Representing Video
Current Approaches
New Solutions
Methodological Considerations
Future Work

6
Global Media Network

Digital video produced anywhere by anyone
accessible to anyone anywhere
Todays video users become tomorrows video
producers
Not 500 Channels 500,000,000 Video Web Sites

7
What is the Problem?

Today people cannot easily create, find, edit,
share, and reuse media
Computers dont understand video content
Video is opaque and data rich
We lack structured representations
Without content representation (metadata),
manipulating digital video will remain like
word-processing with bitmaps

8
Technology Goals

Goals
Increase access to media content
Decrease effort in media handling and reuse
Improve usefulness of media content
Technology
Create metadata about media content
Use metadata to automate media production and
reuse

9
Presentation Outline

Problem Setting
Representing Video
Current Approaches
New Solutions
Methodological Considerations
Future Work

10
Representing Video

Streams vs. Clips
Video syntax and semantics
Ontological issues in video representation

11
Video is Temporal
12
Streams vs. Clips
13
Stream-Based Representation

Makes annotation pay off
The richer the annotation, the more numerous the
possible segmentations of the video stream
Clips
Change from being fixed segmentations of the
video stream, to being the results of retrieval
queries based on annotations of the video stream
Annotations
Create representations which make clips, not
representations of clips

14
Video Syntax and Semantics

The Kuleshov Effect
Video has a dual semantics
Sequence-independent invariant semantics of shots
Sequence-dependent variable semantics of shots

15
Ontological Issues for Video

Video plays with rules for identity and
continuity
Space
Time
Character
Action

16
Space and Time Actual vs. Inferable

Actual Recorded Space and Time
GPS
Studio space and time
Inferable Space and Time
Establishing shots
Cues and clues

17
Character and Continuity

Identity of character is constructed through
Continuity of actor
Continuity of role
Alternative continuities
Continuity of actor only
Continuity of role only

18
Representing Action

Physically-based description for
sequence-independent action semantics
Abstract vs. conventionalized descriptions
Temporally and spatially decomposable actions and
subactions
Issues in describing sequence-dependent action
semantics
Mental states (emotions vs. expressions)
Cultural differences (e.g., bowing vs. greeting)

19
Cinematic Actions

Cinematic actions support the basic narrative
structure of cinema
Reactions/Proactions
Nodding, screaming, laughing, etc.
Focus of Attention
Gazing, headturning, pointing, etc.
Locomotion
Walking, running, etc.
Cinematic actions can occur
Within the frame/shot boundary
Across the frame boundary
Across shot boundaries

20
Presentation Outline

Problem Setting
Representing Video
Current Approaches
New Solutions
Methodological Considerations
Future Work

21
The Search for Solutions

Current approaches to creating metadata dont
work
Signal-based analysis
Keywords
Natural language
Need standardized metadata framework
Designed for video and rich media data
Human and machine readable and writable
Standardized and scaleable
Integrated into media capture, archiving,
editing, distribution, and reuse

22
Signal-Based Parsing

Practical problem
Parsing unstructured, unknown video is very, very
hard
Theoretical problem
Mismatch between percepts and concepts

23
Perceptual/Conceptual Issue
Similar Percepts / Dissimilar Concepts
Clown Nose
Red Sun
24
Perceptual/Conceptual Issue
Dissimilar Percepts / Similar Concepts
John Dillingers
Timothy McVeighs
Car
Car
25
Signal-Based Parsing

Effective and useful automatic parsing
Video
Scene break detection
Camera motion analysis
Low level visual similarity
Feature tracking
Audio
Pause detection
Audio pattern matching
Simple speech recognition

Approaches to automated parsing
At the point of capture, integrate the recording
device, the environment, and agents in the
environment into an interactive system
After capture, use human-in-the-loop algorithms
to leverage human and machine intelligence

26
Keywords vs. Semantic Descriptors
dog, biting, Steve
27
Keywords vs. Semantic Descriptors
dog, biting, Steve
28
Why Keywords Dont Work

Are not a semantic representation
Do not describe relations between descriptors
Do not describe temporal structure
Do not converge
Do not scale

29
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
30
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
31
Notation for Time-Based Media Music
32
Presentation Outline

Problem Setting
Representing Video
Current Approaches
New Solutions
Methodological Considerations
Future Work

33
New Solutions for Creating Metadata
After Capture
During Capture
34
New Solutions for Creating Metadata
After Capture
During Capture
35
After Capture Media Streams
36
Media Streams Features

Key features
Stream-based representation (better segmentation)
Semantic indexing (what things are similar to)
Relational indexing (who is doing what to whom)
Temporal indexing (when things happen)
Iconic interface (designed visual language)
Universal annotation (standardized markup schema)
Key benefits
More accurate annotation and retrieval
Global usability and standardization
Reuse of rich media according to content and
structure

37
New Solutions for Creating Metadata
After Capture
During Capture
38
Moores Law for Cameras
2000
2002
400
Kodak DX4900
Kodak DC40
40
SiPix StyleCam Blink
Nintendo GameBoy Camera
39
From Manual to Automated Production
1990 - 2000
Manual Production Process Manual Generic
Tools Point Solutions
Post-Production Focus

Results
Software tools are difficult to use
Difficult to store, retrieve, edit media
User is focused on production

Results
Easy to use automated solutions
Reusable personalizable media assets
Experience and activity driven

Result Lose the advantages of digital media
40
Creating Metadata During Capture
Current Capture Paradigm Multiple Captures To
Get 1 Good Capture
New Capture Paradigm 1 Good Capture Drives
Multiple Uses
41
Active Capture
42
Active Capture

Active engagement and communication among the
capture device, agent(s), and the environment
Re-envision capture as a control system with
feedback

Use multiple data sources and communication to
simplify the capture scenario
Use HCI to support human-in-the-loop algorithms
for computer vision and audition

43
Active Capture

Automated direction
No need for director
Real-time audio-video analysis in an interactive
control loop
Computer-controlled interactive audio and visual
cues
Automated cinematography
No need for camera and sound crew
Real-time audio-video analysis in an interactive
control loop
Automated post-production reframing and
relighting of video

44
Active Capture Good Capture
45
Active Capture Error Handling
46
Jim Lanahan in an MCI Ad
47
Jim Lanahan in T2 Trailer
48
Jim Lanahan in an _at_Home Banner
49
Evolution of Media Production

Customized production
Skilled creation of one media product
Mass production
Automatic replication of one media product
Mass customization
Skilled creation of adaptive media templates
Automatic production of customized media

50
Editing Paradigm Has Not Changed
51
Central Idea Movies as Programs

Movies change from being static data to programs
Shots are inputs to a program that computes new
media based on content representation and
functional dependency (US Patents 6,243,087
5,969,716)

52
Automatic Video and Audio Editing
Automatically edit the output movie based on
content representation of dialogue and sound
Example of editing based on dialogue
Example of synchronizing video to music
53
Automatic Audio-Video Synchronization
Raw Celery Chopping Video
U2 Numb Audio
Unsynched Numb Celery Music Video
Synched Numb Celery Music Video
54
Adaptive Media

Adaptive Media Templates
Co-adapt template media assets and input media
assets
Based on the content of the media assets and a
set of functions and parameters
To compute unique customized and personalized
media results
Adaptive Media Functions
Take in media and metadata ? produce new media

55
Adaptive Media Design Space
Author- Generated
Compilation Movie Making
Traditional Movie Making
Historical Documentary Movie Making
Structure
Not Author- Generated
Author- Generated
Content
56
Adaptive Media Design Space
57
The Blank Page Approach
58
Captain Zoom IV MadLib
59
Constructing With Lego Blocks
60
Video MadLibs and Video Lego

Video MadLibs
Adaptive media template with open slots
Structure is fixed
Content can be varied
Video Lego
Reusable media components that know how to fit
together
Structure is constrained
Content can be varied

61
Automated Media Production Process
Reusable Online Asset Database
62
Technology Summary

Active Capture automates direction and
cinematography to create reusable media assets
Adaptive Media uses adaptive media templates and
automatic editing functions to eliminate the need
for editing on the part of end users
Media Streams provides a framework for creating
metadata to make media assets searchable and
reusable
Together, these technologies will automate,
personalize, and speed up media production,
distribution, and reuse

63
Patents

Patents Issued
Time-Based Media Processing System. US Patent
6,243,087. Continuation of US Patent 5,969,716.
Filed September 28, 1999. Issued June 5,
2001.
Patents Pending
Automatic Personalized Media Creation System.
Filed January 3, 2000.
Automatic User Performance Capture System.
Filed January 3, 2000.
Automatic Media Editing System. Filed January
3, 2000.
Method for Creating Reusable Automatic
Personalized Media. Filed January 3, 2000.
Automatic Media and Advertising System. Filed
January 3, 2000.
Automatic Electronic Advertising Viewership
Tracking System. Filed January 3, 2000.
Automatic Personalized Media Identification
System. Filed January 3, 2000.
Secure Uniform Resource Locator System. Filed
January 3, 2000.

64
Presentation Outline

Problem Setting
Representing Video
Current Approaches
New Solutions
Methodological Considerations
Future Work

65
Methodological Considerations

Techne-centered methodology
Construction of theories informed by constructing
artifacts
Construction of artifacts informed by
(de)constructing theories
Practitioners Kuleshov, Eisenstein, Papert,
Narrative Intelligence Reading Group
Inherently interdisciplinary activity
Information science, computer science, film
theory and production, media studies, semiotics,
user interface and interaction design and testing

66
Presentation Outline

Problem Setting
Representing Video
Current Approaches
New Solutions
Methodological Considerations
Future Work

67
Computational Media

More intimately integrate two great 20th century
inventions

68
Technical Research Challenges

Develop end-to-end metadata system for automated
media capture, processing, management, and reuse
Creating metadata
Represent action sequences and higher level
narrative structures
Integrate legacy metadata (keywords, natural
language)
Gather more and better metadata at the point of
capture (develop metadata cameras)
Develop human-in-the-loop indexing algorithms
and interfaces
Using metadata
Develop media components (MediaLego)
Integrate linguistic and other query interfaces

69
Non-Technical Challenges

Standardization of media metadata (MPEG-7)
Broadband infrastructure and deployment
Intellectual property and economic models for
sharing and reuse of media assets

70
Garage Cinema Research Projects

Media Metadata
Moving Media Streams to MPEG-7 (XML)
Creating Java-based Web annotation and retrieval
front-end
Integration throughout production cycle
Active Capture
Developing more Active Capture routines
Adaptive Media
Developing higher-order functions
Hello World Application
E-Berkeley Photo ID

71
For More Info

Marc Davis Email
marc_at_sims.berkeley.edu
Marc Davis Web Site
www.sims.berkeley.edu/marc
Spring 2003 course on Multimedia Information at
UC Berkeley SIMS

Write a Comment

User Comments (0)

About PowerShow.com

Prof. Marc Davis - PowerPoint PPT Presentation

Prof. Marc Davis

Prof' Marc Davis – PowerPoint PPT presentation