Prof. Marc Davis - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Prof. Marc Davis

Description:

Prof' Marc Davis – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 72
Provided by: ValuedGate2256
Category:
Tags: davis | dikdik | marc | prof

less

Transcript and Presenter's Notes

Title: Prof. Marc Davis


1
Towards Computational Media Metadata for
Media Automation and Reuse
  • Prof. Marc Davis
  • University of California at Berkeley
  • School of Information Management and Systems
  • www.sims.berkeley.edu/marc

2
Marc Davis Background
  • New Assistant Professor at SIMS (School of
    Information Management and Systems)
  • Background

1980 1984 B.A. from Wesleyan University in the College of Letters
1984 1987 M.A. from the University of Konstanz in Literary Theory and Philosophy
1990 1995 Ph.D. from MIT Media Laboratory in Media Arts and Sciences
1993 1998 Member of the Research Staff and Project Coordinator at Interval Research Corporation
1999 2002 Chairman and CTO of Amova
3
Marc Davis Research
  • Creating technology and applications that will
    enable daily media consumers to become daily
    media producers
  • Research and teaching in the theory, design, and
    development of digital media systems for creating
    and using media metadata to automate media
    production and reuse

4
Presentation Outline
  • Problem Setting
  • Representing Video
  • Current Approaches
  • New Solutions
  • Methodological Considerations
  • Future Work

5
Presentation Outline
  • Problem Setting
  • Representing Video
  • Current Approaches
  • New Solutions
  • Methodological Considerations
  • Future Work

6
Global Media Network
  • Digital video produced anywhere by anyone
    accessible to anyone anywhere
  • Todays video users become tomorrows video
    producers
  • Not 500 Channels 500,000,000 Video Web Sites

7
What is the Problem?
  • Today people cannot easily create, find, edit,
    share, and reuse media
  • Computers dont understand video content
  • Video is opaque and data rich
  • We lack structured representations
  • Without content representation (metadata),
    manipulating digital video will remain like
    word-processing with bitmaps

8
Technology Goals
  • Goals
  • Increase access to media content
  • Decrease effort in media handling and reuse
  • Improve usefulness of media content
  • Technology
  • Create metadata about media content
  • Use metadata to automate media production and
    reuse

9
Presentation Outline
  • Problem Setting
  • Representing Video
  • Current Approaches
  • New Solutions
  • Methodological Considerations
  • Future Work

10
Representing Video
  • Streams vs. Clips
  • Video syntax and semantics
  • Ontological issues in video representation

11
Video is Temporal
12
Streams vs. Clips
13
Stream-Based Representation
  • Makes annotation pay off
  • The richer the annotation, the more numerous the
    possible segmentations of the video stream
  • Clips
  • Change from being fixed segmentations of the
    video stream, to being the results of retrieval
    queries based on annotations of the video stream
  • Annotations
  • Create representations which make clips, not
    representations of clips

14
Video Syntax and Semantics
  • The Kuleshov Effect
  • Video has a dual semantics
  • Sequence-independent invariant semantics of shots
  • Sequence-dependent variable semantics of shots

15
Ontological Issues for Video
  • Video plays with rules for identity and
    continuity
  • Space
  • Time
  • Character
  • Action

16
Space and Time Actual vs. Inferable
  • Actual Recorded Space and Time
  • GPS
  • Studio space and time
  • Inferable Space and Time
  • Establishing shots
  • Cues and clues

17
Character and Continuity
  • Identity of character is constructed through
  • Continuity of actor
  • Continuity of role
  • Alternative continuities
  • Continuity of actor only
  • Continuity of role only

18
Representing Action
  • Physically-based description for
    sequence-independent action semantics
  • Abstract vs. conventionalized descriptions
  • Temporally and spatially decomposable actions and
    subactions
  • Issues in describing sequence-dependent action
    semantics
  • Mental states (emotions vs. expressions)
  • Cultural differences (e.g., bowing vs. greeting)

19
Cinematic Actions
  • Cinematic actions support the basic narrative
    structure of cinema
  • Reactions/Proactions
  • Nodding, screaming, laughing, etc.
  • Focus of Attention
  • Gazing, headturning, pointing, etc.
  • Locomotion
  • Walking, running, etc.
  • Cinematic actions can occur
  • Within the frame/shot boundary
  • Across the frame boundary
  • Across shot boundaries

20
Presentation Outline
  • Problem Setting
  • Representing Video
  • Current Approaches
  • New Solutions
  • Methodological Considerations
  • Future Work

21
The Search for Solutions
  • Current approaches to creating metadata dont
    work
  • Signal-based analysis
  • Keywords
  • Natural language
  • Need standardized metadata framework
  • Designed for video and rich media data
  • Human and machine readable and writable
  • Standardized and scaleable
  • Integrated into media capture, archiving,
    editing, distribution, and reuse

22
Signal-Based Parsing
  • Practical problem
  • Parsing unstructured, unknown video is very, very
    hard
  • Theoretical problem
  • Mismatch between percepts and concepts

23
Perceptual/Conceptual Issue
Similar Percepts / Dissimilar Concepts
Clown Nose
Red Sun
24
Perceptual/Conceptual Issue
Dissimilar Percepts / Similar Concepts
John Dillingers
Timothy McVeighs
Car
Car
25
Signal-Based Parsing
  • Effective and useful automatic parsing
  • Video
  • Scene break detection
  • Camera motion analysis
  • Low level visual similarity
  • Feature tracking
  • Audio
  • Pause detection
  • Audio pattern matching
  • Simple speech recognition
  • Approaches to automated parsing
  • At the point of capture, integrate the recording
    device, the environment, and agents in the
    environment into an interactive system
  • After capture, use human-in-the-loop algorithms
    to leverage human and machine intelligence

26
Keywords vs. Semantic Descriptors
dog, biting, Steve
27
Keywords vs. Semantic Descriptors
dog, biting, Steve
28
Why Keywords Dont Work
  • Are not a semantic representation
  • Do not describe relations between descriptors
  • Do not describe temporal structure
  • Do not converge
  • Do not scale

29
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
30
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
31
Notation for Time-Based Media Music
32
Presentation Outline
  • Problem Setting
  • Representing Video
  • Current Approaches
  • New Solutions
  • Methodological Considerations
  • Future Work

33
New Solutions for Creating Metadata
After Capture
During Capture
34
New Solutions for Creating Metadata
After Capture
During Capture
35
After Capture Media Streams
36
Media Streams Features
  • Key features
  • Stream-based representation (better segmentation)
  • Semantic indexing (what things are similar to)
  • Relational indexing (who is doing what to whom)
  • Temporal indexing (when things happen)
  • Iconic interface (designed visual language)
  • Universal annotation (standardized markup schema)
  • Key benefits
  • More accurate annotation and retrieval
  • Global usability and standardization
  • Reuse of rich media according to content and
    structure

37
New Solutions for Creating Metadata
After Capture
During Capture
38
Moores Law for Cameras
2000
2002
400
Kodak DX4900
Kodak DC40
40
SiPix StyleCam Blink
Nintendo GameBoy Camera
39
From Manual to Automated Production
1990 - 2000
Manual Production Process Manual Generic
Tools Point Solutions
Post-Production Focus
  • Results
  • Software tools are difficult to use
  • Difficult to store, retrieve, edit media
  • User is focused on production
  • Results
  • Easy to use automated solutions
  • Reusable personalizable media assets
  • Experience and activity driven

Result Lose the advantages of digital media
40
Creating Metadata During Capture
Current Capture Paradigm Multiple Captures To
Get 1 Good Capture
New Capture Paradigm 1 Good Capture Drives
Multiple Uses
41
Active Capture
42
Active Capture
  • Active engagement and communication among the
    capture device, agent(s), and the environment
  • Re-envision capture as a control system with
    feedback
  • Use multiple data sources and communication to
    simplify the capture scenario
  • Use HCI to support human-in-the-loop algorithms
    for computer vision and audition

43
Active Capture
  • Automated direction
  • No need for director
  • Real-time audio-video analysis in an interactive
    control loop
  • Computer-controlled interactive audio and visual
    cues
  • Automated cinematography
  • No need for camera and sound crew
  • Real-time audio-video analysis in an interactive
    control loop
  • Automated post-production reframing and
    relighting of video

44
Active Capture Good Capture
45
Active Capture Error Handling
46
Jim Lanahan in an MCI Ad
47
Jim Lanahan in T2 Trailer
48
Jim Lanahan in an _at_Home Banner
49
Evolution of Media Production
  • Customized production
  • Skilled creation of one media product
  • Mass production
  • Automatic replication of one media product
  • Mass customization
  • Skilled creation of adaptive media templates
  • Automatic production of customized media

50
Editing Paradigm Has Not Changed
51
Central Idea Movies as Programs
  • Movies change from being static data to programs
  • Shots are inputs to a program that computes new
    media based on content representation and
    functional dependency (US Patents 6,243,087
    5,969,716)

52
Automatic Video and Audio Editing
Automatically edit the output movie based on
content representation of dialogue and sound
Example of editing based on dialogue
Example of synchronizing video to music
53
Automatic Audio-Video Synchronization
Raw Celery Chopping Video
U2 Numb Audio
Unsynched Numb Celery Music Video
Synched Numb Celery Music Video
54
Adaptive Media
  • Adaptive Media Templates
  • Co-adapt template media assets and input media
    assets
  • Based on the content of the media assets and a
    set of functions and parameters
  • To compute unique customized and personalized
    media results
  • Adaptive Media Functions
  • Take in media and metadata ? produce new media

55
Adaptive Media Design Space
Author- Generated
Compilation Movie Making
Traditional Movie Making
Historical Documentary Movie Making
Structure
Not Author- Generated
Author- Generated
Content
56
Adaptive Media Design Space
57
The Blank Page Approach
58
Captain Zoom IV MadLib
59
Constructing With Lego Blocks
60
Video MadLibs and Video Lego
  • Video MadLibs
  • Adaptive media template with open slots
  • Structure is fixed
  • Content can be varied
  • Video Lego
  • Reusable media components that know how to fit
    together
  • Structure is constrained
  • Content can be varied

61
Automated Media Production Process
Reusable Online Asset Database
62
Technology Summary
  • Active Capture automates direction and
    cinematography to create reusable media assets
  • Adaptive Media uses adaptive media templates and
    automatic editing functions to eliminate the need
    for editing on the part of end users
  • Media Streams provides a framework for creating
    metadata to make media assets searchable and
    reusable
  • Together, these technologies will automate,
    personalize, and speed up media production,
    distribution, and reuse

63
Patents
  • Patents Issued
  • Time-Based Media Processing System. US Patent
    6,243,087. Continuation of US Patent 5,969,716.
    Filed September 28, 1999. Issued June 5,
    2001.
  • Patents Pending
  • Automatic Personalized Media Creation System.
    Filed January 3, 2000.
  • Automatic User Performance Capture System.
    Filed January 3, 2000.
  • Automatic Media Editing System. Filed January
    3, 2000.
  • Method for Creating Reusable Automatic
    Personalized Media. Filed January 3, 2000.
  • Automatic Media and Advertising System. Filed
    January 3, 2000.
  • Automatic Electronic Advertising Viewership
    Tracking System. Filed January 3, 2000.
  • Automatic Personalized Media Identification
    System. Filed January 3, 2000.
  • Secure Uniform Resource Locator System. Filed
    January 3, 2000.

64
Presentation Outline
  • Problem Setting
  • Representing Video
  • Current Approaches
  • New Solutions
  • Methodological Considerations
  • Future Work

65
Methodological Considerations
  • Techne-centered methodology
  • Construction of theories informed by constructing
    artifacts
  • Construction of artifacts informed by
    (de)constructing theories
  • Practitioners Kuleshov, Eisenstein, Papert,
    Narrative Intelligence Reading Group
  • Inherently interdisciplinary activity
  • Information science, computer science, film
    theory and production, media studies, semiotics,
    user interface and interaction design and testing

66
Presentation Outline
  • Problem Setting
  • Representing Video
  • Current Approaches
  • New Solutions
  • Methodological Considerations
  • Future Work

67
Computational Media
  • More intimately integrate two great 20th century
    inventions

68
Technical Research Challenges
  • Develop end-to-end metadata system for automated
    media capture, processing, management, and reuse
  • Creating metadata
  • Represent action sequences and higher level
    narrative structures
  • Integrate legacy metadata (keywords, natural
    language)
  • Gather more and better metadata at the point of
    capture (develop metadata cameras)
  • Develop human-in-the-loop indexing algorithms
    and interfaces
  • Using metadata
  • Develop media components (MediaLego)
  • Integrate linguistic and other query interfaces

69
Non-Technical Challenges
  • Standardization of media metadata (MPEG-7)
  • Broadband infrastructure and deployment
  • Intellectual property and economic models for
    sharing and reuse of media assets

70
Garage Cinema Research Projects
  • Media Metadata
  • Moving Media Streams to MPEG-7 (XML)
  • Creating Java-based Web annotation and retrieval
    front-end
  • Integration throughout production cycle
  • Active Capture
  • Developing more Active Capture routines
  • Adaptive Media
  • Developing higher-order functions
  • Hello World Application
  • E-Berkeley Photo ID

71
For More Info
  • Marc Davis Email
  • marc_at_sims.berkeley.edu
  • Marc Davis Web Site
  • www.sims.berkeley.edu/marc
  • Spring 2003 course on Multimedia Information at
    UC Berkeley SIMS
Write a Comment
User Comments (0)
About PowerShow.com