Lecture 10: Metadata for Media - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 10: Metadata for Media

Description:

2003.09.23 - SLIDE 1. IS 202 FALL 2003. Lecture 10: Metadata for Media ... Tuesday and Thursday 10:30 am - 12:00 pm. Fall 2003 ... Jesse Mendelsohn on Media Streams ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 63
Provided by: ValuedGate1
Category:

less

Transcript and Presenter's Notes

Title: Lecture 10: Metadata for Media


1
Lecture 10 Metadata for Media
SIMS 202 Information Organization and Retrieval
  • Prof. Ray Larson Prof. Marc Davis
  • UC Berkeley SIMS
  • Tuesday and Thursday 1030 am - 1200 pm
  • Fall 2003
  • http//www.sims.berkeley.edu/academics/courses/is2
    02/f03/

2
Todays Agenda
  • Review of Last Time
  • Metadata for Motion Pictures
  • Representing Video
  • Current Approaches
  • Media Streams
  • Discussion Questions
  • Action Items for Next Time

3
Todays Agenda
  • Review of Last Time
  • Metadata for Motion Pictures
  • Representing Video
  • Current Approaches
  • Media Streams
  • Discussion Questions
  • Action Items for Next Time

4
The Media Opportunity
  • Vastly more media will be produced
  • Without ways to manage it (metadata creation and
    use) we lose the advantages of digital media
  • Most current approaches are insufficient and
    perhaps misguided
  • Great opportunity for innovation and invention
  • Need interdisciplinary approaches to the problem

5
What is the Problem?
  • Today people cannot easily find, edit, share, and
    reuse media
  • Computers dont understand media content
  • Media is opaque and data rich
  • We lack structured representations
  • Without content representation (metadata),
    manipulating digital media will remain like
    word-processing with bitmaps

6
Traditional Media Production Chain
METADATA
Metadata-Centric Production Chain
PRE-PRODUCTION
POST-PRODUCTION
PRODUCTION
DISTRIBUTION
7
Automated Media Production Process
8
Technology Summary
  • Media Streams provides a framework for creating
    metadata throughout the media production cycle to
    make media assets searchable and reusable
  • Active Capture automates direction and
    cinematography using real-time audio-video
    analysis in an interactive control loop to create
    reusable media assets
  • Adaptive Media uses adaptive media templates and
    automatic editing functions to mass customize and
    personalize media and thereby eliminate the need
    for editing on the part of end users
  • Together, these technologies will automate,
    personalize, and speed up media production,
    distribution, and reuse

9
Active Capture
10
Active Capture Reusable Shots
11
Marc Davis in Godzilla Scene
12
Evolution of Media Production
  • Customized production
  • Skilled creation of one media product
  • Mass production
  • Automatic replication of one media product
  • Mass customization
  • Skilled creation of adaptive media templates
  • Automatic production of customized media

13
Central Idea Movies as Programs
  • Movies change from being static data to programs
  • Shots are inputs to a program that computes new
    media based on content representation and
    functional dependency (US Patents 6,243,087
    5,969,716)

14
Todays Agenda
  • Review of Last Time
  • Metadata for Motion Pictures
  • Representing Video
  • Current Approaches
  • Media Streams
  • Discussion Questions
  • Action Items for Next Time

15
Representing Video
  • Streams vs. Clips
  • Video syntax and semantics
  • Ontological issues in video representation

16
Video is Temporal
17
Streams vs. Clips
18
Stream-Based Representation
  • Makes annotation pay off
  • The richer the annotation, the more numerous the
    possible segmentations of the video stream
  • Clips
  • Change from being fixed segmentations of the
    video stream, to being the results of retrieval
    queries based on annotations of the video stream
  • Annotations
  • Create representations which make clips, not
    representations of clips

19
Video Syntax and Semantics
  • The Kuleshov Effect
  • Video has a dual semantics
  • Sequence-independent invariant semantics of shots
  • Sequence-dependent variable semantics of shots

20
Ontological Issues for Video
  • Video plays with rules for identity and
    continuity
  • Space
  • Time
  • Person
  • Action

21
Space and Time Actual vs. Inferable
  • Actual Recorded Space and Time
  • GPS
  • Studio space and time
  • Inferable Space and Time
  • Establishing shots
  • Cues and clues

22
Time Temporal Durations
  • Story (Fabula) Duration
  • Example Brushing teeth in story world (5
    minutes)
  • Plot (Syuzhet) Duration
  • Example Brushing teeth in plot world (1 minute
    6 steps of 10 seconds each)
  • Screen Duration
  • Example Brushing teeth (10 seconds 2 shots of 5
    seconds each)

23
Character and Continuity
  • Identity of character is constructed through
  • Continuity of actor
  • Continuity of role
  • Alternative continuities
  • Continuity of actor only
  • Continuity of role only

24
Representing Action
  • Physically-based description for
    sequence-independent action semantics
  • Abstract vs. conventionalized descriptions
  • Temporally and spatially decomposable actions and
    subactions
  • Issues in describing sequence-dependent action
    semantics
  • Mental states (emotions vs. expressions)
  • Cultural differences (e.g., bowing vs. greeting)

25
Cinematic Actions
  • Cinematic actions support the basic narrative
    structure of cinema
  • Reactions/Proactions
  • Nodding, screaming, laughing, etc.
  • Focus of Attention
  • Gazing, headturning, pointing, etc.
  • Locomotion
  • Walking, running, etc.
  • Cinematic actions can occur
  • Within the frame/shot boundary
  • Across the frame boundary
  • Across shot boundaries

26
Todays Agenda
  • Review of Last Time
  • Metadata for Motion Pictures
  • Representing Video
  • Current Approaches
  • Media Streams
  • Discussion Questions
  • Action Items for Next Time

27
The Search for Solutions
  • Current approaches to creating metadata dont
    work
  • Signal-based analysis
  • Keywords
  • Natural language
  • Need standardized metadata framework
  • Designed for video and rich media data
  • Human and machine readable and writable
  • Standardized and scaleable
  • Integrated into media capture, archiving,
    editing, distribution, and reuse

28
Signal-Based Parsing
  • Practical problem
  • Parsing unstructured, unknown video is very, very
    hard
  • Theoretical problem
  • Mismatch between percepts and concepts

29
Perceptual/Conceptual Issue
Similar Percepts / Dissimilar Concepts
Clown Nose
Red Sun
30
Perceptual/Conceptual Issue
Dissimilar Percepts / Similar Concepts
John Dillingers
Timothy McVeighs
Car
Car
31
Signal-Based Parsing
  • Effective and useful automatic parsing
  • Video
  • Shot boundary detection
  • Camera motion analysis
  • Low level visual similarity
  • Feature tracking
  • Face detection
  • Audio
  • Pause detection
  • Audio pattern matching
  • Simple speech recognition
  • Speech vs. music detection
  • Approaches to automated parsing
  • At the point of capture, integrate the recording
    device, the environment, and agents in the
    environment into an interactive system
  • After capture, use human-in-the-loop algorithms
    to leverage human and machine intelligence

32
Keywords vs. Semantic Descriptors
dog, biting, Steve
33
Keywords vs. Semantic Descriptors
dog, biting, Steve
34
Why Keywords Dont Work
  • Are not a semantic representation
  • Do not describe relations between descriptors
  • Do not describe temporal structure
  • Do not converge
  • Do not scale

35
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
36
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
37
Notation for Time-Based Media Music
38
Visual Language Advantages
  • A language designed as an accurate and readable
    representation of time-based media
  • For video, especially important for actions,
    expressions, and spatial relations
  • Enables Gestalt view and quick recognition of
    descriptors due to designed visual similarities
  • Supports global use of annotations

39
Todays Agenda
  • Review of Last Time
  • Metadata for Motion Pictures
  • Representing Video
  • Current Approaches
  • Media Streams
  • Discussion Questions
  • Action Items for Next Time

40
After Capture Media Streams
41
Media Streams Features
  • Key features
  • Stream-based representation (better segmentation)
  • Semantic indexing (what things are similar to)
  • Relational indexing (who is doing what to whom)
  • Temporal indexing (when things happen)
  • Iconic interface (designed visual language)
  • Universal annotation (standardized markup schema)
  • Key benefits
  • More accurate annotation and retrieval
  • Global usability and standardization
  • Reuse of rich media according to content and
    structure

42
Media Streams GUI Components
  • Media Time Line
  • Icon Space
  • Icon Workshop
  • Icon Palette

43
Media Time Line
  • Visualize video at multiple time scales
  • Write and read multi-layered iconic annotations
  • One interface for annotation, query, and
    composition

44
Media Time Line
45
Icon Space
  • Icon Workshop
  • Utilize categories of video representation
  • Create iconic descriptors by compounding iconic
    primitives
  • Extend set of iconic descriptors
  • Icon Palette
  • Dynamically group related sets of iconic
    descriptors
  • Reuse descriptive effort of others
  • View and use query results

46
Icon Space
47
Icon Space Icon Workshop
  • General to specific (horizontal)
  • Cascading hierarchy of icons with increasing
    specificity on subordinate levels
  • Combinatorial (vertical)
  • Compounding of hierarchically organized icons
    across multiple axes of description

48
Icon Space Icon Workshop Detail
49
Icon Space Icon Palette
  • Dynamically group related sets of iconic
    descriptors
  • Collect icon sentences
  • Reuse descriptive effort of others

50
Icon Space Icon Palette Detail
51
Video Retrieval In Media Streams
  • Same interface for annotation and retrieval
  • Assembles responses to queries as well as finds
    them
  • Query responses use semantics to degrade
    gracefully

52
Media Streams Technologies
  • Minimal video representation distinguishing
    syntax and semantics
  • Iconic visual language for annotating and
    retrieving video content
  • Retrieval-by-composition methods for repurposing
    video

53
Non-Technical Challenges
  • Standardization of media metadata (MPEG-7)
  • Broadband infrastructure and deployment
  • Intellectual property and economic models for
    sharing and reuse of media assets

54
Todays Agenda
  • Review of Last Time
  • Metadata for Motion Pictures
  • Representing Video
  • Current Approaches
  • Media Streams
  • Discussion Questions
  • Action Items for Next Time

55
Discussion Questions (Davis)
  • John Snydal on Media Streams
  • What is the target audience of users
    (annotators/retrievers) for Media Streams? In the
    article the following groups are mentioned
  • Content providers
  • Video editors
  • News teams
  • Documentary film makers
  • Film archives
  • Stock photo houses
  • Video archivists
  • Video producers
  • (international audience)
  • (illiterate and preliterate people)
  • Is it possible that Media Streams could satisfy
    the needs, goals and requirements of all of these
    groups, or would it be more appropriate to
    develop separate, tailored applications for the
    unique needs of each group?

56
Discussion Questions (Davis)
  • danah boyd on Media Streams
  • Icons require visual literacy. Icons are also
    culturally constructed. Thus, for them to work as
    an information access bit, people must learn the
    visual language it is not inherent. What are
    the social consequences of a system dependent on
    unfamiliar cues?

57
Discussion Questions (Davis)
  • danah boyd on Media Streams
  • Films are constructed narratives. But most
    commonplace storytelling is not. Even in a
    creative form, people often piece together found
    objects instead of finding objects to fit their
    story. (Think teenage girls making collages out
    of the latest YM.) Storytelling also happens
    around media far more than through media (i.e.
    telling a story about a picture rather than using
    a collection of pictures to tell a story). My
    guess is that this social phenomenon goes beyond
    the retrieval issues. Do you think that Media
    Streams would encourage new behavior regarding
    storytelling or will it only be useful for those
    with a constructed narrative in mind? Why (not)?

58
Discussion Questions (Davis)
  • Jesse Mendelsohn on Media Streams
  • Media Streams does not allow iconic descriptions
    of emotion or scene-interpretation. How would
    someone searching stock footage for a
    suspenseful scene of two men beating each other
    go about doing it? The actual sense of suspense
    and the act of beating cannot be iconified.
    Does this limit Media Streams' ability or is
    there a way around it within its capabilities as
    described?

59
Discussion Questions (Davis)
  • Jesse Mendelsohn on Media Streams
  • In order for Media Streams to work well it relies
    on a the availability of a very large and
    extensive resource of well-annotated video. Is
    the current annotation process too primitive
    and/or time consuming to allow Media Streams to
    work to its full potential? Will changing how
    Media Streams can be used to annotate video or
    changing video annotation methods in general make
    Media Streams more effective?

60
Todays Agenda
  • Review of Last Time
  • Metadata for Motion Pictures
  • Representing Video
  • Current Approaches
  • Media Streams
  • Discussion Questions
  • Action Items for Next Time

61
Assignment 4.1
  • Assignment 4.1
  • Phone Metadata Design - Part 1
  • Due Oct 2

62
Next Time
  • Database Design (RRL)
  • Readings
  • Handouts in Class
  • Database Modeling and Design -- Ch. 2 The ER
    Model - Basic Concepts (Teorey, T.J.)
  • Logical Database Design and the Relational Model
    (F. R. McFadden, J. A. Hoffer)
Write a Comment
User Comments (0)
About PowerShow.com