Amplifying Video InformationSeeking Success through Rich, Exploratory Interfaces

1 / 37
About This Presentation
Title:

Amplifying Video InformationSeeking Success through Rich, Exploratory Interfaces

Description:

Amplifying Video Information-Seeking Success through Rich, Exploratory Interfaces ... and multimedia abstractions (e.g., break stories apart at silence points rather ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 38
Provided by: michaelc57
Learn more at: http://www.cs.cmu.edu

less

Transcript and Presenter's Notes

Title: Amplifying Video InformationSeeking Success through Rich, Exploratory Interfaces


1
Amplifying Video Information-Seeking Success
through Rich, Exploratory Interfaces
Mike Christel christel_at_cs.cmu.edu School of
Computer Science Carnegie Mellon University
KES-IIMSS July 11, 2008
2
Talk Outline
  • Creating metadata for video information sets
  • Informedia demonstrations (oral history
    collection, news video collection)
  • Types of search beyond fact-finding
  • Exploratory search through multiple views
  • Evaluation hurdles
  • Discussion
  • now is a perfect opportunity for leveraging user
    involvement for better video information-seeking
    experiences

3
User Involvement
  • User Correction Corrective action for metadata
    errors (analogous to Harry Shums vision at
    Microsoft for human-assisted computer vision
    success)
  • User Control Driving the interface to overcome
    metadata errors
  • User Context More useful interfaces driven
    implicitly by context

4
CMU Informedia Digital Video Research
  • Details at http//www.informedia.cs.cmu.edu
  • Speech recognition and alignment
  • Image processing
  • Named entity tagging
  • Synchronized metadata for search and navigation
  • Fast, direct video access to oral histories,
    news, etc.
  • Demonstration oral history corpus 913 hours of
    interviews from 400 individuals, 18,254 interview
    story segments (average story segment length of 3
    minutes)
  • Demonstration news corpus TRECVID 2006 test set
    (165 hours of U.S., Arabic, and Chinese news with
    79,484 reference shots)

5
Speech Recognition Functions
  • Generates transcript (if one is not given) to
    enable text-based retrieval from spoken language
    documents
  • Improves text synchronization to audio/video in
    presence of scripts (align speech with text)
  • Supplies necessary information for library
    segmentation and multimedia abstractions (e.g.,
    break stories apart at silence points rather than
    in the middle of sentences)

6
Speech Alignment Example
7
Image Understanding Functions
  • Scene segmentation
  • Similarity matching
  • Camera motion determination and object tracking
  • Optical Character Recognition (OCR) on video text
    and titles
  • Face detection and recognition
  • Ongoing research work in object identification
    and scene characterization, e.g., indoor/outdoor,
    road, building, etc.

8
Images containing similar colors
9
Images containing similar shapes
10
Images containing similar content
11
Goal Automatic Video Characterization
Shots
Yellowstone
Camera
Objects
Action
Captions
Scenery
12
Goal Automatic Video Characterization
Shots
Yellowstone
Camera
Objects
Action
Captions
Scenery
13
Automated Video Processing
  • Produces descriptive metadata for video libraries
  • Metadata has errors greater than metadata
    produced by a careful, human-provided annotation
  • Errors in metadata can be reduced
  • By more computation-intensive algorithms
  • By taking advantage of video frame-to-frame
    redundancy
  • By folding in context, e.g., probable text sizes
    in video
  • By folding in extra sources of knowledge, e.g., a
    dictionary for cleaning up VOCR, or labeled data
    revealing patterns for named entity detection
  • By human review and correction, which can
    generate additional labeled data for machine
    learning

14
Camera and Motion Detection
Pan
Right object motion (not pan left)
15
Text and Face Detection
16
Video OCR Block Diagram
Text Area Detection
Video
Text Area Preprocessing
Commercial OCR
ASCII Text
17
Video Frames Filtered Frames
AND-ed Frames
(1/2 s intervals)
18
VOCR Preprocessing Problems
19
Augmenting VOCR with Dictionary Look-up
20
Name-It Face/Name Association
21
Named Entity Extraction
F. Kubala, R. Schwartz, R. Stone, and R.
Weischedel, Named Entity Extraction from
Speech, Proc. DARPA Workshop on Broadcast News
Understanding Systems, Lansdowne, VA, February
1998.
CNN national correspondent John Holliman is at
Hartsfield International Airport in Atlanta.
Good morning, John. But there was one situation
here at Hartsfield where one airplane flying from
Atlanta to Newark, New Jersey yesterday had a
mechanical problem and it caused a backup that
spread throughout the whole system because even
though there were a lot of planes flying to the
New York area from the Atlanta area yesterday, .
Key Place, Time, Organization/Person
22
Enhancing Library Utility via Better Metadata
23
Improving the Interface via Usage Context
  • Example query-based thumbnail selection

24
Improving Utility through End-User Control
  • Example filtering storyboard based on visual
    concepts with user controlling precision and
    recall

25
Improving the Metadata via User Interaction
  • Example collecting positive and implicit
    negative sets of labeled shot data for visual
    concepts
  • Reference Ming-yu Chen, et al., ACM Multimedia
    2005

26
User Involvement
  • User Correction Corrective action for metadata
    errors (analogous to Harry Shums vision at
    Microsoft for human-assisted computer vision
    success)
  • User Control Driving the interface to overcome
    metadata errors
  • User Context More useful interfaces driven
    implicitly by context

27
Video Summaries (without User Context)
  • BBC rushes video summarization task in TRECVID
    2007 and TRECVID 2008 shows difficulty of the
    task
  • Video summary is a condensed version of some
    information, such that various judgments about
    the full information can be made using only the
    summary and taking less time and effort than
    would be required using the full information
    source
  • Maximum 4 duration (2 in TRECVID 2008)
  • Benefits of this TRECVID task provides a
    reasonably large video collection to be
    summarized, a uniform method of creating ground
    truth, and a uniform scoring mechanism

28
BBC Rushes
  • 42 test videos ( development ones) from BBC
    Archive
  • Test videos
  • minimum duration 3.3 minutes
  • maximum duration 36.4 minutes
  • mean duration 25 minutes
  • Raw (unedited) rush video with a great deal of
    redundancy (repeated takes), mixed quality audio,
    junk frames

29
Video Summaries (with/without User Context)
  • BBC Rush video has no context to build from
  • However, users often provide cues as to what is
    important, as will be seen shortly

30
Storyboards TRECVID Search Success
  • For the shot-based directed search information
    retrieval task evaluated at TRECVID, storyboards
    have consistently and overwhelmingly produced the
    best performance (see references in paper, e.g.,
    Snoek et al. 2007)
  • Motivated users can navigate through thousands of
    shot thumbnails in storyboards, better even than
    with extreme video retrieval interfaces 2487
    shots on average per 15 minute topic for TRECVID
    2006 Christel/Yan CIVR 2007
  • Storyboard benefits packed visual overview,
    trivial interactive control needed for overview,
    zoom and filter, details on demand
    Shneidermans Visual Information-Seeking Mantra

31
Beyond Fact-Finding
  • CACM April 2006 special issue on this topic
  • G. Marchionini (Exploratory Search From Finding
    to Understanding, CACM 49, April 2006) breaks
    down 3 types of search activities
  • Lookup (fact-finding solving stated/understood
    need)
  • Learn
  • Investigate
  • Computer scientists and information retrieval
    specialists emphasize evaluation of lookup
    activities (NIST TREC)
  • Real world interest in learn/investigate for an
    oral history collection, State Univ. New York at
    Buffalo Workshop library science and humanities
    participants quite interested in
    learn/investigate activities

32
Exploratory Search (Demonstrations)
  • Examples where storyboards still useful visual
    review, e.g., of disaster field footage
  • Where storyboards fail
  • Showing other facets like time, space,
    co-occurrence, named entities (When did disasters
    occur? Which ones? Where?)
  • Providing collection understanding, a holistic
    view of whats in say 100s of segments of 1000s
    of matching shots
  • Providing window into visually homogenous
    results, e.g., results from color search perhaps,
    or a corpus of just lecture slides, or
    head-and-shoulder interview shots
  • Claim Storyboards are not sufficient, but are
    part of a useful suite of tools/interfaces for
    interactive video search

33
Anecdotal Support for Claim
  • Collected 2006-2007 from
  • Government analysts with news data
  • History students and faculty with oral history
    data
  • Views Tested
  • Timeline
  • Visualization By Example (VIBE) Plot (query
    terms)
  • Map View
  • Named Entity view (people, places, organizations)
  • Text-dominant views
  • Nested Lists (pre-defined clusters by
    contributor)
  • Common Text (on-the-fly grouping of common
    phrases)

34
Anecdotal Results
  • 38 HistoryMakers corpus users (mostly students,
    15 female, average age 24), experienced web
    searchers, modest digital video experience
  • 6 intelligence analysts (1 female 2 older than
    40, 3 in their 30s, 1 in 20s), very experienced
    text searchers, experienced web searchers, novice
    video searchers
  • View use minimal aside from Common Text
  • Text titling and text transcripts used frequently
  • A bit of evidence for collection understanding
    (e.g., diffs in topic between New York and
    Chicago), but overall, cautious use of default
    settings for initial trial(s).

35
Evaluation Hurdles
  • How does one evaluate information visualization
    for promoting exploratory video search?
  • Low level simple tasks vs. complex real-world
    tasks
  • Traditional effectiveness, efficiency,
    satisfaction are even problematic is fast
    interface for exploration good or bad?
  • HCI discount usability techniques offer some
    support, but ecological validity may limit impact
    of conclusions (e.g., HCII students found Common
    Text well suited for History students)
  • Look to field of Visual Analytics for help, e.g.,
    Plaisant
  • First hour with system studies, or developer
    as user insights too limiting. Rather, consider
    Multi-dimensional In-depth Long-term Case-studies
    (MILC)

36
Concluding Points - 1
  • Interactive allows human direction to
    compensate for automation shortcomings and
    varying needs
  • Interactive fact-finding better than automated
    fact-finding in visual shot retrieval (TRECVID)
  • Interactive computer vision has successes (Harry
    Shum at Microsoft, Michael Brown et al. at NUS)
  • Interactive view/facet control ??? (too early
    to tell)
  • Users need scaffolding/support to get started
  • Evaluations need to run longer term, in depth,
    with case studies to see what has benefit (MILC)
  • Keep track of facet-based interfaces, e.g.,
    Bungee View work by Mark Derthick (Carnegie
    Mellon University) on web for faceted browsing of
    image/video resources

37
Concluding Points - 2
  • Storyboards work well for visual overview
  • Video surrogates can be made more effective,
    efficient, and satisfying when tailored to user
    activity (leverage context)
  • Interface should provide easy tuning of precision
    vs. recall
  • As cheap storage and transmission is producing a
    wealth of digital video, exploratory search will
    gain emphasis regarding video repositories
  • Augment automatically produced metadata with
    human-provided descriptors (take advantage of
    what users are willing to volunteer, and in fact
    solicit additional feedback from humans through
    motivating games that allow for human
    computation, a research focus of Luis von Ahn at
    Carnegie Mellon University)

38
Credits
  • Many members of the Informedia Project, CMU
    research community, and The HistoryMakers
    contributed to this work, including
  • Informedia Project Director Howard Wactlar
  • The HistoryMakers Executive Director Julieanna
    Richardson
  • HistoryMakers Beta Testers Joe Trotter (CMU
    History Dept.), SUNY at Buffalo and all UB
    Workshop participants Schomburg Center for
    Research in Black Culture, NY Public Library,
    Randforce Associates, University of Illinois (3
    campuses)
  • Informedia User Interface Ron Conescu, Neema
    Moraveji
  • Informedia Processing Alex Hauptmann, Ming-yu
    Chen, Wei-Hao Lin, Rong Yan, Jun Yang
  • Informedia Library Essentials Bob Baron, Bryan
    Maher
  • This work supported by the National Science
    Foundation under Grant Nos. IIS-0205219 and
    IIS-0705491
Write a Comment
User Comments (0)