Amplifying Video InformationSeeking Success through Rich, Exploratory Interfaces

1 / 37

About This Presentation

Title:

Amplifying Video InformationSeeking Success through Rich, Exploratory Interfaces

Description:

Amplifying Video Information-Seeking Success through Rich, Exploratory Interfaces ... and multimedia abstractions (e.g., break stories apart at silence points rather ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 38

Provided by: michaelc57

Learn more at: http://www.cs.cmu.edu

more less

Transcript and Presenter's Notes

Title: Amplifying Video InformationSeeking Success through Rich, Exploratory Interfaces

1
Amplifying Video Information-Seeking Success
through Rich, Exploratory Interfaces
Mike Christel christel_at_cs.cmu.edu School of
Computer Science Carnegie Mellon University
KES-IIMSS July 11, 2008
2
Talk Outline

Creating metadata for video information sets
Informedia demonstrations (oral history
collection, news video collection)
Types of search beyond fact-finding
Exploratory search through multiple views
Evaluation hurdles
Discussion
now is a perfect opportunity for leveraging user
involvement for better video information-seeking
experiences

3
User Involvement

User Correction Corrective action for metadata
errors (analogous to Harry Shums vision at
Microsoft for human-assisted computer vision
success)
User Control Driving the interface to overcome
metadata errors
User Context More useful interfaces driven
implicitly by context

4
CMU Informedia Digital Video Research

Details at http//www.informedia.cs.cmu.edu
Speech recognition and alignment
Image processing
Named entity tagging
Synchronized metadata for search and navigation
Fast, direct video access to oral histories,
news, etc.
Demonstration oral history corpus 913 hours of
interviews from 400 individuals, 18,254 interview
story segments (average story segment length of 3
minutes)
Demonstration news corpus TRECVID 2006 test set
(165 hours of U.S., Arabic, and Chinese news with
79,484 reference shots)

5
Speech Recognition Functions

Generates transcript (if one is not given) to
enable text-based retrieval from spoken language
documents
Improves text synchronization to audio/video in
presence of scripts (align speech with text)
Supplies necessary information for library
segmentation and multimedia abstractions (e.g.,
break stories apart at silence points rather than
in the middle of sentences)

6
Speech Alignment Example
7
Image Understanding Functions

Scene segmentation
Similarity matching
Camera motion determination and object tracking
Optical Character Recognition (OCR) on video text
and titles
Face detection and recognition
Ongoing research work in object identification
and scene characterization, e.g., indoor/outdoor,
road, building, etc.

8
Images containing similar colors
9
Images containing similar shapes
10
Images containing similar content
11
Goal Automatic Video Characterization
Shots
Yellowstone
Camera
Objects
Action
Captions
Scenery
12
Goal Automatic Video Characterization
Shots
Yellowstone
Camera
Objects
Action
Captions
Scenery
13
Automated Video Processing

Produces descriptive metadata for video libraries
Metadata has errors greater than metadata
produced by a careful, human-provided annotation
Errors in metadata can be reduced
By more computation-intensive algorithms
By taking advantage of video frame-to-frame
redundancy
By folding in context, e.g., probable text sizes
in video
By folding in extra sources of knowledge, e.g., a
dictionary for cleaning up VOCR, or labeled data
revealing patterns for named entity detection
By human review and correction, which can
generate additional labeled data for machine
learning

14
Camera and Motion Detection
Pan
Right object motion (not pan left)
15
Text and Face Detection
16
Video OCR Block Diagram
Text Area Detection
Video
Text Area Preprocessing
Commercial OCR
ASCII Text
17
Video Frames Filtered Frames
AND-ed Frames
(1/2 s intervals)
18
VOCR Preprocessing Problems
19
Augmenting VOCR with Dictionary Look-up
20
Name-It Face/Name Association
21
Named Entity Extraction
F. Kubala, R. Schwartz, R. Stone, and R.
Weischedel, Named Entity Extraction from
Speech, Proc. DARPA Workshop on Broadcast News
Understanding Systems, Lansdowne, VA, February
1998.
CNN national correspondent John Holliman is at
Hartsfield International Airport in Atlanta.
Good morning, John. But there was one situation
here at Hartsfield where one airplane flying from
Atlanta to Newark, New Jersey yesterday had a
mechanical problem and it caused a backup that
spread throughout the whole system because even
though there were a lot of planes flying to the
New York area from the Atlanta area yesterday, .
Key Place, Time, Organization/Person
22
Enhancing Library Utility via Better Metadata
23
Improving the Interface via Usage Context

Example query-based thumbnail selection

24
Improving Utility through End-User Control

Example filtering storyboard based on visual
concepts with user controlling precision and
recall

25
Improving the Metadata via User Interaction

Example collecting positive and implicit
negative sets of labeled shot data for visual
concepts
Reference Ming-yu Chen, et al., ACM Multimedia
2005

26
User Involvement

User Correction Corrective action for metadata
errors (analogous to Harry Shums vision at
Microsoft for human-assisted computer vision
success)
User Control Driving the interface to overcome
metadata errors
User Context More useful interfaces driven
implicitly by context

27
Video Summaries (without User Context)

BBC rushes video summarization task in TRECVID
2007 and TRECVID 2008 shows difficulty of the
task
Video summary is a condensed version of some
information, such that various judgments about
the full information can be made using only the
summary and taking less time and effort than
would be required using the full information
source
Maximum 4 duration (2 in TRECVID 2008)
Benefits of this TRECVID task provides a
reasonably large video collection to be
summarized, a uniform method of creating ground
truth, and a uniform scoring mechanism

28
BBC Rushes

42 test videos ( development ones) from BBC
Archive
Test videos
minimum duration 3.3 minutes
maximum duration 36.4 minutes
mean duration 25 minutes
Raw (unedited) rush video with a great deal of
redundancy (repeated takes), mixed quality audio,
junk frames

29
Video Summaries (with/without User Context)

BBC Rush video has no context to build from
However, users often provide cues as to what is
important, as will be seen shortly

30
Storyboards TRECVID Search Success

For the shot-based directed search information
retrieval task evaluated at TRECVID, storyboards
have consistently and overwhelmingly produced the
best performance (see references in paper, e.g.,
Snoek et al. 2007)
Motivated users can navigate through thousands of
shot thumbnails in storyboards, better even than
with extreme video retrieval interfaces 2487
shots on average per 15 minute topic for TRECVID
2006 Christel/Yan CIVR 2007
Storyboard benefits packed visual overview,
trivial interactive control needed for overview,
zoom and filter, details on demand
Shneidermans Visual Information-Seeking Mantra

31
Beyond Fact-Finding

CACM April 2006 special issue on this topic
G. Marchionini (Exploratory Search From Finding
to Understanding, CACM 49, April 2006) breaks
down 3 types of search activities
Lookup (fact-finding solving stated/understood
need)
Learn
Investigate
Computer scientists and information retrieval
specialists emphasize evaluation of lookup
activities (NIST TREC)
Real world interest in learn/investigate for an
oral history collection, State Univ. New York at
Buffalo Workshop library science and humanities
participants quite interested in
learn/investigate activities

32
Exploratory Search (Demonstrations)

Examples where storyboards still useful visual
review, e.g., of disaster field footage
Where storyboards fail
Showing other facets like time, space,
co-occurrence, named entities (When did disasters
occur? Which ones? Where?)
Providing collection understanding, a holistic
view of whats in say 100s of segments of 1000s
of matching shots
Providing window into visually homogenous
results, e.g., results from color search perhaps,
or a corpus of just lecture slides, or
head-and-shoulder interview shots
Claim Storyboards are not sufficient, but are
part of a useful suite of tools/interfaces for
interactive video search

33
Anecdotal Support for Claim

Collected 2006-2007 from
Government analysts with news data
History students and faculty with oral history
data
Views Tested
Timeline
Visualization By Example (VIBE) Plot (query
terms)
Map View
Named Entity view (people, places, organizations)
Text-dominant views
Nested Lists (pre-defined clusters by
contributor)
Common Text (on-the-fly grouping of common
phrases)

34
Anecdotal Results

38 HistoryMakers corpus users (mostly students,
15 female, average age 24), experienced web
searchers, modest digital video experience
6 intelligence analysts (1 female 2 older than
40, 3 in their 30s, 1 in 20s), very experienced
text searchers, experienced web searchers, novice
video searchers
View use minimal aside from Common Text
Text titling and text transcripts used frequently
A bit of evidence for collection understanding
(e.g., diffs in topic between New York and
Chicago), but overall, cautious use of default
settings for initial trial(s).

35
Evaluation Hurdles

How does one evaluate information visualization
for promoting exploratory video search?
Low level simple tasks vs. complex real-world
tasks
Traditional effectiveness, efficiency,
satisfaction are even problematic is fast
interface for exploration good or bad?
HCI discount usability techniques offer some
support, but ecological validity may limit impact
of conclusions (e.g., HCII students found Common
Text well suited for History students)
Look to field of Visual Analytics for help, e.g.,
Plaisant
First hour with system studies, or developer
as user insights too limiting. Rather, consider
Multi-dimensional In-depth Long-term Case-studies
(MILC)

36
Concluding Points - 1

Interactive allows human direction to
compensate for automation shortcomings and
varying needs
Interactive fact-finding better than automated
fact-finding in visual shot retrieval (TRECVID)
Interactive computer vision has successes (Harry
Shum at Microsoft, Michael Brown et al. at NUS)
Interactive view/facet control ??? (too early
to tell)
Users need scaffolding/support to get started
Evaluations need to run longer term, in depth,
with case studies to see what has benefit (MILC)
Keep track of facet-based interfaces, e.g.,
Bungee View work by Mark Derthick (Carnegie
Mellon University) on web for faceted browsing of
image/video resources

37
Concluding Points - 2

Storyboards work well for visual overview
Video surrogates can be made more effective,
efficient, and satisfying when tailored to user
activity (leverage context)
Interface should provide easy tuning of precision
vs. recall
As cheap storage and transmission is producing a
wealth of digital video, exploratory search will
gain emphasis regarding video repositories
Augment automatically produced metadata with
human-provided descriptors (take advantage of
what users are willing to volunteer, and in fact
solicit additional feedback from humans through
motivating games that allow for human
computation, a research focus of Luis von Ahn at
Carnegie Mellon University)

38
Credits

Many members of the Informedia Project, CMU
research community, and The HistoryMakers
contributed to this work, including
Informedia Project Director Howard Wactlar
The HistoryMakers Executive Director Julieanna
Richardson
HistoryMakers Beta Testers Joe Trotter (CMU
History Dept.), SUNY at Buffalo and all UB
Workshop participants Schomburg Center for
Research in Black Culture, NY Public Library,
Randforce Associates, University of Illinois (3
campuses)
Informedia User Interface Ron Conescu, Neema
Moraveji
Informedia Processing Alex Hauptmann, Ming-yu
Chen, Wei-Hao Lin, Rong Yan, Jun Yang
Informedia Library Essentials Bob Baron, Bryan
Maher