Information Retrieval - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Information Retrieval

Description:

Information Retrieval Non-Textual Materials 2 – PowerPoint PPT presentation

Number of Views:258
Avg rating:3.0/5.0
Slides: 21
Provided by: wya49
Category:

less

Transcript and Presenter's Notes

Title: Information Retrieval


1
Information Retrieval
Non-Textual Materials 2
2
Automatic Creation of Surrogates for Non-textual
Materials
Discovery of non-textual materials usually
requires surrogates How far can these
surrogates be created automatically?
Automatically created surrogates are much less
expensive than manually created, but have high
error rates. If surrogates have high rates of
error, is it possible to have effective
information discovery?
3
Example Informedia Digital Video Library
Collections Segments of video programs, e.g., TV
and radio news and documentary broadcasts. Cable
Network News, British Open University, WQED
television. Segmentation Automatically broken
into short segments of video, such as the
individual items in a news broadcast. Size More
than 4,000 hours, 2 terabyte. Objective
Research into automatic methods for organizing
and retrieving information from video. Funding
NSF, DARPA, NASA and others. Principal
investigator Howard Wactlar (Carnegie Mellon
University).
4
Informedia Digital Video Library
History Carnegie Mellon has broad research
programs in speech recognition, image
recognition, natural language processing. 1994.
Basic mock-up demonstrated the general concept of
a system using speech recognition to build an
index from a sound track matched against spoken
queries. (DARPA funded.) 1994-1998. Informedia
developed the concept of multi-modal information
discovery with a series of users interface
experiments. (NSF/DARPA/NASA Digital Libraries
Initiative.) 1998 - . Continued research and
commercial spin-off (which failed).
5
The Challenge
A video sequence is awkward for information
discovery Textual methods of information
retrieval cannot be applied Browsing requires
the user to view the sequence. Fast skimming is
difficult. Computing requirements are
demanding (MPEG-1 requires 1.2 Mbits/sec). Surroga
tes are required
6
Multi-Modal Information Discovery
  • The multi-modal approach to information retrieval
  • Computer programs to analyze video materials for
    clues
  • e.g., changes of scene
  • methods from artificial intelligence, e.g.,
    speech recognition, natural language processing,
    image recognition.
  • analysis of video track, sound track, closed
    captioning if present, any other information.
  • Each mode gives imperfect information. Therefore
    use
  • many approaches and combine the evidence.

7
Informedia Library Creation
Audio
Text
Video
Segmentation
8
User
Querying via natural language
Browsing via multimedia surrogates
Requested segments and metadata
9
Text Extraction
Source Sound track Automatic speech recognition
using Sphinx II and III recognition systems.
(Unrestricted vocabulary, speaker independent,
multi-lingual, background sounds). Error rates
25 up. Closed captions Digitally encoded text.
(Not on all video. Often inaccurate.) Text on
screen Can be extracted by image recognition
and optical character recognition. (Matches
speaker with name.) Query Spoken query
Automatic speech recognition using the same
system as is used to index the sound track. Typed
by user
10
Image Understanding
Informedia has developed specialized tools for
various aspects of image understanding scene
break detection segmentation icon selection
image similarity matching camera motion and
object tracking video-OCR (recognize text on
screen) face detection and association

11
Multimodal Metadata Extraction
12
An Evaluation Experiment
Test corpus 602 news stories from CNN, etc.
Average length 672 words. Manually
transcribed to obtained accurate text. Speech
recognition of text using Sphinx II (50.7 error
rate) Errors introduced artificially to give
error rates from 0 to 80. Relative
precision and recall (using a vector ranking)
were used as measures of retrieval
performance. As word error rate increased from 0
to 50 Relative precision fell from 80 to
65 Relative recall fell from 90 to 80
13
Speech recognition and retrieval performance
14
User Interface Concepts
Users need a variety of ways to search and
browse, depending on the task being carried out
and preferred style of working Visual
icons one-line headlines film strip
views video skims transcript following of
audio track Collages Semantic
zooming Results set Named faces
Skimming
15
(No Transcript)
16
Thumbnails, Filmstrips and Video Skims
Thumbnail A single image that illustrates
the content of a video Filmstrip A sequence
of thumbnails that illustrate the flow of a video
segment Video skim A short video that
summarizes the contents of a longer sequence, by
combining shorter sequences of video and sound
that provide an overview of the full sequence
17
Creating a Filmstrip
Separate video sequence into shots Use
techniques from image recognition to identify
dramatic changes in scene. Frames with similar
color characteristics are assumed to be part of a
single shot. Choose a sample frame Default is
to select the middle frame from the shot. If
camera motion, select frame where motion
ends. User feedback Frames are tied to time
sequence.
18
Creating Video Skims
Static Precomputed based on video and audio
phrases Fixed compression, e.g., one minute
skim of 10 minute sequence Dynamic After a
query, skim is created to emphasize context of
the hit Variable compression selected by
user Adjustable during playback
19
Limits to Scalability
Informedia has demonstrated effective information
discovery with moderately large
collections Problems with increased scale
Technical -- storage, bandwidth, etc.
Diversity of content -- difficult to tune
heuristics User interfaces -- complexity of
browsing grows with scale
20
Lessons Learned
Searching and browsing must be considered
integrated parts of a single information
discovery process. Data (content and metadata),
computing systems (e.g., search engines), and
user interfaces must be designed
together. Multi-modal methods compensate for
incomplete or error-prone data.
Write a Comment
User Comments (0)
About PowerShow.com