Information Retrieval - PowerPoint PPT Presentation


PPT – Information Retrieval PowerPoint presentation | free to view - id: 7d972f-NmRiZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Information Retrieval


Information Retrieval Non-Textual Materials 2 – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 21
Provided by: wya49


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Information Retrieval

Information Retrieval
Non-Textual Materials 2
Automatic Creation of Surrogates for Non-textual
Discovery of non-textual materials usually
requires surrogates How far can these
surrogates be created automatically?
Automatically created surrogates are much less
expensive than manually created, but have high
error rates. If surrogates have high rates of
error, is it possible to have effective
information discovery?
Example Informedia Digital Video Library
Collections Segments of video programs, e.g., TV
and radio news and documentary broadcasts. Cable
Network News, British Open University, WQED
television. Segmentation Automatically broken
into short segments of video, such as the
individual items in a news broadcast. Size More
than 4,000 hours, 2 terabyte. Objective
Research into automatic methods for organizing
and retrieving information from video. Funding
NSF, DARPA, NASA and others. Principal
investigator Howard Wactlar (Carnegie Mellon
Informedia Digital Video Library
History Carnegie Mellon has broad research
programs in speech recognition, image
recognition, natural language processing. 1994.
Basic mock-up demonstrated the general concept of
a system using speech recognition to build an
index from a sound track matched against spoken
queries. (DARPA funded.) 1994-1998. Informedia
developed the concept of multi-modal information
discovery with a series of users interface
experiments. (NSF/DARPA/NASA Digital Libraries
Initiative.) 1998 - . Continued research and
commercial spin-off (which failed).
The Challenge
A video sequence is awkward for information
discovery Textual methods of information
retrieval cannot be applied Browsing requires
the user to view the sequence. Fast skimming is
difficult. Computing requirements are
demanding (MPEG-1 requires 1.2 Mbits/sec). Surroga
tes are required
Multi-Modal Information Discovery
  • The multi-modal approach to information retrieval
  • Computer programs to analyze video materials for
  • e.g., changes of scene
  • methods from artificial intelligence, e.g.,
    speech recognition, natural language processing,
    image recognition.
  • analysis of video track, sound track, closed
    captioning if present, any other information.
  • Each mode gives imperfect information. Therefore
  • many approaches and combine the evidence.

Informedia Library Creation
Querying via natural language
Browsing via multimedia surrogates
Requested segments and metadata
Text Extraction
Source Sound track Automatic speech recognition
using Sphinx II and III recognition systems.
(Unrestricted vocabulary, speaker independent,
multi-lingual, background sounds). Error rates
25 up. Closed captions Digitally encoded text.
(Not on all video. Often inaccurate.) Text on
screen Can be extracted by image recognition
and optical character recognition. (Matches
speaker with name.) Query Spoken query
Automatic speech recognition using the same
system as is used to index the sound track. Typed
by user
Image Understanding
Informedia has developed specialized tools for
various aspects of image understanding scene
break detection segmentation icon selection
image similarity matching camera motion and
object tracking video-OCR (recognize text on
screen) face detection and association

Multimodal Metadata Extraction
An Evaluation Experiment
Test corpus 602 news stories from CNN, etc.
Average length 672 words. Manually
transcribed to obtained accurate text. Speech
recognition of text using Sphinx II (50.7 error
rate) Errors introduced artificially to give
error rates from 0 to 80. Relative
precision and recall (using a vector ranking)
were used as measures of retrieval
performance. As word error rate increased from 0
to 50 Relative precision fell from 80 to
65 Relative recall fell from 90 to 80
Speech recognition and retrieval performance
User Interface Concepts
Users need a variety of ways to search and
browse, depending on the task being carried out
and preferred style of working Visual
icons one-line headlines film strip
views video skims transcript following of
audio track Collages Semantic
zooming Results set Named faces
(No Transcript)
Thumbnails, Filmstrips and Video Skims
Thumbnail A single image that illustrates
the content of a video Filmstrip A sequence
of thumbnails that illustrate the flow of a video
segment Video skim A short video that
summarizes the contents of a longer sequence, by
combining shorter sequences of video and sound
that provide an overview of the full sequence
Creating a Filmstrip
Separate video sequence into shots Use
techniques from image recognition to identify
dramatic changes in scene. Frames with similar
color characteristics are assumed to be part of a
single shot. Choose a sample frame Default is
to select the middle frame from the shot. If
camera motion, select frame where motion
ends. User feedback Frames are tied to time
Creating Video Skims
Static Precomputed based on video and audio
phrases Fixed compression, e.g., one minute
skim of 10 minute sequence Dynamic After a
query, skim is created to emphasize context of
the hit Variable compression selected by
user Adjustable during playback
Limits to Scalability
Informedia has demonstrated effective information
discovery with moderately large
collections Problems with increased scale
Technical -- storage, bandwidth, etc.
Diversity of content -- difficult to tune
heuristics User interfaces -- complexity of
browsing grows with scale
Lessons Learned
Searching and browsing must be considered
integrated parts of a single information
discovery process. Data (content and metadata),
computing systems (e.g., search engines), and
user interfaces must be designed
together. Multi-modal methods compensate for
incomplete or error-prone data.