Multimedia Information Retrieval - PowerPoint PPT Presentation


PPT – Multimedia Information Retrieval PowerPoint presentation | free to download - id: 9329a-MTYyY


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Multimedia Information Retrieval


Achieving symmetry between annotation and query is difficult. Retrieval is based on similarity between query and stored ... comparing fabric patterns ... – PowerPoint PPT presentation

Number of Views:255
Avg rating:3.0/5.0
Slides: 28
Provided by: padmamundu
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Multimedia Information Retrieval

Multimedia Information Retrieval
  • Unlike alphanumeric data, multimedia data do not
    have any semantic structure
  • Achieving symmetry between annotation and query
    is difficult
  • Retrieval is based on similarity between query
    and stored information instead of exact match
  • Stored information is represented using indexing

IR Model
  • Information is preprocessed to extract features
    and semantic contents
  • Indexed based on these features and semantics
  • Users query is processed and main features are
  • Querys features are then compared with features
    or index of each information item in the database
  • Information item whose features are similar to
    those of the query are retrieved and presented to
    the user

Design Issues
  • Indexing
  • a mechanism that reduces the search space of an
    operator without losing any relevant information
  • Similarity Computation
  • easy to compute and should conform to human

Performance Measures
  • Retrieval speed, recall, precision
  • Recall measures the ability of retrieving
    relevant information items from the database
  • defined as the ratio between the number of
    retrieved relevant items and the total number of
    relevant items in the database
  • Precision measures retrieval accuracy
  • defined as the ratio between the number of
    retrieved relevant items and the number of total
    retrieved items
  • Recall and precision are usually considered
  • high recall and low precision
  • high precision and low recall

Text Retrieval
  • Text may be used to annotate other media such as
    audio, images and video and conventional IR
    techniques used to retrieve multimedia
  • Boolean IR systems or text-pattern search systems
  • Substantial effort is spent in analyzing the
    contents of the documents and in generating
    keywords and indices
  • Boolean queries are keywords connected with
    logical operators (AND, OR, NOT)

File Structures
  • Flat files
  • Inverted files
  • for each term a separate index is constructed
    that stores the document identifiers for all
    documents containing the term
  • each term and the document IDs containing the
    term are organized into one row
  • searching and retrieval is fast because only rows
    containing the query terms need to be retrieved
    and there is no need to search the whole database

  • Nearness parameters used in query specification
    help define the topic more precisely and
    therefore increase probable relevance of the
    retrieved item
  • Within Sentence and Adjacency specification in
  • Term location information is included in the
    inverted file
  • Term i document id, paragraph no., sentence
    no., word no.
  • For example, if an inverted file has the
    following entries
  • information R99, 10, 8, 3 R155, 15, 3, 6 R166,
    2, 3,1
  • retrieval R77, 9, 7, 2 R99, 10, 8, 4 R166, 10,
    2, 5

  • Stop words -- grammatical functional words, such
    as of, the, and a.
  • Stemming -- reducing words to a common root form
  • Thesaurus -- list of synonyms
  • Weighting -- term significance derived from
    occurrence frequency within a document and among
    different documents

Relevance Feedback
  • Query modification
  • terms occurring in documents previously
    identified as relevant are added to the original
    query or the weight of such terms is increased
  • terms occurring in documents previously
    identified as irrelevant are deleted from the
    query or the weight of such terms is reduced
  • Document modification
  • terms in the query, but not in the user-judged
    relevant documents, are added to the document
    index list with an initial weight
  • weights of index terms in the query and also in
    relevant documents are increased by a certain
  • weights of index terms not in the query but in
    the relevant documents are decreased by a certain

Problems with Annotation
  • Automatic generation of descriptive key words or
    extracting semantic information to build
    classification hierarchies for broad varieties of
  • Involving human operators makes the process
    time-consuming and subjective
  • retrieval fails if the user forms a query based
    on key words not employed by the operator
  • retrieval fails if the query refers to elements
    of image content that were not described
  • certain visual properties, textures and shapes,
    are difficult or nearly impossible to describe
    with text for general-purpose usage

Content-based IR
  • Retrieve visual data using queries based on the
    visual content of an image/video patterns ,
    colors, textures, and shapes, layout and location
  • when it is necessary to verify that a trademark
    or logo has not been used by another comapany
  • comparing fabric patterns
  • Search is driven by first establishing one or
    more sample images and then identifying specific
    features of those sample images which need to
    match images from the database

Audio Search and Retrieval
  • Keywords can be highly subjective because of a
    different perspective or even a different
  • Hard to browse directly since it must be
    auditioned in real-time (unlike video which can
    be keyframed)
  • Two categories Speech and Non-speech
  • with speech, indexing and retrieval is based on
    obtaining spoken words either manually or by
    speech recognition technique
  • with non-speech, indexing and retrieval may be
    based on text annotation (but will it help a
    query like find the first occurrence of the note

Image Database Issues
  • Selection, derivation, and computation of image
    features and objects that provide useful query
  • Retrieval methods based on similarity, as opposed
    to exact matching
  • User interface that supports the visual
    expression of queries and allows query refinement
    and navigation of results
  • Indexing which is compatible with the
    expressiveness of the queries
  • A system architecture that supports this approach

Color Analysis
  • Color distribution represented as a histogram of
    intensity values each of whose bins corresponds
    to a range of pixel values
  • Histograms are compared by an intersection
  • This sum may be interpreted as enumerating the
    number of pixels which are common to both
  • This value may be normalized by the total number
    of pixels in one of the two histograms
  • Computationally expensive -- O(NM) where N is th
    enumber of histogram bins and M is the total
    number of images in the database

Color Analysis (contd.)
  • Reduce search time by reducing the number of
    histogram bins
  • transform RGB representation (coarse segmentation
    of color space)
  • apply clustering technique to determine K best
    colors in a given color space (clustering process
    takes into account the color distribution of
    images over the entire database)
  • a small number of histogram bins tend to capture
    the majority of pixels of an image only largest
    bins in terms of pixels counts need be selected
    as representation of any histogram. As long as
    the bins of the query and image histograms are
    appropriately matched, intersection may be
    computed over this reduce set.

Color Analysis (contd.)
  • Disadvantages
  • histogram-based similarity computation lacks
    information about location (this problem may be
    solved by dividing an image into sub-areas and
    calculating a histogram for each of those
  • image representations in the image database as
    well as queries have to be the same

Texture Analysis
  • Statistical methods are used to characterize
    texture in terms of the spatial distribution of
    image intensity
  • Tamura features
  • contrast quantification is based on the
    statistical distribution of pixel intensities
  • coarseness measure of the granularity of the
  • directionality to compute this measure, a
    gradient vector is calculated at each pixel

Shape Analysis
  • Histogram of significant edges
  • Ordered list of interest points
  • Chain-code-based shape representation and
    similarity measure

Chain Code-based Shape Analysis
  • Chain code
  • 4-directional
  • 8-directional
  • Grid spacing
  • Normalization process -- starting point,
    rotation, scale

Starting Point Normalization
  • Treat the chain code generated by an arbitrary
    starting point as a circular sequence of
    direction numbers
  • Redefine the starting point such that the
    resulting sequence of numbers forms an integer of
    minimum magnitude
  • 0303332221211010 (arbitrary starting point)
  • 0030333222121101 (after normalizing)
  • After normalizing, the shape boundary has unique
    chain code (for a fixed orientation and grid size)

Shape Number
  • Rotation normalization is needed because a
    boundary after rotation has a different chain
    code. Rotation changes the spatial relationships
    between the grid space and boundary.
  • First difference of the chain code reflects
    spatial relationships between boundary segments
    which are independent of rotation
  • The difference is computed by counting (in a
    counter- clockwise) the number of directions that
    separate two adjacent elements in a code
  • Shape number of a boundary is defined as the
    first difference of the smallest magnitude

Unique Shape Number
  • Need for making the shape boundaries invariant to
    rotation and scale
  • Solution -- orient the resampling grid along the
    principal axis of the shape boundary. In this
    case, the grid and the boundary have fixed
    spatial relationships.
  • Major axis is defined as the line segment between
    two farthest points on the boundary. Minor axis
    is perpendicular to the major axis and its length
    is such that a rectangle formed by these axes
    will enclose the shape boundary.

Scale Normalization
  • Eccentricity of the boundary -- ratio of the
    major to the minor axes
  • Basic Rectangle -- rectangle formed by the major
    and the minor axes of a boundary
  • Shape number obtained using basic rectangle will
    be unique

Unique Chain Code
  • Algorithm
  • select the first digit as any number within the
    chain code direction range, say 0
  • the second digit differs from the first digit by
    an amount determined by the first digit of the
    shape number
  • use the shape number to determine the rest of the
    digits in the unique chain code

Similarity Measurement
  • The distance d between two boundaries is defined
    as the number of grids not commonly covered by
    the two boundaries
  • boundaries with the same unique chain code have
    distance 0
  • Obtain a binary number for each boundary
  • Exclusive OR of the binary numbers of the two
    boundaries and the number of 1s in the result is
    the distance d
  • Similarity is 1 - (d/N)

Indexing and Retrieval of Video
  • Video is normally made of a number of logic units
    or segments (video shots)
  • frames depicting the same scene
  • frames signify single camera operation
  • frames contain a distinct event or or action
    (signifying the presence of the same object)
  • Consecutive frames on either side of a camera
    break generally display a significant
    quantitative change in the content (other camera
    operations such as dissolve, wipe, fade-in, and
    fade-out require sophisticated measures to
    quantify the change)

Shot Detection
  • Difference metrics between frames are based on
    the comparison of pixel intensity histograms
  • Difference threshold are chosen such that all
    boundaries are detected and false detection is
  • Dealing with gradual changes requires
    sophisticated techniques
  • Indexing is done by finding a representative
    frame and features of this frame are extracted
    and indexed based on text, color, shape, and/or