Semantic Image Annotation and Retrieval - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Semantic Image Annotation and Retrieval

Description:

Tiger is associated with grass and not a computer. ... Direct comparison on same dataset not available. ( Also try labeling regions) ... – PowerPoint PPT presentation

Number of Views:358
Avg rating:3.0/5.0
Slides: 16
Provided by: puffthema
Category:

less

Transcript and Presenter's Notes

Title: Semantic Image Annotation and Retrieval


1
Semantic Image Annotation and Retrieval
  • R. Manmatha
  • Center for Intelligent Information Retrieval
  • University of Massachusetts Amherst

2
Motivation
  • How can we retrieve images?
  • Image retrieval based on similarity search of
    image features between a visual query and a
    database image.
  • Not very effective.
  • Doesnt lend itself to textual queries.
  • Manually annotate them using text and then use a
    text search engine.
  • Used by many libraries
  • Expensive, tedious.
  • Aside Google image search based on textual cues
    and surrounding text.
  • Example gandhi.jpg maybe a picture of Gandhi.

3
Automatic Image Annotation
  • Can we automatically annotate unseen images with
    keywords?
  • Given a training set of blobs and image
    annotations, learn a model and annotate a test
    set of images.
  • Example The annotation for this picture would be
  • Tiger, grass.
  • Question Do we need to recognize objects?

4
Object Recognition
  • A classic unsolved problem in computer vision.
  • Humans can do it easily.
  • How? we dont know.
  • Two main questions.
  • What is the object in a picture?
  • Where is it in the picture.
  • Annotation Enough to answer the first question.
    Answering the second question requires labeling
    image segments.
  • In some cases as in face recognition, finding a
    face in an image is considered face detection and
    finding the identity of the specific individual
    is considered face recognition.

5
Statistical Data-driven Approaches
  • Statistical data driven approaches have been
    successful in many areas
  • Information Retrieval, Machine Translation,
    Optical Character Recognition, Information
    Extraction, .
  • Early work on vision was focused on single
    images, pairs of images or short video sequences.
  • Computational Cost
  • Recent successful work in object detection and
    recognition uses a lot of data (training
    examples) and test.

6
Object Detection/Recognition
  • Focused on a few specific objects like faces,
    cars.
  • Learn the joint probability for different regions
    forming the object.
  • Train on examples of the specific object.
  • Solve a two-class classification problem.
  • Features - Wavelets, Gabor filters,
  • Examples
  • Schneiderman and Kanades Face Detector
  • Requires training images of cut out faces.
  • Works fairly well but still makes mistakes.
  • (pictures from their webpage)
  • Fergus, Perona and Zisserman , CVPR 2003.
  • Other object like motorcycles, cars,
  • Training images of object background.
  • Learns a single class at a time.
  • (picture from their webpage).

7
Image Annotation
  • Describe an image using words or image features.
  • Vocabularies in two different languages to
    describe the same thing.
  • Visterms and words.
  • Visterms e.g. Segment image, compute features
    over each region, cluster the features into a
    discrete vocabulary.
  • Partition the image into regions and compute a
    set of continuous features.
  • Translate and retrieve.
  • Cross-lingual retrieval.
  • Use a training set of images and captions and
    learn annotations.
  • Characteristics of (most) of these approaches.
  • Associate words (semantics) with pictures.
  • Context is important in images. Take advantage of
    the association of different regions in the image
  • Tiger is associated with grass and not a
    computer.
  • Note For text, performance on cross-lingual
    retrieval is equal to or exceeds mono-lingual
    retrieval.

8
Image Vocabulary
  • Images are segmented into semantic regions.(
    Blobworld, Normalized-cuts algorithm.)
  • Segments are clustered into fixed number of
    blobs (visterms). Each blob is a word in the
    image vocabulary.
  • Any image can be represented by a small set of
    blobs (Duygulu et al, ECCV 2002)

Blobs
Images
Segments

9
Models
  • Co-occurrence Model (Mori et al) Compute the
    co-occurrence of visterms and words.
  • Mean precision of 0.07
  • Translation Model (Duygulu, Barnard, de Freitas
    and Forsyth) Treat it as a problem of
    translating from the vocabulary of blobs
    (discrete visterms) to that of words. (Also try
    labeling regions).
  • Mean precision of 0.14
  • Cross Media Relevance Model (Jeon, Lavrenko,
    Manmatha) Use a relevance (based language)
    model. Discrete model.
  • Mean precision of 0.33
  • Correlation Latent Dirichlet Allocation (Blei and
    Jordan) Model generates words and regions based
    on a latent factor. Direct comparison on same
    dataset not available. (Also try labeling
    regions).
  • Guess is that its comparable or slightly worse
    than the relevance model.
  • Continuous Relevance Model (Lavrenko, Manmatha,
    Jeon) Relevance Model with continuous features.
  • Mean precision of 0.6

10
Cross Media Relevance Models
  • Goal Estimating Relevance Model the joint
    distribution of words and visterms.
  • Find probability of observing word w and visterms
    bi P(w,b1,,bm) together.
  • To annotate image with visterms
  • Grass, tiger, water, road
  • P(wbgrass,btiger,bwater,broad)
  • If top three probabilities are for words
  • grass, water, tiger.
  • Then annotate image with grass, water, tiger

Tiger
Water
Grass
11
Relevance Models
  • Annotation
  • Or

J. Jeon, V. Lavrenko and R. Manmatha, Automatic
Image Annotation and Retrieval Using Cross-Media
Relevance Models, To appear in SIGIR03.
12
Training
  • Joint distribution computed as an expectation
    over the training set J
  • Given J, the events are independent

13
Annotation
  • Compute P(wI) for different w.
  • Probabilistic Annotation
  • Annotate the image with every possible w in the
    vocabulary with associated probabilities.
  • Useful for retrieval but not for people.
  • Fixed Length Annotation
  • For people, take the top (say 3 or 4) words for
    every image and annotate images with them.

14
Retrieval
  • Language Modeling Approach
  • Given a Query Q, the probability of drawing Q
    from image I is
  • Or using the probabilistic annotation.
  • Rank images according to this probability.

15
Samples - good
  • Annotation examples - CMRM
  • Retrieval examples Top 4 images, CMRM

Query Tiger
Query Pillar
Write a Comment
User Comments (0)
About PowerShow.com