Semantic Image Annotation and Retrieval - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Semantic Image Annotation and Retrieval

Description:

Number of Views:358

Avg rating:3.0/5.0

Slides: 16

Provided by: puffthema

Category:

more less

Transcript and Presenter's Notes

Title: Semantic Image Annotation and Retrieval

1
Semantic Image Annotation and Retrieval

2
Motivation

How can we retrieve images?
Image retrieval based on similarity search of
image features between a visual query and a
database image.
Not very effective.
Doesnt lend itself to textual queries.
Manually annotate them using text and then use a
text search engine.
Used by many libraries
Expensive, tedious.
Aside Google image search based on textual cues
and surrounding text.
Example gandhi.jpg maybe a picture of Gandhi.

3
Automatic Image Annotation

Can we automatically annotate unseen images with
keywords?
Given a training set of blobs and image
annotations, learn a model and annotate a test
set of images.
Example The annotation for this picture would be
Tiger, grass.
Question Do we need to recognize objects?

4
Object Recognition

A classic unsolved problem in computer vision.
Humans can do it easily.
How? we dont know.
Two main questions.
What is the object in a picture?
Where is it in the picture.
Annotation Enough to answer the first question.
Answering the second question requires labeling
image segments.
In some cases as in face recognition, finding a
face in an image is considered face detection and
finding the identity of the specific individual
is considered face recognition.

5
Statistical Data-driven Approaches

Statistical data driven approaches have been
successful in many areas
Information Retrieval, Machine Translation,
Optical Character Recognition, Information
Extraction, .
Early work on vision was focused on single
images, pairs of images or short video sequences.
Computational Cost
Recent successful work in object detection and
recognition uses a lot of data (training
examples) and test.

6
Object Detection/Recognition

7
Image Annotation

Describe an image using words or image features.
Vocabularies in two different languages to
describe the same thing.
Visterms and words.
Visterms e.g. Segment image, compute features
over each region, cluster the features into a
discrete vocabulary.
Partition the image into regions and compute a
set of continuous features.
Translate and retrieve.
Cross-lingual retrieval.
Use a training set of images and captions and
learn annotations.
Characteristics of (most) of these approaches.
Associate words (semantics) with pictures.
Context is important in images. Take advantage of
the association of different regions in the image
Tiger is associated with grass and not a
computer.
Note For text, performance on cross-lingual
retrieval is equal to or exceeds mono-lingual
retrieval.

8
Image Vocabulary

Images are segmented into semantic regions.(
Blobworld, Normalized-cuts algorithm.)
Segments are clustered into fixed number of
blobs (visterms). Each blob is a word in the
image vocabulary.
Any image can be represented by a small set of
blobs (Duygulu et al, ECCV 2002)

Blobs
Images
Segments

9
Models

Co-occurrence Model (Mori et al) Compute the
co-occurrence of visterms and words.
Mean precision of 0.07
Translation Model (Duygulu, Barnard, de Freitas
and Forsyth) Treat it as a problem of
translating from the vocabulary of blobs
(discrete visterms) to that of words. (Also try
labeling regions).
Mean precision of 0.14
Cross Media Relevance Model (Jeon, Lavrenko,
Manmatha) Use a relevance (based language)
model. Discrete model.
Mean precision of 0.33
Correlation Latent Dirichlet Allocation (Blei and
Jordan) Model generates words and regions based
on a latent factor. Direct comparison on same
dataset not available. (Also try labeling
regions).
Guess is that its comparable or slightly worse
than the relevance model.
Continuous Relevance Model (Lavrenko, Manmatha,
Jeon) Relevance Model with continuous features.
Mean precision of 0.6

10
Cross Media Relevance Models

Tiger
Water
Grass
11
Relevance Models

J. Jeon, V. Lavrenko and R. Manmatha, Automatic
Image Annotation and Retrieval Using Cross-Media
Relevance Models, To appear in SIGIR03.
12
Training

13
Annotation

Compute P(wI) for different w.
Probabilistic Annotation
Annotate the image with every possible w in the
vocabulary with associated probabilities.
Useful for retrieval but not for people.
Fixed Length Annotation
For people, take the top (say 3 or 4) words for
every image and annotate images with them.

14
Retrieval