Robust Statistical Techniques for the Categorization of Images Using Associated Text

About This Presentation

Title:

Robust Statistical Techniques for the Categorization of Images Using Associated Text

Description:

Robust Statistical Techniques for the Categorization of Images Using Associated Text – PowerPoint PPT presentation

Number of Views:188

Avg rating:3.0/5.0

Slides: 100

Provided by: CarlS158

Category:

more less

Transcript and Presenter's Notes

Title: Robust Statistical Techniques for the Categorization of Images Using Associated Text

1
Robust Statistical Techniquesfor the
Categorization of ImagesUsing Associated Text
2
Text Categorization

Text categorization (TC) refers to the automatic
labeling of documents, using natural language
text contained in or associated with each
document, into one or more pre-defined
categories.
Idea TC techniques can be applied to image
captions or articles to label the corresponding
images.

3
Clues for Indoor versus OutdoorText (as opposed
to visual image features)
Denver Summit of Eight leaders begin their first
official meeting in the Denver Public Library,
June 21.
The two engines of an Amtrak passenger train lie
in the mud at the edge a marsh after the train,
bound for Boston from Washington, derailed on the
bank of the Hackensack River, just after crossing
a bridge.
4
Two Paradigms of Research

Machine learning (ML) techniques
Common in the literature
Usually involve the exploration of new algorithms
applied to bag of words representations of
documents
Novel representation
Rare in the literature
Usually more specific, but often interesting and
can lead to substantial improvement
Important for certain tasks involving images!

5
Contributions

General
An in-depth exploration of the categorization of
images based on associated text
Incorporating research into Newsblaster
Novel machine learning (ML) techniques
The creation of two novel TC approaches
The combination of high-precision/low-recall
rules with other systems
Novel representation
The use of Natural Language Processing (NLP)
techniques
The use of low-level image features

6
Framework

Collection of Experiments
Various tasks
Multiple techniques
No clear winner for all tasks
Characteristics of tasks often dictate which
techniques work best
No Free Lunch

7
Overview

The Main Idea
Description of Corpus
Novel ML Systems
NLP Based System
High-Precision/Low-Recall Rules
Image Features
Newsblaster
Conclusions and Future Work

8
Corpus

Raw data
Postings from news related Usenet newsgroups
Over 2000 include embedded captioned images
Data sets
Multiple sets of categories representing various
levels of abstraction
Mutually exclusive and exhaustive categories

9
Indoor
Outdoor
10
Events Categories
Politics
Struggle
Disaster
Crime
Other
11
Subcategories for Disaster Images
Category F1
Politics 89
Struggle 88
Disaster 97
Crime 90
Other 59
Politics
Struggle
Disaster
Crime
Other
12
Disaster Image Categories
Affected People
Workers Responding
Other
Wreckage
13
Subcategories for Politics Images
Category F1
Politics 89
Struggle 88
Disaster 97
Crime 90
Other 59
Politics
Struggle
Disaster
Crime
Other
14
Politics Image Categories
Meeting
Civilians
Announcement
Other
Military
Politician Photographed
15
Collect Labels to Train Systems
16
Overview

The Main Idea
Description of Corpus
Novel ML Systems
NLP Based System
High-Precision/Low-Recall Rules
Image Features
Newsblaster
Conclusions and Future Work

17
Two Novel ML Approaches

Density estimation
Can be applied to the results of any other system
that calculates a similarity score for every
category
Often improves performance
Provides probabilistic confidence measures for
predictions
BINS
Uses binning to estimate accurate term weights
for words with scarce evidence
Smoothing leads to robust performance
Extremely competitive for two data sets in my
corpus

18
Density Estimation

First apply a standard system
For each document, compute a similarity or score
for every category.
Apply to training documents as well as test
documents.
For each test document
Find all documents from training set with similar
category scores.
Use categories of close training documents to
predict categories of test documents.

19
Density Estimation Example
Category score vector for test document
Category score vectors for training documents
Actual Categories
85, 35, 25, 95, 20
Distances
(Crime)
Struggle ?
Politics ?
Disaster ?
20.0
Crime ?
Other ?
100, 75, 20, 30, 5
(Struggle)
92.5
100, 40, 30, 90, 10
106.4
40, 30, 80, 25, 40
(Disaster)
Predictions Rocchio/TFIDF Struggle DE Crime
(Probability .679)
27.4
91.4
80, 45, 20, 75, 10
(Struggle)
36.7
60, 95, 20, 30, 5
(Politics)
90, 25, 50, 110, 25
(Crime)
20
Density Estimation Significantly Improves
Performancefor the Indoor versus Outdoor Data Set
21
Density Estimation Slightly Degrades
Performancefor the Events Data Set
22
Density Estimation Sometimes Improves
Performance,Always Provides Confidence Measures
Indoor versus Outdoor
Events Politics, Struggle, Disaster, Crime, Other
23
Results of Density Estimation Experiments for the
Indoor versus Outdoor Data Set
Confidence Range of Images Overall Accuracy
High (P ? 0.9) 285 92.6
Medium (0.9 gt P ? 0.7) 98 75.5
Low (0.7 gt P ? 0.5) 62 72.6
Results of Density Estimation Experiments for the
Events Data Set
Confidence Range of Documents Overall Accuracy
High (P ? 0.9) 301 94.4
Medium (0.9 gt P ? 0.7) 68 79.4
Low (0.7 gt P ? 0.5) 60 53.3
Very Low (0.5 gt P) 14 42.9
24
BINS SystemNaïve Bayes Smoothing

Binning based on smoothing in the speech
recognition literature
Not enough training data to estimate term weights
for words with scarce evidence
Words with similar statistical features are
grouped into a common bin
Estimate a single weight for each bin
This weight is assigned to all words in the bin
Credible estimates even for small (or zero) counts

25
Binning Uses Statistical Features of Words
Intuition Word Indoor Category Count Outdoor Category Count Quantized IDF
Clearly Indoor conference 14 1 4
Clearly Indoor bed 1 0 8
Clearly Outdoor plane 0 9 5
Clearly Outdoor earthquake 0 4 6
Unclear speech 2 2 6
Unclear ceremony 3 8 5
26
plane

Sparse data
plane does not occur in any Indoor training
documents
Infinitely more likely to be Outdoor ???
Assign plane to bins of words with similar
features (e.g. IDF, category counts)
In first half of training set, plane appears
in
9 Outdoor documents
0 Indoor documents

27
Lambdas Weights

First half of training set Assign words to bins
Second half of training set Estimate term
weights

28
Lambdas for plane4.03 times more likely in an
Outdoor document
29
Binning ? Credible Log Likelihood Ratios
Intuition Word ?Indoor minus ?Outdoor Indoor Category Count Outdoor Category Count Quantized IDF
Clearly Indoor conference 4.84 14 1 4
Clearly Indoor bed 1.35 1 0 8
Clearly Outdoor plane -2.01 0 9 5
Clearly Outdoor earthquake -1.00 0 4 6
Unclear speech 0.84 2 2 6
Unclear ceremony -0.50 3 8 5
30
Lambdas Decrease with IDF
31
Methodology of BINS

Divide training set into two halves
First half used to determine bins for words
Second half used to determine lambdas for bins
For each test document
Map every word to a bin for each category
Add lambdas, obtaining a score for each category
Switch halves of training and repeat
Combine results and assign each document to
category with highest score

32
Binning Improves Performancefor the Indoor
versus Outdoor Data Set
33
Binning Improves Performancefor the Events Data
Set
34
BINS Robust Version of Naïve Bayes
Indoor versus Outdoor
Events Politics, Struggle, Disaster, Crime, Other
35
Combining Bin Weights and Naïve Bayes Weights

Idea
It might be better to use the Naïve Bayes weight
when there is enough evidence for a word
Back off to the bin weight otherwise
BINS allows combinations of weights to be used
based on the level of evidence
How can we automatically determine when to use
which weights???
Entropy
Minimum Squared Error (MSE)

36
BINS Allows User to Combine Weights
Based on Entropy
Based on MSE
0 0.25 0.5 0.75 1
Use only bin weight for evidence of 0
0 0.5 1
Average bin weight and NB weight for evidence of 1
Use only NB weight for evidence of 2 or more
COMBO 1
COMBO 2
37
Appropriately Combining the Bin Weight and the
Naïve Bayes Weight Leads to the Best Performance
Yet
Indoor versus Outdoor
Events Politics, Struggle, Disaster, Crime, Other
38
BINS Performs the Best of All Systems Tested
Indoor versus Outdoor
Events Politics, Struggle, Disaster, Crime, Other
39
How Can We Improve Results?

One idea Label more documents!
Usually works
Boring
Another idea Use unlabeled documents!
Easily obtainable
But can this really work???
Maybe it can

40
Binning Using Unlabeled Documents

Apply system to unlabeled documents
Choose documents with confident predictions
Each word has new feature of unlabeled
documents containing the word that are
confidently predicted to belong to each category
(unlabeled category counts)
Probably less important than regular category
counts
Binning provides a natural mechanism for
weighting the new feature appropriately

41
Determining Confident Predictions

BINS computes a score for each category
BINS predicts category with highest score
Confidence for predicted category is score of
that category minus score of second place
category
Confidence for non-predicted category is score of
that category minus score of chosen category
Cross validation experiments can be used to
determine a confidence cutoff for each category
Maximize F? for category
Beta of 1 gives precision and recall equal
weight, lower beta weights precision higher

42
Use F? to Optimize Confidence Cutoffs (example
for a single category)
43
Use F? to Optimize Confidence Cutoffs (important
region of graph highlighted)
44
Should the New Feature Matter?
45
Does the New Feature Help?

No
Why???
New features add info but make bins smaller
Perhaps more data isnt needed in the first place
Should more data matter?
Hard to accumulate more labeled data
Easy to try out less labeled data!

46
Does Size Matter?
47
Overview

The Main Idea
Description of Corpus
Novel ML Systems
NLP Based System
High-Precision/Low-Recall Rules
Image Features
Newsblaster
Conclusions and Future Work

48
Disaster Image Categories
Affected People
Workers Responding
Other
Wreckage
49
Performance of Standard SystemsNot Very
Satisfying
50
Ambiguity for Disaster ImagesWorkers Responding
vs. Affected People
Philippine rescuers carry a fire victim March 19
who perished in a blaze at a Manila disco.
Hypothetical alternative caption A fire victim
who perished in a blaze at a Manila disco is
carried by Philippine rescuers March 19.
51
Summary of Observations About Task
Philippine rescuers carry a fire victim March 19
who perished in a blaze at a Manila disco.

Need to distinguish foreground from background,
determine focus of image
Not all words are important some are misleading
Hypothesis the main subject and verb are
particularly useful for this task
Problematic for bag of words approaches
Need linguistic analysis to determine predicate
argument relationships

52
Hypothesis Subject and Verbare Useful Clues
Subject Verb Category Guessable?
Truck makes Wreckage No
couple mourn Affected People Yes
blocks suffered Wreckage Yes
NAME gather Affected People No
child sleeps Affected People Yes
inspectors search Workers Responding Yes
NAME observes Workers Responding No
workers confer Workers Responding Yes
child covers Affected People Yes
chimney stands Wreckage Yes
53
Experiments with Humans Subjects 4
ConditionsTest Hypothesis Subject and Verb are
Useful Clues
SENT First sentence of caption Philippine rescuers carry a fire victim March 19 who perished in a blaze at a Manila disco.
RAND All words from first sentence in random order At perished disco who Manila a a in 19 carry Philippine blaze victim a rescuers March fire
IDF Top two TFIDF words disco rescuers
S-V Subject and verb subject rescuers, verb carry
54
Experiments with Humans Subjects
ResultsHypothesis Subject and Verb are Useful
Clues

Syntax is important
SENT gt RAND
S-V gt IDF
Subject and verb are especially important

Condition Average Time (in seconds)
RAND 68
SENT 34
IDF 22
S-V 20
55
Experiments with Humans Subjects
ResultsHypothesis Subject and Verb are Useful
Clues

More words are better than fewer words
SENT, RAND gt S-V, IDF
Syntax is important
SENT gt RAND S-V gt IDF

Condition Average Time (in seconds)
RAND 68
SENT 34
IDF 22
S-V 20
56
RAND is Very Slow!
Condition Average Time (in seconds)
RAND 68
SENT 34
IDF 22
S-V 20

Perhaps human subjects unscrambled words,
regaining syntactic information

57
Using Just Two Words (S-V)Almost as Good as All
the Words (Bag of Words)
58
Operational NLP Based System
Subjects 83.9
Verbs 80.6

Extract subjects and verbs from all documents in
training set

CASS shallow parser
Extract subject and verb
WordNet maps to base form
Sentence
POS tagger
Output

For each test document
Extract subject and verb
Compare to those from training set using a novel
method of word-to-word similarity
Based on similarities, generate a score for every
category

59
Word Similarity

Examine large extended corpus to generate many
subject/verb pairs
Use to compute similarities

60
Choosing a Category

For given test document d, calculate total score
for every category c
Choose category with highest score
If subject is NAME, a bit more complicated

61
The NLP Based System Beats All Others by a
Considerable Margin
62
Politics Image Categories
Meeting
Civilians
Announcement
Other
Military
Politician Photographed
63
The NLP Based System is in the Middle of the Pack
for the Politics Image Data Set
64
Why is the Performance for the NLP Based System
not as Strong for the Politics Image Data Set?

A much wider range of performance scores
Range for Politics images is 36 to 64.7
Range for Disaster images is 54 to 59.7
The top systems are harder to beat
Too many proper names as subjects
60 of test instances for Politics images
Only 13 of test instances for Disaster images
For 60 of test documents, only one word (the
main verb) is being used to determine the
prediction

65
Overview

The Main Idea
Description of Corpus
Novel ML Systems
NLP Based System
High-Precision/Low-Recall Rules
Image Features
Newsblaster
Conclusions and Future Work

66
The Original Premise

For the Disaster image data set, the performance
of the NLP based system still leaves room for
improvement
NLP based system achieves 65 overall accuracy
for the Disaster image data set
Humans viewing all words in random order achieve
about 75
Humans viewing full first sentence achieve over
90
Main subject and verb are particularly important,
but sometimes other words might offer good clues

67
Higinio Guereca carries family photos he
retrieved from his mobile home which was
destroyed as a tornado moved through the Central
Florida community, early December 27.
68
Choosing Indicative Words

Let x be the number of training documents
containing a word w
Let p be the proportion of these documents that
belong to category c
If x gt X and p gt P then w is indicative of c
X and P can be varied to generate lists of
indicative words
Lists can be pruned manually

69
Selected Indicative Words for the Disaster Image
Data Set
Word Indicated Category Total Count (x) Proportion (p)
her Affected People 7 1.0
his Affected People 7 0.86
family Affected People 6 0.83
relatives Affected People 6 1.0
rescue Workers Responding 15 1.0
search Workers Responding 9 1.0
similar Other 2 1.0
soldiers Workers Responding 6 1.0
workers Workers Responding 12 1.0
70
Selected Indicative Words for the Politics Image
Data Set
Word Indicated Category Total Count (x) Proportion (p)
hands Meeting 10 0.90
journalists Announcement 4 1.0
local Civilians 4 1.0
media Announcement 3 1.0
presidential Politician Photographed 9 0.78
press Announcement 7 0.71
reporters Announcement 8 0.88
meeting Meeting 15 0.73
session Meeting 6 0.83
victory Politician Photographed 6 0.83
waves Politician Photographed 4 1.0
wife Politician Photographed 6 1.0
71
High-Precision/Low-Recall Rules

If a word w that indicates category c occurs in a
document d, then assign d to c
Every selected indicative word has an associated
rule of the above form
Each rule is very accurate but rarely applicable
If only rules are used
most predictions will be correct (hence, high
precision)
most instances of most categories will remain
unlabeled (hence, low recall)

72
Combining the High-Precision/Low-Recall Rules
with Other Systems

Two-pass approach
Conduct a first-pass using the indicative words
and the high-precision/low-recall rules
For documents that are still unlabeled, fall back
to some other system
Compared to the fall back system
If the rules are more accurate for the documents
to which they apply, overall accuracy will
improve!
Intended to improve the NLP based system, but
easy to test with other systems as well

73
The Rules Improve Every Fall Back System for the
Disaster Image Data Set
74
The Rules Improve 7 of 8 Fall Back Systems for
the Politics Image Data Set
75
Overview

The Main Idea
Description of Corpus
Novel ML Systems
NLP Based System
High-Precision/Low-Recall Rules
Image Features
Newsblaster
Conclusions and Future Work

76
Low-Level Image Features

Collaboration with Paek and Benitez
They have provided me with information, pointers
to resources, and code
I have reimplemented some of their code
Color histograms
Based on entire images or image regions
Can be used as input to machine learning
approaches (e.g. kNN, SVMs)

77
Color

Three components to color
Red, green, blue (RGB)
Hue, saturation, value (HSV)
Can convert from RGB to HSV
Can quantize HSV triples
18 hues 3 saturations 3 values 4 grays
166 slots

78
Color Histograms

For each pixel of image, compute its quantized
HSV triple
Color histogram of image is vector such that
There are 166 dimensions
Each dimension represents one possible HSV triple
Value of dimension is proportion of pixels with
associated HSV triple
Can be computed for image regions and
concatenated together
Can be input for machine learning techniques

79
Images Divided into 8 x 8 Rectangular Regions of
Equal Size
80
Using Color Histograms to Predict Labels for the
Indoor versus Outdoor Data Set
81
Combining Text and Image Features

Combining systems has had mixed results in the TC
literature, but
Most attempts have involved systems that use the
same features (bag of words)
There is little reason to believe that indicative
text is correlated with indicative low-level
image features
Most text based systems are beating the image
based systems, but
Distance from optimal hyperplane can be used as a
confidence measure for support vector machine
Predictions with high confidence may be more
accurate than text systems

82
Accuracy of Support Vector Machine Approach Tends
to be Higher when Confidence is Greater
Distance Cutoff Overall Accuracy of Images Above Cutoff
3.5 --- 0.0
3.0 100.0 0.4
2.5 87.5 1.8
2.0 92.3 5.8
1.5 94.4 16.0
1.0 91.0 34.1
0.5 84.6 70.1
0.0 78.0 100.0
83
The Combination of Text and Image Beats Text
AloneMost systems show small gains, one has
major improvement
84
Overview

The Main Idea
Description of Corpus
Novel ML Systems
NLP Based System
High-Precision/Low-Recall Rules
Image Features
Newsblaster
Conclusions and Future Work

85
(No Transcript)
86
Newsblaster Categories
U.S. News
World News
Finance
Entertainment
Science/Technology
Sports
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
Newsblaster

A pragmatic showcase for NLP
My contributions
Extraction of images and captions from web pages
Image browsing interface
Categorization of stories (clusters) and images
Scripts that allow users to suggest labels for
articles with incorrect predictions

95
Overview

The Main Idea
Description of Corpus
Novel ML Systems
NLP Based System
High-Precision/Low-Recall Rules
Image Features
Newsblaster
Conclusions and Future Work

96
Summary

Examined several methods of categorizing images
No clear winner for all tasks
BINS is very competitive
NLP can lead to substantial improvement, at least
for certain tasks
High-precision/low-recall rules are likely to
improve performance for tough tasks
Image features show promise
Newsblaster demonstrates pragmatic benefits of my
work

97
Conclusions

TC techniques can be used to categorize images
Approach that should be used depends on the
specific task being considered
Important and timely
increased commonality of images on the web, large
corpora of images, and personal collections of
images.
Tools will be needed for better browsing,
searching, and filtering

98
Future Work