Robust Statistical Techniques for the Categorization of Images Using Associated Text - PowerPoint PPT Presentation

1 / 99
About This Presentation
Title:

Robust Statistical Techniques for the Categorization of Images Using Associated Text

Description:

Robust Statistical Techniques for the Categorization of Images Using Associated Text – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 100
Provided by: CarlS158
Category:

less

Transcript and Presenter's Notes

Title: Robust Statistical Techniques for the Categorization of Images Using Associated Text


1
Robust Statistical Techniquesfor the
Categorization of ImagesUsing Associated Text
2
Text Categorization
  • Text categorization (TC) refers to the automatic
    labeling of documents, using natural language
    text contained in or associated with each
    document, into one or more pre-defined
    categories.
  • Idea TC techniques can be applied to image
    captions or articles to label the corresponding
    images.

3
Clues for Indoor versus OutdoorText (as opposed
to visual image features)
Denver Summit of Eight leaders begin their first
official meeting in the Denver Public Library,
June 21.
The two engines of an Amtrak passenger train lie
in the mud at the edge a marsh after the train,
bound for Boston from Washington, derailed on the
bank of the Hackensack River, just after crossing
a bridge.
4
Two Paradigms of Research
  • Machine learning (ML) techniques
  • Common in the literature
  • Usually involve the exploration of new algorithms
    applied to bag of words representations of
    documents
  • Novel representation
  • Rare in the literature
  • Usually more specific, but often interesting and
    can lead to substantial improvement
  • Important for certain tasks involving images!

5
Contributions
  • General
  • An in-depth exploration of the categorization of
    images based on associated text
  • Incorporating research into Newsblaster
  • Novel machine learning (ML) techniques
  • The creation of two novel TC approaches
  • The combination of high-precision/low-recall
    rules with other systems
  • Novel representation
  • The use of Natural Language Processing (NLP)
    techniques
  • The use of low-level image features

6
Framework
  • Collection of Experiments
  • Various tasks
  • Multiple techniques
  • No clear winner for all tasks
  • Characteristics of tasks often dictate which
    techniques work best
  • No Free Lunch

7
Overview
  1. The Main Idea
  2. Description of Corpus
  3. Novel ML Systems
  4. NLP Based System
  5. High-Precision/Low-Recall Rules
  6. Image Features
  7. Newsblaster
  8. Conclusions and Future Work

8
Corpus
  • Raw data
  • Postings from news related Usenet newsgroups
  • Over 2000 include embedded captioned images
  • Data sets
  • Multiple sets of categories representing various
    levels of abstraction
  • Mutually exclusive and exhaustive categories

9
Indoor
Outdoor
10
Events Categories
Politics
Struggle
Disaster
Crime
Other
11
Subcategories for Disaster Images
Category F1
Politics 89
Struggle 88
Disaster 97
Crime 90
Other 59
Politics
Struggle
Disaster
Crime
Other
12
Disaster Image Categories
Affected People
Workers Responding
Other
Wreckage
13
Subcategories for Politics Images
Category F1
Politics 89
Struggle 88
Disaster 97
Crime 90
Other 59
Politics
Struggle
Disaster
Crime
Other
14
Politics Image Categories
Meeting
Civilians
Announcement
Other
Military
Politician Photographed
15
Collect Labels to Train Systems
16
Overview
  1. The Main Idea
  2. Description of Corpus
  3. Novel ML Systems
  4. NLP Based System
  5. High-Precision/Low-Recall Rules
  6. Image Features
  7. Newsblaster
  8. Conclusions and Future Work

17
Two Novel ML Approaches
  • Density estimation
  • Can be applied to the results of any other system
    that calculates a similarity score for every
    category
  • Often improves performance
  • Provides probabilistic confidence measures for
    predictions
  • BINS
  • Uses binning to estimate accurate term weights
    for words with scarce evidence
  • Smoothing leads to robust performance
  • Extremely competitive for two data sets in my
    corpus

18
Density Estimation
  • First apply a standard system
  • For each document, compute a similarity or score
    for every category.
  • Apply to training documents as well as test
    documents.
  • For each test document
  • Find all documents from training set with similar
    category scores.
  • Use categories of close training documents to
    predict categories of test documents.

19
Density Estimation Example
Category score vector for test document
Category score vectors for training documents
Actual Categories
85, 35, 25, 95, 20
Distances
(Crime)
Struggle ?
Politics ?
Disaster ?
20.0
Crime ?
Other ?
100, 75, 20, 30, 5
(Struggle)
92.5
100, 40, 30, 90, 10
106.4
40, 30, 80, 25, 40
(Disaster)
Predictions Rocchio/TFIDF Struggle DE Crime
(Probability .679)
27.4
91.4
80, 45, 20, 75, 10
(Struggle)
36.7
60, 95, 20, 30, 5
(Politics)
90, 25, 50, 110, 25
(Crime)
20
Density Estimation Significantly Improves
Performancefor the Indoor versus Outdoor Data Set
21
Density Estimation Slightly Degrades
Performancefor the Events Data Set
22
Density Estimation Sometimes Improves
Performance,Always Provides Confidence Measures
Indoor versus Outdoor
Events Politics, Struggle, Disaster, Crime, Other
23
Results of Density Estimation Experiments for the
Indoor versus Outdoor Data Set
Confidence Range of Images Overall Accuracy
High (P ? 0.9) 285 92.6
Medium (0.9 gt P ? 0.7) 98 75.5
Low (0.7 gt P ? 0.5) 62 72.6
Results of Density Estimation Experiments for the
Events Data Set
Confidence Range of Documents Overall Accuracy
High (P ? 0.9) 301 94.4
Medium (0.9 gt P ? 0.7) 68 79.4
Low (0.7 gt P ? 0.5) 60 53.3
Very Low (0.5 gt P) 14 42.9
24
BINS SystemNaïve Bayes Smoothing
  • Binning based on smoothing in the speech
    recognition literature
  • Not enough training data to estimate term weights
    for words with scarce evidence
  • Words with similar statistical features are
    grouped into a common bin
  • Estimate a single weight for each bin
  • This weight is assigned to all words in the bin
  • Credible estimates even for small (or zero) counts

25
Binning Uses Statistical Features of Words
Intuition Word Indoor Category Count Outdoor Category Count Quantized IDF
Clearly Indoor conference 14 1 4
Clearly Indoor bed 1 0 8
Clearly Outdoor plane 0 9 5
Clearly Outdoor earthquake 0 4 6
Unclear speech 2 2 6
Unclear ceremony 3 8 5
26
plane
  • Sparse data
  • plane does not occur in any Indoor training
    documents
  • Infinitely more likely to be Outdoor ???
  • Assign plane to bins of words with similar
    features (e.g. IDF, category counts)
  • In first half of training set, plane appears
    in
  • 9 Outdoor documents
  • 0 Indoor documents

27
Lambdas Weights
  • First half of training set Assign words to bins
  • Second half of training set Estimate term
    weights

28
Lambdas for plane4.03 times more likely in an
Outdoor document
29
Binning ? Credible Log Likelihood Ratios
Intuition Word ?Indoor minus ?Outdoor Indoor Category Count Outdoor Category Count Quantized IDF
Clearly Indoor conference 4.84 14 1 4
Clearly Indoor bed 1.35 1 0 8
Clearly Outdoor plane -2.01 0 9 5
Clearly Outdoor earthquake -1.00 0 4 6
Unclear speech 0.84 2 2 6
Unclear ceremony -0.50 3 8 5
30
Lambdas Decrease with IDF
31
Methodology of BINS
  • Divide training set into two halves
  • First half used to determine bins for words
  • Second half used to determine lambdas for bins
  • For each test document
  • Map every word to a bin for each category
  • Add lambdas, obtaining a score for each category
  • Switch halves of training and repeat
  • Combine results and assign each document to
    category with highest score

32
Binning Improves Performancefor the Indoor
versus Outdoor Data Set
33
Binning Improves Performancefor the Events Data
Set
34
BINS Robust Version of Naïve Bayes
Indoor versus Outdoor
Events Politics, Struggle, Disaster, Crime, Other
35
Combining Bin Weights and Naïve Bayes Weights
  • Idea
  • It might be better to use the Naïve Bayes weight
    when there is enough evidence for a word
  • Back off to the bin weight otherwise
  • BINS allows combinations of weights to be used
    based on the level of evidence
  • How can we automatically determine when to use
    which weights???
  • Entropy
  • Minimum Squared Error (MSE)

36
BINS Allows User to Combine Weights
Based on Entropy
Based on MSE
0 0.25 0.5 0.75 1
Use only bin weight for evidence of 0
0 0.5 1
Average bin weight and NB weight for evidence of 1
Use only NB weight for evidence of 2 or more
COMBO 1
COMBO 2
37
Appropriately Combining the Bin Weight and the
Naïve Bayes Weight Leads to the Best Performance
Yet
Indoor versus Outdoor
Events Politics, Struggle, Disaster, Crime, Other
38
BINS Performs the Best of All Systems Tested
Indoor versus Outdoor
Events Politics, Struggle, Disaster, Crime, Other
39
How Can We Improve Results?
  • One idea Label more documents!
  • Usually works
  • Boring
  • Another idea Use unlabeled documents!
  • Easily obtainable
  • But can this really work???
  • Maybe it can

40
Binning Using Unlabeled Documents
  • Apply system to unlabeled documents
  • Choose documents with confident predictions
  • Each word has new feature of unlabeled
    documents containing the word that are
    confidently predicted to belong to each category
    (unlabeled category counts)
  • Probably less important than regular category
    counts
  • Binning provides a natural mechanism for
    weighting the new feature appropriately

41
Determining Confident Predictions
  • BINS computes a score for each category
  • BINS predicts category with highest score
  • Confidence for predicted category is score of
    that category minus score of second place
    category
  • Confidence for non-predicted category is score of
    that category minus score of chosen category
  • Cross validation experiments can be used to
    determine a confidence cutoff for each category
  • Maximize F? for category
  • Beta of 1 gives precision and recall equal
    weight, lower beta weights precision higher

42
Use F? to Optimize Confidence Cutoffs (example
for a single category)
43
Use F? to Optimize Confidence Cutoffs (important
region of graph highlighted)
44
Should the New Feature Matter?
45
Does the New Feature Help?
  • No
  • Why???
  • New features add info but make bins smaller
  • Perhaps more data isnt needed in the first place
  • Should more data matter?
  • Hard to accumulate more labeled data
  • Easy to try out less labeled data!

46
Does Size Matter?
47
Overview
  1. The Main Idea
  2. Description of Corpus
  3. Novel ML Systems
  4. NLP Based System
  5. High-Precision/Low-Recall Rules
  6. Image Features
  7. Newsblaster
  8. Conclusions and Future Work

48
Disaster Image Categories
Affected People
Workers Responding
Other
Wreckage
49
Performance of Standard SystemsNot Very
Satisfying
50
Ambiguity for Disaster ImagesWorkers Responding
vs. Affected People
Philippine rescuers carry a fire victim March 19
who perished in a blaze at a Manila disco.
Hypothetical alternative caption A fire victim
who perished in a blaze at a Manila disco is
carried by Philippine rescuers March 19.
51
Summary of Observations About Task
Philippine rescuers carry a fire victim March 19
who perished in a blaze at a Manila disco.
  • Need to distinguish foreground from background,
    determine focus of image
  • Not all words are important some are misleading
  • Hypothesis the main subject and verb are
    particularly useful for this task
  • Problematic for bag of words approaches
  • Need linguistic analysis to determine predicate
    argument relationships

52
Hypothesis Subject and Verbare Useful Clues
Subject Verb Category Guessable?
Truck makes Wreckage No
couple mourn Affected People Yes
blocks suffered Wreckage Yes
NAME gather Affected People No
child sleeps Affected People Yes
inspectors search Workers Responding Yes
NAME observes Workers Responding No
workers confer Workers Responding Yes
child covers Affected People Yes
chimney stands Wreckage Yes
53
Experiments with Humans Subjects 4
ConditionsTest Hypothesis Subject and Verb are
Useful Clues
SENT First sentence of caption Philippine rescuers carry a fire victim March 19 who perished in a blaze at a Manila disco.
RAND All words from first sentence in random order At perished disco who Manila a a in 19 carry Philippine blaze victim a rescuers March fire
IDF Top two TFIDF words disco rescuers
S-V Subject and verb subject rescuers, verb carry
54
Experiments with Humans Subjects
ResultsHypothesis Subject and Verb are Useful
Clues
  • Syntax is important
  • SENT gt RAND
  • S-V gt IDF
  • Subject and verb are especially important

Condition Average Time (in seconds)
RAND 68
SENT 34
IDF 22
S-V 20
55
Experiments with Humans Subjects
ResultsHypothesis Subject and Verb are Useful
Clues
  • More words are better than fewer words
  • SENT, RAND gt S-V, IDF
  • Syntax is important
  • SENT gt RAND S-V gt IDF

Condition Average Time (in seconds)
RAND 68
SENT 34
IDF 22
S-V 20
56
RAND is Very Slow!
Condition Average Time (in seconds)
RAND 68
SENT 34
IDF 22
S-V 20
  • Perhaps human subjects unscrambled words,
    regaining syntactic information

57
Using Just Two Words (S-V)Almost as Good as All
the Words (Bag of Words)
58
Operational NLP Based System
Subjects 83.9
Verbs 80.6
  • Extract subjects and verbs from all documents in
    training set

CASS shallow parser
Extract subject and verb
WordNet maps to base form
Sentence
POS tagger
Output
  • For each test document
  • Extract subject and verb
  • Compare to those from training set using a novel
    method of word-to-word similarity
  • Based on similarities, generate a score for every
    category

59
Word Similarity
  • Examine large extended corpus to generate many
    subject/verb pairs
  • Use to compute similarities

60
Choosing a Category
  • For given test document d, calculate total score
    for every category c
  • Choose category with highest score
  • If subject is NAME, a bit more complicated

61
The NLP Based System Beats All Others by a
Considerable Margin
62
Politics Image Categories
Meeting
Civilians
Announcement
Other
Military
Politician Photographed
63
The NLP Based System is in the Middle of the Pack
for the Politics Image Data Set
64
Why is the Performance for the NLP Based System
not as Strong for the Politics Image Data Set?
  • A much wider range of performance scores
  • Range for Politics images is 36 to 64.7
  • Range for Disaster images is 54 to 59.7
  • The top systems are harder to beat
  • Too many proper names as subjects
  • 60 of test instances for Politics images
  • Only 13 of test instances for Disaster images
  • For 60 of test documents, only one word (the
    main verb) is being used to determine the
    prediction

65
Overview
  1. The Main Idea
  2. Description of Corpus
  3. Novel ML Systems
  4. NLP Based System
  5. High-Precision/Low-Recall Rules
  6. Image Features
  7. Newsblaster
  8. Conclusions and Future Work

66
The Original Premise
  • For the Disaster image data set, the performance
    of the NLP based system still leaves room for
    improvement
  • NLP based system achieves 65 overall accuracy
    for the Disaster image data set
  • Humans viewing all words in random order achieve
    about 75
  • Humans viewing full first sentence achieve over
    90
  • Main subject and verb are particularly important,
    but sometimes other words might offer good clues

67
Higinio Guereca carries family photos he
retrieved from his mobile home which was
destroyed as a tornado moved through the Central
Florida community, early December 27.
68
Choosing Indicative Words
  • Let x be the number of training documents
    containing a word w
  • Let p be the proportion of these documents that
    belong to category c
  • If x gt X and p gt P then w is indicative of c
  • X and P can be varied to generate lists of
    indicative words
  • Lists can be pruned manually

69
Selected Indicative Words for the Disaster Image
Data Set
Word Indicated Category Total Count (x) Proportion (p)
her Affected People 7 1.0
his Affected People 7 0.86
family Affected People 6 0.83
relatives Affected People 6 1.0
rescue Workers Responding 15 1.0
search Workers Responding 9 1.0
similar Other 2 1.0
soldiers Workers Responding 6 1.0
workers Workers Responding 12 1.0
70
Selected Indicative Words for the Politics Image
Data Set
Word Indicated Category Total Count (x) Proportion (p)
hands Meeting 10 0.90
journalists Announcement 4 1.0
local Civilians 4 1.0
media Announcement 3 1.0
presidential Politician Photographed 9 0.78
press Announcement 7 0.71
reporters Announcement 8 0.88
meeting Meeting 15 0.73
session Meeting 6 0.83
victory Politician Photographed 6 0.83
waves Politician Photographed 4 1.0
wife Politician Photographed 6 1.0
71
High-Precision/Low-Recall Rules
  • If a word w that indicates category c occurs in a
    document d, then assign d to c
  • Every selected indicative word has an associated
    rule of the above form
  • Each rule is very accurate but rarely applicable
  • If only rules are used
  • most predictions will be correct (hence, high
    precision)
  • most instances of most categories will remain
    unlabeled (hence, low recall)

72
Combining the High-Precision/Low-Recall Rules
with Other Systems
  • Two-pass approach
  • Conduct a first-pass using the indicative words
    and the high-precision/low-recall rules
  • For documents that are still unlabeled, fall back
    to some other system
  • Compared to the fall back system
  • If the rules are more accurate for the documents
    to which they apply, overall accuracy will
    improve!
  • Intended to improve the NLP based system, but
    easy to test with other systems as well

73
The Rules Improve Every Fall Back System for the
Disaster Image Data Set
74
The Rules Improve 7 of 8 Fall Back Systems for
the Politics Image Data Set
75
Overview
  1. The Main Idea
  2. Description of Corpus
  3. Novel ML Systems
  4. NLP Based System
  5. High-Precision/Low-Recall Rules
  6. Image Features
  7. Newsblaster
  8. Conclusions and Future Work

76
Low-Level Image Features
  • Collaboration with Paek and Benitez
  • They have provided me with information, pointers
    to resources, and code
  • I have reimplemented some of their code
  • Color histograms
  • Based on entire images or image regions
  • Can be used as input to machine learning
    approaches (e.g. kNN, SVMs)

77
Color
  • Three components to color
  • Red, green, blue (RGB)
  • Hue, saturation, value (HSV)
  • Can convert from RGB to HSV
  • Can quantize HSV triples
  • 18 hues 3 saturations 3 values 4 grays
    166 slots

78
Color Histograms
  • For each pixel of image, compute its quantized
    HSV triple
  • Color histogram of image is vector such that
  • There are 166 dimensions
  • Each dimension represents one possible HSV triple
  • Value of dimension is proportion of pixels with
    associated HSV triple
  • Can be computed for image regions and
    concatenated together
  • Can be input for machine learning techniques

79
Images Divided into 8 x 8 Rectangular Regions of
Equal Size
80
Using Color Histograms to Predict Labels for the
Indoor versus Outdoor Data Set
81
Combining Text and Image Features
  • Combining systems has had mixed results in the TC
    literature, but
  • Most attempts have involved systems that use the
    same features (bag of words)
  • There is little reason to believe that indicative
    text is correlated with indicative low-level
    image features
  • Most text based systems are beating the image
    based systems, but
  • Distance from optimal hyperplane can be used as a
    confidence measure for support vector machine
  • Predictions with high confidence may be more
    accurate than text systems

82
Accuracy of Support Vector Machine Approach Tends
to be Higher when Confidence is Greater
Distance Cutoff Overall Accuracy of Images Above Cutoff
3.5 --- 0.0
3.0 100.0 0.4
2.5 87.5 1.8
2.0 92.3 5.8
1.5 94.4 16.0
1.0 91.0 34.1
0.5 84.6 70.1
0.0 78.0 100.0
83
The Combination of Text and Image Beats Text
AloneMost systems show small gains, one has
major improvement
84
Overview
  1. The Main Idea
  2. Description of Corpus
  3. Novel ML Systems
  4. NLP Based System
  5. High-Precision/Low-Recall Rules
  6. Image Features
  7. Newsblaster
  8. Conclusions and Future Work

85
(No Transcript)
86
Newsblaster Categories
U.S. News
World News
Finance
Entertainment
Science/Technology
Sports
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
Newsblaster
  • A pragmatic showcase for NLP
  • My contributions
  • Extraction of images and captions from web pages
  • Image browsing interface
  • Categorization of stories (clusters) and images
  • Scripts that allow users to suggest labels for
    articles with incorrect predictions

95
Overview
  1. The Main Idea
  2. Description of Corpus
  3. Novel ML Systems
  4. NLP Based System
  5. High-Precision/Low-Recall Rules
  6. Image Features
  7. Newsblaster
  8. Conclusions and Future Work

96
Summary
  • Examined several methods of categorizing images
  • No clear winner for all tasks
  • BINS is very competitive
  • NLP can lead to substantial improvement, at least
    for certain tasks
  • High-precision/low-recall rules are likely to
    improve performance for tough tasks
  • Image features show promise
  • Newsblaster demonstrates pragmatic benefits of my
    work

97
Conclusions
  • TC techniques can be used to categorize images
  • Approach that should be used depends on the
    specific task being considered
  • Important and timely
  • increased commonality of images on the web, large
    corpora of images, and personal collections of
    images.
  • Tools will be needed for better browsing,
    searching, and filtering

98
Future Work
  • BINS
  • Explore additional binning features
  • Explore use of unlabeled data
  • NLP and TC
  • Improve current system
  • Explore additional categories
  • Image features
  • Explore additional low-level image features
  • Explore better methods of combining text and
    image
  • Pragmatic benefits
  • Investigate end user applications
  • Expand to video (perhaps using closed captions)

99
And Now the Questions
Write a Comment
User Comments (0)
About PowerShow.com