Title: Get Another Label Improving Data Quality and Data Mining Using Multiple, Noisy Labelers
1Get Another Label? Improving Data Quality and
Data Mining Using Multiple, Noisy Labelers
Victor Sheng Foster Provost Panos Ipeirotis
- New York University
- Stern School
2Outsourcing KDD preprocessing
- Traditionally, data mining teams have invested
substantial internal resources in data
formulation, information extraction, cleaning,
and other preprocessing - Raghu from his Innovation Lecture
- the best you can expect are noisy labels
- Now, we can outsource preprocessing tasks, such
as labeling, feature extraction, verifying
information extraction, etc. - using Mechanical Turk, Rent-a-Coder, etc.
- quality may be lower than expert labeling (much?)
- but low costs can allow massive scale
- The ideas may apply also to focusing
user-generated tagging, crowdsourcing, etc.
2
3ESP Game (by Luis von Ahn)
3
4Other free labeling schemes
- Open Mind initiative (www.openmind.org)
- Other gwap games
- Tag a Tune
- Verbosity (tag words)
- Matchin (image ranking)
- Web 2.0 systems?
- Can/should tagging be directed?
5Noisy labels can be problematic
- Many tasks rely on high-quality labels for
objects - learning predictive models
- searching for relevant information
- finding duplicate database records
- image recognition/labeling
- song categorization
- Noisy labels can lead to degraded task
performance
5
6Quality and Classification Performance
Here, labels are values for target variable
- Labeling quality increases ? classification
quality increases
P 1.0
P 0.8
P 0.6
P 0.5
7Summary of results
- Repeated labeling can improve data quality and
model quality (but not always) - When labels are noisy, repeated labeling can be
preferable to single labeling even when labels
arent particularly cheap - When labels are relatively cheap, repeated
labeling can do much better (omitted) - Round-robin repeated labeling does well
- Selective repeated labeling improves substantially
8Our Focus Labeling using Multiple Noisy
Labelers
- Repeated labeling and data quality
- Repeated labeling and classification quality
- Selective repeated labeling
9Majority Voting and Label Quality
- Ask multiple labelers, keep majority label as
true label - Quality is probability of being correct
P1.0
P0.9
P0.8
P is probabilityof individual labelerbeing
correct
P0.7
P0.6
P0.5
P0.4
10Tradeoffs for Modeling
- Get more labels ? Improve label quality ? Improve
classification - Get more examples ? Improve classification
P 1.0
P 0.8
P 0.6
P 0.5
11Basic Labeling Strategies
- Single Labeling
- Get as many data points as possible
- one label each
- Round-robin Repeated Labeling
- Fixed Round Robin (FRR)
- keep labeling the same set of points
- Generalized Round Robin (GRR)
- repeatedly-label data points, giving next label
to point with fewest so far
12Fixed Round Robin vs. Single Labeling
FRR (100 examples)
SL
p 0.6, labeling quality examples 100
With high noise, repeated labeling better than
single labeling
13Fixed Round Robin vs. Single Labeling
Single
FRR (50 examples)
p 0.8, labeling quality examples 50
With low noise, more (single labeled) examples
better
14Gen. Round Robin vs. Single Labeling
P labeling quality k labels
P0.6, k5
GRR
SL
Repeated labeling is better than single labeling
15Tradeoffs for Modeling
- Get more labels ? Improve label quality ? Improve
classification - Get more examples ? Improve classification
P 1.0
P 0.8
P 0.6
P 0.5
15
16Selective Repeated-Labeling
- We have seen
- With enough examples and noisy labels, getting
multiple labels is better than single-labeling - When we consider costly preprocessing, the
benefit is magnified (omitted -- see paper) - Can we do better than the basic strategies?
- Key observation we have additional information
to guide selection of data for repeated labeling - the current multiset of labels
- Example ,-,,,-, vs. ,,,
17Natural Candidate Entropy
- Entropy is a natural measure of label
uncertainty - E(,,,,,)0
- E(,-, ,-, ,- )1
- Strategy Get more labels for examples with
high-entropy label multisets
18What Not to Do Use Entropy
Improves at first, hurts in long run
19Why not Entropy
- In the presence of noise, entropy will be high
even with many labels - Entropy is scale invariant
- (3 , 2-) has same entropy as (600 , 400-)
20Estimating Label Uncertainty (LU)
- Observe s and s and compute Probs and
Pr-obs - Label uncertainty tail of beta distribution
Beta probability density function
SLU
0.5
0.0
1.0
21Label Uncertainty
- p0.7
- 5 labels(3, 2-)
- Entropy 0.97
- CDFb0.34
22Label Uncertainty
- p0.7
- 10 labels(7, 3-)
- Entropy 0.88
- CDFb0.11
23Label Uncertainty
- p0.7
- 20 labels(14, 6-)
- Entropy 0.88
- CDFb0.04
24Label Uncertainty vs. Round Robin
similar results across a dozen data sets
25RecallGen. Round Robin vs. Single Labeling
P labeling quality k labels
P0.6, k5
GRR
SL
Multi-labeling is better than single labeling
26Label Uncertainty vs. Round Robin
26
similar results across a dozen data sets
27Another strategyModel Uncertainty (MU)
- Learning a model of the data provides an
alternative source of information about label
certainty - Model uncertainty get more labels for instances
that cannot be modeled well - Intuition?
- for data quality, low-certainty regions may be
due to incorrect labeling of corresponding
instances - for modeling why improve training data quality
if model already is certain there?
28Yet another strategyLabel Model Uncertainty
(LMU)
- Label and model uncertainty (LMU) avoid examples
where either strategy is certain
29Comparison
Model Uncertainty alone also improves quality
Label Model Uncertainty
Label Uncertainty
GRR
30Comparison Model Quality
Across 12 domains, LMU is always better than
GRR. LMU is statistically significantly better
than LU and MU.
Label Model Uncertainty
31Summary of results
- Micro-task outsourcing (e.g., MTurk, RentaCoder
ESP game) has changed the landscape for data
formulation - Repeated labeling can improve data quality and
model quality (but not always) - When labels are noisy, repeated labeling can be
preferable to single labeling even when labels
arent particularly cheap - When labels are relatively cheap, repeated
labeling can do much better (omitted) - Round-robin repeated labeling can do well
- Selective repeated labeling improves substantially
32Opens up many new directions
- Strategies using learning-curve gradient
- Estimating the quality of each labeler
- Example-conditional quality
- Increased compensation vs. labeler quality
- Multiple real labels
- Truly soft labels
- Selective repeated tagging
33Thanks!Q A?
34What if different labelers have different
qualities?
- (Sometimes) quality of multiple noisy labelers is
better than quality of best labeler in set - here, 3 labelers
- p-d, p, pd
35Mechanical Turk Example
36Estimating Labeler Quality
- (Dawid, Skene 1979) Multiple diagnoses
- Assume equal qualities
- Estimate true labels for examples
- Estimate qualities of labelers given the true
labels - Repeat until convergence