Title: Data Annotation using Human Computation
1Data Annotation using Human Computation
- HOANG Cong Duy Vu
- 07/10/2009
2Outline
- Introduction
- Data Annotation with GWAP
- Data Annotation with AMT
- Characterization of Multiple Dimensions
- Correlation Analysis
- Future Directions
- Conclusion
3Introduction
- Data annotation refers to the task of adding
specific information to raw data - In computational linguistics, various information
such as morphology, POS, syntax, semantics,
discourse - In computer vision, some information such as
image labels, regions, video descriptions
4Introduction
- Annotated data is extremely important for
computational learning problems and training AI
algorithms. - But, also very non-trivial tasks to obtain due to
- Ambiguity in processing information
- Money/time consuming, labor-intensive and
error-prone process
5Introduction
- Motivated facts
- Gaming data 1
- Each day, more than 200 million hours spent
playing games in the U.S - by age 21 of American, the average number of more
than 10,000 hours playing games, equivalent to
five years of working a full-time job 40 hours
per week. - With explosion of web services, consider to take
advantage of community popularity
How can we leverage this for annotation?
6Introduction
- Human computation emerges as a viable synergy for
data annotation. - Its main idea is to harness what humans are good
at but machines are poor at. - Use the ability and speed of community solving
some particular tasks - Computer programs can simultaneously be used for
other purposes (e.g. educational entertainment)
7Introduction
- What is human computation?
- is a CS technique in which a computational
process performs its function by outsourcing
certain steps to humans. (from Wikipedia) - Also called as human-based computation, human
computing - More general term Crowdsourcing
- Typical frameworks
- Game With A Purpose (GWAP)
- Amazon Mechanical Turk (AMT)
8Data Annotation with GWAP
- GWAP - Game With A Purpose
- Pioneered by Luis von Ahn at CMU in his PhD
thesis in 2005 - GWAPs are online games with special mechanism
- Humans enjoy playing games provided by computers
- Humans help computers do implicit annotation
tasks integrated in such games
9Data Annotation with GWAP
Developers build everything for both server and
clients Bots sometimes developers create bots
which play a role as real players since number of
players in GWAP are limited at any certain
time Players people who play game, pairwise
interaction GUI Graphical User Interface Data
sources data need to be annotated
10Data Annotation with GWAP
- Input-output mechanism of GWAP 1
11Data Annotation with GWAP
Captured from http//www.gwap.com
Computer Vision Game
12Data Annotation with GWAP
- Example 2 Video description tagging
Captured from http//www.gwap.com
Computer Vision Game
13Data Annotation with GWAP
- Example 3 Online word games
Captured from http//wordgame.stanford.edu/freeAss
ociation.html
Natural Language Processing Game
14Data Annotation with GWAP
- Recently, there are various GWAP games developed
in wide range of AI domain - Computer Vision ESPGame, Peekaboom, TagATune,
Google Image Labeler, - Semantic web OntoGame, Verbosity
- Natural Language Processing Word Online Games
(Categorilla, Categodzilla, and Free
Association), Phrase Detectives
15Data Annotation with GWAP
- Results obtained so far
- ESP Game Dataset 1 (CMU) 100,000 images with
English labels from ESPGame (1.8Gb) - Online word games 2 (Stanford) 800,000 data
instances for semantic processing - TagATune music data1 (CMU) 25863 clips, 5405
source mp3 with 188 unique tags
1 from http//musicmachinery.com/2009/04/01/magn
atagatune-a-new-research-data-set-for-mir/
16Data Annotation with GWAP
- Advantages
- Free
- People always love playing games
- Fun, attractive and sometimes addictive
- Disadvantages
- Highly visual design for game requires much
efforts - Integration of annotation tasks into games is
hard, equivalently to thinking up algorithms - Very hard to design GWAP games for complex
processing tasks
17Data Annotation with GWAP
- Players feel fun and enjoy the games
- More players play the game, more annotated data
people obtain. - Question if games not fun, whether we can still
attract much people to join? - Viable answer Amazon Mechanical Turk (AMT) ???
18Data Annotation with AMT
- AMT Amazon Mechanical Turk
- one of the tools of Amazon Web Services
- a wide-range marketplace for work
- utilize human intelligence to generate tasks
which computers are unable to do but humans can
do effectively - Located at https//www.mturk.com/mturk/
19Data Annotation with AMT
Captured from https//www.mturk.com/mturk/
20Data Annotation with AMT
Requesters will define tasks using GUI
interactive interface using APIs provided by AMT,
known as HITs (Human Intelligence Tasks) HIT
each HIT allows requesters to generate task
instructions, required qualifications, duration,
or reward by money. Broker web services playing
an intermediate role to supply, assist and unveil
everything Workers people who want to solve HIT
tasks to earn money
21Data Annotation with AMT
22Data Annotation with AMT
- Statistics related to AMT
- The AMT service was initially launched publicly
in 2005 - According to report1 in March 2007, there were
more than 100,000 workers in over one hundred
countries
Why it can attract a lot of participants?
1 from http//en.wikipedia.org/wiki/Amazon_Mechani
cal_Turk
23Data Annotation with AMT
- It seems that AMT has wider range due to its ease
and simplicity - Results obtained so far
- Statistics from Amazon website
- 69,452 HITs currently available
- Some of them make annotated data public
- Sorokin 3 (UIUC) 25000 annotated images with
costs 800 - Snow 4 (Stanford) linguistics annotation
(WSD, temporal ordering, word similarity, )
24Data Annotation with AMT
- Advantages
- For users
- HITs are not so hard to solve
- Easily earn money but still remain for relaxing
purpose - For developers
- APIs provided by AMT help build HITs easily
- Diverse demographics of users on Amazon website
- hopefully obtain large-scale annotated data very
quickly over time
25Data Annotation with AMT
- Disadvantages
- Hard to control and maintain the tradeoff between
data quantity and quality - Need effective strategies
26Data Annotation with AMT
- Example 1 Word Similarity
Captured from http//nlpannotations.googlepages.co
m/wordsim_sample.html
NLP task
27Data Annotation with AMT
Captured from http//visionpc.cs.uiuc.edu/largesc
ale/protocols/4/index.html
CV task
28Characterization of Multiple Dimensions
- Overview of the dimensions considered
- to create interfaces interacting with people
that participate in annotation process - should
be designed to ensure the objective of obtaining
large, clean and useful data
- discuss about the quality of annotation or
accuracy of annotated outputs
- Some factors relating to participants in
annotation process
- Need to figure out where and which data need to
be annotated
29Characterization of Multiple Dimensions
- Setup Effort
- UI design/Visual impact
- graphical characteristics in user interface
design - substantial factor that determines the efficiency
of annotation process - GWAPs need much effort to focus mainly on GUI to
make the game entertaining enough to motivate
players - But, AMT needs not much effort to build HIT tasks
but it should be designed funnily easily
attractively - Scale - none/low/basic/average/distinctive/excelle
nt
30Characterization of Multiple Dimensions
- Fun
- very significant factor because simply players
will not join if GWAP games have no fun - make fun design algorithms
- Some ways
- Timing (GWAP AMT)
- Scores, top scores, top players (GWAP)
- Levels (GWAP AMT)
- Money (AMT)
- Scale - none/low/fair/high/very high
31Characterization of Multiple Dimensions
- Payment
- To make annotation process more motivated
- For example,
- In GWAP, player pairs are raised scores
- In AMT, workers have monetary payment or bonus
from requesters - Scale - none/score/monetary payment
32Characterization of Multiple Dimensions
- Cheating
- Sometimes, unmotivated and lazy participants use
some tricks when doing annotation tasks - Some ways to avoid
- filter players by using IP address, locations,
training (GWAP) - Use qualification (AMT)
- Scale - none/low/possible/high (-)
33Characterization of Multiple Dimensions
- Implementation Cost
- Various costs
- Designing annotation tasks
- Creation of timing controllers
- Game mechanism (online or offline)
- Network infrastructure (client/server,
peer-to-peer) - Record and statistics (user scores, player skill,
qualification) - Building intelligent bots
- Scale - none/low/fair/high/very high
34Characterization of Multiple Dimensions
- Exposure
- Relating to high social impacts, letting people
know is very important - GWAP must itself do this by popularizing on
social webs, contributor sites and gaming sites - AMT under umbrella of web service of Amazon sites
-gt higher impact - Scale - none/low/fair/high
35Characterization of Multiple Dimensions
- Centralization
- measures whether there is a single entity or
owner that defines which tasks are being
presented to workers - In case of GWAP games1, there are currently 5
games right now. For AMT, anyone can define their
own tasks for their evaluation purpose - Scale - yes/no (-)
36Characterization of Multiple Dimensions
- Scale
- metric of how many tasks the system will be able
to accomplish - GWAP can produce extremely volumes of data,
because the operating costs are low - AMT scales really well, but it costs money
- For example
- if we have many millions of tasks to accomplish,
GWAP is a better approach - At 10,000 tasks, AMT will do well (and requires
less effort to setup and less effort to approve
submitted tasks) - Scale - none/low/fair/high/very high
37Characterization of Multiple Dimensions
- Annotation participation
- Number of participants
- utilize people at different skills to improve
diversity of quality of annotation data - A small study6 indicated that demographics of AMT
currently correlated with demographics of
Internet users - Scale - none/low/fair/high/very high
6 http//behind-the-enemy-lines.blogspot.com/2008/
03/mechanical-turk-demographics.html
38Characterization of Multiple Dimensions
- Motivation
- Exhibit the attractiveness of annotation systems
- Some of reasons
- for money
- for entertainment/fun
- for killing free time
- for challenge/self-competition,
- Scale - none/low/fair/high
39Characterization of Multiple Dimensions
- Interaction
- Different ways to create interaction of
participants - Scale - none/multiple without interaction/multiple
with pair-wise interaction/multiple with
multiple interaction
40Characterization of Multiple Dimensions
- Qualification
- limit required workers to ensure that only
qualified workers can do these tasks - Scale - none/low/fair/high
41Characterization of Multiple Dimensions
- Data Selection
- Size
- choose which data resources will be annotated
- Scale - none/small/fair/large/very large
42Characterization of Multiple Dimensions
- Coverage
- Coverage would mean whether the data covers the
expected real population and distribution of data - Scale - none/low/fair/high
43Characterization of Multiple Dimensions
- Quality of Annotation
- Annotation accuracy
- Use different strategies to control the quality
of annotation - Use repetition which is the process that does not
consider an output to be correct until a certain
number of players have entered it - use the post-processing steps to re-evaluate the
annotated data - Scale - none/low/fair/high/very high
44Characterization of Multiple Dimensions
- Inter-annotator agreement
- The inter-agreement means for measuring agreement
among data annotators - Scale - none/low/fair/high
45Characterization of Multiple Dimensions
- Quality control
- filter the bad data out, integrate correction
model to minimize errors during annotation
process - For example
- In AMT, developers will approve all submitted HIT
tasks, use the voting threshold to approve the
answers - In GWAP, check all data contributed by players
just after a fixed time - Scale - none/low/fair/high/very high
46Characterization of Multiple Dimensions
- Usability
- annotated data should be proved to be useful and
have a real world impact - Scale - none/low/fair/high
47Characterization of Multiple Dimensions
- Annotation Speed
- measure how many labels per day/hour/minute
people can obtain - Scale - none/slow/fair/fast
48Characterization of Multiple Dimensions
- Annotation Cost
- measure total cost to be paid to get annotated
data - Scale - none/cheap/fair/expensive
49Correlation Analysis
- To analyze correlation between dimensions
- Collect info of available human computation
systems so far - 28 popular systems with 4 types of human
computation
50Correlation Analysis
- Human computation systems
51Correlation Analysis
Pearson correlation matrix between dimensions
Statistics of rating of human computation systems
according to four dimensions
52Correlation Analysis
Relationships between Setup Effort vs. Annotation
Participation and Data Selection
Annotation Participation is high and Setup Effort
is low in this case, people can attract a lot of
participants by using much money to employ them to
do annotation tasks Annotation Participation is l
ow and Setup Effort is also low this case can hap
pen in situations of manual/in-house annotation li
ke people want to annotate small amount of domai
n data using a few expert annotators. Annotation
Participation is low and Setup Effort is high mea
ns that the annotation task in this case is so har
d and annotator are not interested in doing annota
tion tasks.
Data Selection is high and Setup Effort is low t
his case is quite difficult to happen because if
Data Selection is high then data used for annotat
ion can be large, setup effort cannot be low. Da
ta Selection is low and Setup Effort is high not
feasible in reality because Setup Effort is partia
lly based on organization, storage and manipulatio
n of data used in annotation tasks.
53Correlation Analysis
Relationships between Setup Effort vs. Quality of
Annotation and Annotation Participation and Data
Selection
In fact, these dimensions are not mutually affecti
ng.
Quality of Annotation is high and Setup Effort is
low this can be because of many reasons annotati
on tasks are simple to setup and also easy for ann
otators to do Quality of Annotation is low and Se
tup Effort is high the task can be difficult for
annotators to annotate.
54Correlation Analysis
Relationships between Quality of Annotation vs.
Annotation Participation and Quality of
Annotation vs. Data Selection
Quality of Annotation is high and Data Selection i
s low this can happen in annotation of data with
small amount. Quality of Annotation is low and Da
ta Selection is high data which is large can lead
to difficulty in annotation.
Annotation Participation is high and Quality of An
notation is low because of some reasons like tha
t annotation tasks are so difficult or Setup Effo
rt is not good so that annotators cannot do well.
Both Annotation Participation and Quality of Ann
otation are high means that people can employ a l
ot of expert annotators with high qualification, t
raining and supervision. Thus this leads to high Q
uality of Annotation. Annotation Participation i
s low and Quality of Annotation is high this case
cannot happen in reality.
55Future Directions
- Human Computation is an emerging research,
raising related possible research directions 5 - Theories about what makes some human computation
tasks fun and addictive - Active learning from imperfect human labelers
- Creation of intelligent bots in human computation
games - Cost versus reliability of labelers
56Conclusion
- A comprehensive introduction about human
computation frameworks (GWAP AMT) - Characterization of multiple dimensions
- Some additional correlation analysis
- Future directions relating to human computation
57Thanks for your attendance!
58References
- 1. Luis von Ahn et al. General Techniques for
Designing Games with a Purpose. CACM08 - 2. Vickery et al. Online Word Games for
Semantic Data Collection. EMNLP08 - 3. Sorokin et al. Utility data annotation with
Amazon Mechanical Turk. CVPRWS08 - 4. Snow et al. Cheap and Fast -- But is it
Good? Evaluating Non-Expert Annotations for
Natural Language Tasks. EMNLP08 - 5. http//www.hcomp2009.org