Data Annotation using Human Computation - PowerPoint PPT Presentation


PPT – Data Annotation using Human Computation PowerPoint presentation | free to download - id: 45061-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Data Annotation using Human Computation


Example 3: Online word games. Captured from ... Natural Language Processing: Word Online Games (Categorilla, Categodzilla, and ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 59
Provided by: hoan6


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data Annotation using Human Computation

Data Annotation using Human Computation
  • HOANG Cong Duy Vu
  • 07/10/2009

  • Introduction
  • Data Annotation with GWAP
  • Data Annotation with AMT
  • Characterization of Multiple Dimensions
  • Correlation Analysis
  • Future Directions
  • Conclusion

  • Data annotation refers to the task of adding
    specific information to raw data
  • In computational linguistics, various information
    such as morphology, POS, syntax, semantics,
  • In computer vision, some information such as
    image labels, regions, video descriptions

  • Annotated data is extremely important for
    computational learning problems and training AI
  • But, also very non-trivial tasks to obtain due to
  • Ambiguity in processing information
  • Money/time consuming, labor-intensive and
    error-prone process

  • Motivated facts
  • Gaming data 1
  • Each day, more than 200 million hours spent
    playing games in the U.S
  • by age 21 of American, the average number of more
    than 10,000 hours playing games, equivalent to
    five years of working a full-time job 40 hours
    per week.
  • With explosion of web services, consider to take
    advantage of community popularity

How can we leverage this for annotation?
  • Human computation emerges as a viable synergy for
    data annotation.
  • Its main idea is to harness what humans are good
    at but machines are poor at.
  • Use the ability and speed of community solving
    some particular tasks
  • Computer programs can simultaneously be used for
    other purposes (e.g. educational entertainment)

  • What is human computation?
  • is a CS technique in which a computational
    process performs its function by outsourcing
    certain steps to humans. (from Wikipedia)
  • Also called as human-based computation, human
  • More general term Crowdsourcing
  • Typical frameworks
  • Game With A Purpose (GWAP)
  • Amazon Mechanical Turk (AMT)

Data Annotation with GWAP
  • GWAP - Game With A Purpose
  • Pioneered by Luis von Ahn at CMU in his PhD
    thesis in 2005
  • GWAPs are online games with special mechanism
  • Humans enjoy playing games provided by computers
  • Humans help computers do implicit annotation
    tasks integrated in such games

Data Annotation with GWAP
  • How GWAP works?

Developers build everything for both server and
clients Bots sometimes developers create bots
which play a role as real players since number of
players in GWAP are limited at any certain
time Players people who play game, pairwise
interaction GUI Graphical User Interface Data
sources data need to be annotated
Data Annotation with GWAP
  • Input-output mechanism of GWAP 1

Data Annotation with GWAP
  • Example 1 Image labeling

Captured from http//
Computer Vision Game
Data Annotation with GWAP
  • Example 2 Video description tagging

Captured from http//
Computer Vision Game
Data Annotation with GWAP
  • Example 3 Online word games

Captured from http//
Natural Language Processing Game
Data Annotation with GWAP
  • Recently, there are various GWAP games developed
    in wide range of AI domain
  • Computer Vision ESPGame, Peekaboom, TagATune,
    Google Image Labeler,
  • Semantic web OntoGame, Verbosity
  • Natural Language Processing Word Online Games
    (Categorilla, Categodzilla, and Free
    Association), Phrase Detectives

Data Annotation with GWAP
  • Results obtained so far
  • ESP Game Dataset 1 (CMU) 100,000 images with
    English labels from ESPGame (1.8Gb)
  • Online word games 2 (Stanford) 800,000 data
    instances for semantic processing
  • TagATune music data1 (CMU) 25863 clips, 5405
    source mp3 with 188 unique tags

1 from http//
Data Annotation with GWAP
  • Advantages
  • Free
  • People always love playing games
  • Fun, attractive and sometimes addictive
  • Disadvantages
  • Highly visual design for game requires much
  • Integration of annotation tasks into games is
    hard, equivalently to thinking up algorithms
  • Very hard to design GWAP games for complex
    processing tasks

Data Annotation with GWAP
  • Players feel fun and enjoy the games
  • More players play the game, more annotated data
    people obtain.
  • Question if games not fun, whether we can still
    attract much people to join?
  • Viable answer Amazon Mechanical Turk (AMT) ???

Data Annotation with AMT
  • AMT Amazon Mechanical Turk
  • one of the tools of Amazon Web Services
  • a wide-range marketplace for work
  • utilize human intelligence to generate tasks
    which computers are unable to do but humans can
    do effectively
  • Located at https//

Data Annotation with AMT
Captured from https//
Data Annotation with AMT
  • How AMT works?

Requesters will define tasks using GUI
interactive interface using APIs provided by AMT,
known as HITs (Human Intelligence Tasks) HIT
each HIT allows requesters to generate task
instructions, required qualifications, duration,
or reward by money. Broker web services playing
an intermediate role to supply, assist and unveil
everything Workers people who want to solve HIT
tasks to earn money
Data Annotation with AMT
  • An example about HIT

Data Annotation with AMT
  • Statistics related to AMT
  • The AMT service was initially launched publicly
    in 2005
  • According to report1 in March 2007, there were
    more than 100,000 workers in over one hundred

Why it can attract a lot of participants?
1 from http//
Data Annotation with AMT
  • It seems that AMT has wider range due to its ease
    and simplicity
  • Results obtained so far
  • Statistics from Amazon website
  • 69,452 HITs currently available
  • Some of them make annotated data public
  • Sorokin 3 (UIUC) 25000 annotated images with
    costs 800
  • Snow 4 (Stanford) linguistics annotation
    (WSD, temporal ordering, word similarity, )

Data Annotation with AMT
  • Advantages
  • For users
  • HITs are not so hard to solve
  • Easily earn money but still remain for relaxing
  • For developers
  • APIs provided by AMT help build HITs easily
  • Diverse demographics of users on Amazon website
  • hopefully obtain large-scale annotated data very
    quickly over time

Data Annotation with AMT
  • Disadvantages
  • Hard to control and maintain the tradeoff between
    data quantity and quality
  • Need effective strategies

Data Annotation with AMT
  • Example 1 Word Similarity

Captured from http//
NLP task
Data Annotation with AMT
  • Example 2 Image Labeling

Captured from http//
CV task
Characterization of Multiple Dimensions
  • Overview of the dimensions considered

- to create interfaces interacting with people
that participate in annotation process - should
be designed to ensure the objective of obtaining
large, clean and useful data
- discuss about the quality of annotation or
accuracy of annotated outputs
- Some factors relating to participants in
annotation process
- Need to figure out where and which data need to
be annotated
Characterization of Multiple Dimensions
  • Setup Effort
  • UI design/Visual impact
  • graphical characteristics in user interface
  • substantial factor that determines the efficiency
    of annotation process
  • GWAPs need much effort to focus mainly on GUI to
    make the game entertaining enough to motivate
  • But, AMT needs not much effort to build HIT tasks
    but it should be designed funnily easily
  • Scale - none/low/basic/average/distinctive/excelle

Characterization of Multiple Dimensions
  • Fun
  • very significant factor because simply players
    will not join if GWAP games have no fun
  • make fun design algorithms
  • Some ways
  • Timing (GWAP AMT)
  • Scores, top scores, top players (GWAP)
  • Levels (GWAP AMT)
  • Money (AMT)
  • Scale - none/low/fair/high/very high

Characterization of Multiple Dimensions
  • Payment
  • To make annotation process more motivated
  • For example,
  • In GWAP, player pairs are raised scores
  • In AMT, workers have monetary payment or bonus
    from requesters
  • Scale - none/score/monetary payment

Characterization of Multiple Dimensions
  • Cheating
  • Sometimes, unmotivated and lazy participants use
    some tricks when doing annotation tasks
  • Some ways to avoid
  • filter players by using IP address, locations,
    training (GWAP)
  • Use qualification (AMT)
  • Scale - none/low/possible/high (-)

Characterization of Multiple Dimensions
  • Implementation Cost
  • Various costs
  • Designing annotation tasks
  • Creation of timing controllers
  • Game mechanism (online or offline)
  • Network infrastructure (client/server,
  • Record and statistics (user scores, player skill,
  • Building intelligent bots
  • Scale - none/low/fair/high/very high

Characterization of Multiple Dimensions
  • Exposure
  • Relating to high social impacts, letting people
    know is very important
  • GWAP must itself do this by popularizing on
    social webs, contributor sites and gaming sites
  • AMT under umbrella of web service of Amazon sites
    -gt higher impact
  • Scale - none/low/fair/high

Characterization of Multiple Dimensions
  • Centralization
  • measures whether there is a single entity or
    owner that defines which tasks are being
    presented to workers
  • In case of GWAP games1, there are currently 5
    games right now. For AMT, anyone can define their
    own tasks for their evaluation purpose
  • Scale - yes/no (-)

Characterization of Multiple Dimensions
  • Scale
  • metric of how many tasks the system will be able
    to accomplish
  • GWAP can produce extremely volumes of data,
    because the operating costs are low
  • AMT scales really well, but it costs money
  • For example
  • if we have many millions of tasks to accomplish,
    GWAP is a better approach
  • At 10,000 tasks, AMT will do well (and requires
    less effort to setup and less effort to approve
    submitted tasks)
  • Scale - none/low/fair/high/very high

Characterization of Multiple Dimensions
  • Annotation participation
  • Number of participants
  • utilize people at different skills to improve
    diversity of quality of annotation data
  • A small study6 indicated that demographics of AMT
    currently correlated with demographics of
    Internet users
  • Scale - none/low/fair/high/very high

6 http//
Characterization of Multiple Dimensions
  • Motivation
  • Exhibit the attractiveness of annotation systems
  • Some of reasons
  • for money
  • for entertainment/fun
  • for killing free time
  • for challenge/self-competition,
  • Scale - none/low/fair/high

Characterization of Multiple Dimensions
  • Interaction
  • Different ways to create interaction of
  • Scale - none/multiple without interaction/multiple
    with pair-wise interaction/multiple with
    multiple interaction

Characterization of Multiple Dimensions
  • Qualification
  • limit required workers to ensure that only
    qualified workers can do these tasks
  • Scale - none/low/fair/high

Characterization of Multiple Dimensions
  • Data Selection
  • Size
  • choose which data resources will be annotated
  • Scale - none/small/fair/large/very large

Characterization of Multiple Dimensions
  • Coverage
  • Coverage would mean whether the data covers the
    expected real population and distribution of data
  • Scale - none/low/fair/high

Characterization of Multiple Dimensions
  • Quality of Annotation
  • Annotation accuracy
  • Use different strategies to control the quality
    of annotation
  • Use repetition which is the process that does not
    consider an output to be correct until a certain
    number of players have entered it
  • use the post-processing steps to re-evaluate the
    annotated data
  • Scale - none/low/fair/high/very high

Characterization of Multiple Dimensions
  • Inter-annotator agreement
  • The inter-agreement means for measuring agreement
    among data annotators
  • Scale - none/low/fair/high

Characterization of Multiple Dimensions
  • Quality control
  • filter the bad data out, integrate correction
    model to minimize errors during annotation
  • For example
  • In AMT, developers will approve all submitted HIT
    tasks, use the voting threshold to approve the
  • In GWAP, check all data contributed by players
    just after a fixed time
  • Scale - none/low/fair/high/very high

Characterization of Multiple Dimensions
  • Usability
  • annotated data should be proved to be useful and
    have a real world impact
  • Scale - none/low/fair/high

Characterization of Multiple Dimensions
  • Annotation Speed
  • measure how many labels per day/hour/minute
    people can obtain
  • Scale - none/slow/fair/fast

Characterization of Multiple Dimensions
  • Annotation Cost
  • measure total cost to be paid to get annotated
  • Scale - none/cheap/fair/expensive

Correlation Analysis
  • To analyze correlation between dimensions
  • Collect info of available human computation
    systems so far
  • 28 popular systems with 4 types of human

Correlation Analysis
  • Human computation systems

Correlation Analysis
 Pearson correlation matrix between dimensions
Statistics of rating of human computation systems 
according to four dimensions
Correlation Analysis
Relationships between Setup Effort vs. Annotation
Participation and Data Selection
Annotation Participation is high and Setup Effort 
is low in this case, people can attract a lot of 
participants by using much money to employ them to
 do annotation tasks Annotation Participation is l
ow and Setup Effort is also low this case can hap
pen in situations of manual/in-house annotation li
ke  people want to annotate small amount of domai
n data using a few expert annotators.  Annotation 
Participation is low and Setup Effort is high mea
ns that the annotation task in this case is so har
d and annotator are not interested in doing annota
tion tasks.
Data Selection is high and Setup Effort is low  t
his case is quite difficult to happen because if
 Data Selection is high then data used for annotat
ion can be large, setup effort cannot be low.  Da
ta Selection is low and Setup Effort is high not 
feasible in reality because Setup Effort is partia
lly based on organization, storage and manipulatio
n of data used in annotation tasks. 
Correlation Analysis
Relationships between Setup Effort vs. Quality of
Annotation and Annotation Participation and Data
In fact, these dimensions are not mutually affecti
Quality of Annotation is high and Setup Effort is 
low this can be because of many reasons annotati
on tasks are simple to setup and also easy for ann
otators to do Quality of Annotation is low and Se
tup Effort is high the task can be difficult for 
annotators to annotate. 
Correlation Analysis
Relationships between Quality of Annotation vs.
Annotation Participation and Quality of
Annotation vs. Data Selection
Quality of Annotation is high and Data Selection i
s low this can happen in annotation of data with 
small amount. Quality of Annotation is low and Da
ta Selection is high data which is large can lead
 to difficulty in annotation. 
Annotation Participation is high and Quality of An
notation is low  because of some reasons like tha
t annotation tasks are so difficult or Setup Effo
rt is not good so that annotators cannot do well. 
Both Annotation Participation and Quality of Ann
otation are high means that people can employ a l
ot of expert annotators with high qualification, t
raining and supervision. Thus this leads to high Q
uality of Annotation.  Annotation Participation i
s low and Quality of Annotation is high this case
 cannot happen in reality.
Future Directions
  • Human Computation is an emerging research,
    raising related possible research directions 5
  • Theories about what makes some human computation
    tasks fun and addictive
  • Active learning from imperfect human labelers
  • Creation of intelligent bots in human computation
  • Cost versus reliability of labelers

  • A comprehensive introduction about human
    computation frameworks (GWAP AMT)
  • Characterization of multiple dimensions
  • Some additional correlation analysis
  • Future directions relating to human computation

Thanks for your attendance!
  • Questions are welcome!

  • 1. Luis von Ahn et al. General Techniques for
    Designing Games with a Purpose. CACM08
  • 2. Vickery et al. Online Word Games for
    Semantic Data Collection. EMNLP08
  • 3. Sorokin et al. Utility data annotation with
    Amazon Mechanical Turk. CVPRWS08
  • 4. Snow et al. Cheap and Fast -- But is it
    Good? Evaluating Non-Expert Annotations for
    Natural Language Tasks. EMNLP08
  • 5. http//