GATE Evaluation Tools - PowerPoint PPT Presentation

About This Presentation
Title:

GATE Evaluation Tools

Description:

... crews readied the space shuttle Endeavour for launch on a Japanese satellite ... Endeavour, with an international crew of six, was set to blast off from the ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 20
Provided by: Dia571
Category:

less

Transcript and Presenter's Notes

Title: GATE Evaluation Tools


1
GATE Evaluation Tools
  • GATE Training Course
  • October 2006
  • Kalina Bontcheva

2
System development cycle
  1. Collect corpus of texts
  2. Annotate manually gold standard
  3. Develop system
  4. Evaluate performance
  5. Go back to step 3, until desired performance is
    reached

3
Corpora and System Development
  • Gold standard data created by manual annotation
  • Corpora are divided typically into a training and
    testing portion
  • Rules and/or learning algorithms are developed or
    trained on the training part
  • Tuned on the testing portion in order to optimise
  • Rule priorities, rules effectiveness, etc.
  • Parameters of the learning algorithm and the
    features used (typical routine 10-fold cross
    validation)
  • Evaluation set the best system configuration is
    run on this data and the system performance is
    obtained
  • No further tuning once evaluation set is used!

4
Some NE Annotated Corpora
  • MUC-6 and MUC-7 corpora - English
  • CONLL shared task corpora http//cnts.uia.ac.be/co
    nll2003/ner/ - NEs in English and
    Germanhttp//cnts.uia.ac.be/conll2002/ner/ -
    NEs in Spanish and Dutch
  • TIDES surprise language exercise (NEs in Cebuano
    and Hindi)
  • ACE English - http//www.ldc.upenn.edu/Projects/
    ACE/

5
Some NE Annotated Corpora
  • MUC-6 and MUC-7 corpora - English
  • CONLL shared task corpora http//cnts.uia.ac.be/co
    nll2003/ner/ - NEs in English and
    Germanhttp//cnts.uia.ac.be/conll2002/ner/ -
    NEs in Spanish and Dutch
  • TIDES surprise language exercise (NEs in Cebuano
    and Hindi)
  • ACE English - http//www.ldc.upenn.edu/Projects/
    ACE/

6
The MUC-7 corpus
  • 100 documents in SGML
  • News domain
  • Named Entities
  • 1880 Organizations (46)
  • 1324 Locations (32)
  • 887 Persons (22)
  • Inter-annotator agreement very high (97)
  • http//www.itl.nist.gov/iaui/894.02/related_projec
    ts/muc/proceedings/muc_7_proceedings/marsh_slides.
    pdf

7
The MUC-7 Corpus (2)
  • ltENAMEX TYPE"LOCATION"gtCAPE CANAVERALlt/ENAMEXgt,
    ltENAMEX TYPE"LOCATION"gtFla.lt/ENAMEXgt MD
    Working in chilly temperatures ltTIMEX
    TYPE"DATE"gtWednesdaylt/TIMEXgt ltTIMEX
    TYPE"TIME"gtnightlt/TIMEXgt, ltENAMEX
    TYPE"ORGANIZATION"gtNASAlt/ENAMEXgt ground crews
    readied the space shuttle Endeavour for launch on
    a Japanese satellite retrieval mission.
  • ltpgt
  • Endeavour, with an international crew of six, was
    set to blast off from the ltENAMEX
    TYPE"ORGANIZATIONLOCATION"gtKennedy Space
    Centerlt/ENAMEXgt on ltTIMEX TYPE"DATE"gtThursdaylt/TI
    MEXgt at ltTIMEX TYPE"TIME"gt418 a.m. ESTlt/TIMEXgt,
    the start of a 49-minute launching period. The
    ltTIMEX TYPE"DATE"gtnine daylt/TIMEXgt shuttle
    flight was to be the 12th launched in darkness.

8
ACE Towards Semantic Tagging of Entities
  • MUC NE tags segments of text whenever that text
    represents the name of an entity
  • In ACE (Automated Content Extraction), these
    names are viewed as mentions of the underlying
    entities. The main task is to detect (or infer)
    the mentions in the text of the entities
    themselves
  • Rolls together the NE and CO tasks
  • Domain- and genre-independent approaches
  • ACE corpus contains newswire, broadcast news (ASR
    output and cleaned), and newspaper reports (OCR
    output and cleaned)

9
ACE Entities
  • Dealing with
  • Proper names e.g., England, Mr. Smith, IBM
  • Pronouns e.g., he, she, it
  • Nominal mentions the company, the spokesman
  • Identify which mentions in the text refer to
    which entities, e.g.,
  • Tony Blair, Mr. Blair, he, the prime minister, he
  • Gordon Brown, he, Mr. Brown, the chancellor

10
ACE Example
  • ltentity ID"ft-airlines-27-jul-2001-2"
  • GENERIC"FALSE"
  • entity_type "ORGANIZATION"gt
  • ltentity_mention ID"M003"
  • TYPE "NAME"
  • string "National Air
    Traffic Services"gt
  • lt/entity_mentiongt
  • ltentity_mention ID"M004"
  • TYPE "NAME"
  • string "NATS"gt
  • lt/entity_mentiongt
  • ltentity_mention ID"M005"
  • TYPE "PRO"
  • string "its"gt
  • lt/entity_mentiongt
  • ltentity_mention ID"M006"
  • TYPE "NAME"
  • string "Nats"gt
  • lt/entity_mentiongt

11
Annotate Gold Standard Manual Annotation in
GATE GUI
12
Ontology-Based Annotation (coming in GATE 4.0)
13
Two GATE evaluation tools
  • AnnotationDiff
  • Corpus Benchmark Tool

14
AnnotationDiff
  • Graphical comparison of 2 sets of annotations
  • Visual diff representation, like tkdiff
  • Compares one document at a time, one annotation
    type at a time
  • Gives scores for precision, recall, F_measure
    etc.

15
Annotation Diff
16
Corpus Benchmark Tool
  • Compares annotations at the corpus level
  • Compares all annotation types at the same time,
    i.e. gives an overall score, as well as a score
    for each annotation type
  • Enables regression testing, i.e. comparison of 2
    different versions against gold standard
  • Visual display, can be exported to HTML
  • Granularity of results user can decide how much
    information to display
  • Results in terms of Precision, Recall, F-measure

17
Corpus structure
  • Corpus benchmark tool requires particular
    directory structure
  • Each corpus must have a clean and marked
    directory
  • Clean holds the unannotated version, while marked
    holds the marked (gold standard) ones
  • There may also be a processed subdirectory this
    is a datastore (unlike the other two)
  • Corresponding files in each subdirectory must
    have the same name

18
How it works
  • Clean, marked, and processed
  • Corpus_tool.properties must be in the directory
    from where gate is executed
  • Specifies configuration information about
  • What annotation types are to be evaluated
  • Threshold below which to print out debug info
  • Input set name and key set name
  • Modes
  • Default regression testing
  • Human marked against already stored, processed
  • Human marked against current processing results

19
Corpus Benchmark Tool
Write a Comment
User Comments (0)
About PowerShow.com