Evaluation metrics for Natural Language Processing - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Evaluation metrics for Natural Language Processing

Description:

Compute a best text' The resulting text. How good does the text reflect what we want to? ... Compute the metrics of this presentation. Evaluate human ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 22
Provided by: let9
Category:

less

Transcript and Presenter's Notes

Title: Evaluation metrics for Natural Language Processing


1
Evaluation metrics forNatural Language Processing
  • Richard Krooman

2
Evaluation
  • What is evaluation
  • NLP systems in a nutshell
  • NLP evaluations
  • Example metrics
  • Problems and solutions
  • Metrics
  • Human Metric

3
What is evaluation
  • Measure how well a system works
  • Wikipedia Evaluation is systematic
    determination of merit, worth, and significance
    of something or someone using criteria against a
    set of standards.

4
NLP Systems in a nutshell
  • Some text or data
  • Compute a best text
  • The resulting text
  • How good does the text reflect what we want to?

5
NLP evaluations
  • Create the result ourselves and compare
  • How good does the result show what we want to
    know from the input?
  • Is it understandable?
  • (in testing the correct result is known)
  • Is it well written?

6
Example metrics
  • A number describing a text
  • text
  • Generated text / Target text
  • words / text
  • correctly placed letters
  • Etc.

7
Problems and solutions
  • Problems
  • Solutions
  • Computing metrics is easy, interpreting them is
    hard
  • Humans dont think alike
  • Create a metric which evaluates results as a
    human would do
  • Take the average of many humans

8
Metrics overview
  • Simple String Accuracy
  • Generated String Accuracy
  • Simple Tree Accuracy
  • Generated Tree Accuracy
  • Human Metric

9
MetricsSimple String Accuracy
  • Compare the correct to the generated results on
    word level using
  • Insertions (I)
  • Deletions (D)
  • Substitutions (S)
  • Result

10
Min Edit Example
  • Is the same but on character level

11
Example
  • There was estimate for phase the second no cost
  • There was estimate for phase the second no (D)
  • There was estimate for phase the second phase (S)
  • There was estimate for (D) the second phase
  • There was no (I) estimate for the second phase
  • There was no cost (I) estimate for the second
    phase
  • There was no cost estimate for the second phase

12
MetricsGenerated String Accuracy
  • Compare the correct to the generated results on
    word level
  • Insertions
  • Deletions
  • Substitutions
  • Movements
  • Result

13
Article specific
  • Up to now everything can be applied for all NLP
    software
  • The rest of the presentation is about more
    problem specific metrics which can be possible
    to be applied in most cases
  • Article corpus is the Wall Street Journal

14
MetricsTree Based
15
MetricsTree Based
16
Create a human metric
  • Compute the metrics of this presentation
  • Evaluate human evaluations
  • Throw in statistics
  • Find a vector though each of the results of the
    metrics
  • Find which vector best matches with the human
    evaluations

17
Results of the metrics
18
Human evaluation
  • Take a corpus (WSJ)
  • Find a large group of humans
  • Select a paragraph (part) of the corpus
  • Give a few variants of the last line
  • Ask how well the lines are written

19
Human results
  • Web-based experiment
  • Give grades (understandability quality)
  • 3-5 generated lines
  • Original line
  • The original line in the WSJ
  • Understandability mean 0.8689
  • Quality mean 0.6639

20
Resulting Metrics
21
Questions?
Write a Comment
User Comments (0)
About PowerShow.com