You Can - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

You Can

Description:

Collocation Extraction. Extract idioms 'kick the bucket' Domain ... Collocation ... Collocation Extraction Results. Automatic Term Recognition Results ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 19
Provided by: Aqua2
Category:

less

Transcript and Presenter's Notes

Title: You Can


1
You Cant Beat Frequency (Unless You Use
Linguistic Knowledge) A Qualitative Evaluation
of Association Measures for Collocation and Term
Extraction
  • Joachim Wermter and Udo Hahn
  • Jena University
  • ACL 2006 Regular Conference Paper

2
Objective
  • Compare the performance of frequency, t-test, LSM
    and LPM methods on collocation extraction and
    domain-specific automatic term recognition

3
Collocation Extraction
  • Extract idioms
  • kick the bucket

4
Domain-Specific Term Extraction
  • Extract domain-specific phrases
  • mitochondrial inheritance

5
Corpus
6
LSM
  • A linguistic knowledge-based method for
    collocation extraction proposed by the same
    authors in another paper
  • Assumes that idioms are less modifiable by
    supplements
  • e.g. kick the beautiful bucket
  • probability of PNVtriple having Suppk
  • f(x) frequency of x

7
LSM
  • Modifiability of a PNVtriple
  • Probability of a PNVtriple
  • Collocation Score

8
LPM
  • A linguistic knowledge-based method for
    automatic term recognition proposed by the same
    authors in another paper
  • Assumes that words in a phrase are less
    interchangeable
  • e.g mitochondrion inheritance ? money inheritance
  • Modifiability of a phrase
  • modk(n-gram) replace k words
  • seli particular replacement

9
LPM
  • Phrase Score

10
Evaluation Criteria
  • Compared to the baseline frequency ranking
    method, a good ranking function should have the
    four characteristics
  • Keep the true positives in the upper portion of
    the list
  • Keep the true negatives in the lower portion of
    the list
  • Demote true negatives from the upper portion
  • Promote true positives from the lower portion

11
Collocation Extraction Results
12
Automatic Term Recognition Results
13
Observations
  • CE Criterion 1
  • t-test and frequency methods have similar
    performance
  • LSM promotes some TPs to top 1/6
  • ATR Criterion 1
  • t-test and frequency methods have similar
    performance
  • LPM promotes a few TPs to top 1/6

14
Observations
  • CE Criterion 2
  • LSM promotes a lot more TNs to upper portion than
    t-test method (bad)
  • ATR Criterion 2
  • Same as above

15
Observations
  • CE Criterion 3
  • LSM demotes a lot more TNs to the lower portion
    than t-test
  • ATR Criterion 3
  • Same as above

16
Observations
  • CE Criterion 4
  • LSM promotes more TPs to upper portion than
    t-test
  • ATR Criterion 4
  • Same as above

17
(No Transcript)
18
Conclusion
  • LSM and LPM methods are better than t-test and
    frequency methods
  • Pure statistics methods are worse than
    knowledge-based methods
Write a Comment
User Comments (0)
About PowerShow.com