Finding High-frequent Synonyms of a Domain-specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet - PowerPoint PPT Presentation

About This Presentation
Title:

Finding High-frequent Synonyms of a Domain-specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet

Description:

Finding High-frequent Synonyms of a Domain-specific Verb in English Sub-language ... restrict, restrain, trammel, limit, bound, confine, throttle ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 23
Provided by: fiM2
Category:

less

Transcript and Presenter's Notes

Title: Finding High-frequent Synonyms of a Domain-specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet


1
Finding High-frequent Synonyms of a
Domain-specific Verb in English Sub-language of
MEDLINE Abstracts Using WordNet
  • Chun Xiao and Dietmar Rösner
  • Institut für Wissens-
  • und Sprachverarbeitung (IWS),
  • Faculty of Computer Science,University of
    Magdeburg,39016 Magdeburg, Germany

2
Introduction MEDLINE Abstract
  • MEDLINE
  • Domain clinical medicine, biomedicine,
    biological and physical sciences
  • Source articles from over 4,600 journals
    published throughout the world
  • Coverage abstracts are included for about 52 of
    the articles.
  • PubMed, an application of UMLS (unified medical
    language system), provides links within MEDLINE
    to the full text of 15 clinical medical journals
    .
  • Available at http//www.ncbi.nlm.nih.gov/PubMed/

3
Available Resources in the Experiment
  • The test corpus consists of 800 MEDLINE abstracts
    extracted from the GENIA Corpus V3.0p and V3.01.
  • Available at http//www-tsujii.is.s.u-tokyo.ac.jp
    /GENIA/
  • WordNet 1.7.1

4
Extraction of a Specific Relation
  • Inhibitory relation
  • Example Secreted from activated T cells and
    macrophages, bone marrow-derived MIP-1
    alpha/GOS19 inhibits primitive hematopoietic stem
    cells and appears to be involved in the
    homeostatic control of stem cell proliferation.
  • Semantic annotations in the GENIA corpus
  • protein_molecule
  • cell_type

5
High-frequent Verbs in the Test Corpus
6
Synonym Sets (Synsets) of Verb inhibit
  • Synset in WordNet
  • Sense 1
  • suppress, stamp down, inhibit, subdue, conquer,
    curb
  • gt control, hold in, hold, contain, check,
    curb, moderate
  • Sense 2
  • inhibit
  • gt restrict, restrain, trammel, limit,
    bound, confine, throttle
  • Synset in test corpus of MEDLINE abstracts
  • Inhibit, block, prevent, etc.

7
Problem
  • Occurrences of verbs in the two synsets in the
    test corpus of MEDLINE abstracts
  • WN-synonyms suppress (69), limit (16), restrict
    (5)
  • non WN-synonyms block (124), reduce (119),
    prevent(53)
  • How can WordNet synsets and information from the
    corpus be combined to create domain-specific verb
    synsets?

8
Three Definitions
  • Language unit a text segment (a sentence,
    several sentences, or a paragraph, etc.) that
    expresses one semantic topic.
  • Core word the verb, whose synset in the test
    corpus is to be found out. E.g., in this test
    inhibit is the core word.
  • Keyword the word, whose corresponding verb base
    form is the core word. E.g., in this test
    inhibitor, inhibiting, and so on are keywords.

9
Example
  • We performed an analysis of the
    mechanisms by which two PKC inhibitors,
    Calphostin C and Staurosporine, prevent the
    FN-induced IL-1beta response. Both inhibitors
    blocked the secretion of IL-1beta protein into
    the media of peripheral blood mononuclear cells
    exposed to FN.
  • Language unit two sentences
  • Core word inhibit
  • Keyword inhibitor (2 times)
  • Local context searching window size gt3
  • Verbs around the first keyword perform,
    prevent, block, expose
  • Verbs around the second keyword prevent,
    perform, block, expose
  • In the following test, the language unit is
    selected to be the whole abstract.

10
Idea Description
  • Assumption
  • The synonyms of a verb co-occur much more
    frequently together with the keywords of the verb
    than together with other words in the language
    unit.
  • Method
  • Thus the verb chunks around the keywords are
    collected, from which the synonyms of the core
    word will be selected and filtered, using WordNet
    synset information.
  • One resource
    WordNet synset information
  • The other resource
    Local context information in
    the test corpus

11
Distribution of Keywords of inhibit in the Test
Corpus
12
Verbs around the Keywords in the Test Corpus
13
Method Description I
  • Expansion of WordNet Synsets (Si)
  • S1 the verb collection of synonyms of all
    synonyms of the core word
  • S2 the verb collection of synonyms of all verbs
    in S1
  • Expansion of Stoplist (STOPk)
  • STOP0 manually select 15 stop-verbs from the
    high-frequent verbs in the test corpus (e.g.,
    suggest, indicate, including the high-frequent
    antonyms of the core word)
  • STOP1 the verb collection of synonyms of all
    verbs in STOP0

14
Method Description II
  • Verb list from the corpus (Vj)
  • Verbs around the keywords in a local context
    of searching window size of j are collected.
  • Synonym candidate list (Sg)
  • If a verb is in Vj and also in Si, but not in
    STOPk, then add it to Sg.

15
Evaluation
  • Golden standard list (SG)
  • A manually created synonym list, which is
    extracted from the test corpus.
  • Consist of 10 verbs with the most frequent
    occurrences, in which 3 verbs come directly from
    the WordNet synset of inhibit, the rest 7 verbs
    come from its hypernym set or the expanded list
    of its synonyms.
  • Recall Precision

16
Result
  • 60 recall of SG ltgt 93.05 occurrences in the
    test corpus

17
Conclusions and Future Work
  • Conclusions
  • English sublanguage of MEDLINE abstract
  • The core word and its keywords were
    high-frequent
  • Multiword verb structures were not considered
    yet
  • Balance between recall and precision expansion
    of Si and STOPk should be limited.
  • Future works
  • Consideration of other WordNet information
    besides synsets
  • Automatic creation of stoplists
  • Extraction of multiword verb structures
  • Utilization of syntactic information.

18
Thanks!
19
Looking forward to your questions!
20
(No Transcript)
21
Possible Errors
  • Errors of POS tags between
  • Adjectives ltgt Past participles
  • Errors of manual works when selecting stop-verbs

22
Question or Hope
  • Can WordNet provide the possibility for
    accessing multiword expressions?
Write a Comment
User Comments (0)
About PowerShow.com