Title: Finding High-frequent Synonyms of a Domain-specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet
1Finding High-frequent Synonyms of a
Domain-specific Verb in English Sub-language of
MEDLINE Abstracts Using WordNet
- Chun Xiao and Dietmar Rösner
- Institut für Wissens-
- und Sprachverarbeitung (IWS),
- Faculty of Computer Science,University of
Magdeburg,39016 Magdeburg, Germany
2Introduction MEDLINE Abstract
- MEDLINE
- Domain clinical medicine, biomedicine,
biological and physical sciences - Source articles from over 4,600 journals
published throughout the world - Coverage abstracts are included for about 52 of
the articles. - PubMed, an application of UMLS (unified medical
language system), provides links within MEDLINE
to the full text of 15 clinical medical journals
. - Available at http//www.ncbi.nlm.nih.gov/PubMed/
3Available Resources in the Experiment
- The test corpus consists of 800 MEDLINE abstracts
extracted from the GENIA Corpus V3.0p and V3.01. - Available at http//www-tsujii.is.s.u-tokyo.ac.jp
/GENIA/ - WordNet 1.7.1
4Extraction of a Specific Relation
- Inhibitory relation
- Example Secreted from activated T cells and
macrophages, bone marrow-derived MIP-1
alpha/GOS19 inhibits primitive hematopoietic stem
cells and appears to be involved in the
homeostatic control of stem cell proliferation. - Semantic annotations in the GENIA corpus
- protein_molecule
- cell_type
5High-frequent Verbs in the Test Corpus
6Synonym Sets (Synsets) of Verb inhibit
- Synset in WordNet
- Sense 1
- suppress, stamp down, inhibit, subdue, conquer,
curb - gt control, hold in, hold, contain, check,
curb, moderate - Sense 2
- inhibit
- gt restrict, restrain, trammel, limit,
bound, confine, throttle - Synset in test corpus of MEDLINE abstracts
- Inhibit, block, prevent, etc.
7Problem
- Occurrences of verbs in the two synsets in the
test corpus of MEDLINE abstracts - WN-synonyms suppress (69), limit (16), restrict
(5) - non WN-synonyms block (124), reduce (119),
prevent(53) - How can WordNet synsets and information from the
corpus be combined to create domain-specific verb
synsets?
8Three Definitions
- Language unit a text segment (a sentence,
several sentences, or a paragraph, etc.) that
expresses one semantic topic. - Core word the verb, whose synset in the test
corpus is to be found out. E.g., in this test
inhibit is the core word. - Keyword the word, whose corresponding verb base
form is the core word. E.g., in this test
inhibitor, inhibiting, and so on are keywords.
9Example
- We performed an analysis of the
mechanisms by which two PKC inhibitors,
Calphostin C and Staurosporine, prevent the
FN-induced IL-1beta response. Both inhibitors
blocked the secretion of IL-1beta protein into
the media of peripheral blood mononuclear cells
exposed to FN. - Language unit two sentences
- Core word inhibit
- Keyword inhibitor (2 times)
- Local context searching window size gt3
- Verbs around the first keyword perform,
prevent, block, expose - Verbs around the second keyword prevent,
perform, block, expose - In the following test, the language unit is
selected to be the whole abstract.
10Idea Description
- Assumption
- The synonyms of a verb co-occur much more
frequently together with the keywords of the verb
than together with other words in the language
unit. - Method
- Thus the verb chunks around the keywords are
collected, from which the synonyms of the core
word will be selected and filtered, using WordNet
synset information.
- One resource
WordNet synset information
- The other resource
Local context information in
the test corpus
11Distribution of Keywords of inhibit in the Test
Corpus
12Verbs around the Keywords in the Test Corpus
13Method Description I
- Expansion of WordNet Synsets (Si)
- S1 the verb collection of synonyms of all
synonyms of the core word - S2 the verb collection of synonyms of all verbs
in S1 -
- Expansion of Stoplist (STOPk)
- STOP0 manually select 15 stop-verbs from the
high-frequent verbs in the test corpus (e.g.,
suggest, indicate, including the high-frequent
antonyms of the core word) - STOP1 the verb collection of synonyms of all
verbs in STOP0
14Method Description II
- Verb list from the corpus (Vj)
- Verbs around the keywords in a local context
of searching window size of j are collected. - Synonym candidate list (Sg)
- If a verb is in Vj and also in Si, but not in
STOPk, then add it to Sg.
15Evaluation
- Golden standard list (SG)
- A manually created synonym list, which is
extracted from the test corpus. - Consist of 10 verbs with the most frequent
occurrences, in which 3 verbs come directly from
the WordNet synset of inhibit, the rest 7 verbs
come from its hypernym set or the expanded list
of its synonyms. - Recall Precision
16Result
- 60 recall of SG ltgt 93.05 occurrences in the
test corpus
17Conclusions and Future Work
- Conclusions
- English sublanguage of MEDLINE abstract
- The core word and its keywords were
high-frequent - Multiword verb structures were not considered
yet - Balance between recall and precision expansion
of Si and STOPk should be limited. - Future works
- Consideration of other WordNet information
besides synsets - Automatic creation of stoplists
- Extraction of multiword verb structures
- Utilization of syntactic information.
18Thanks!
19Looking forward to your questions!
20(No Transcript)
21Possible Errors
- Errors of POS tags between
- Adjectives ltgt Past participles
- Errors of manual works when selecting stop-verbs
22Question or Hope
- Can WordNet provide the possibility for
accessing multiword expressions?