New Logo Here - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

New Logo Here

Description:

Lexical Tools for UMLS Developers. Lexical Tools for UMLS Developers ... stative {base=aspirate. entry=E0010803. cat=verb. variants=reg. tran=np ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 50
Provided by: Div62
Category:
Tags: here | logo | new | stative

less

Transcript and Presenter's Notes

Title: New Logo Here


1
        T25 Lexical Tools for UMLS
Developers     11/10/2002 800 AM
New Logo Here!
 
2
Lexical Tools for UMLS Developers
November 10, 2002 Allen C. Browne, Guy Divita,
Chris Lu Lister Hill National Center for
Biomedical Communications National Library of
Medicine
3
(No Transcript)
4
Lexical Tools for UMLS Developers
  • The SPECIALIST lexicon Browne
  • The lexical tools Divita/Lu
  • Coffee Break 1000 - 1030
  • Lexical tools cont. Divita/Lu

5
Text processing
Lexical tools
SPECIALIST LEXICON
6
The SPECIALIST Lexicon
  • A syntactic lexicon
  • Biomedical and general English
  • Over 180,000 records

7
The SPECIALIST Lexicon
  • General English
  • 10,000 most frequent words from the American
    Heritage word frequency list
  • 2,000 words used by Longmans Dictionary of
    Contemporary English
  • Verbs and adjectives identified by heuristics

8
Lexicon Growth
9
George A. Miller The Science of Words 1991
10
The SPECIALIST Lexicon
  • Morphology
  • Inflection
  • Derivation
  • Orthography
  • Spelling variants
  • Syntax
  • Complementation for verbs, nouns, and adjectives

11
Morphology
  • Inflectional
  • nucleus -- nuclei
  • cauterize, cauterizes, cauterized, cauterizing
  • red, redder reddest
  • Derivational
  • laryngeal -- larynx
  • transport -- transportation

12
Orthography
13
Orthography
Spelling Variation
  • align -- aline
  • Graves disease -- Gravess disease -- Graves
    disease
  • anesthetize -- anaesthetise
  • esophagus -- oesophagus

14
British and American Spelling
  • Criticise -- criticize
  • naturalise --naturalize
  • centre -- center
  • foetus -- fetus

15
Syntax -- Verb Complements
  • Intran
  • Ill treat.
  • trannp
  • He treated the patient.
  • ditrannp,pphr(with,np)
  • She treated the patient with the drug.

16
Syntax -- Verb Complements
basetreat entryE0061964 catverb variantsreg
intran trannp tranpphr(with,np) tranpphr(o
f,np) ditrannp,pphr(to,np) ditrannp,pphr(with,
np) ditrannp,pphr(for,np) cplxtrannp,advbl no
minalizationtreatmentnounE0061968
17
The 2003 SPECIALIST Lexicon
18
(No Transcript)
19
Lexicon Unit Records
basechronic entryE0016869 catadj variantsin
v positionattrib(1) positionpred stative
baseKaposi's sarcoma spelling_variantKaposi
sarcoma entryE0003576 catnoun variantsuncount
variantsreg variantsglreg
baseaspirate entryE0010803 catverb
variantsreg trannp
nominalizationaspirationnounE0010804
basein entryE0033870 catprep
20
Noun Variants
  • Kaposis sarcoma
  • Kaposis sarcomas
  • Kaposis sarcomata
  • Kaposi sarcoma
  • Kaposi sarcomas
  • Kaposi sarcomata

baseKaposi's sarcoma spelling_variantKaposi
sarcoma entryE0003576 catnoun variantsuncount
variantsreg variantsglreg
21
Regular Nouns
The plural suffix is s. y becomes ie following a
consonant before s. e is inserted before s if the
base ends in s, z, x, ch, or s
22
Regular Nouns
23
Greco-latin Regular nouns
24
Uncount Nouns(abstract or mass)
basesmallpox entryE0056359 catnoun variants
uncount basepotassium entryE0049387 catnoun
variantsuncount
  • a smallpox
  • two smallpoxes
  • much smallpox
  • a potassium
  • two potassiums
  • much potassium

25
Fixed Plural Nouns
basescissors entryE0054633 catnoun
variantsplur
basepolice entryE0048616 catnoun variantspl
ur
26
Irregular Nouns
baselarynx entryE0036919 catnoun
variantsirreglarynges variantsreg
basecorpus entryE0019113 catnoun
variantsirregcorpora variantsreg
27
Regular Verbs
  • The third person present tense suffix is s.
  • y becomes ie following a consonant before s.
  • e is inserted between z, x, ch, or sh and s.
  • The past tense suffix is ed.
  • y becomes ie following a consonant before ed.
  • Final e is deleted before ed.

28
Regular Verbs
  • dismissdismisses, dismissed, dismissing
  • agree agrees agreed agreeing
  • dry dries, dried, drying

29
Regular Doubling Verbs
  • End in a CVC pattern
  • Double the final consonant before ed and ing.
  • Are otherwise regular
  • variantsregd
  • e.g. control controls, controlled, controlling

30
Irregular Verbs
basedive catverb variantsreg
variantsirregdivesdovedovediving
intran intranpart(in) ...
31
Dive vs. Dove
32
Regular Adjectives and Adverbs
  • The comparative suffix is er.
  • The superlative suffix is est.
  • y become ie after a consonant before er or est.
  • Final e is deleted before er or est.
  • e.g. green greener, greenest

33
Regular Doubling Adjectives and Adverbs
  • CVC final pattern
  • Final consonant is doubled before ed or est.
  • Otherwise regular
  • e.g. red redder, reddest

34
Ancillary Data Bases
  • Synonymy
  • sm.db
  • Derivation
  • dm.db, dm.rules
  • Inflection
  • im.rules
  • Neoclassical compounds
  • nc.db

35
Derivational Facts and Rules
dm.facts treatmentnountreatverb prohibitionno
unprohibitiveadj cell lineagenouncell
linenoun photochemotherapeuticadjphotochemother
apynoun pharmacotherapeuticadjpharmacotherapyn
oun
36
Derivational Facts and Rules
dm.rules e.g. alienationalienate ationnouna
teverb rationrate stationstate
37
Inflectional Facts and Rules
im.rules Noun rules (glreg) usnounsingular
inounplural antusanti manounsingularm
atanounplural anounsingularaenounplural
umnounsingularanounplural onnounsingular
anounplural sisnounsingularsesnounplura
l isnounsingularidesnounplural mennounsi
ngularminanounplural exnounsingularicesn
ounplural xnounsingularcesnounplural
38
Neoclassical compounds
nc.db abdomin(o)abdomenroot abaway
fromprefix acanth(o)prickleroot acar(o)mitero
ot acetabul(o)acetabulumroot adtowardsprefix a
gogueinducingterminal albumin(o)albuminroot si
sconditionterminal stomysurgical
openingterminal
39
Synonyms
sm.db alaradjwingnoun amygdalineadjtonsilno
un articularadjjointnoun bulbaradjmedulla
oblongatanoun fununcularadjboilnoun genicular
adjkneenoun hepatocellularadjliver
cellsnoun lazaradjleprosynoun lenticularadjc
rystalline lensnoun ypsiliformadjupsiloidadj w
olframnountungstennoun double
visionnoundiplopianoun
40
Relational Tables
  • One line records
  • Pipe separated Fields --
  • Keyed to EUI
  • LRAGR matches forms to EUIs
  • Word index LRWD

41
Relational Tables
  • LRAGR - Agreement
  • LRCMP - Complements
  • LRFIL - Files
  • LRFLD - Fields
  • LRMOD - Modification
  • LRNOM - Nominalization
  • LRPRN - Pronouns
  • LRPRP - Properties
  • LRSPL - Spelling
  • LRTRM - Trademarks
  • LRWD - Word index

42
LRAGR
Agreement and Inflection
  • EUI - Entry ID
  • STR - Inflected form
  • SCA - Syntactic category
  • AGR - agreement information
  • BAS - Base form (morphological)
  • CIT - Citation form (base)

43
LRAGR
E0003576Kaposi sarcomasnouncount(thr_plur)Kapo
si sarcomaKaposi's sarcoma E0003576Kaposi
sarcomatanouncount(thr_plur)Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi
sarcomanouncount(thr_sing)Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi
sarcomanoununcount(thr_sing)Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomasnouncount(thr_plur)Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomatanouncount(thr_plur)Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomanouncount(thr_sing)Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomanoununcount(thr_sing)Kaposi's
sarcomaKaposi's sarcoma
44
Number Words
  • one, thirteen fifty, thousand, million
  • Not in the lexicon.
  • No part of speech
  • Used to construct number expressions
  • Three thousand eight hundred and five
  • To be released in the 2003 lexicon.
  • Accompanying number tools.

45
basetwo catnumber_word entryN0000003
variantsecondordinal variantseconddenom
inator,singularpart_denominator
variantseconddenominator,pluralpart_denominator
varianthalfdenominator,singularfull_denomi
nator varianthalvesdenominator,pluralfull_d
enominator number_typeunit value2
digit2
46
basetwenty catnumber_word
entryN0000021 variantsreg
number_typedecade value20 digit2
basetwelve catnumber_word
entryN0000013 variantsreg
number_typeteen value12
basebillion catnumber_word
entryN0000032 variantsreg
number_typemagnitude power3
basesexdecillion catnumber_word
entryN0000046 variantsreg
number_typemagnitude power17
47
Text processing
Lexical tools
SPECIALIST LEXICON
48
Lexical Tools
  • Wordind -- breaks strings into words
  • Produces the Metathesaurus word indexes (MRXW)
  • LVG -- performs various lexical transformations
  • NORM -- a selection of LVG transformations,
  • Used for Metathesaurus indexing
  • Produces the Metathesaurus Normalized word and
    string indexes (MRXNW MRXNS)
  • Used to access those indexes

49
Normalization
  • Hodgkin Disease
  • HODGKINS DISEASE
  • Hodgkin's Disease
  • Disease, Hodgkin's
  • HODGKIN'S DISEASE
  • Hodgkin's disease
  • Hodgkins Disease
  • Hodgkin's disease NOS
  • Hodgkin's disease, NOS
  • Disease, Hodgkins
  • Diseases, Hodgkins
  • Hodgkins Diseases
  • Hodgkins disease
  • hodgkin's disease
  • DiseaseHodgkins
  • Disease, Hodgkin
  • disease hodgkin
Write a Comment
User Comments (0)
About PowerShow.com