The SPECIALIST Lexicon and Lexical Tools - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

The SPECIALIST Lexicon and Lexical Tools

Description:

10,000 most frequent words from the American Heritage word frequency list ... Dictionary ology ist. Orthography. align -- aline ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 50
Provided by: Div62
Category:

less

Transcript and Presenter's Notes

Title: The SPECIALIST Lexicon and Lexical Tools


1
The SPECIALIST Lexicon and Lexical Tools
  • Allen Browne
  • Guy Divita
  • Chris Lu

2
(No Transcript)
3
Text processing
Lexical tools
SPECIALIST LEXICON
4
The SPECIALIST Lexicon
  • A syntactic lexicon
  • Biomedical and general English
  • Over 180,000 records

5
The SPECIALIST Lexicon
  • General English
  • 10,000 most frequent words from the American
    Heritage word frequency list
  • 2,000 words used by Longmans Dictionary of
    Contemporary English
  • Verbs and adjectives identified by heuristics

6
Lexicon Growth
7
George A. Miller The Science of Words 1991
8
The SPECIALIST Lexicon
  • Morphology
  • Inflection
  • Derivation
  • Orthography
  • Spelling variants
  • Syntax
  • Complementation for verbs, nouns, and adjectives

9
Morphology
  • Inflectional
  • nucleus -- nuclei
  • cauterize, cauterizes, cauterized, cauterizing
  • red, redder reddest
  • Derivational
  • laryngeal -- larynx
  • transport -- transportation

10
Inflectional Morphology
The pitcher wound up and he flang the ball at the
batter. The batter swang and missed. The pitcher
flang the ball again and this time the batter
connected. He hit a high fly right to the center
fielder. The center fielder was all set to catch
the ball, but at the last minute his eyes were
blound by the sun and he dropped it. --J. H.
"Dizzy" Dean
11
Derivational Morphology
Dictionary ology ist
12
Orthography
Spelling Variation
  • align -- aline
  • Graves disease -- Gravess disease -- Graves
    disease
  • anesthetize -- anaesthetise
  • esophagus -- oesophagus

13
British and American Spelling
  • Criticise -- criticize
  • naturalise --naturalize
  • centre -- center
  • foetus -- fetus

14
Syntax -- Verb Complements
  • Intran
  • Ill treat.
  • trannp
  • He treated the patient.
  • ditrannp,pphr(with,np)
  • She treated the patient with the drug.

15
Syntax -- Verb Complements
basetreat entryE0061964 catverb variantsreg
intran trannp tranpphr(with,np) tranpphr(o
f,np) ditrannp,pphr(to,np) ditrannp,pphr(with,
np) ditrannp,pphr(for,np) cplxtrannp,advbl no
minalizationtreatmentnounE0061968
16
The 2003 SPECIALIST Lexicon
17
(No Transcript)
18
Lexicon Unit Records
basechronic entryE0016869 catadj variantsin
v positionattrib(1) positionpred stative
baseKaposi's sarcoma spelling_variantKaposi
sarcoma entryE0003576 catnoun variantsuncount
variantsreg variantsglreg
baseaspirate entryE0010803 catverb
variantsreg trannp
nominalizationaspirationnounE0010804
basein entryE0033870 catprep
19
Noun Variants
  • Kaposis sarcoma
  • Kaposis sarcomas
  • Kaposis sarcomata
  • Kaposi sarcoma
  • Kaposi sarcomas
  • Kaposi sarcomata

baseKaposi's sarcoma spelling_variantKaposi
sarcoma entryE0003576 catnoun variantsuncount
variantsreg variantsglreg
20
Regular Nouns
The plural suffix is s. y becomes ie following a
consonant before s. e is inserted before s if the
base ends in s, z, x, ch, or s
21
Regular Nouns
22
Greco-latin Regular nouns
23
Uncount Nouns(abstract or mass)
basesmallpox entryE0056359 catnoun variants
uncount basepotassium entryE0049387 catnoun
variantsuncount
  • a smallpox
  • two smallpoxes
  • much smallpox
  • a potassium
  • two potassiums
  • much potassium

24
Fixed Plural Nouns
basescissors entryE0054633 catnoun
variantsplur
basepolice entryE0048616 catnoun variantspl
ur
25
Irregular Nouns
baselarynx entryE0036919 catnoun
variantsirreglarynges variantsreg
basecorpus entryE0019113 catnoun
variantsirregcorpora variantsreg
26
Regular Verbs
  • The third person present tense suffix is s.
  • y becomes ie following a consonant before s.
  • e is inserted between z, x, ch, or sh and s.
  • The past tense suffix is ed.
  • y becomes ie following a consonant before ed.
  • Final e is deleted before ed.

27
Regular Verbs
  • dismissdismisses, dismissed, dismissing
  • agree agrees agreed agreeing
  • dry dries, dried, drying

28
Regular Doubling Verbs
  • End in a CVC pattern
  • Double the final consonant before ed and ing.
  • Are otherwise regular
  • variantsregd
  • e.g. control controls, controlled, controlling

29
Irregular Verbs
basedive catverb variantsreg
variantsirregdivesdovedovediving
intran intranpart(in) ...
30
Dive vs. Dove
31
Regular Adjectives and Adverbs
  • The comparative suffix is er.
  • The superlative suffix is est.
  • y become ie after a consonant before er or est.
  • Final e is deleted before er or est.
  • e.g. green greener, greenest

32
Regular Doubling Adjectives and Adverbs
  • CVC final pattern
  • Final consonant is doubled before ed or est.
  • Otherwise regular
  • e.g. red redder, reddest

33
Ancillary Data Bases
  • Synonymy
  • sm.db
  • Derivation
  • dm.db, dm.rules
  • Inflection
  • im.rules
  • Neoclassical compounds
  • nc.db

34
Derivational Facts and Rules
dm.facts treatmentnountreatverb prohibitionno
unprohibitiveadj cell lineagenouncell
linenoun photochemotherapeuticadjphotochemother
apynoun pharmacotherapeuticadjpharmacotherapyn
oun
35
Derivational Facts and Rules
dm.rules e.g. alienationalienate ationnouna
teverb rationrate stationstate
36
Inflectional Facts and Rules
im.rules Noun rules (glreg) usnounsingular
inounplural antusanti manounsingularm
atanounplural anounsingularaenounplural
umnounsingularanounplural onnounsingular
anounplural sisnounsingularsesnounplura
l isnounsingularidesnounplural mennounsi
ngularminanounplural exnounsingularicesn
ounplural xnounsingularcesnounplural
37
Neoclassical compounds
nc.db abdomin(o)abdomenroot abaway
fromprefix acanth(o)prickleroot acar(o)mitero
ot acetabul(o)acetabulumroot adtowardsprefix a
gogueinducingterminal albumin(o)albuminroot si
sconditionterminal stomysurgical
openingterminal
38
Synonyms
sm.db alaradjwingnoun amygdalineadjtonsilno
un articularadjjointnoun bulbaradjmedulla
oblongatanoun fununcularadjboilnoun genicular
adjkneenoun hepatocellularadjliver
cellsnoun lazaradjleprosynoun lenticularadjc
rystalline lensnoun ypsiliformadjupsiloidadj w
olframnountungstennoun double
visionnoundiplopianoun
39
Relational Tables
  • One line records
  • Pipe separated Fields --
  • Keyed to EUI
  • LRAGR matches forms to EUIs
  • Word index LRWD

40
Relational Tables
  • LRAGR - Agreement
  • LRCMP - Complements
  • LRFIL - Files
  • LRFLD - Fields
  • LRMOD - Modification
  • LRNOM - Nominalization
  • LRPRN - Pronouns
  • LRPRP - Properties
  • LRSPL - Spelling
  • LRTRM - Trademarks
  • LRWD - Word index

41
LRAGR
Agreement and Inflection
  • EUI - Entry ID
  • STR - Inflected form
  • SCA - Syntactic category
  • AGR - agreement information
  • BAS - Base form (morphological)
  • CIT - Citation form (base)

42
LRAGR
E0003576Kaposi sarcomasnouncount(thr_plur)Kapo
si sarcomaKaposi's sarcoma E0003576Kaposi
sarcomatanouncount(thr_plur)Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi
sarcomanouncount(thr_sing)Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi
sarcomanoununcount(thr_sing)Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomasnouncount(thr_plur)Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomatanouncount(thr_plur)Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomanouncount(thr_sing)Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomanoununcount(thr_sing)Kaposi's
sarcomaKaposi's sarcoma
43
Number Words
  • one, thirteen fifty, thousand, million
  • Not in the lexicon.
  • No part of speech
  • Used to construct number expressions
  • Three thousand eight hundred and five
  • To be released in the 2003 lexicon.
  • Accompanying number tools.

44
basetwo catnumber_word entryN0000003
variantsecondordinal variantseconddenom
inator,singularpart_denominator
variantseconddenominator,pluralpart_denominator
varianthalfdenominator,singularfull_denomi
nator varianthalvesdenominator,pluralfull_d
enominator number_typeunit value2
digit2
45
basetwenty catnumber_word
entryN0000021 variantsreg
number_typedecade value20 digit2
basetwelve catnumber_word
entryN0000013 variantsreg
number_typeteen value12
basebillion catnumber_word
entryN0000032 variantsreg
number_typemagnitude power3
basesexdecillion catnumber_word
entryN0000046 variantsreg
number_typemagnitude power17
46
sixty four million four hundred thousand
64 1,000,000 400,000 64,400,000
47
Text processing
Lexical tools
SPECIALIST LEXICON
48
Lexical Tools
  • Wordind -- breaks strings into words
  • Produces the Metathesaurus word indexes (MRXW)
  • LVG -- performs various lexical transformations
  • NORM -- a selection of LVG transformations,
  • Used for Metathesaurus indexing
  • Produces the Metathesaurus Normalized word and
    string indexes (MRXNW MRXNS)
  • Used to access those indexes

49
Normalization
  • Hodgkin Disease
  • HODGKINS DISEASE
  • Hodgkin's Disease
  • Disease, Hodgkin's
  • HODGKIN'S DISEASE
  • Hodgkin's disease
  • Hodgkins Disease
  • Hodgkin's disease NOS
  • Hodgkin's disease, NOS
  • Disease, Hodgkins
  • Diseases, Hodgkins
  • Hodgkins Diseases
  • Hodgkins disease
  • hodgkin's disease
  • DiseaseHodgkins
  • Disease, Hodgkin
  • disease hodgkin
Write a Comment
User Comments (0)
About PowerShow.com