Lexical Tools - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Lexical Tools

Description:

Characteristics of all the command line tools. take input from ... Remove genitive. g. Inflections to simplified inflections. Si. Lexical Tools: Flow Components ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 61
Provided by: Div62
Category:

less

Transcript and Presenter's Notes

Title: Lexical Tools


1
Lexical Tools
2
The Lexical Tools
  • Introduction
  • Norm
  • WordInd
  • Lvg
  • Additional Tools Developed by NLM

3
Lexical Tools Introduction
  • Command line tools
  • norm
  • lvg
  • wordInd
  • Web GUI, Lexical Gui Tool (lgt)
  • Embeddable Java APIs

4
Lexical Tools Introduction
  • These tools are good for
  • aggressive text pattern matching
  • making word, term, phrase indexes
  • matching queries with indexed entries
  • increasing recall and/or precision

5
Lexical Tools Introduction
  • Characteristics of all the command line tools
  • take input from the screen or a file
  • put their results to the screen or a file
  • Interpret fielded text
  • Can be told which fields contain what type of
    information

6
Lexical Tools Norm
Metathesaurus English Strings
Normalized string index
norm
MRXNS.ENG
WordInd
Normalized word index
MRXNW.ENG
7
Lexical Tools Norm
Normalized string index
Normed term
Query
norm
Normalized word index
SUIS
Metathesaurus Concepts
Metathesaurus Concepts that match The normalized
query
8
Lexical Tools Norm
  • Norm abstracts away from
  • case
  • punctuation
  • word order
  • possessive forms
  • inflectional variation

9
Lexical Tools Norm
Hodgkin's Diseases, NOS
remove genitives
replace punctuation with spaces
remove stop words
lowercase
uninflect each word
word order sort
10
Lexical Tools Norm
Hodgkin's Diseases, NOS
Hodgkin'sDiseases, NOS
Hodgkin Diseases, NOS
remove genitives
replace punctuation with spaces
remove stop words
lowercase
uninflect each word
word order sort
11
Lexical Tools Norm
Hodgkin's Diseases, NOS
Hodgkin'sDiseases, NOS
Hodgkin Diseases, NOS
remove genitives
Hodgkin Diseases NOS
replace punctuation with spaces
remove stop words
lowercase
uninflect each word
word order sort
12
Lexical Tools Norm
Hodgkin's Diseases, NOS
Hodgkin'sDiseases, NOS
Hodgkin Diseases, NOS
remove genitives
Hodgkin Diseases NOS
replace punctuation with spaces
Hodgkin Diseases
remove stop words
lowercase
uninflect each word
word order sort
13
Lexical Tools Norm
Hodgkin's Diseases, NOS
Hodgkin'sDiseases, NOS
Hodgkin Diseases, NOS
remove genitives
Hodgkin Diseases NOS
replace punctuation with spaces
Hodgkin Diseases
remove stop words
hodgkin diseases
lowercase
uninflect each word
word order sort
14
Lexical Tools Norm
Hodgkin's Diseases, NOS
Hodgkin'sDiseases, NOS
Hodgkin Diseases, NOS
remove genitives
Hodgkin Diseases NOS
replace punctuation with spaces
Hodgkin Diseases
remove stop words
hodgkin diseases
lowercase
hodgkin disease
uninflect each word
word order sort
15
Lexical Tools Norm
Hodgkin's Diseases, NOS
Hodgkin'sDiseases, NOS
Hodgkin Diseases, NOS
remove genitives
Hodgkin Diseases NOS
replace punctuation with spaces
Hodgkin Diseases
remove stop words
hodgkin diseases
lowercase
hodgkin disease
uninflect each word
disease hodgkin
word order sort
disease hodgkin
16
Lexical Tools Norm
Norm to
17
Lexical Tools WordInd
18
Lexical Tools WordInd
wordind is a tool to break terms into
words it is used to
Wordind is a tool to break terms into words. It
is used to take a row from a Metathesaurus table
that contains a term, sentence, paragraph, story,
and break the text part of that row into its
constituent words.
19
Lexical Tools WordInd
  • Breaks words into tokens
  • Passes other fields to output, untouched
  • Lowercases
  • Removes white space and punctuation

20
Lexical Tools WordInd
  • Useful command line options for wordInd

21
Lexical Tools WordInd
gt wordInd t7 F16 C0185495S0298948denis C01
85495S0298948browne C0185495S0298948splint C01
85495S0298948strapping
C0185495ENGPL0223844PFS0298948Denis-Browne
splint strapping3
22
Lexical Tools Lvg
23
Lexical Tools Flow Components
24
Lexical Tools Flow Components
25
Lexical Tools Flow Components
26
Lexical Tools Flow Components
27
Lexical Tools Flow Components
28
Lexical Tools Flows
leave
leaves
leave
inflect
leaving
left
29
Lexical Tools Flows
  • gt lvg fi
  • leave
  • leaveleave1281i1
  • leaveleave128512i1
  • leaveleaves1288i1
  • leaveleft102464i1
  • leaveleft102432i1
  • leaveleave10241i1
  • leaveleave1024262144i1
  • leaveleave10241024i1
  • leaveleaves1024128i1
  • leaveleaving102416i1

30
Lexical Tools A Serial Flow
lowercase
Strip diacritics
Input term
Output term
Remove possessive
Remove stop words
Strip punctuation
Word order sort
Flow components can be arranged so that the
output of one is the input to another.
31
Lexical Tools A Serial Flow
gt lvg flqgtpw The Gougerot-Sjögren's
Syndrome The Gougerot-Sjögren's Syndrome
gougerotsjogren syndrome204716777215

lqgtpw1
32
Lexical Tools Parallel Flows
Output term
noOperation
Input term

Output terms
Uninflect
synonyms
Multiple flows can be defined
33
Lexical Tools Parallel Flows
gt lvg fn fBy ear earear20471048575n1
earaural11By2 earauricularis11By2 ea
rotic11By2 earotor11By2
First Flow
Second Flow
34
Lexical Tools Fielded Output
gt lvg fL leaves
Flow Number
Flow history
leaves
leaves
1
136
L
1152





Input Term
Output Term
Categories
Inflections
35
Lexical Tools Fielded Output

Categories
Output term
Inflections
plural
noun
leaves
verb
pres3ps
36
Lexical Tools Categories
Categories 1152
Category bit vector
compl
modal
noun
prep
pron
verb
conj
adv
aux
adj
det
0
0
0
1
0
1
0
0
0
0
0
37
Lexical Tools Categories
compl
modal
noun
prep
pron
verb
conj
adv
aux
adj
det
Bit Vector positions
0
6
8
7
9
10
3
4
5
1
2
38
Lexical Tools Inflections
Inflections 136
Present participle
comparative
Present 3ps
Present 3ps
superlative
plural
base
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
39
Lexical Tools Inflections
40
Lexical Tools Inflections
41
Lexical Tools Post Flow Options
42
Lexical Tools Post Flow Options
Show category names
Show inflection names
Show the category and inflection names
gt lvg -fL -SC -SI phosphoprotein phosphoproteinp
hosphoproteinltnoungtltbasesingulargtL1 sclerosi
ng sclerosingsclerosingltadjverbgtltbasepresPart
positivegtL1
43
Lexical Tools Post Flow Options
Mark the end of the set of variants returned
Mark the end of processing
gt lvg -fL -ccgi behavior behaviorbehavior12851
3L1 __THE_END__
44
Lexical Tools Post Flow Options
Specify fields for outputs
Display only the 8th and 6th field from the
output
gt lvg -fu -t7 -F86 C0035440ENGSL0035434VW
S0003894Rheumatic carditis, acute acute
Rheumatic carditisS0003894
45
Lexical Tools Post Flow Options
Display only the input term field when using
fielded input
Display only the input term from the fielded
input to the output
gt lvg -fu -t7 ti C0035440S0003894Rheumatic
carditis, acute Rheumatic carditis, acuteacute
Rheumatic carditis204716777215u1
46
Lexical Tools Post Flow Options
Restrict the number of variants returned
Limit the number of output terms to 2
gt lvg -fi -R2 foo foofoo1281i1 foofoos12
88i1
Note The unrestricted output would have
produced 12 rows otherwise
Note Dangerous! Do not try this at home!
47
Lexical Tools Global Behaviors
Output terms
Input term
Output terms
48
Lexical Tools No Operation
-fn
Copies the input term to the output with no
transformation
gt lvg -fn -fd -fy -SC -SI force forceforceltal
lgtltallgtn1 forceforcefullyltadvgtltbasegtd2 f
orceforcefulltadjgtltbasegtd2 forceforcibleltad
jgtltbasegtd2 forcedynamicltadjgtltbasegty3
49
Lexical Tools Inflect
-fi
Generate inflectional variants
gt lvg -fi -SC -SI West Nile Virus West Nile
VirusWest Nile virusltnoungtltbasegti1 West
Nile VirusWest Nile virusltnoungtltsingulargti1
West Nile VirusWest Nile virusesltnoungtltpluralgt
i1
50
Lexical Tools Inflect
Generate inflections, filter by cat and/or
inflection
-ficicatsinfls
gt lvg -fici128ALL -SC -SI bioassay bioassaybio
assayltnoungtltbasegtici1 bioassaybioassayltnoun
gtltsingulargtici1 bioassaybioassaysltnoungtltplu
ralgtici1
51
Lexical Tools Inflect Simple
Generate inflections noting simplified inflection
tags
-fis
gt lvg -fis cutcut10241is1 cutcutting1024
16is1 cutcut102432is1 cutcut102464is
1 cutcut1024128is1 cutcuts1024128is1
cutcut1281is1 cutcuts1288is1
52
Lexical Tools Uninflect by Term
-fb
Uninflect by term
gt lvg -fb -SC -SI left atria left atrialeft
atriumltnoungtltbasegtb1
53
Lexical Tools Derivations
-fd
Generate derivations
gt lvg -fd -SC -SI diagnostic diagnosticdiagnosis
ltnoungtltbasegtd1 diagnosticdiagnosticsltnoungt
ltbasegtd1 diagnosticdiagnoseltverbgtltbasegtd1
diagnosticdiagnosticalltadjgtltbasegtd1
54
Lexical Tools Derivations
-fdccats
Generate derivations, filter by category
gt lvg -fdc129 -SC -SI Reduce reducereducerltnou
ngtltbasegtd1 reducereductionltnoungtltbasegtd1
reducereducibleltadjgtltbasegtd1
55
Lexical Tools Synonyms
-fy
Generate synonyms
gt lvg -fy -SC -SI kidney kidneynephricltadjgtltba
segty1 kidneynephriticltadjgtltbasegty1 kidney
renalltadjgtltbasegty1
56
Lexical Tools Normalize (norm)
-fN
Remove stop words, then remove genitives, then
replace punctuation with spaces, then lowercase,
then uninflect each word, then take each of the
uninflected words, then word order sort.
gt lvg -fN Syndrome, Dry Eyes Syndrome, Dry
Eyesdry eye syndrome20471gotlBw1
57
Lexical Resources
58
Building an Index Using The Lexical Tools
  • Can we build a tool that increases recall?
  • Can we build a tool that increases precision?

59
Building an Index Using The Lexical Tools
  • Can we build a tool that increases precision?
  • Case
  • Constrain by part of speech
  • Filter to the lexicon

60
Building an Index Using The Lexical Tools
  • Can we a tool that increases recall?
  • - Include
  • synonyms
  • derivations
  • acronyms and their expansions
  • spelling variants
Write a Comment
User Comments (0)
About PowerShow.com