Additional NLS Tools - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Additional NLS Tools

Description:

Additional NLS Tools. NLS's Java NLP tools. MMTx. GSpell. NLS Java NLP Tools. Tokenizer ... anonomous|anadromous|3.0|0.2958160192082048|NGrams ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 22
Provided by: Div62
Category:

less

Transcript and Presenter's Notes

Title: Additional NLS Tools


1
Additional NLS Tools
  • NLSs Java NLP tools
  • MMTx
  • GSpell

2
NLS Java NLP Tools
  • Tokenizer
  • Lexical Lookup
  • NP Parser
  • Document Centric
  • Java Programs
  • and APIs

3
Java NLP Tools Tokenizer
Document
  • Tokenizes text into
  • Sections (paragraphs)
  • Sentences
  • Tokens
  • Can handle
  • FreeText
  • HTML
  • MedLINE Abstracts

Sections
Section 1
Sentences
Sentence 1
Tokens
Token 1
4
Java NLP Tools Tokenizer
  • Usage
  • tokenize.batsh Options
  • --fileNamefileName
  • --outputFileNamefileName
  • --inputTypefreeTextHTMLmedlineCitations
  • --sections
  • --sentences
  • --tokens
  • --pipedOutput
  • --indicate_citation_end

5
Java NLP Tools Tokenizer
tokenize.bat --inputFile5.txt --inputTypefreeTex
t --sentences --tokens
--pipedOutput
  • Sentence197182But those follow-up tests have
    been inconclusive, state and federal officials
    said.
  • Token16979900But
  • Token1710110510those
  • Token1810811320follow
  • Token1911411420-
  • Token2011511630up
  • Token2111812240tests
  • Token2212412750have
  • Token2312913260been
  • Token2413414570inconclusive

6
NLP Tools Lexical Lookup
Document
  • Chunks tokens into
  • terms
  • From SPECIALIST
  • Lexicon
  • From regular
  • expressions

Sections
Section 1
Sentences
Sentence 1
LexicalElements
Lexical Element 1
Tokens
7
Java NLP Tools Lexical Lookup
  • Usage
  • LexicalLookup.batsh Options
  • --fileNamefileName
  • --outputFileNamefileName
  • --inputTypefreeTextHTMLmedlineCitations
  • --sections
  • --sentences
  • --lexicalElements
  • --lexicalEntries
  • --tokens
  • --pipedOutput

8
Java NLP Tools Lexical Lookup
LexicalLookup.bat --inputFile5.txt
--inputTypefreeText
--lexicalElements --lexicalEntries --pipedOutput
  • Lexical Element17LEXICONprepBut9799
  • LexicalEntrybutconjbaseE0014465
  • LexicalEntrybutprepbaseE0014464
  • Lexical Element18LEXICONdetthose101105
  • LexicalEntrythosedetpluralE0060728
  • LexicalEntrythosepronbaseE0060729
  • Lexical Element20LEXICONadjfollow-up108116
  • LexicalEntryfollow-upadjbaseE0028422
  • Lexical Element23LEXICONnountests118122
  • LexicalEntrytestsverbpres3sE0060349
  • LexicalEntrytestsnounpluralE0060348

9
NLP Tools NpParser
  • Chunks sentences
  • into simple phrases

10
Java NLP Tools NpParser
  • Usage
  • npParser.batsh Options
  • --fileNamefileName
  • --outputFileNamefileName
  • --inputTypefreeTextHTMLmedlineCitations
  • --sections
  • --sentences
  • --phrases--nps--mincoMan
  • --lexicalElements
  • --lexicalEntries
  • --tokens
  • --pipedOutput

11
Java NLP Tools NpParser
npParser.bat --inputFile5.txt --inputTypefreeTex
t --phrases --pipedOutput
  • Phrase0010The companycompany
  • Phrase11214has
  • Phrase21624forwarded
  • Phrase32639some materialsmaterials
  • Phrase44162to a state laboratorystate
    laboratory
  • Phrase56474in RichmondRichmond
  • Phrase67686for furtherfurther
  • Phrase78894testing

12
MMTxMetaMapTechnology Transfer
  • Maps text phrases to Metathesaurus
  • concepts
  • Java
  • Implementation
  • of MetaMap

Document
Tokenization
POS Tagger Client
Lexical Lookup
Parser
Variant Generation
Candidate Retrieval
Evaluation
Phrase 1
Final Mapping
Post-processing Presentation
13
MMTx
  • Usage
  • MMTx ltoptionsgt --fileNameinfile
    outputFileNameoutfile
  • --strict_model--moderate_model--relaxed_model
  • --KSYearyear--mm_data_versioncustomName
  • --thresholdlowestScore
  • --truncate_candidates_mappings
  • --term_processing--allow_overmatches--allow_co
    ncept_gaps
  • --composite_phrases
  • --prefer_multiple_concepts
  • --fielded_output

14
MMTx
MMTx --inputFile5.txt --inputTypefreeText
  • Processing 00000000.tx.3 One problem is caused
    by the VecTest itself,
  • which uses a dipstick to measure the presence of
    a protein
  • associated with the parasite that causes malaria.
  • Phrase "One problem"
  • Meta Candidates (2)
  • 861 Problem, NOS Finding,Pathologic Function
  • 694 One Quantitative Concept
  • Meta Mapping (888)
  • 694 One Quantitative Concept
  • 861 Problem, NOS Finding,Pathologic Function

15
GSpell
16
GSpell
  • Spelling suggestion tool
  • Pure Java application with Java APIs
  • Support for multi word dictionary entries

17
GSpell Usage
  • Usage
  • GSpellFind.shbat
  • --dictionaryNameOfDictionary
  • --inputFileSource --outputFiletarget
  • --truncateN --considerNCandidatesN
  • --maxEditDistanceN
  • --fieldedText --termFieldX
    --correctFieldY
  • --reportTime --version--help

18
GSpell Example
  • anonomousanonymous1.00.8734230160180236NGrams
  • anonomousallonomous2.00.5819672267388108NGram
    s
  • anonomousautonomous2.00.5819672267388108NGram
    s
  • anonomousanadromous3.00.2958160192082048NGram
    s
  • anonomousanalogous3.00.2958160192082048NGrams
  • anonomousanomalous3.00.2958160192082048NGrams
  • anonomousanonymously3.00.295816019208248NGram
    s
  • anonomousanonymes3.00.2958160192082048Metapho
    ne
  • anonomousanonyms3.00.2958160192082048Metaphon
    e
  • anonomousacoprous4.00.11470810702102521NGrams

19
GSpell Indexing
  • Usage
  • GSpellIndex.shbat
  • --dictionaryNameOfDictionary
  • --inputFileSourceFile
  • --reportTime --version--help
  • Format for the input file
  • One word per line

20
Downloadable Resources
  • umlslex.nlm.nih.gov
  • Lvg
  • Java NLP Tools
  • GSpell
  • mmtx.nlm.nih.gov

21
Lexical Tools for UMLS Developers
Allen C. Browne, Guy Divita, Chris Lu Lister
Hill National Center for Biomedical
Communications National Library of Medicine
Lexical Systems
umlsLex.nlm.nih.gov Email
umlslex_at_nlm.nih.gov Knowledge Source
Server http//umlsks.nlm.nih.gov UMLS
Information http//umlsInfo.nlm.nih.go
v
Write a Comment
User Comments (0)
About PowerShow.com