WSTA Lecture 14 Part-of-speech Tagging presentation

About This Presentation

Transcript and Presenter's Notes

Title: WSTA Lecture 14 Part-of-speech Tagging

1
WSTA Lecture 14Part-of-speech Tagging

Tags
introduction
tagged corpora, tagsets
Tagging
motivation
Simple unigram tagger
Markov model tagging
Rule based tagging
Evaluation

Slide credits Steven Bird
2
NLP versus IR

Covered predominantly IR up until now
processing, stemming, indexing, querying, etc
mostly bag of words and vector space models
word order unimportant
word inflections unimportant
What do we mean by natural language processing?
and how does this differ from / overlap with IR?

3
Tags 1 ambiguity

time flies like an arrow
fruit flies like a banana
ambiguous headlines
http//www.snopes.com/humor/nonsense/head97.htm
British Left Waffles on Falkland Islands
Juvenile Court to Try Shooting Defendant

4
Tags 2 Representationsto resolve ambiguity
5
Exercise tag some headlines

British Left Waffles on Falkland Islands
Juvenile Court to Try Shooting Defendant

6
Tags 3 Tagged Corpora

The/DT limits/NNS to/TO legal/JJ absurdity/NN
stretched/VBD another/DT notch/NN this/DT
week/NN when/WRB the/DT Supreme/NNP Court/NNP
refused/VBD to/TO hear/VB an/DT appeal/NN from/IN
a/DT case/NNthat/WDT says/VBZ corporate/JJ
defendants/NNS must/MD pay/VB damages/NNS even/RB
after/IN proving/VBG that/IN they/PRP could/MD
not/RB possibly/RB have/VB caused/VBN the/DT
harm/NN ./.
Source Penn Treebank Corpus (nltk/data/treebank/
wsj_0130)

7
Another kind of taggingSense Tagging

The Pantheon's interior/a , still in its
original/a form/a ,
interior (a) inside a space (b) inside a
country and at a distance from the coast or
border (c) domestic (d) private.
original (a) relating to the beginning of
something (b) novel (c) that from which a copy
is made (d) mentally ill or eccentric.
form (a) definite shape or appearance (b) body
(c) mould (d) particular structural character
exhibited by something (e) a style as in music,
art or literature (f) homogenous polynomial in
two or more variables ...

8
Significance of Parts of Speech

a word's POS tells us a lot about the word and
its neighbors
limits the range of meanings (deal),pronunciation
s (object vs object), or both (wind)
helps in stemming
limits the range of following words for ASR
helps select nouns from a document for IR
More advanced uses (these won't make sense yet)
basis for chunk parsing
parsers can build trees directly on the POS tags
instead of maintaining a lexicon
first step for many different NLP tasks

9
What does Tagging do?

Collapses Distinctions
Lexical identity may be discarded
e.g. all personal pronouns tagged with PRP
Introduces Distinctions
Ambiguities may be removed
e.g. deal tagged with NN or VB deal tagged with
DEAL1 or DEAL2
Helps classification and prediction
There are many tagsets. This is due to
the different ways to define a tag
the need to balance classification and prediction
harder/easier classification task vs more/less
information about context

10
Tagged Corpora

Brown Corpus
The first digital corpus (1961), Francis and
Kucera, Brown U
Contents 500 texts, each 2000 words long
from American books, newspapers, magazines,
representing 15 genres
science fiction, romance fiction, press reportage
scientific writing, popular lore.
See nltk/data/brown/
See reading for definition of Brown tags
Penn Treebank
First syntactically annotated corpus
Contents 1 million words from WSJ POS tags,
syntax trees
See nltk/data/treebank/ (5 sample)

11
Tagged Corpora in other languages

Parsed treebanks in many other languages
Basque, Bulgarian, Chinese, Czech, Finnish,
French
German, Greek, Hebrew, Hungarian, Irish, Italian
Japanese, Korean, Persian, Romanian, Spanish
Swedish and many more!
All with part-of-speech annotation
language specific tag sets
recent work on mapping to common tag set
https//code.google.com/p/universal-pos-tags/
http//universaldependencies.github.io/docs/

12
Application of tagged corporagenre
classification
13
Important Treebank Tags

NN noun JJ adjective
NNP proper noun CC coord conjunc
(and/or/..)
DT determiner (the/a/..) CD cardinal
number
IN preposition (in/of/..) PRP personal
pronoun (I/you/..)
VB verb RB adverb (gently, now)
-R comparative (better)
-S superlative (bravest) or plural
- possessive (my)

14
Verb Tags

VBP base present take
VB infinitive take
VBD past took
VBG present participle taking
VBN past participle taken
VBZ present 3sg takes
MD modal can, would

15
Simple Tagging in NLTK

Reading Tagged Corpora
gtgtgt from nltk.corpus import treebankgtgtgt
treebank.fileids()gtgtgt treebank.tagged_sents('wsj_
0001.mrg')0(u'Pierre', u'NNP'), (u'Vinken',
u'NNP'), (u',', u','), (u'61', u'CD'), (u'years',
u'NNS'), (u'old', u'JJ'), (u',', u','), (u'will',
u'MD'), (u'join', u'VB'), (u'the', u'DT'), ...
see also Brown corpus, Conll2000, Alpino and more
Tagging a string
gtgtgt nltk.tag.pos_tag('Fruit flies like a
banana'.split())
('Fruit', 'NN'), ('flies', 'NNS'), ('like',
'IN'), ('a', 'DT'), ('banana', 'NN')
(N.b. Uses a maximum entropy tagger)

16
Tagging Algorithms

rule based taggers
original methods, based on layers of rules about
how to tag words based on their context (e.g.,
Brill tagger)
unigram tagger
assign the tag which is the most probable for the
word in question, based on frequency in a
training corpus
bigram tagger, n-gram tagger
inspect one or more tags in the context(usually,
immediate left context)
Maximum entropy and HMM taggers (next lecture)

17
Unigram Tagging

Unigram table of tag frequencies for each word
e.g. in tagged WSJ sample (from Penn Treebank)
deal NN (11) VB (1) VBP (1)
Training
load a corpus
count the occurrences of each (word, tag) in the
corpus
Tagging
lookup the most common tag for each word to tag
Gets 90 accuracy!
See the code in nltk.tag.UnigramTagger

18
The problem with unigram taggers

what evidence do they consider when assigning a
tag?
when does this method fail?

19
Fixing the problem usinga bigram tagger

construct sentences involving a word which can
have two different parts of speech
e.g. wind noun, verb
The wind blew forcefully
I wind up the clock
gather statistics for current tag, based on
(i) current word (ii) previous tag
result a 2-D array of frequency distributions
what does this look like?

20
Generalizing the context
21
Bigram n-gram taggers

n-gram tagger consider n-1 previous tags
how big does the model get?
how much data do we need to train it?
Sparse-data problem
As n gets large, the chances of having seen all
possible patterns of tags during training
diminishes (large gt3)
Approaches
Combine taggers (backoff, weighted average)
statistical estimation of the probability of
unseen events
See nltk.tag.sequential.NgramTagger
and various others in nltk.tag package

22
Markov Model Taggers

Recall n-gram language model
similar problem of modelling next word given
previous words, similar issues with sparsity and
estimation
here we focus on generating tag sequences rather
than words
both are in instances of a Markov model
tag sequence modelled as a Markov chain
each tag is linked to word sequence
Can we just predict each tag in sequence?
need to know the preceding tag(s)
but these are unknown
Next lecture, well explore this further using
Hidden Markov Models

23
The Brill rule-Based Tagger

The Linguistic Complaint
where is the linguistic knowledge of a tagger?
just a massive table of numbers
aren't there any linguistic insights that could
emerge from the data?
Transformation-Based Tagging / Brill Tagging
Tag each word with its most likely tag
Repeatedly correct tags based on context
Example rule NN VB PREVTAG TO
to/TO race/NN -gt to/TO race/VB
Other contexts
PREV1OR2TAG, PREV1OR2WD, WDNEXTTAG, ...
See nltk.tag.brill.BrillTagger

24
Evaluating Tagger Performance

Need an objective measure of performance
Commonly use per-token accuracy
measured against heldout gold standard data
fraction of words tagged correctly
Simple methods get 90 performance
1 and 2-gram
Brill tagger
HMMs get 95 and CRFs get 97 performance
see nltk.tag.hmm,tnt,crf,stanford,senna,
Why can't we get 100?

25
Tagging broader lessons

Tagging has several properties that are typical
of NLP
classification (words have properties)
disambiguation through representation
sequence learning from annotated corpora
simple, general methods
conditional frequency distributions
Cool things you can do now elementary NLU, NLG
Review
tokenization tagging segmentation and
annotation of words
chunking segmentation and annotation of word
sequences

26
Readings

One of
Jurafsky Martin, chapter 5
Manning Schutze, chapter 10
NLTK tagging tutorial
http//www.nltk.org/book/ch05.html
Next lecture
tagging with (hidden) Markov models
other sequence tagging tasks
named entity tagging
shallow parsing

Write a Comment

User Comments (0)

About PowerShow.com

WSTA Lecture 14 Part-of-speech Tagging PowerPoint PPT Presentation