Title: CIS 530 Part of Speech Tagging
1CIS 530 - Part of Speech Tagging
- Reading JM 5.1-5.5.1 (MS 3.1, 4.3.2,
10.1-10.3) - (A few slides adapted from Dan Jurafsky, Jim
Martin, Dekang Lin, Rada Mihalcea, and Bonnie
Dorr.)
2Parts of Speech
- 8 (ish) traditional parts of speech
- Noun, verb, adjective, preposition, adverb,
article, interjection, pronoun, conjunction, etc - This idea has been around for over 2000 years
(Dionysius Thrax of Alexandria, c. 100 B.C.) - Called parts-of-speech, lexical category, word
classes, morphological classes, lexical tags, POS - Well use POS most frequently
- (This and next 4 slides from Dan Jurafsky from
slides by Jim Martin, Dekang Lin, and Bonnie
Dorr.)
3POS examples for English
- N noun chair, bandwidth, pacing
- V verb study, debate, munch
- ADJ adj purple, tall, ridiculous
- ADV adverb unfortunately, slowly,
- P preposition of, by, to
- PRO pronoun I, me, mine
- DET determiner the, a, that, those
4Open vs. Closed classes
- Open vs. Closed classes
- Open
- Nouns, Verbs, Adjectives, Adverbs.
- Why open?
- Closed
- determiners a, an, the
- pronouns she, he, I
- prepositions on, under, over, near, by,
5Open Class Words
- Every known human language has nouns and verbs
- Nouns people, places, things
- Classes of nouns
- proper vs. common
- count vs. mass
- Verbs actions and processes
- Adjectives properties, qualities
- Adverbs hodgepodge!
- Unfortunately, John walked home extremely slowly
yesterday
6Closed Class Words
- Differ more from language to language than open
class words - Examples
- prepositions on, under, over,
- particles up, down, on, off,
- determiners a, an, the,
- pronouns she, who, I, ..
- conjunctions and, but, or,
- auxiliary verbs can, may should,
- numerals one, two, three, third,
7Prepositions from CELEX
8Pronouns in CELEX
9Conjunctions
10Auxiliaries
11NLP Task I Determining Part of Speech Tags
12POS Tagging Definition
- The process of assigning a part-of-speech or
lexical class marker to each word in a corpus
13POS Tagging example
- WORD tag
- the DET
- koala N
- put V
- the DET
- keys N
- on P
- the DET
- table N
14What is POS tagging good for?
- Speech synthesis
- How to pronounce lead?
- INsult inSULT
- OBject obJECT
- OVERflow overFLOW
- DIScount disCOUNT
- CONtent conTENT
- Stemming for information retrieval
- Knowing a word is a N tells you it gets plurals
- Can search for aardvarks get aardvark
- Parsing and speech recognition and etc
- Possessive pronouns (my, your, her) followed by
nouns - Personal pronouns (I, you, he) likely to be
followed by verbs
15Equivalent Problem in Bioinformatics
- Durbin et al. Biological Sequence Analysis,
Cambridge University Press. - Several applications, e.g. proteins
- From primary structure ATCPLELLLD
- Infer secondary structure HHHBBBBBC..
16History From Yair Halevi (Bar-Ilan U.)
Combined Methods 98
Trigram Tagger (Kempe) 96
DeRose/Church Efficient HMM Sparse Data 95
Tree-Based Statistics (Helmut Shmid) Rule Based
96
Transformation Based Tagging (Eric Brill) Rule
Based 95
Greene and Rubin Rule Based - 70
HMM Tagging (CLAWS) 93-95
Neural Network 96
LOB Corpus Tagged
Brown Corpus Created (EN-US) 1 Million Words
Brown Corpus Tagged
British National Corpus (tagged by CLAWS)
POS Tagging separated from other NLP
LOB Corpus Created (EN-UK) 1 Million Words
Penn Treebank Corpus (WSJ, 4.5M)
17POS Tag Sets for English Design
18Penn Treebank Tagset
19A Simplified Tagset for English
- Tagsets for English have grown progressively
larger since the Brown Corpus until the Penn
Treebank project.
20Rationale behind British European tag sets
- To provide distinct codings for all classes of
words having distinct grammatical behaviour
Garside et al. 1987 - The Lund tagset for adverb distinguishes
between - Adjunct Process, Space, Time
- Wh-type Manner, Reason, Space, Time, Wh-type
S - Conjunct Appositional, Contrastive,
Inferential, Listing, - Disjunct Content, Style
- Postmodifier else
- Negative not
- Discourse Item Appositional, Expletive,
Greeting, Hesitator,
21One of Several Reasons for a Smaller Tagset
- Many tags are unique to particular lexical items,
and can be recovered automatically if desired.
22Syntactic Recoverability
- Prepositions vs. Subordinating Conjunctions
- Since the last meeting, things have changed.
- Since we first learned about stochastic methods,
things have changed - We tag both as IN
- Subject vs. Object Pronouns
- Recoverable from Position in Parse Tree
- To as Preposition vs. to as Auxiliary
- Can be recovered by position in parse tree
- BIG MISTAKE The parser needs this information.
23POS Tagging - Statistical Models
24Task I Determining Part of Speech Tags
- The Problem
- The Old Solution Combinatoric search.
- If each of n words has k tags on average, try
the nk combinations until one works.
25NLP Task I Determining Part of Speech Tags
- The Old Solution Depth First search.
- If each of n words has k tags on average, try
the nk combinations until one works. - Machine Learning Solutions Automatically learn
Part of Speech (POS) assignment. - The best techniques achieve 96-97 accuracy per
word on new materials, given large training
corpora.
26Simple Statistical Approaches Idea 1
27Simple Statistical Approaches Idea 2
- For a string of words
- W w1w2w3wn
- find the string of POS tags
- T t1 t2 t3 tn
- which maximizes P(TW)
- i.e., the probability of tag string T given that
the word string was W - i.e., that W was tagged T
28Again, The Sparse Data Problem
- A Simple, Impossible Approach to Compute P(TW)
- Count up instances of the string "heat oil in a
large pot" in the training corpus, and pick the
most common tag assignment to the string..
29A BOTEC Estimate of What We Can Estimate
- What parameters can we estimate with a million
words of hand tagged training data? - Assume a uniform distribution of 5000 words and
40 part of speech tags.. - Rich Models often require vast amounts of data
- Good estimates of models with bad assumptions
often outperform better models which are badly
estimated
30A Practical Statistical Tagger
31A Practical Statistical Tagger II
- But we can't accurately estimate more than tag
bigrams or so - Again, we change to a model that we CAN estimate
32A Practical Statistical Tagger III
- So, for a given string W w1w2w3wn, the tagger
needs to find the string of tags T which maximizes
33Training and Performance
- To estimate the parameters of this model, given
an annotated training corpus - Because many of these counts are small, smoothing
is necessary for best results - Such taggers typically achieve about 95-96
correct tagging, for tag sets of 40-80 tags.