CIS 530 Part of Speech Tagging

About This Presentation

Title:

CIS 530 Part of Speech Tagging

Description:

Noun, verb, adjective, preposition, adverb, article, ... History: From Yair Halevi (Bar-Ilan U.) 1960. 1970. 1980. 1990. 2000. Brown Corpus Created (EN-US) ... – PowerPoint PPT presentation

Number of Views:241

Avg rating:3.0/5.0

Slides: 34

Provided by: mitchel4

more less

Transcript and Presenter's Notes

Title: CIS 530 Part of Speech Tagging

1
CIS 530 - Part of Speech Tagging

Reading JM 5.1-5.5.1 (MS 3.1, 4.3.2,
10.1-10.3)
(A few slides adapted from Dan Jurafsky, Jim
Martin, Dekang Lin, Rada Mihalcea, and Bonnie
Dorr.)

2
Parts of Speech

8 (ish) traditional parts of speech
Noun, verb, adjective, preposition, adverb,
article, interjection, pronoun, conjunction, etc
This idea has been around for over 2000 years
(Dionysius Thrax of Alexandria, c. 100 B.C.)
Called parts-of-speech, lexical category, word
classes, morphological classes, lexical tags, POS
Well use POS most frequently
(This and next 4 slides from Dan Jurafsky from
slides by Jim Martin, Dekang Lin, and Bonnie
Dorr.)

3
POS examples for English

N noun chair, bandwidth, pacing
V verb study, debate, munch
ADJ adj purple, tall, ridiculous
ADV adverb unfortunately, slowly,
P preposition of, by, to
PRO pronoun I, me, mine
DET determiner the, a, that, those

4
Open vs. Closed classes

Open vs. Closed classes
Open
Nouns, Verbs, Adjectives, Adverbs.
Why open?
Closed
determiners a, an, the
pronouns she, he, I
prepositions on, under, over, near, by,

5
Open Class Words

Every known human language has nouns and verbs
Nouns people, places, things
Classes of nouns
proper vs. common
count vs. mass
Verbs actions and processes
Adjectives properties, qualities
Adverbs hodgepodge!
Unfortunately, John walked home extremely slowly
yesterday

6
Closed Class Words

Differ more from language to language than open
class words
Examples
prepositions on, under, over,
particles up, down, on, off,
determiners a, an, the,
pronouns she, who, I, ..
conjunctions and, but, or,
auxiliary verbs can, may should,
numerals one, two, three, third,

7
Prepositions from CELEX
8
Pronouns in CELEX
9
Conjunctions
10
Auxiliaries
11
NLP Task I Determining Part of Speech Tags

The Problem

12
POS Tagging Definition

The process of assigning a part-of-speech or
lexical class marker to each word in a corpus

13
POS Tagging example

WORD tag
the DET
koala N
put V
the DET
keys N
on P
the DET
table N

14
What is POS tagging good for?

Speech synthesis
How to pronounce lead?
INsult inSULT
OBject obJECT
OVERflow overFLOW
DIScount disCOUNT
CONtent conTENT
Stemming for information retrieval
Knowing a word is a N tells you it gets plurals
Can search for aardvarks get aardvark
Parsing and speech recognition and etc
Possessive pronouns (my, your, her) followed by
nouns
Personal pronouns (I, you, he) likely to be
followed by verbs

15
Equivalent Problem in Bioinformatics

Durbin et al. Biological Sequence Analysis,
Cambridge University Press.
Several applications, e.g. proteins
From primary structure ATCPLELLLD
Infer secondary structure HHHBBBBBC..

16
History From Yair Halevi (Bar-Ilan U.)
Combined Methods 98
Trigram Tagger (Kempe) 96
DeRose/Church Efficient HMM Sparse Data 95
Tree-Based Statistics (Helmut Shmid) Rule Based
96
Transformation Based Tagging (Eric Brill) Rule
Based 95
Greene and Rubin Rule Based - 70
HMM Tagging (CLAWS) 93-95
Neural Network 96
LOB Corpus Tagged
Brown Corpus Created (EN-US) 1 Million Words
Brown Corpus Tagged
British National Corpus (tagged by CLAWS)
POS Tagging separated from other NLP
LOB Corpus Created (EN-UK) 1 Million Words
Penn Treebank Corpus (WSJ, 4.5M)
17
POS Tag Sets for English Design
18
Penn Treebank Tagset
19
A Simplified Tagset for English

Tagsets for English have grown progressively
larger since the Brown Corpus until the Penn
Treebank project.

20
Rationale behind British European tag sets

To provide distinct codings for all classes of
words having distinct grammatical behaviour
Garside et al. 1987
The Lund tagset for adverb distinguishes
between
Adjunct Process, Space, Time
Wh-type Manner, Reason, Space, Time, Wh-type
S
Conjunct Appositional, Contrastive,
Inferential, Listing,
Disjunct Content, Style
Postmodifier else
Negative not
Discourse Item Appositional, Expletive,
Greeting, Hesitator,

21
One of Several Reasons for a Smaller Tagset

Many tags are unique to particular lexical items,
and can be recovered automatically if desired.

22
Syntactic Recoverability

Prepositions vs. Subordinating Conjunctions
Since the last meeting, things have changed.
Since we first learned about stochastic methods,
things have changed
We tag both as IN
Subject vs. Object Pronouns
Recoverable from Position in Parse Tree
To as Preposition vs. to as Auxiliary
Can be recovered by position in parse tree
BIG MISTAKE The parser needs this information.

23
POS Tagging - Statistical Models
24
Task I Determining Part of Speech Tags

The Problem
The Old Solution Combinatoric search.
If each of n words has k tags on average, try
the nk combinations until one works.

25
NLP Task I Determining Part of Speech Tags

The Old Solution Depth First search.
If each of n words has k tags on average, try
the nk combinations until one works.
Machine Learning Solutions Automatically learn
Part of Speech (POS) assignment.
The best techniques achieve 96-97 accuracy per
word on new materials, given large training
corpora.

26
Simple Statistical Approaches Idea 1
27
Simple Statistical Approaches Idea 2

For a string of words
W w1w2w3wn
find the string of POS tags
T t1 t2 t3 tn
which maximizes P(TW)
i.e., the probability of tag string T given that
the word string was W
i.e., that W was tagged T

28
Again, The Sparse Data Problem

A Simple, Impossible Approach to Compute P(TW)
Count up instances of the string "heat oil in a
large pot" in the training corpus, and pick the
most common tag assignment to the string..

29
A BOTEC Estimate of What We Can Estimate

What parameters can we estimate with a million
words of hand tagged training data?
Assume a uniform distribution of 5000 words and
40 part of speech tags..
Rich Models often require vast amounts of data
Good estimates of models with bad assumptions
often outperform better models which are badly
estimated

30
A Practical Statistical Tagger
31
A Practical Statistical Tagger II

But we can't accurately estimate more than tag
bigrams or so
Again, we change to a model that we CAN estimate

32
A Practical Statistical Tagger III

So, for a given string W w1w2w3wn, the tagger
needs to find the string of tags T which maximizes

33
Training and Performance

To estimate the parameters of this model, given
an annotated training corpus
Because many of these counts are small, smoothing
is necessary for best results
Such taggers typically achieve about 95-96
correct tagging, for tag sets of 40-80 tags.

Write a Comment

User Comments (0)