SIMS 290-2: Applied Natural Language Processing

About This Presentation

Title:

SIMS 290-2: Applied Natural Language Processing

Description:

Title: SIMS 290-2: Applied Natural Language Processing: Marti Hearst Last modified by: hearst Created Date: 7/19/2001 7:37:29 AM Document presentation format – PowerPoint PPT presentation

Number of Views:121

Avg rating:3.0/5.0

Slides: 33

Provided by: coursesIs1

Learn more at: https://courses.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: SIMS 290-2: Applied Natural Language Processing

1
SIMS 290-2 Applied Natural Language Processing
Marti Hearst Sept 22, 2004
2
Today

Cascaded Chunking
Example of Using Chunking Word Associations
Evaluating Chunking
Going to the next level Parsing

3
Cascaded Chunking

Goal create chunks that include other chunks
Examples
PP consists of preposition NP
VP consists of verb followed by PPs or NPs
How to make it work in NLTK
The tutorial is a bit confusing, I attempt to
clarify

4
Creating Cascaded Chunkers

Start with a sentence token
A list of words with parts of speech assigned
Create a fresh one or use one from a corpus

5
Creating Cascaded Chunkers

Create a set of chunk parsers
One for each chunk type
Each one takes as input some kind of list of
tokens, and produced as output a NEW list of
tokens
You can decide what this new list is called
Examples NP-CHUNK, PP-CHUNK, VP-CHUNK
You can also decide what to name each occurrence
of the chunk type, as it is assigned to a subset
of tokens
Examples NP, VP, PP
How to match higher-level tags?
It just seems to match their string description
So best be certain that their name does not
overlap with POS tags too

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
Lets do some text analysis

Lets try this on more complex sentences
First, read in part of a corpus
Then, count how often each word occurs with each
POS
Determine some common verbs, choose one
Make a list of sentences containing that verb
Test out the chunker on them examine further

10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Why didnt this parse work?
14
Why didnt this parse work?
15
Why didnt this parse work?
16
Why didnt this parse work?
17
Corpus Analysis for Discovery ofWord Associations

Classic paper by Church Hanks showed how to use
a corpus and a shallow parser to find interesting
dependencies between words
Word Association Norms, Mutual Information, and
Lexicography, Computational Linguistics, 16(1),
1991
http//www.research.att.com/kwc/publications.html
Some cognitive evidence
Word association norms which word to people say
most often after hearing another word
Given doctor nurse, sick, health, medicine,
hospital
People respond more quickly to a word if theyve
seen an associated word
E.g., if you show bread theyre faster at
recognizing butter than nurse (vs a nonsense
string)

18
Corpus Analysis for Discovery ofWord Associations

Idea use a corpus to estimate word associations
Association ratio log ( P(x,y) / P(x)P(y) )
The probability of seeing x followed by y vs. the
probably of seeing x anywhere times the
probability of seeing y anywhere
P(x) is how often x appears in the corpus
P(x,y) is how often y follows x within w words
Interesting associations with doctor
X honorary Y doctor
X doctors Y dentists
X doctors Y nurses
X doctors Y treating
X examined Ydoctor
X doctors Y treat

19
Corpus Analysis for Discovery ofWord Associations

Now lets make use of syntactic information.
Look at which words and syntactic forms follow a
given verb, to see what kinds of arguments it
takes
Compute triples of subject-verb-object
Example nouns that appear as the object of the
verb usage of drink
martinis, cup_water, champagne, beverage,
cup_coffee, cognac, beer, cup, coffee, toast,
alcohol
What can we note about many of these words?
Example verbs that have telephone in their
object
sit_by, disconnect, answer, hang_up, tap,
pick_up, return, be_by, spot, repeat, place,
receive, install, be_on

20
Corpus Analysis for Discovery ofWord Associations

The approach has become standard
Entire collections available
Dekang Lins Dependency Database
Given a word, retrieve words that had dependency
relationship with the input word
Dependency-based Word Similarity
Given a word, retrieve the words that are most
similar to it, based on dependencies
http//www.cs.ualberta.ca/lindek/demos.htm

21
Example Dependency Database sell
22
Example Dependency-based Similarity sell
23
Homework Assignment

Choose a verb of interest
Analyze the context in which the verb appears
Can use any corpus you like
Can train a tagger and run it on some fresh text
Example What kinds of arguments does it take?
Improve on my chunking rules to get better
characterizations

24
Evaluating the Chunker

Why not just use accuracy?
Accuracy correct/total number
Definitions
Total number of chunks in gold standard
Guessed set of chunks that were labeled
Correct of the guessed, which were correct
Missed how many correct chunks not guessed?
Precision correct / guessed
Recall correct / total
F-measure 2 (PrecRecall) / (Prec Recall)

25
Example

Assume the following numbers
Total 100
Guessed 120
Correct 80
Missed 20
Precision 80 / 120 0.67
Recall 80 / 100 0.80
F-measure 2 (.67.80) / (.67 .80) 0.69

26
Evaluating in NLTK

We have some already chunked text from the
Treebank
The code below uses the existing parse to compare
against, and to generate Tokens of type word/tag
to parse with our own chunker.
Have to add location information so the
evaluation code can compare which words have been
assigned which labels

27
How to get better accuracy?