CSA2050: Natural Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

CSA2050: Natural Language Processing

Description:

like rule-based tagging: rules are used to specify tags in a certain environment; ... Prosodic chunking: [I saw] [a tall man] [in the park]. Question answering: ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 37
Provided by: michael307
Category:

less

Transcript and Presenter's Notes

Title: CSA2050: Natural Language Processing


1
CSA2050 Natural Language Processing
  • Tagging 3 and Chunking
  • Transformation Based Tagging
  • Chunking

2
Tagging 3 and Chunking Lecture
  • Slides based on Mike Rosner and Marti Hearst
    notes
  • Additions from NLTK tutorials

3
3 Approaches to Tagging
  1. Rule-Based Tagger ENGTWOL Tagger(Voutilainen
    1995)
  2. Stochastic Tagger HMM-based Tagger
  3. Transformation-Based Tagger Brill Tagger(Brill
    1995)

4
Transformation-Based Tagging
  • A combination of rule-based and stochastic
    tagging methodologies
  • like rule-based tagging rules are used to
    specify tags in a certain environment
  • like stochastic tagging machine learning is
    used.
  • Transformation-Based Learning (TBL)

5
Transformation Based Error Driven Learning
unannotated text
initial state
annotated text
TRUTH
learner
transformation rules
diagram after Brill (1996)
6
TBL Requirements
  • Initial State Annotator
  • List of allowable transformations
  • Scoring function
  • Search strategy

7
Initial State Annotation
  • Input
  • Corpus
  • Dictionary
  • Frequency counts for each entry
  • Output
  • Corpus tagged with most frequent tags

8
TBL Requirements
  • Initial State Annotator
  • List of allowable transformations
  • Scoring function
  • Search strategy

9
Transformations
  • Each transformation comprises
  • A source tag
  • A target tag
  • A triggering environment
  • Example
  • NN
  • VB
  • Previous tag is TO

10
More Examples
Source tag Target Tag Triggering
Environment NN VB
previous tag is TOVBP VB
one of the three previous
tags is MD JJR RBR
next tag is JJ VBP
VB one of the two previous
words is nt
11
Allowable transforms based on fixed schemas
12
Set of Possible Transformations
  • The set of possible transformations is
    enumerated by allowing
  • every possible tag or word
  • in every possible slot
  • in every possible schema
  • This set can get quite large

13
TBL Requirements
  • Initial State Annotator
  • List of allowable transformations
  • Scoring function
  • Search strategy

14
Scoring Function
  • For a given tagging state of the corpusFor a
    given transformation
  • For every word position in the corpus
  • If the rule applies and yields a correct tag,
    increment score by 1
  • If the rule applies and yields an incorrect tag,
    decrement score by 1

15
TBL Requirements
  • Initial State Annotator
  • List of allowable transformations
  • Scoring function
  • Search strategy

16
The Basic Algorithm
  • Label every word with its most likely tag
  • Repeat the followingwhile improvement gt
    threshold
  • Examine every possible transformation, selecting
    the one that results in the most improved tagging
  • Retag the data according to this rule
  • Append this rule to output list
  • Return output list of transformations

17
TBL Remarks
  • Execution Speed TBL tagger is slower than HMM
    approach.
  • Learning Speed is slow Brills implementation
    over a day (600k tokens)
  • BUT
  • Learns small number of simple, non-stochastic
    rules
  • Can be made to work faster with Finite State
    Transducers

18
Tagging Unknown Words
  • New words added to (newspaper) language 20 per
    month
  • Plus many proper names
  • Increases error rates by 1-2
  • Methods
  • Assume the unknowns are nouns.
  • Assume the unknowns have a probability
    distribution similar to words occurring once in
    the training set.
  • Use morphological information, e.g. words ending
    with ed tend to be tagged VBN.

19
Evaluation
  • The result is compared with a manually coded
    Gold Standard
  • Typically accuracy reaches 95-97
  • This may be compared with the result for a
    baseline tagger (one that uses no context).
  • Important 100 accuracy is impossible even for
    human annotators.

20
A word of caution
  • 95 accuracy every 20th token wrong
  • 96 accuracy every 25th token wrong
  • an improvement of 25 from 95 to 96 ???
  • 97 accuracy every 33th token wrong
  • 98 accuracy every 50th token wrong

21
How much training data is needed?
  • When working with the STTS (50 tags) we observed
  • a strong increase in accuracy when testing on
    10000, 20000, , 50000 tokens,
  • a slight increase in accuracy when testing on up
    to 100000 tokens,
  • hardly any increase thereafter.

22
Summary
  • Tagging decisions are conditioned on a wider
    range of events that HMM models mentioned
    earlier. For example, left and right context can
    be used simultaneously.
  • Learning and tagging are simple, intuitive and
    understandable.
  • Transformation-based learning has also been
    applied to sentence parsing.

23
The Three Approaches Compared
  • Rule Based
  • Hand crafted rules
  • It takes too long to come up with good rules
  • Portability problems
  • Stochastic
  • Find sequence with highest probability (Viterbi)
  • Result of training not accessible to humans
  • Large storage needs for intermediate results
    whilst training
  • Transformation
  • Rules are learned
  • Small number of rules
  • Rules can be inspected and modified by humans

24
Shallow/Chunk Parsing
  • Goal divide a sentence into a sequence of
    chunks.
  • Chunks are non-overlapping regions of a text
  • I saw a tall man in the park.
  • Chunks are non-recursive
  • A chunk can not contain other chunks
  • Chunks are non-exhaustive
  • Not all words are included in chunks

25
Chunk Parsing Examples
  • Noun-phrase chunking
  • I saw a tall man in the park.
  • Verb-phrase chunking
  • The man who was in the park saw me.
  • Prosodic chunking
  • I saw a tall man in the park.
  • Question answering
  • What Spanish explorer discovered the
    Mississippi River?

26
Motivation
  • Locating information
  • e.g., text retrieval
  • Index a document collection on its noun phrases
  • Ignoring information
  • Generalize in order to study higher-level
    patterns
  • e.g. phrases involving gave in Penn treebank
  • gave NP gave up NP in NP gave NP up gave NP
    help gave NP to NP
  • Sometimes a full parse has too much structure
  • Too nested
  • Chunks usually are not recursive

27
Representation
  • BIO (or IOB)Trees

28
Comparison with Full Parsing
  • Parsing is usually an intermediate stage
  • Builds structures that are used by later stages
    of processing
  • Full parsing is a sufficient but not necessary
    intermediate stage for many NLP tasks
  • Parsing often provides more information than we
    need
  • Shallow parsing is an easier problem
  • Less word-order flexibility within chunks than
    between chunks
  • More locality
  • Fewer long-range dependencies
  • Less context-dependence
  • Less ambiguity

29
Chunks and Constituency
  • Constituents a tall man in the park.
  • Chunks a tall man in the park.
  • A constituent is part of some higher unit in the
    hierarchical syntactic parse
  • Chunks are not constituents
  • Constituents are recursive
  • But, chunks are typically subsequences of
    constituents
  • Chunks do not cross major constituent boundaries

30
Chunk Parsing in NLTK
  • Chunk parsers usually ignore lexical content
  • Only need to look at part-of-speech tags
  • Possible steps in chunk parsing
  • Chunking, unchunking
  • Chinking
  • Merging, splitting
  • Evaluation
  • Compare to a Baseline
  • Evaluate in terms of
  • Precision, Recall, F-Measure
  • Missed (False Negative), Incorrect (False
    Positive)

31
Chunk Parsing in NLTK
  • Define a regular expression that matches the
    sequences of tags in a chunk
  • A simple noun phrase chunk regexp
  • (Note that ltNN.gt matches any tag starting with
    NN)
  • ltDTgt? ltJJgt ltNN.?gt
  • Chunk all matching subsequences
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN
  • If matching subsequences overlap, first 1 gets
    priority

32
Unchunking
  • Remove any chunk with a given pattern
  • e.g., unChunkRule(ltNNDTgt, Unchunk NNDT)
  • Combine with Chunk Rule ltNNDTJJgt
  • Chunk all matching subsequences
  • Input
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN
  • Apply chunk rule
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN
  • Apply unchunk rule
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN

33
Chinking
  • A chink is a subsequence of the text that is not
    a chunk.
  • Define a regular expression that matches the
    sequences of tags in a chink
  • A simple chink regexp for finding NP chunks
  • (ltVB.?gtltINgt)
  • First apply chunk rule to chunk everything
  • Input
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN
  • ChunkRule('lt.gt', Chunk everything)
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN
  • Apply Chink rule above
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN

34
Merging
  • Combine adjacent chunks into a single chunk
  • Define a regular expression that matches the
    sequences of tags on both sides of the point to
    be merged
  • Example
  • Merge a chunk ending in JJ with a chunk starting
    with NN
  • MergeRule(ltJJgt, ltNNgt, Merge adjs and
    nouns)
  • the/DT little/JJ cat/NN sat/VBD on/IN
    the/DT mat/NN
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN
  • Splitting is the opposite of merging

35
Merging
  • Combine adjacent chunks into a single chunk
  • Define a regular expression that matches the
    sequences of tags on both sides of the point to
    be merged
  • Example
  • Merge a chunk ending in JJ with a chunk starting
    with NN
  • MergeRule(ltJJgt, ltNNgt, Merge adjs and
    nouns)
  • the/DT little/JJ cat/NN sat/VBD on/IN
    the/DT mat/NN
  • the/DT little/JJ cat/NN sat/VBD on/IN the/DT
    mat/NN
  • Splitting is the opposite of merging

36
Next Sessions
  • NLTK Exercises
Write a Comment
User Comments (0)
About PowerShow.com