A tutorial of Maximum Entropy Approach for NLP

About This Presentation

Title:

A tutorial of Maximum Entropy Approach for NLP

Description:

9/16/09. 1. A tutorial of Maximum Entropy Approach for NLP ... 9/16/09. 13 ... 9/16/09. 18. Test the Model. The test corpus is tagged one sentence at a time ... – PowerPoint PPT presentation

Number of Views:374

Avg rating:3.0/5.0

Slides: 24

Provided by: AMA9153

Category:

more less

Transcript and Presenter's Notes

Title: A tutorial of Maximum Entropy Approach for NLP

1
A tutorial of Maximum Entropy Approach for NLP

Yang Yongsheng
24.Aug.2001

2
Outline

Motivation Example
Whats Maximum Entropy(ME)
ME for POS tagger
How to train a ME model
Testing the ME model
ME approach for other NLP applications

3
Motivation Example

For a POS tagger
go ? VB,VBP or NN
We have a model p with constraint
p(go,VB)p(go,VBP)p(go,NN)1
If we have one more constraint observed from
training data
P(go,VB)1/2
So what about p(go,VBP) and p(go,NN)?

4
Motivation Example(cont.)

There are infinite num of possible cases satisfy
the above constraints.
The most uniform model to satisfy the above
constraints is

QuestionCan we always find a most uniform model
subject to a set of constraints?

5
Two problems

What exactly is meant by uniform and how can
one measure the uniformity of a model?
How does one find the most uniform model subject
to a set of constraints like those we have
described

6
Conditional Entropy

A mathematical measure of the uniformity of a
conditional distribution p(yx) is provided by
the conditional entropy

7
Maximum Entropy (ME)

To select a model from a set C of allowed
probability distributions, choose the model p
with maximum entropy H(P)

Principle of ME model all that is known and
assume nothing about that which is unknown

8
Train a Model for a ME tagger
9
Context and Context Predicate

Context (c) and Context Predicate(cp)
A history for a prediction
e.g.ct-2,t-2t-1,w0,w-2,w-1,w1,w2
context predicate(cp) denotes an element of
context(c)
Example a context of word board
ct-1DT, t-2t-1VBZ,DT, w0board,
w-1the, w-2increases, w1to, w2seven
cp1t-1DT, cp2t-2t-1VBZ,DT,

10
Event

Event (c,t)
In ME POS tagger, an event is generated from a
word in the training data
A eventa prediction(t) a context (c)
Example Event of word board with tag NN
Event(c,t)predNN and t-1DT,
t-2t-1VBZ,DT, w0board, w-1the,
w-2increases, w1to, w2seven

11
Feature Candidate and Feature Set

Feature Candidate (cp,t)
A feature candidate(cp,t)an context
predicate(cp)a prediction(t)
e.g. (cp,t)t-2DT and predNN
Feature Selection and Feature Set(F)
Feature Set(F)Those feature candidates occur
more than N(10) times in training data

12
Feature Function

Feature function
A binary-value function
Given a context(c) and prediction(t),if a feature
candidate(cp,t) can be found in the feature
set(F), return 1, otherwise return 0

13
Probability Model for MaxEnt

The probability model is defined over C x T,
where C is the context set and T is the tag set

Where ?, ?1,, ?k are positive model
parameters, f1,,fk is the feature set, each
parameter ?j corresponds to a feature fj

14
Compute Parameters for Model

With the contextual information(c) and
events(c,t), we can generate a set of features
(F) by doing a feature selection
The goal of training
Train a set of parameters, so that, each
parameter(?j, 1ltjltk) corresponding to a only
feature(fj, 1ltjltk) in the feature set(F).

15
GIS Algorithm

GIS(Generalized Iterative Scaling) algorithm is
used to find values for the parameters of maximum
entropy p
GIS procedure requires that features sum to a
constant for any event(c,t) in training data

16
GIS Algorithm(Cont.)

Since the constant doesnt exist in our training
data, we set C

And add a correction feature for any event(c,t)
in the training set

17
GIS Algorithm(Cont.)

The following procedure will converge to p

18
Test the Model

The test corpus is tagged one sentence at a time
The testing procedure requires a search to list
the candidate tag sequences for the sentence
The tag sequence with the highest probability is
chosen as the answer

19
Search Algorithm

The search algorithm is a top K breadth first
search(BFS)
Given a sentence w1,,wn, a tag sequence
candidate t1,,tn has conditional probability
(ref)

20
Data flow of ME tagger
21
An example of search algorithm

A search procedure with words5,K3

22
Applying ME model to other NLP applications

Any classification problem with contextual
information can use ME model to solute
Most NLP problem can be treated as classification
problem
Chinese Segmentation, POS tagging, Phrase
Chunker, Parsing, Machine Translation

23
The key for applying ME

Define a set of context templates
Do a good feature selection
Thats it. We can use a single ME toolkit to do
the training and classification jobs.

Write a Comment

User Comments (0)

About PowerShow.com

A tutorial of Maximum Entropy Approach for NLP - PowerPoint PPT Presentation

A tutorial of Maximum Entropy Approach for NLP

9/16/09. 1. A tutorial of Maximum Entropy Approach for NLP ... 9/16/09. 13 ... 9/16/09. 18. Test the Model. The test corpus is tagged one sentence at a time ... – PowerPoint PPT presentation