A tutorial of Maximum Entropy Approach for NLP - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

A tutorial of Maximum Entropy Approach for NLP

Description:

9/16/09. 1. A tutorial of Maximum Entropy Approach for NLP ... 9/16/09. 13 ... 9/16/09. 18. Test the Model. The test corpus is tagged one sentence at a time ... – PowerPoint PPT presentation

Number of Views:374
Avg rating:3.0/5.0
Slides: 24
Provided by: AMA9153
Category:

less

Transcript and Presenter's Notes

Title: A tutorial of Maximum Entropy Approach for NLP


1
A tutorial of Maximum Entropy Approach for NLP
  • Yang Yongsheng
  • 24.Aug.2001

2
Outline
  • Motivation Example
  • Whats Maximum Entropy(ME)
  • ME for POS tagger
  • How to train a ME model
  • Testing the ME model
  • ME approach for other NLP applications

3
Motivation Example
  • For a POS tagger
  • go ? VB,VBP or NN
  • We have a model p with constraint
  • p(go,VB)p(go,VBP)p(go,NN)1
  • If we have one more constraint observed from
    training data
  • P(go,VB)1/2
  • So what about p(go,VBP) and p(go,NN)?

4
Motivation Example(cont.)
  • There are infinite num of possible cases satisfy
    the above constraints.
  • The most uniform model to satisfy the above
    constraints is
  • QuestionCan we always find a most uniform model
    subject to a set of constraints?

5
Two problems
  • What exactly is meant by uniform and how can
    one measure the uniformity of a model?
  • How does one find the most uniform model subject
    to a set of constraints like those we have
    described

6
Conditional Entropy
  • A mathematical measure of the uniformity of a
    conditional distribution p(yx) is provided by
    the conditional entropy

7
Maximum Entropy (ME)
  • To select a model from a set C of allowed
    probability distributions, choose the model p
    with maximum entropy H(P)
  • Principle of ME model all that is known and
    assume nothing about that which is unknown

8
Train a Model for a ME tagger
9
Context and Context Predicate
  • Context (c) and Context Predicate(cp)
  • A history for a prediction
  • e.g.ct-2,t-2t-1,w0,w-2,w-1,w1,w2
  • context predicate(cp) denotes an element of
    context(c)
  • Example a context of word board
  • ct-1DT, t-2t-1VBZ,DT, w0board,
    w-1the, w-2increases, w1to, w2seven
  • cp1t-1DT, cp2t-2t-1VBZ,DT,

10
Event
  • Event (c,t)
  • In ME POS tagger, an event is generated from a
    word in the training data
  • A eventa prediction(t) a context (c)
  • Example Event of word board with tag NN
  • Event(c,t)predNN and t-1DT,
    t-2t-1VBZ,DT, w0board, w-1the,
    w-2increases, w1to, w2seven

11
Feature Candidate and Feature Set
  • Feature Candidate (cp,t)
  • A feature candidate(cp,t)an context
    predicate(cp)a prediction(t)
  • e.g. (cp,t)t-2DT and predNN
  • Feature Selection and Feature Set(F)
  • Feature Set(F)Those feature candidates occur
    more than N(10) times in training data

12
Feature Function
  • Feature function
  • A binary-value function
  • Given a context(c) and prediction(t),if a feature
    candidate(cp,t) can be found in the feature
    set(F), return 1, otherwise return 0

13
Probability Model for MaxEnt
  • The probability model is defined over C x T,
    where C is the context set and T is the tag set
  • Where ?, ?1,, ?k are positive model
    parameters, f1,,fk is the feature set, each
    parameter ?j corresponds to a feature fj

14
Compute Parameters for Model
  • With the contextual information(c) and
    events(c,t), we can generate a set of features
    (F) by doing a feature selection
  • The goal of training
  • Train a set of parameters, so that, each
    parameter(?j, 1ltjltk) corresponding to a only
    feature(fj, 1ltjltk) in the feature set(F).

15
GIS Algorithm
  • GIS(Generalized Iterative Scaling) algorithm is
    used to find values for the parameters of maximum
    entropy p
  • GIS procedure requires that features sum to a
    constant for any event(c,t) in training data

16
GIS Algorithm(Cont.)
  • Since the constant doesnt exist in our training
    data, we set C
  • And add a correction feature for any event(c,t)
    in the training set

17
GIS Algorithm(Cont.)
  • The following procedure will converge to p

18
Test the Model
  • The test corpus is tagged one sentence at a time
  • The testing procedure requires a search to list
    the candidate tag sequences for the sentence
  • The tag sequence with the highest probability is
    chosen as the answer

19
Search Algorithm
  • The search algorithm is a top K breadth first
    search(BFS)
  • Given a sentence w1,,wn, a tag sequence
    candidate t1,,tn has conditional probability
    (ref)

20
Data flow of ME tagger
21
An example of search algorithm
  • A search procedure with words5,K3

22
Applying ME model to other NLP applications
  • Any classification problem with contextual
    information can use ME model to solute
  • Most NLP problem can be treated as classification
    problem
  • Chinese Segmentation, POS tagging, Phrase
    Chunker, Parsing, Machine Translation

23
The key for applying ME
  • Define a set of context templates
  • Do a good feature selection
  • Thats it. We can use a single ME toolkit to do
    the training and classification jobs.
Write a Comment
User Comments (0)
About PowerShow.com