Maximum Entropy Model - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Maximum Entropy Model

Description:

Examples of the application of ME in NLP. An overview of ... in a sequence of words uttered by a speaker: ... Nunquam ponenda est pluralitas sine necesitate ' ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 43
Provided by: che100
Category:
Tags: entropy | maximum | model | sine

less

Transcript and Presenter's Notes

Title: Maximum Entropy Model


1
Maximum Entropy Model Its Application in NLP
  • Luo Weihua
  • Software Division, ICT
  • 2002-12-26

2
Main content
  • An overview of ME philosophy
  • Mathematical structure of MEM
  • Algorithm of estimation of parameters
  • Method for feature selection
  • Examples of the application of ME in NLP

3
An overview of ME philosophy
  • Essential tasks of statistical modelling
  • ?Feature selection
  • in a sequence of words uttered by a speaker
  • informative statisticaverage number of words in
    sentencespercentage of nounsthe number of times
    of each different words used
  • ?Model selection
  • predicts the future output of the process

4
An overview of ME philosophy
  • A simple example-translation decision
  • Constraints
  • p(dans) p(en) p(à) p(au cours de)
    p(pendant) 1
  • p(dans) 1 ?
  • p(dans) 0.5 and p(en) 0.5?
  • p(dans) p(en) p(à) p(au cours de)
    p(pendant) 1/5?

in
dans
en
à
au cours de
pendant
5
An overview of ME philosophy
  • Update 1
  • p(dans) p(en) 3/10
  • p(dans) p(en) p(à) p(au cours de)
    p(pendant) 1
  • ? p(dans) p(en) 3/20
  • p(à) p(au cours de) p(pendant) 7/30
  • Update 2
  • p(dans) p(à) 1/2
  • p(dans) p(en) 3/10
  • p(dans) p(en) p(à) p(au cours de)
    p(pendant) 1

6
An overview of ME philosophy
  • Occams Razor
  • Nunquam ponenda est pluralitas sine necesitate
  • Laplaces Principle of Insufficient Reason when
    one has no information to distinguish between the
    probability of two events the best strategy is to
    consider them equally likely
  • E.T. Jayness Principle of Maximum Entropy

7
Maximum Entropy Modelling
  • Contextual information and its representation
  • Feature Function

8
Maximum Entropy Modelling
  • Constraint equation/constraint
  • Simplification
  • p(y,x)p(yx)(x)
  • ?

9
Maximum Entropy Modelling
  • Questionthe collection is vast?
  • Solutionseparate the wheat from chaff
  • Note Feature VS. Constraint ?
  • feature a binary valued function of (y,x)
  • constraint an equation that imposes quality
    between the expected value of the feature
    function in the model and its expected value in
    the data

10
The maximum entropy principle
  • Given
  • Task devise a model based on the constraints

11
The maximum entropy principle
  • What does uniform mean?
  • ? Shannon Entropy
  • Principle of ME
  • To select a model p from a set C of allowed
    probability distributions, choose the
    distribution pME C with maximum entropy H(p)
  • pME argmaxp?CH(p)

12
Parametric Form
  • Preliminary
  • ?Definition 1
  • ?Definition 2
  • Lagrange Multiplier Analysis
  • Suppose ? maximize ?(?), then p? (yx) is the
    model in C that maximizes H(p)

13
Parametric Form
  • Relation to Maximum Likelihood
  • Re-interpretation in Log-Likehood
  • The model p?C with maximum entropy is the the
    model in the parametric family p?(yx) that
    maximizes the likelihood of the sample

14
Parameters Estimation
  • Improved Iterative Scaling Algorithm
  • ?Start with ?i1 for all i?(1n)
  • ?Repeat step 3 until all ?i values have
    converged to within a preset threshold.
  • ?For each i?(1,2,n) do the following
  • (a) Let ? ?i be the solution to the following
    equation
  • where
  • (b)Update the value of ?i according to

15
Feature Selection
  • A universe of constraints for translation of in
  • An example
  • in ? rhinocéros

16
Feature Selection
  • Two objectives in feature selection
  • ?sufficiently rich to capture as much
    information about the random process as possible
  • ?robust enough to generalize to new data
  • Two categories of constraints
  • ?likely occur in other samples of output from
    the same process
  • ?just a random fluctuation of the data

17
Feature Selection
  • An intuitionistic solutionselect the contraints
    corresponding to features observed many times in
    empirical data
  • Its danger(!)
  • Epf11000/2000 Epf11/2
  • Epf1?21001/2000

18
Feature Selection
  • Incremental feature selection algorithm
  • ?Begin with S empty
  • ?Compute the maximum entropy model pS(yx) that
    is consistent with all the constraints in S
  • ?For each constraint f not in S, evalute the
    gain G(f) of adding f to S as followings
  • (a) Compute the maximum entropy model pS?f that
    is consistent with f and the constranits already
    in S.
  • (b) Let G(f) be the increase in conditional
    log-likelihood G(f)L(pS?f )-L(pS)

19
Feature Selection
  • ?Select the constraint f with the largest gain.
    If this gain is statistically significant, then
    add f to S and go to step 2. Otherwise, stop
  • Efficiently Computing Gains
  • ?Bottleneck 3(a) exorbitant albeit exact
    computation
  • ?Restrictionall values change with the
    imposition of a new constraint
  • ?Assumptionaddition of a new feature does not
    affect the optimal values for the parameters
    associated with the other features

20
Feature Selection
  • Approximation
  • where
  • ?maxargmaxL(pS?f )
  • ?

21
Case studies
  • Word Translation
  • Segmentation
  • Word Reordering

22
Word Translation
  • Background
  • Consider
  • Viterbi alignment

EF?unwieldy?
23
Word Translation
  • Estimation
  • p(ne), the probability that the English word e
    generates n French words
  • p(fe), the probability that the English word e
    generates the French word f
  • distortion probabilities, d(F,AE) which
    model how the words are reordered in the French
    sentence after they are generated

24
Word Translation
  • Brown Model EM algorithm

25
Word Translation
  • major shortcoming of Brown Model probabilities
    are context-free

26
Word Translation
  • Modelling p(fe) with MEM
  • Improvement
  • Form for the features

27
Word Translation
  • feature templates
  • Training events

28
Word Translation
  • Consider

Template 1 feature
29
Word Translation
  • Construction of a ME model pin(fx)
  • Sample 10000 instances in Hansard corpus

30
Word Translation
  • ME model to predict French Translation of to run

31
Word Translation
  • Improved French-to-English translations resulting

32
Segmentation
  • Definition
  • rifta position i in a French sentence such that
    for all jlti, aj?ai and for all jgti, aj?ai
  • Task
  • ri 1, if and only if a rift belongs at fi
  • Modelling p(rix) denoted by p(riftix)

33
Segmentation
  • conditioning information x
  • The problem of overfitting
  • ?Phenomenonthe model p fits itself to quirks in
    the empirical data over a point
  • ?Solution Cross-Validation
  • empirical data training dataheld-out data

34
Segmentation
35
Segmentation
36
Word Reordering
  • noun de noun phrases and their English equivalents

37
Word Reordering
  • (y,x) pair
  • Sets used in noun de noun word reordering

38
Word Reordering
  • Examples of features

39
Word Reordering
40
Word Reordering
  • noun de noun model performance simple approach
    vs maximum entropy

41
Word Reordering
42
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com