Minimum%20Error%20Rate%20Training%20in%20Statistical%20Machine%20Translation

About This Presentation

Title:

Minimum%20Error%20Rate%20Training%20in%20Statistical%20Machine%20Translation

Description:

Traverse the sequence of boundaries while keeping track of error count to find the optimal ? ... Optimizing error rate in training yields better results on ... – PowerPoint PPT presentation

Number of Views:131

Avg rating:3.0/5.0

Slides: 29

Provided by: annatin

Learn more at: http://faculty.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Minimum%20Error%20Rate%20Training%20in%20Statistical%20Machine%20Translation

1
Minimum Error Rate Training in Statistical
Machine Translation

By Franz Och, 2003
Presented By Anna Tinnemore, 2006

2
GOAL

To directly optimize translation quality
WHY??
No direct correlation in popular evaluation
criteria
F-Measure (parsing)
Mean Average Precision (ranked retrieval)
BLEUmulti-reference word error rate
(statistical machine translation)

Problem The difference in classification of
error between the statistical approach and the
automatic evaluation methods.
Solution (maybe) optimize model parameters
according to individual evaluation methods

4
Background

Optimal under zero-one loss function
A different metric would have a different optimal
decision rule

5
Background, continued

Problems finding suitable feature functions (M)
and parameter values(?)
MMI (max mutual info)
One unique global optimum
Algorithms guaranteed to find it
Optimal translation quality?

6
So what?

Review automatic evaluation criteria
Two training criteria that might help
New training algorithm for optimizing an
unsmoothed error count
Ochs approach
Evaluation of training criteria

7
Translation quality metrics

mWER (multi-reference word error rate)
Compute edit distance to closest ref. transl.
mPER (multi-reference position independent
error rate)
bag of words, edit distance
BLEU
The mean of the precision of n-grams
NIST
Weighted precision of n-grams

8
Training

Minimize error rate
Problems
argmax operation (6)- no global optimum
Many local optima

9
Smoothed Error Count

This is easier to deal with than last function,
but still tricky
Performance doesnt change much with smoothing

10
(No Transcript)
11
Unsmoothed Error Count

Standard Powells algorithm grid-based line
optimization
Fine-grained grid slow
Large grid miss optimal solution
NEW Log-linear model
Guaranteed to find the optimal solution
Much faster and more stable

12
New Algorithm

Each candidate translation in C corresponds to a
line
(t and m are constants)
? Piecewise linear

13
Algorithm the nitty-gritty

For every f
Compute ordered sequence of linear intervals that
make up f(?f)
Compute each change in error count between
intervals
Merge all sequences ?f and ?Ef
Traverse the sequence of boundaries while keeping
track of error count to find the optimal ?

14
Baseline

Same as alignment template approach
This model, log-linear, had M 8 features
Extract n-best candidate translations from all
possible translations
Wait a minute . . .

15
N-best???

Overfitting? Unseen data?
First, compute n-best list using made-up
parameter values. Use this list to train model
for new parameters.
Second, use new parameters, do new search, make
new n-best list, append to old n-best list
Third, use new list to train model for even
better parameters

Keep going until the n-best list doesnt change
all possible translations are in list
Each iteration generates approx. 200 additional
translations
The algorithm only takes 5-7 iterations to
converge

17
Additional Sneaky Stuff

Problems with MMI (maximum mutual info)
Reference sentences have to be part of n-best
list
Solution
Fake reference sentences, of course
Select from the n-best list, those sentences with
the fewest word errors with respect to the REAL
references, and call these pseudo-references

18
Experiment

2002 TIDES Chinese-English small data track task
News text from Chinese to English
Note no rule-based components used to translate
numbers, dates, or names

19
Development Corpus Results
20
Test Corpus Results
21
Conclusions

Alternative training criteria which directly
relate to quality of translation
Unsmoothed and smoothed error count on
development corpus
Optimizing error rate in training yields better
results on unseen test data
Maybe true translation quality is also
increased
We dont know because the evaluation metrics need
help

22
Future Questions

How many parameters can be reliably estimated
using differing criteria on development corpuses
(corpi) of various sizes?
Does the criteria used make a difference?
Which error rate criteria (smooth/unsmooth)
should be optimized in training?

23
Boasting

This approach applies to any evaluation technique
If the evaluation methods ever get better, this
algorithm will yield correspondingly better
results

24
Side-stepping

Its possible that this algorithm could be used
to overfit the evaluation method, giving
falsely inflated scores
Its not our problem. The developers of the
evaluation methods should develop so this cant
happen

25
. . . And Around The World

This algorithm has a place wherever evaluation
methods are used
It could yield improvements in these other areas
as well

26
Questions, observations, accolades . . .
27
My Observations

Improvements do not seem significant
This exposes a problem in the evaluation metrics,
but does nothing to solve it
Seems like a good idea, but has many unanswered
questions regarding optimal implementation

Minimum%20Error%20Rate%20Training%20in%20Statistical%20Machine%20Translation - PowerPoint PPT Presentation

Minimum%20Error%20Rate%20Training%20in%20Statistical%20Machine%20Translation

Traverse the sequence of boundaries while keeping track of error count to find the optimal ? ... Optimizing error rate in training yields better results on ... – PowerPoint PPT presentation