Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation - PowerPoint PPT Presentation

About This Presentation

Title:

Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation

Description:

is there a new - age music concert within the next few days ? ? ?? ?? ?? ?? ???? ?? ? ... ?? ?? t sense='new - age music' ???? /t ?? ? Extracted PSD training ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 52

Provided by: deka2

Learn more at: http://www.lrec-conf.org

Category:

more less

Transcript and Presenter's Notes

Title: Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation

1
Evaluation of Context-DependentPhrasal
Translation Lexiconsfor Statistical Machine
Translation

Marine CARPUAT and Dekai WU
Human Language Technology Center
Department of Computer Science and Engineering
HKUST

2
New resources for SMT context-dependent phrasal
translation lexicons

A key new resource for Phrase Sense
Disambiguation (PSD) for SMT Carpuat Wu 2007
Entirely automatically acquired
Consistently improves 8 translation quality
metrics EMNLP 2007
Fully phrasal just like conventional SMT lexicons
TMI 2007
But much larger than conventional lexicons!
Why is this extremely large resource necessary?
Is its contribution observably useful?
Is it used by the SMT system differently than
conventional SMT lexicons?

3
Our finding context-dependent lexicons directly
improve lexical choice in SMT

Exploit the available vocabulary better for
phrasal segmentation
more and longer phrases are used in decoding
consistent with other findings TMI2007
fully phrasal context-dependent lexicons yield
more reliable improvements than single word
lexicons
Select better translation candidates
even after compensating for differences in
phrasal segmentation
improvements in BLEU, TER, METEOR, etc. really
reflect improved lexical choice

4
Problems with current SMT systems

Input ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Ref. Prof. Zhang gave a lecture on China and
India to a packed audience.
SMT1 Prof. Zhang to a group of people on China
and India class.
SMT2 Prof. Zhang and a group of people go into
class on China and India.

Ref. Prof. Zhang gave a lecture on China and
India to a packed audience.
SMT2 Prof. Zhang and a group of people go into
class on China and India.
5
Translation lexicons in SMT are independent of
context!
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Prof. Zhang gave a lecture on China and India
to a packed audience.
? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ?
Everyone is welcome to attend class tomorrow, on
the topic China and India.
6
Phrasal lexicons in SMT are independent of
context too!
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Prof. Zhang gave a lecture on China and India
to a packed audience.
? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ?
Everyone is welcome to attend class tomorrow, on
the topic China and India.
7
Current SMT systems are hurt byvery weak models
of context

Translation disambiguation models are too
simplistic
Phrasal lexicon translation probabilities are
static, so not sensitive to context
Context in input language is only modeled weakly
by phrase segments
Context in output language is only modeled weakly
by n-grams
Error analysis reveals many lexical choice errors
Yet, few attempts at directly modeling context

8
Todays SMT systems ignore the contextual
features that would help lexical choice

No full sentential context
merely local n-gram context
No POS information
merely surface form of words
No structural information
merely word n-gram identities

9
Correct translation disambiguation requires rich
context features
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Prof. Zhang gave a lecture on China and India
to a packed audience.
? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ?
Everyone is welcome to attend class tomorrow, on
the topic China and India.
10
Todays SMT systems ignore context in their
phrasal translation lexicons
11
Todays SMT systems ignore context in their
phrasal translation lexicons
cj(f)
Entire input sentence context
12
But context-dependent lexical choice does not
necessarily improve translation quality

Early pilot study Brown et al. 1991
use single most discriminative feature to
disambiguate between 2 English translations of a
French word
WSD improves French-English translation quality,
but not on a significant vocabulary and allowing
only 2 senses
Context-dependent lexical choice helps word
alignment, but not really translation quality
Garcia Varea et al. 2001, 2002
maximum-entropy trained bilexicon replaces
IBM-4/5 translation probabilities
improves AER on Canadian Hansards and Verbmobil
tasks
small improvement on WER and PER by rescoring
n-best lists, but not statistically significant
Garcia Varea Casacuberta 2005

13
Context-dependent modeling improves quality of
Statistical MT Carpuat Wu 2007

Introduced context-dependent phrasal lexicons for
SMT
leverage WSD techniques for SMT lexical choice
generalize conventional WSD to Phrase Sense
Disambiguation
Context-dependent modeling always improves SMT
accuracy
on all tasks - 3 different IWSLT06 datasets,
NIST04
on all 8 common automatic metrics - BLEU, NIST,
METEOR, METEORsynsets, TER, WER, PER, CDER

14
No other WSD for SMT approach improves
translation quality as consistently

Until recently, using WSD to improve SMT quality
has met with mixed or disappointing results
Carpuat Wu ACL-2005, Cabezas Resnik unpub
Last year, for the first time, different
approaches showed that WSD can help translation
quality
WSD improved BLEU (but how about other
metrics??) on 3 Chinese-English tasks
Carpuat et al.
IWSLT-2006
WSD improved BLEU (but how about other
metrics??) on Chinese-English NIST task
Chan et al. ACL-2007
WSD improved METEOR (but not BLEU!) on
Spanish-English Europarl task
Giménez Màrquez WMT-2007
Phrasal WSD improves BLEU, NIST, METEOR (but how
about error rates??)
on Italian-English and Chinese-English IWSLT
task Stroppa et al. TMI-2007
But no other approach improves on 8 metrics on 4
different tasks

15
But how useful are the context-dependent lexicons
as resources?

Improving translation quality is great, but
Metrics aggregate impact of many different
factors
Metrics ignore how translation hypotheses are
generated
Context-dependent lexicons are more expensive to
train, so
Are their contributions observably useful?
Direct analysis needed how do SMT systems use
context-dependent vs. conventional lexicons?

16
Learning context-dependent vs. conventional
lexicons for SMT

learned from the same word-aligned parallel data
cover the same phrasal input vocabulary
know the same phrasal translation candidates
Only difference an additional context-dependent
parameter
dynamically computed vs. static conventional
scores
Uses WSD modeling vs. MLE in conventional
lexicons

17
Word Sense Disambiguation provides appropriate
models of context

WSD has long targeted the questions of
how to design context features
how to combine contextual evidence into a sense
prediction
Senseval/SemEval have extensively evaluated WSD
systems
with different feature sets
with different machine learning classifiers
Senseval multilingual lexical sample tasks
use observable lexical translations as senses
just like lexical choice in SMT
E.g. Senseval-2003 English-Hindi, SemEval-2007
Chinese-English

18
Leveraging a Senseval WSD system

Top Senseval-3 Chinese Lexical Sample
systemCarpuat et al. 2004
standard classification models
maximum entropy, SVM, boosted decision stumps,
naïve Bayes
rich lexical and syntactic features
bag of word sentence context
position sensitive co-occurring words and POS
tags
basic syntactic dependency features

19
Generalizing WSD to PSD for context-dependent
phrasal translation lexicons

One PSD model per input language phrase
regardless of POS, length, etc.
Generalization of standard WSD models
Sense candidates are the phrase translation
candidates seen in training
The sense candidates are extracted just like the
conventional SMT phrasal lexicon
typically, output language phrases consistent
with the intersection of bidirectional IBM
alignments

20
Extracting PSD senses and training examples from
word-aligned parallel text
? ?? ?? ?? ? ? ???? ? ? ?
is there a new - age music concert within
the next few days ?
21
Extracting PSD senses and training examples from
word-aligned parallel text
? ?? ?? ?? ? ? ???? ? ? ?
is there a new - age music concert within
the next few days ?
Extracted PSD training instances
? ?? ?? ?? ltt sense withingt?lt/tgt ? ???? ? ? ?
22
Extracting PSD senses and training examples from
word-aligned parallel text
? ?? ?? ?? ? ? ???? ? ? ?
is there a new - age music concert within
the next few days ?
Extracted PSD training instances
? ?? ?? ?? ltt sense withingt?lt/tgt ? ???? ? ? ?
? ?? ?? ?? ? ? ltt sensenew - age
musicgt????lt/tgt ? ? ?
23
Extracting PSD senses and training examples from
word-aligned parallel text
? ?? ?? ?? ? ? ???? ? ? ?
is there a new - age music concert within
the next few days ?
Extracted PSD training instances
? ?? ?? ?? ltt sense withingt?lt/tgt ? ???? ? ? ?
? ?? ?? ?? ? ? ltt sensenew - age
musicgt????lt/tgt ? ? ?
? ltt sensewithin the next few daysgt?? ?? ??
?lt/tgt ? ???? ? ? ?
24
Integrating context-dependent lexicon into
phrase-based SMT architectures

The context-dependent phrasal lexicon
probabilities
Are conditional translation probabilities
can naturally be added as a feature in log linear
translation models
Unlike conventional translation probabilities,
they are
dynamically computed
dependent on full-sentence context
Decoding can make full use of context-dependent
phrasal lexicons predictions at all stages of
decoding
unlike in n-best reranking

25
Evaluating context-dependent phrasal translation
lexicons

lexical choice only
vs. translation quality Carpuat Wu EMNLP 2007
integrated evaluation in SMT
vs. stand-alone as in Senseval Carpuat et al.
2004
fully phrasal lexicons only
vs. single-word context-dependent lexicon
Carpuat Wu TMI 2007
Translation task
Test set NIST-04 Chinese-English text
translation
1788 sentences
4 reference translations
Standard phrase-based SMT decoder (Moses)

26
Experimental setupLearning the lexicons

Standard conventional lexicon learning
Newswire Chinese-English corpus
2M sentences
Standard word-alignment methodology
GIZA
Intersection using grow-diag heuristics Koehn
et al. 2003
Standard Pharaoh/Moses phrase-table
Maximum phrase length 10
Translation probabilities in both directions,
lexical weights
Context-dependent lexicons
Use the exact same word-aligned parallel data
Train a WSD model for each known phrase

27
Step 1 Evaluating phrasal segmentation with
context-dependent vs. conventional lexicons

Goal compare the phrasal segmentation of the
input sentence used to produce the top hypothesis
Method
We do not evaluate accuracy
There is no gold standard phrasal segmentation!
Instead, we analyze how the input phrases
available in lexicons are used

28
SMT uses longer input phrases with
context-dependent lexicons

Context-dependent lexicons help use longer, less
ambiguous phrases

29
SMT uses more input phrase types with
context-dependent lexicons

26 of phrase types used with context-dependent
lexicon are not used with conventional lexicon
96 of those lexicon entries are truly phrasal
(not single words)
Context-dependent lexicons make better use of
available input language vocabulary

30
SMT uses more rare phrases with context-dependent
lexicons

With context modeling, less training data is
needed for phrases to be used

31
Step 2 Comparing translation selection

Goal compare translation selection only
Method
We compare accuracy of translation selection for
identical segments only
Because different lexicons yield different
phrasal segmentations
A translation is considered accurate if it
matches any of the reference translations
Because input sentence and references are not
word-aligned

32
Context-dependent lexicon predictions match
references better

Context-dependent lexicons yield more matches
than conventional lexicons
48 of errors made with conventional lexicons are
corrected with context-dependent lexicons

Lexicon Conventional Match No match
Context-dependent
Match 1435 2139
No match 683 2272
33
Conclusion context-dependent phrasal translation
lexicons are useful resources for SMT

A key new resource for Phrase Sense
Disambiguation (PSD) for SMT Carpuat Wu 2007
Entirely automatically acquired
Consistently improves 8 translation quality
metrics EMNLP 2007
Fully phrasal just like conventional SMT lexicons
TMI 2007
But much larger than conventional lexicons!
Why is this extremely large resource necessary?
Is its contribution observably useful?
Is it used by the SMT system differently than
conventional SMT lexicons?

34
Conclusion context-dependent phrasal translation
lexicons are useful resources for SMT

Improve phrasal segmentation
Exploit available input vocabulary better
More phrases, longer phrases and more rare
phrases are used in decoding
Consistent with other findings
fully phrasal context-dependent lexicons yield
more reliable improvements than single word
lexicons Carpuat Wu TMI2007
Improve translation candidate selection
Even after compensating for differences in
phrasal segmentation
Genuinely improve lexical choice
Not just BLEU and other metrics!

35
Evaluation of Context-DependentPhrasal
Translation Lexiconsfor Statistical Machine
Translation

Marine CARPUAT and Dekai WU
Human Language Technology Center
Department of Computer Science and Engineering
HKUST

36
Translation quality evaluationNot just BLEU, but
8 automatic metrics

N-gram matching metrics
BLEU4
NIST
METEOR
METEORsynsets
augmented with WordNet synonym matching
Edit distances
TER
WER
PER
CDER

37
Context-dependent modeling consistently improves
translation quality
Test set Experiment BLEU NIST METEOR METEOR (no syn) TER WER PER CDER
IWSLT 1 SMT 42.21 7.888 65.40 63.24 40.45 45.58 37.80 40.09
SMTWSD 42.38 7.902 65.73 63.64 39.98 45.30 37.60 39.91
IWSLT 2 SMT 41.49 8.167 66.25 63.85 40.95 46.42 37.52 40.35
SMTWSD 41.97 8.244 66.35 63.86 40.63 46.14 37.25 40.10
IWSLT 3 SMT 49.91 9.016 73.36 70.70 35.60 40.60 32.30 35.46
SMTWSD 51.05 9.142 74.13 71.44 34.68 39.75 31.71 34.58
NIST SMT 20.41 7.155 60.21 56.15 76.76 88.26 61.71 70.32
SMTWSD 20.92 7.468 60.30 56.79 71.34 83.37 57.29 67.38
38
Results are statistically significant

NIST results are statistically significant at the
95 level
Tested using paired bootstrap resampling

39
Translations with context-dependent phrasal
lexicons often differ from SMT translations
Test set Translations changed by context modeling
IWSLT 1 25.49
IWSLT 2 30.40
IWSLT 3 29.25
NIST 95.74
40
Context-dependent modeling helps even for small
and single-domain IWSLT

IWSLT is a single-domain task with very short
sentences
Even in these conditions, context-dependent
phrasal lexicons are helpful
there are genuine sense ambiguities
E.g.
turn vs. transfer
context-features are available
19 observed features per occurrence of a Chinese
phrase

41
The most useful context features are not
available in standard SMT

The 3 most useful context feature types are
POS tag of word preceding the target phrase
POS tag of word following the target phrase
Bag-of-word context
We use weights learned by maximum entropy
classifier to determine the most useful features
We normalized feature weights for each WSD model
and then compute average weight of each feature
type

42
Dynamic context-dependent sense predictions are
better than static predictions

Context-dependent modeling often helps rank the
correct translation first
Even when context-dependent modeling picks the
same translation candidate, the WSD scores are
more discriminative than baseline translation
probabilities
better at overriding incorrect LM predictions
gives higher confidence to translate longer input
phrases when appropriate

43
Context-dependent modeling improves phrasal
lexical choice examples
44
Context-dependent modeling improves phrasal
lexical choice examples
45
Context-dependent modeling prefers longer phrases

Input
Ref.
No parliament members voted against him .
SMT
Without any congressmen voted against him .
SMTWSD
No congressmen voted against him .

46
Context-dependent modeling prefers longer phrases

Input
Ref.
No parliament members voted against him .
SMT
Without any congressmen voted against him .
SMTWSD
No congressmen voted against him .

47
Context-dependent modeling prefers longer phrases

Input
Ref.
No parliament members voted against him .
SMT
Without any congressmen voted against him .
SMTWSD
No congressmen voted against him .

48
Context-dependent modeling prefers longer phrases

Average length of Chinese phrases used is higher
with context-dependent phrasal lexicon
This confirms that
Context-dependent predictions for all phrases are
useful
Context-dependent predictions should be available
at all stages of decoding
This explains why using WSD for single words only
has a less reliable impact on translation quality
as in Cabezas Resnik 2005, Carpuat et al.
2006

49
Context-dependent lexicons should be phrasal to
always help translation
Test set Experiment BLEU NIST METEOR METEOR (no syn) TER WER PER CDER
1 SMT 42.21 7.888 65.40 63.24 40.45 45.58 37.80 40.09
word lex. 41.94 7.911 65.55 63.52 40.59 45.61 37.75 40.09
phrasal lex. 42.38 7.902 65.73 63.64 39.98 45.30 37.60 39.91
2 SMT 41.49 8.167 66.25 63.85 40.95 46.42 37.52 40.35
word lex. 41.31 8.161 66.23 63.72 41.34 46.82 37.98 40.69
phrasal lex. 41.97 8.244 66.35 63.86 40.63 46.14 37.25 40.10
3 SMT 49.91 9.016 73.36 70.70 35.60 40.60 32.30 35.46
word lex. 49.73 9.017 73.32 70.82 35.72 40.61 32.10 35.30
phrasal lex. 51.05 9.142 74.13 71.44 34.68 39.75 31.71 34.58
50
No other WSD for SMT approach improves
translation quality as consistently

Until recently, using WSD to improve SMT quality
has met with mixed or disappointing results
Carpuat Wu ACL-2005, Cabezas Resnik unpub
Last year, for the first time, different
approaches showed that WSD can help translation
quality
WSD improved BLEU (but how about other
metrics??) on 3 Chinese-English tasks
Carpuat et al.
IWSLT-2006
WSD improved BLEU (but how about other
metrics??) on Chinese-English NIST task
Chan et al. ACL-2007
WSD improved METEOR (but not BLEU!) on
Spanish-English Europarl task
Giménez Màrquez WMT-2007
Phrasal WSD improves BLEU, NIST, METEOR (but how
about error rates??)
on Italian-English and Chinese-English IWSLT
task Stroppa et al. TMI-2007
But no other approach improves on 8 metrics on 4
different tasks

51
Context-dependent modeling improves quality of
Statistical MT

Presenting context-dependent phrasal lexicons for
SMT
leverage WSD techniques for SMT lexical choice
Context-dependent modeling always improves SMT
accuracy
on all tasks - 3 different IWSLT06 datasets,
NIST04
on all 8 common automatic metrics - BLEU, NIST,
METEOR, METEORsynsets, TER, WER, PER, CDER
Why?
Most useful context features are unavailable to
current SMT systems
Better phrasal segmentation
Better phrasal lexical choice
more accurate rankings
more discriminative scores

52
Maxent-based sense disambiguation in Candide
Berger 1996

No evaluation of impact on translation quality
only 2 example sentences, no contrastive
evaluation by human judgment nor any automatic
metric
extension by Garcia Varea et al. does not
significantly improve translation quality
Still does not model input language context
Overly simplified context model
does not use full sentential context
only 3 words to the left, 3 words to the right
does not generalize over word identities
only words, no POS tags
does not generalize to phrasal disambiguation
targets
only words
Does not augment the existing SMT model
only replace context-independent translation
probability