Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation

Description:

is there a new - age music concert within the next few days ? ? ?? ?? ?? ?? ???? ?? ? ... ?? ?? t sense='new - age music' ???? /t ?? ? Extracted PSD training ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 52
Provided by: deka2
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation


1
Evaluation of Context-DependentPhrasal
Translation Lexiconsfor Statistical Machine
Translation
  • Marine CARPUAT and Dekai WU
  • Human Language Technology Center
  • Department of Computer Science and Engineering
  • HKUST

2
New resources for SMT context-dependent phrasal
translation lexicons
  • A key new resource for Phrase Sense
    Disambiguation (PSD) for SMT Carpuat Wu 2007
  • Entirely automatically acquired
  • Consistently improves 8 translation quality
    metrics EMNLP 2007
  • Fully phrasal just like conventional SMT lexicons
    TMI 2007
  • But much larger than conventional lexicons!
  • Why is this extremely large resource necessary?
  • Is its contribution observably useful?
  • Is it used by the SMT system differently than
    conventional SMT lexicons?

3
Our finding context-dependent lexicons directly
improve lexical choice in SMT
  • Exploit the available vocabulary better for
    phrasal segmentation
  • more and longer phrases are used in decoding
  • consistent with other findings TMI2007
  • fully phrasal context-dependent lexicons yield
    more reliable improvements than single word
    lexicons
  • Select better translation candidates
  • even after compensating for differences in
    phrasal segmentation
  • improvements in BLEU, TER, METEOR, etc. really
    reflect improved lexical choice

4
Problems with current SMT systems
  • Input ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  • Ref. Prof. Zhang gave a lecture on China and
    India to a packed audience.
  • SMT1 Prof. Zhang to a group of people on China
    and India class.
  • SMT2 Prof. Zhang and a group of people go into
    class on China and India.

Ref. Prof. Zhang gave a lecture on China and
India to a packed audience.
SMT2 Prof. Zhang and a group of people go into
class on China and India.
5
Translation lexicons in SMT are independent of
context!
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Prof. Zhang gave a lecture on China and India
to a packed audience.
? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ?
Everyone is welcome to attend class tomorrow, on
the topic China and India.
6
Phrasal lexicons in SMT are independent of
context too!
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Prof. Zhang gave a lecture on China and India
to a packed audience.
? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ?
Everyone is welcome to attend class tomorrow, on
the topic China and India.
7
Current SMT systems are hurt byvery weak models
of context
  • Translation disambiguation models are too
    simplistic
  • Phrasal lexicon translation probabilities are
    static, so not sensitive to context
  • Context in input language is only modeled weakly
  • by phrase segments
  • Context in output language is only modeled weakly
  • by n-grams
  • Error analysis reveals many lexical choice errors
  • Yet, few attempts at directly modeling context

8
Todays SMT systems ignore the contextual
features that would help lexical choice
  • No full sentential context
  • merely local n-gram context
  • No POS information
  • merely surface form of words
  • No structural information
  • merely word n-gram identities

9
Correct translation disambiguation requires rich
context features
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Prof. Zhang gave a lecture on China and India
to a packed audience.
? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ?
Everyone is welcome to attend class tomorrow, on
the topic China and India.
10
Todays SMT systems ignore context in their
phrasal translation lexicons
11
Todays SMT systems ignore context in their
phrasal translation lexicons
cj(f)
Entire input sentence context
12
But context-dependent lexical choice does not
necessarily improve translation quality
  • Early pilot study Brown et al. 1991
  • use single most discriminative feature to
    disambiguate between 2 English translations of a
    French word
  • WSD improves French-English translation quality,
    but not on a significant vocabulary and allowing
    only 2 senses
  • Context-dependent lexical choice helps word
    alignment, but not really translation quality
    Garcia Varea et al. 2001, 2002
  • maximum-entropy trained bilexicon replaces
    IBM-4/5 translation probabilities
  • improves AER on Canadian Hansards and Verbmobil
    tasks
  • small improvement on WER and PER by rescoring
    n-best lists, but not statistically significant
    Garcia Varea Casacuberta 2005

13
Context-dependent modeling improves quality of
Statistical MT Carpuat Wu 2007
  • Introduced context-dependent phrasal lexicons for
    SMT
  • leverage WSD techniques for SMT lexical choice
  • generalize conventional WSD to Phrase Sense
    Disambiguation
  • Context-dependent modeling always improves SMT
    accuracy
  • on all tasks - 3 different IWSLT06 datasets,
    NIST04
  • on all 8 common automatic metrics - BLEU, NIST,
    METEOR, METEORsynsets, TER, WER, PER, CDER

14
No other WSD for SMT approach improves
translation quality as consistently
  • Until recently, using WSD to improve SMT quality
    has met with mixed or disappointing results
  • Carpuat Wu ACL-2005, Cabezas Resnik unpub
  • Last year, for the first time, different
    approaches showed that WSD can help translation
    quality
  • WSD improved BLEU (but how about other
    metrics??) on 3 Chinese-English tasks
    Carpuat et al.
    IWSLT-2006
  • WSD improved BLEU (but how about other
    metrics??) on Chinese-English NIST task
    Chan et al. ACL-2007
  • WSD improved METEOR (but not BLEU!) on
    Spanish-English Europarl task
    Giménez Màrquez WMT-2007
  • Phrasal WSD improves BLEU, NIST, METEOR (but how
    about error rates??)
  • on Italian-English and Chinese-English IWSLT
    task Stroppa et al. TMI-2007
  • But no other approach improves on 8 metrics on 4
    different tasks

15
But how useful are the context-dependent lexicons
as resources?
  • Improving translation quality is great, but
  • Metrics aggregate impact of many different
    factors
  • Metrics ignore how translation hypotheses are
    generated
  • Context-dependent lexicons are more expensive to
    train, so
  • Are their contributions observably useful?
  • Direct analysis needed how do SMT systems use
    context-dependent vs. conventional lexicons?

16
Learning context-dependent vs. conventional
lexicons for SMT
  • learned from the same word-aligned parallel data
  • cover the same phrasal input vocabulary
  • know the same phrasal translation candidates
  • Only difference an additional context-dependent
    parameter
  • dynamically computed vs. static conventional
    scores
  • Uses WSD modeling vs. MLE in conventional
    lexicons

17
Word Sense Disambiguation provides appropriate
models of context
  • WSD has long targeted the questions of
  • how to design context features
  • how to combine contextual evidence into a sense
    prediction
  • Senseval/SemEval have extensively evaluated WSD
    systems
  • with different feature sets
  • with different machine learning classifiers
  • Senseval multilingual lexical sample tasks
  • use observable lexical translations as senses
  • just like lexical choice in SMT
  • E.g. Senseval-2003 English-Hindi, SemEval-2007
    Chinese-English

18
Leveraging a Senseval WSD system
  • Top Senseval-3 Chinese Lexical Sample
    systemCarpuat et al. 2004
  • standard classification models
  • maximum entropy, SVM, boosted decision stumps,
    naïve Bayes
  • rich lexical and syntactic features
  • bag of word sentence context
  • position sensitive co-occurring words and POS
    tags
  • basic syntactic dependency features

19
Generalizing WSD to PSD for context-dependent
phrasal translation lexicons
  • One PSD model per input language phrase
  • regardless of POS, length, etc.
  • Generalization of standard WSD models
  • Sense candidates are the phrase translation
    candidates seen in training
  • The sense candidates are extracted just like the
    conventional SMT phrasal lexicon
  • typically, output language phrases consistent
    with the intersection of bidirectional IBM
    alignments

20
Extracting PSD senses and training examples from
word-aligned parallel text
? ?? ?? ?? ? ? ???? ? ? ?
is there a new - age music concert within
the next few days ?
21
Extracting PSD senses and training examples from
word-aligned parallel text
? ?? ?? ?? ? ? ???? ? ? ?
is there a new - age music concert within
the next few days ?
Extracted PSD training instances
? ?? ?? ?? ltt sense withingt?lt/tgt ? ???? ? ? ?
22
Extracting PSD senses and training examples from
word-aligned parallel text
? ?? ?? ?? ? ? ???? ? ? ?
is there a new - age music concert within
the next few days ?
Extracted PSD training instances
? ?? ?? ?? ltt sense withingt?lt/tgt ? ???? ? ? ?
? ?? ?? ?? ? ? ltt sensenew - age
musicgt????lt/tgt ? ? ?
23
Extracting PSD senses and training examples from
word-aligned parallel text
? ?? ?? ?? ? ? ???? ? ? ?
is there a new - age music concert within
the next few days ?
Extracted PSD training instances
? ?? ?? ?? ltt sense withingt?lt/tgt ? ???? ? ? ?
? ?? ?? ?? ? ? ltt sensenew - age
musicgt????lt/tgt ? ? ?
? ltt sensewithin the next few daysgt?? ?? ??
?lt/tgt ? ???? ? ? ?
24
Integrating context-dependent lexicon into
phrase-based SMT architectures
  • The context-dependent phrasal lexicon
    probabilities
  • Are conditional translation probabilities
  • can naturally be added as a feature in log linear
    translation models
  • Unlike conventional translation probabilities,
    they are
  • dynamically computed
  • dependent on full-sentence context
  • Decoding can make full use of context-dependent
    phrasal lexicons predictions at all stages of
    decoding
  • unlike in n-best reranking

25
Evaluating context-dependent phrasal translation
lexicons
  • lexical choice only
  • vs. translation quality Carpuat Wu EMNLP 2007
  • integrated evaluation in SMT
  • vs. stand-alone as in Senseval Carpuat et al.
    2004
  • fully phrasal lexicons only
  • vs. single-word context-dependent lexicon
    Carpuat Wu TMI 2007
  • Translation task
  • Test set NIST-04 Chinese-English text
    translation
  • 1788 sentences
  • 4 reference translations
  • Standard phrase-based SMT decoder (Moses)

26
Experimental setupLearning the lexicons
  • Standard conventional lexicon learning
  • Newswire Chinese-English corpus
  • 2M sentences
  • Standard word-alignment methodology
  • GIZA
  • Intersection using grow-diag heuristics Koehn
    et al. 2003
  • Standard Pharaoh/Moses phrase-table
  • Maximum phrase length 10
  • Translation probabilities in both directions,
    lexical weights
  • Context-dependent lexicons
  • Use the exact same word-aligned parallel data
  • Train a WSD model for each known phrase

27
Step 1 Evaluating phrasal segmentation with
context-dependent vs. conventional lexicons
  • Goal compare the phrasal segmentation of the
    input sentence used to produce the top hypothesis
  • Method
  • We do not evaluate accuracy
  • There is no gold standard phrasal segmentation!
  • Instead, we analyze how the input phrases
    available in lexicons are used

28
SMT uses longer input phrases with
context-dependent lexicons
  • Context-dependent lexicons help use longer, less
    ambiguous phrases

29
SMT uses more input phrase types with
context-dependent lexicons
  • 26 of phrase types used with context-dependent
    lexicon are not used with conventional lexicon
  • 96 of those lexicon entries are truly phrasal
    (not single words)
  • Context-dependent lexicons make better use of
    available input language vocabulary

30
SMT uses more rare phrases with context-dependent
lexicons
  • With context modeling, less training data is
    needed for phrases to be used

31
Step 2 Comparing translation selection
  • Goal compare translation selection only
  • Method
  • We compare accuracy of translation selection for
    identical segments only
  • Because different lexicons yield different
    phrasal segmentations
  • A translation is considered accurate if it
    matches any of the reference translations
  • Because input sentence and references are not
    word-aligned

32
Context-dependent lexicon predictions match
references better
  • Context-dependent lexicons yield more matches
    than conventional lexicons
  • 48 of errors made with conventional lexicons are
    corrected with context-dependent lexicons

Lexicon Conventional Match No match
Context-dependent
Match 1435 2139
No match 683 2272
33
Conclusion context-dependent phrasal translation
lexicons are useful resources for SMT
  • A key new resource for Phrase Sense
    Disambiguation (PSD) for SMT Carpuat Wu 2007
  • Entirely automatically acquired
  • Consistently improves 8 translation quality
    metrics EMNLP 2007
  • Fully phrasal just like conventional SMT lexicons
    TMI 2007
  • But much larger than conventional lexicons!
  • Why is this extremely large resource necessary?
  • Is its contribution observably useful?
  • Is it used by the SMT system differently than
    conventional SMT lexicons?

34
Conclusion context-dependent phrasal translation
lexicons are useful resources for SMT
  • Improve phrasal segmentation
  • Exploit available input vocabulary better
  • More phrases, longer phrases and more rare
    phrases are used in decoding
  • Consistent with other findings
  • fully phrasal context-dependent lexicons yield
    more reliable improvements than single word
    lexicons Carpuat Wu TMI2007
  • Improve translation candidate selection
  • Even after compensating for differences in
    phrasal segmentation
  • Genuinely improve lexical choice
  • Not just BLEU and other metrics!

35
Evaluation of Context-DependentPhrasal
Translation Lexiconsfor Statistical Machine
Translation
  • Marine CARPUAT and Dekai WU
  • Human Language Technology Center
  • Department of Computer Science and Engineering
  • HKUST

36
Translation quality evaluationNot just BLEU, but
8 automatic metrics
  • N-gram matching metrics
  • BLEU4
  • NIST
  • METEOR
  • METEORsynsets
  • augmented with WordNet synonym matching
  • Edit distances
  • TER
  • WER
  • PER
  • CDER

37
Context-dependent modeling consistently improves
translation quality
Test set Experiment BLEU NIST METEOR METEOR (no syn) TER WER PER CDER
IWSLT 1 SMT 42.21 7.888 65.40 63.24 40.45 45.58 37.80 40.09
SMTWSD 42.38 7.902 65.73 63.64 39.98 45.30 37.60 39.91
IWSLT 2 SMT 41.49 8.167 66.25 63.85 40.95 46.42 37.52 40.35
SMTWSD 41.97 8.244 66.35 63.86 40.63 46.14 37.25 40.10
IWSLT 3 SMT 49.91 9.016 73.36 70.70 35.60 40.60 32.30 35.46
SMTWSD 51.05 9.142 74.13 71.44 34.68 39.75 31.71 34.58
NIST SMT 20.41 7.155 60.21 56.15 76.76 88.26 61.71 70.32
SMTWSD 20.92 7.468 60.30 56.79 71.34 83.37 57.29 67.38
38
Results are statistically significant
  • NIST results are statistically significant at the
    95 level
  • Tested using paired bootstrap resampling

39
Translations with context-dependent phrasal
lexicons often differ from SMT translations
Test set Translations changed by context modeling
IWSLT 1 25.49
IWSLT 2 30.40
IWSLT 3 29.25
NIST 95.74
40
Context-dependent modeling helps even for small
and single-domain IWSLT
  • IWSLT is a single-domain task with very short
    sentences
  • Even in these conditions, context-dependent
    phrasal lexicons are helpful
  • there are genuine sense ambiguities
  • E.g.
  • turn vs. transfer
  • context-features are available
  • 19 observed features per occurrence of a Chinese
    phrase

41
The most useful context features are not
available in standard SMT
  • The 3 most useful context feature types are
  • POS tag of word preceding the target phrase
  • POS tag of word following the target phrase
  • Bag-of-word context
  • We use weights learned by maximum entropy
    classifier to determine the most useful features
  • We normalized feature weights for each WSD model
  • and then compute average weight of each feature
    type

42
Dynamic context-dependent sense predictions are
better than static predictions
  • Context-dependent modeling often helps rank the
    correct translation first
  • Even when context-dependent modeling picks the
    same translation candidate, the WSD scores are
    more discriminative than baseline translation
    probabilities
  • better at overriding incorrect LM predictions
  • gives higher confidence to translate longer input
    phrases when appropriate

43
Context-dependent modeling improves phrasal
lexical choice examples
44
Context-dependent modeling improves phrasal
lexical choice examples
45
Context-dependent modeling prefers longer phrases
  • Input
  • Ref.
  • No parliament members voted against him .
  • SMT
  • Without any congressmen voted against him .
  • SMTWSD
  • No congressmen voted against him .

46
Context-dependent modeling prefers longer phrases
  • Input
  • Ref.
  • No parliament members voted against him .
  • SMT
  • Without any congressmen voted against him .
  • SMTWSD
  • No congressmen voted against him .

47
Context-dependent modeling prefers longer phrases
  • Input
  • Ref.
  • No parliament members voted against him .
  • SMT
  • Without any congressmen voted against him .
  • SMTWSD
  • No congressmen voted against him .

48
Context-dependent modeling prefers longer phrases
  • Average length of Chinese phrases used is higher
    with context-dependent phrasal lexicon
  • This confirms that
  • Context-dependent predictions for all phrases are
    useful
  • Context-dependent predictions should be available
    at all stages of decoding
  • This explains why using WSD for single words only
    has a less reliable impact on translation quality
  • as in Cabezas Resnik 2005, Carpuat et al.
    2006

49
Context-dependent lexicons should be phrasal to
always help translation
Test set Experiment BLEU NIST METEOR METEOR (no syn) TER WER PER CDER
1 SMT 42.21 7.888 65.40 63.24 40.45 45.58 37.80 40.09
word lex. 41.94 7.911 65.55 63.52 40.59 45.61 37.75 40.09
phrasal lex. 42.38 7.902 65.73 63.64 39.98 45.30 37.60 39.91
2 SMT 41.49 8.167 66.25 63.85 40.95 46.42 37.52 40.35
word lex. 41.31 8.161 66.23 63.72 41.34 46.82 37.98 40.69
phrasal lex. 41.97 8.244 66.35 63.86 40.63 46.14 37.25 40.10
3 SMT 49.91 9.016 73.36 70.70 35.60 40.60 32.30 35.46
word lex. 49.73 9.017 73.32 70.82 35.72 40.61 32.10 35.30
phrasal lex. 51.05 9.142 74.13 71.44 34.68 39.75 31.71 34.58
50
No other WSD for SMT approach improves
translation quality as consistently
  • Until recently, using WSD to improve SMT quality
    has met with mixed or disappointing results
  • Carpuat Wu ACL-2005, Cabezas Resnik unpub
  • Last year, for the first time, different
    approaches showed that WSD can help translation
    quality
  • WSD improved BLEU (but how about other
    metrics??) on 3 Chinese-English tasks
    Carpuat et al.
    IWSLT-2006
  • WSD improved BLEU (but how about other
    metrics??) on Chinese-English NIST task
    Chan et al. ACL-2007
  • WSD improved METEOR (but not BLEU!) on
    Spanish-English Europarl task
    Giménez Màrquez WMT-2007
  • Phrasal WSD improves BLEU, NIST, METEOR (but how
    about error rates??)
  • on Italian-English and Chinese-English IWSLT
    task Stroppa et al. TMI-2007
  • But no other approach improves on 8 metrics on 4
    different tasks

51
Context-dependent modeling improves quality of
Statistical MT
  • Presenting context-dependent phrasal lexicons for
    SMT
  • leverage WSD techniques for SMT lexical choice
  • Context-dependent modeling always improves SMT
    accuracy
  • on all tasks - 3 different IWSLT06 datasets,
    NIST04
  • on all 8 common automatic metrics - BLEU, NIST,
    METEOR, METEORsynsets, TER, WER, PER, CDER
  • Why?
  • Most useful context features are unavailable to
    current SMT systems
  • Better phrasal segmentation
  • Better phrasal lexical choice
  • more accurate rankings
  • more discriminative scores

52
Maxent-based sense disambiguation in Candide
Berger 1996
  • No evaluation of impact on translation quality
  • only 2 example sentences, no contrastive
    evaluation by human judgment nor any automatic
    metric
  • extension by Garcia Varea et al. does not
    significantly improve translation quality
  • Still does not model input language context
  • Overly simplified context model
  • does not use full sentential context
  • only 3 words to the left, 3 words to the right
  • does not generalize over word identities
  • only words, no POS tags
  • does not generalize to phrasal disambiguation
    targets
  • only words
  • Does not augment the existing SMT model
  • only replace context-independent translation
    probability
Write a Comment
User Comments (0)
About PowerShow.com