Title: A Discriminative Framework for Bilingual Word Alignment
1A Discriminative Framework for Bilingual Word
Alignment
- Robert C. Moore
- Natural Language Processing Group
- Microsoft Research
2Micro-tutorial on current statistical MT
- Word align a parallel bilingual corpus
- Extract translation pairs of contiguous phrases
- Optimize weighted linear combination of
- Log of phrase-translation probabilities
- Log of word-translation probabilities
- Degree of nonmonotonicity of translation
- Log of n-gram target language string probability
- Number of target language words
- Other features
- Find highest scoring possible translation for
source language sentences
3Example of word alignment and phrase extraction
- I dont speak French
- Je ne parle pas Français
4Example of word alignment and phrase extraction
- I dont speak French
- Je ne parle pas Français
5Standard approach to word alignment
- Generative models, maximum likelihood training
using approximations to EM - IBM Models 1-5
- Aachen HMM-based model
- Problems
- Free parameters not trained by EM are difficult
to optimize - A generative story is required to add new
features
6Our approach based on discriminative training
- A weighted linear model
- Applied using beam-search aligner
- Trained with averaged perceptron algorithm on a
small number (200) hand aligned sentence pairs
7Technical advantages
- Discriminative training generally superior to
maximum likelihood training - Conceptually very simple easy to understand and
implement - Easy to add new features without having to invent
a generative story - Fast to train with averaged perceptron
8Two models
- First model based on log-likelihood-ratio word
association statistics - Two versions of a second model based on
conditional probability of a link cluster, given
co-occurrence, trained on alignments produced by
a simpler model - LLR-based discriminative model
- LLR-based greedy heuristic model
9Log-likelihood-ratio measure of word association
10Features for LLR-based model
- Sum of LLR scores for linked word pairs
- Two nonmonotonicity features computed by ordering
linked word pairs by source token position, then
by target token position, and - Summing backward jumps in target position
- Counting backward jumps in target position
- One-to-many feature counting number of links in
which one word participates in another link - Unlinked word feature counting number of words
with no link
11Hard constraints in LLR-based model
- No many-to-many links
- No more than three words linked to one word
- Link allowed only if it has the highest LLR score
in sentence for one of the words
12Features for CLP-based model
- Sum of logs of discounted estimates of
conditional link cluster probabilities - Nonmonotonicity features
- Unlinked word feature
13Novel alignment search
- Greedy search used by Liu, Liu, and Lin
- Start with empty alignment
- Until improvement obtained lt threshold
- Estimate how much each remaining possible link
would improve alignment - Add estimated best link
- Performs alignment evaluations
- Our alignment search performs alignment
evaluations
14Alignment search for LLR-based model
- Initialize existing alignments to contain the
empty alignment and its score - For each possible link L in decreasing order of
LLR score - Initialize recent alignments to be empty
- For each existing alignment A
- Create a new alignment adding L to A
- For each link L overlapping with L
- Create a new alignment adding L to A and removing
L - For each new alignment A
- If A meets hard constraints and is not in recent
alignments, compute score, and if exceeds
threshold, add to recent alignments - Add recent alignments to existing alignments,
sort by score, and prune to N best
15Alignment search for LLR-based model
- Initialize existing alignments to contain the
empty alignment and its score - For each possible link L in decreasing order of
LLR score - Initialize recent alignments to be empty
- For each existing alignment A
- Create a new alignment adding L to A
- For each link L overlapping with L
- Create a new alignment adding L to A and removing
L - For each new alignment A
- If A meets hard constraints and is not in recent
alignments, compute score, and if exceeds
threshold, add to recent alignments - Add recent alignments to existing alignments,
sort by score, and prune to N best
16Alignment search for CLP-based model
- Initialize existing alignments to contain the
empty alignment and its score - For each possible link cluster L in decreasing
order of log conditional link probability - Initialize recent alignments to be empty
- For each existing alignment A
- Create a new alignment adding L to A and removing
any overlapping link clusters - If new alignment is not in recent alignments,
compute score, and if exceeds threshold, add to
recent alignments - Add recent alignments to existing alignments,
sort by score, and prune to N best
17Evaluation methodology
- Used 500K EF (mostly unannotated) sentence pairs
from 2003 parallel text workshop - 447 annotated sentence pairs evenly split into
training set and test set - Evaluated on recall, precision, and alignment
error rate
18Evaluation results
19Conclusions
- Discriminatively trained linear models for
bilingual word alignment can be - Simpler to implement than standard approach
- Easier to add features to than standard approach
- Easier to optimize than standard approach
- At least as accurate as standard approach