Title: Search Applications: Machine Translation
1Search ApplicationsMachine Translation
- Next time Constraint Satisfaction
- Reading for today See Machine Translation
Paper under links - Reading for next time Chapter 5
2Homework Questions?
3Agenda
- Introduction to machine translation
- Statistical approaches
- Use of parallel data
- Alignment
- What functions must be optimized?
- Comparison of A and greedy local search (hill
climbing) algorithms for translation - How they work
- Their performance
4Approach to Statistical MT
- Translate from past experience
- Observe how words, and phrases, and sentences are
translated - Given new sentences in the source language,
choose the most probable translation in the
target language - Data large corpus of parallel text
- E.g., Canadian Parliamentary proceedings
5Data
- Example
- Ce nest pas clair.
- It is not clear.
- Quantity
- 200 billion words (2004 MT evaluation)
- Sources
- Hansards Canadian parliamentary proceedings
- Hong Kong official documents published in
multiple languages - Newspapers published in multiple languages
- Religious and literary works
6Alignment the first step
- Which sentences or paragraphs in one language
correspond to which paragraphs or sentences in
another language? (Or what words?) - Problems
- Translators dont use word for word translations
- Crossing alignments
- Types of alignment
- 11 (90 of the cases)
- 12, 21
- 31, 13
7An example of 22 alignment
8- Fertility a word may be translated by more than
1 word - Notamment - in particular (fertility 2)
- Limonades - soft drinks
- Fertility 0 A word translated by 0 words
- Des ventes - sales
- Les boissons a base de cola - cola drinks
- Many to many
- Elles rencontrent toujours plus dadeptes - The
growing popularity
9Bead for sentence alignment
- A group of sentences in one language that
corresponds in content to some group of sentences
in the other language - Either group can be empty
- How much content has to overlap between sentences
to count it as alignment? - An overlapping clause can be sufficient
10Methods for alignment
- Length based
- Offset alignment
- Word based
- Anchors (e.g., cognates)
11Word Based Alignment
- Assume first and last sentences of the texts
align (anchors). - Then until most sentences aligned
- Form an envelope of alignments from the cartesian
product of the list of sentences - Exclude alignments if they cross anchors or too
distance - Choose pairs of words that tend to occur in
alignments - Find pairs of source and target sentences which
contain many possible lexical correspondences. - The most reliable augment the set of anchors
12The Noisy Channel Model for MT
Language Model P(e)
Decoder eargmaxeP(ef)
Translation Model P(fe)
Noisy Channel
13The problem
- Language model constructed from a large corpus of
English - Bigram model probability of word pairs
- Trigram model probability of 3 words in a row
- From these, compute sentence probability
- Translation model can be derived from alignment
- For any pair of English/French words, what is the
probability that pair is a translation? - Decoding is the problem Given an unseen French
sentence, how do we determine the translation?
14Language Model
- Predict the next word given the previous words
- P(Wn W1Wn-1)
- Markov assumption
- Only the last few words affects the next word
- Usual cases bigram, trigram, 4gram
- Sue swallowed the large green .
- Parameter estimation
- Bigram 20,000X19,000 400 million
- Trigram 20,0002X19,000 8 trillion
- 4gram 20,0003X19,0001.6X1017
15Translation Model
- For a particular word alignment, multiply the m
translation probabilities - P(Jean aime Marie John loves Mary)
- P(JeanJohn)XP(aimeloves)XP(MarieMary)
- Then sum the probabilities of all alignments
16Decoding is NP complete
- When considering any word re-ordering
- Swapped words
- Words with fertility n (insertions)
- Words with fertility 0 (deletions)
- Usual strategy examine a subset of likely
possibilities and choose from that - Search error decoder returns e but there exists
some e s.t. P(ef) P (ef)
17Example Decoding Errors
- Search ErrorPermettez que je donne un example a
la chambre.Let me give the House one
example.Let me give an example in the House - Model Error Vous avez besoin de toute laide
disponible.You need all the help you can
get.You need of the whole benefits available.
18Search
- Traditional decoding method stack decoder
- A algorithm
- Deeply explore each hypothesis
- Fast greedy algorithm
- Much faster than A
- How often does it fail?
- Integer Programming Method
- Transform to Traveling Salesman (see paper)
- Very slow
- Guaranteed to find the best choice
19Large branching factors
- Machine Translation
- Input sequence of n words, each with up to 200
possible target word translations. - Output sequence of m words in the target
language that has high score under some goodness
criterion. - Search space
- 6 words French sentence has 10300 distinct
translation scores under the IBM M4 translation
model. Soricut, Knight, Marcu, AMTA2002
20Stack decoder A
- Initialize the stack with an empty hypothesis
- Loop
- Pop h, the best hypothesis off the stack
- If h is a complete sentence, output h and
terminate - For each possible next word w, extend h by adding
w and push the resulting hypothesis onto the
stack.
21Complications
- Its not a simple left-to-right translation
- Because we multiply probabilities as we add
words, shorter hypotheses will always win - Use multiple stacks, one for each length
- Given fertility possibilities, when we add a new
target word for an input source word, how many do
we add?
22Example
23Hill climbing
- function HillClimbing(problem, initial-state,
queuing-fn) - node ? MakeNode(initial-state(problem))
- while T do
- next ? Best(SearchOperator-fn(node,cost-fn))
- if(IsBetter-fn(next, node)) then continue
- else if(GoalTest(node)) then return node
- else exit
- end while
- return Failure
MT (Germann et al., ACL-2001) node ?
targetGloss(sourceSentence) while T do next
? Best( LocallyModifiedTranslationOf(node))
if(IsBetter(next, node)) then continue else
print node exit end while
24Types of changes
- Translate one or two words (j1e1j2e2)
- Translate and insert (j e1 e2)
- Remove word of fertility 0 (i)
- Swap segments (i1 i2 j1 j2)
- Join words (i1 i2)
25Example
- Total of 77,421 possible translations attempted
26(No Transcript)
27(No Transcript)
28How to search better?
- MakeNode(initial-state(problem))
- RemoveFront(Q)
- SearchOperator-fn(node, cost-fn)
- queuing-fn(problem, Q, (Next,Cost))
29Example 1 Greedy Search MakeNode(initial-state(p
roblem))
Machine Translation (Marcu and Wong,
EMNLP-2002) node ? targetGloss(sourceSentence) w
hile T do next ? Best( LocallyModifiedTranslat
ionOf(node)) if(IsBetter(next, node)) then
continue else print node exit end while
30Climbing the wrong peak
What sentence is more grammatical? 1. better bart
than madonna , i say 2. i say better than bart
madonna ,
Can you make a sentence with these words? a
and apparently as be could dissimilar firing
identical neural really so things thought
two
31Language-model stress-testing
- Input bag of words
- Output best sequence according to a linear
combination of an - ngram LM
- syntax-based LM (Collins, 1997)
-
32Size 3-7 words long
- Best searched
- 32.3 i say better than bart madonna ,
- Original word order
- 41.6 better bart than madonna, i say
SBLM trained on an additional 160k WSJ
sentences.
33End of Class Questions