Search Applications: Machine Translation - PowerPoint PPT Presentation

About This Presentation

Title:

Search Applications: Machine Translation

Description:

Translation model can be derived from alignment ... Machine Translation ... It's not a simple left-to-right translation ... – PowerPoint PPT presentation

Number of Views:245

Avg rating:3.0/5.0

Slides: 34

Provided by: Kathleen268

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Search Applications: Machine Translation

1
Search ApplicationsMachine Translation

Next time Constraint Satisfaction
Reading for today See Machine Translation
Paper under links
Reading for next time Chapter 5

2
Homework Questions?

3
Agenda

Introduction to machine translation
Statistical approaches
Use of parallel data
Alignment
What functions must be optimized?
Comparison of A and greedy local search (hill
climbing) algorithms for translation
How they work
Their performance

4
Approach to Statistical MT

Translate from past experience
Observe how words, and phrases, and sentences are
translated
Given new sentences in the source language,
choose the most probable translation in the
target language
Data large corpus of parallel text
E.g., Canadian Parliamentary proceedings

5
Data

Example
Ce nest pas clair.
It is not clear.
Quantity
200 billion words (2004 MT evaluation)
Sources
Hansards Canadian parliamentary proceedings
Hong Kong official documents published in
multiple languages
Newspapers published in multiple languages
Religious and literary works

6
Alignment the first step

Which sentences or paragraphs in one language
correspond to which paragraphs or sentences in
another language? (Or what words?)
Problems
Translators dont use word for word translations
Crossing alignments
Types of alignment
11 (90 of the cases)
12, 21
31, 13

7
With regard to Quant aux According to
the mineral waters and (les) eaux minerales et our survey, 1988
the lemonades-soft drinks aux limonades,
they encounter elles rencontrent sales of
still more toujours plus mineral water
users. Indeed dadeptes. En effet and soft drinks were
our survey notre sondage much higher
makes standout fait ressortir than in 1987,
the sales des ventes reflecting
clearly nettement The growing popularity
superior Superieures Of these products.
to those in 1987 a celles de 1987 Cola drink manufacturers
for cola-based drinks Pour les boissons a base de cola in particular
especially notamment Achieved above
Average growth rates
An example of 22 alignment
8

Fertility a word may be translated by more than
1 word
Notamment -gt in particular (fertility 2)
Limonades -gt soft drinks
Fertility 0 A word translated by 0 words
Des ventes -gt sales
Les boissons a base de cola -gt cola drinks
Many to many
Elles rencontrent toujours plus dadeptes -gt The
growing popularity

9
Bead for sentence alignment

A group of sentences in one language that
corresponds in content to some group of sentences
in the other language
Either group can be empty
How much content has to overlap between sentences
to count it as alignment?
An overlapping clause can be sufficient

10
Methods for alignment

Length based
Offset alignment
Word based
Anchors (e.g., cognates)

11
Word Based Alignment

Assume first and last sentences of the texts
align (anchors).
Then until most sentences aligned
Form an envelope of alignments from the cartesian
product of the list of sentences
Exclude alignments if they cross anchors or too
distance
Choose pairs of words that tend to occur in
alignments
Find pairs of source and target sentences which
contain many possible lexical correspondences.
The most reliable augment the set of anchors

12
The Noisy Channel Model for MT
Language Model P(e)
Decoder eargmaxeP(ef)
Translation Model P(fe)
Noisy Channel
13
The problem

Language model constructed from a large corpus of
English
Bigram model probability of word pairs
Trigram model probability of 3 words in a row
From these, compute sentence probability
Translation model can be derived from alignment
For any pair of English/French words, what is the
probability that pair is a translation?
Decoding is the problem Given an unseen French
sentence, how do we determine the translation?

14
Language Model

Predict the next word given the previous words
P(Wn W1Wn-1)
Markov assumption
Only the last few words affects the next word
Usual cases bigram, trigram, 4gram
Sue swallowed the large green .
Parameter estimation
Bigram 20,000X19,000 400 million
Trigram 20,0002X19,000 8 trillion
4gram 20,0003X19,0001.6X1017

15
Translation Model

For a particular word alignment, multiply the m
translation probabilities
P(Jean aime Marie John loves Mary)
P(JeanJohn)XP(aimeloves)XP(MarieMary)
Then sum the probabilities of all alignments

16
Decoding is NP complete

When considering any word re-ordering
Swapped words
Words with fertility gt n (insertions)
Words with fertility 0 (deletions)
Usual strategy examine a subset of likely
possibilities and choose from that
Search error decoder returns e but there exists
some e s.t. P(ef) gt P (ef)

17
Example Decoding Errors

Search ErrorPermettez que je donne un example a
la chambre.Let me give the House one
example.Let me give an example in the House
Model Error Vous avez besoin de toute laide
disponible.You need all the help you can
get.You need of the whole benefits available.

18
Search

Traditional decoding method stack decoder
A algorithm
Deeply explore each hypothesis
Fast greedy algorithm
Much faster than A
How often does it fail?
Integer Programming Method
Transform to Traveling Salesman (see paper)
Very slow
Guaranteed to find the best choice

19
Large branching factors

Machine Translation
Input sequence of n words, each with up to 200
possible target word translations.
Output sequence of m words in the target
language that has high score under some goodness
criterion.
Search space
6 words French sentence has 10300 distinct
translation scores under the IBM M4 translation
model. Soricut, Knight, Marcu, AMTA2002

20
Stack decoder A

Initialize the stack with an empty hypothesis
Loop
Pop h, the best hypothesis off the stack
If h is a complete sentence, output h and
terminate
For each possible next word w, extend h by adding
w and push the resulting hypothesis onto the
stack.

21
Complications

Its not a simple left-to-right translation
Because we multiply probabilities as we add
words, shorter hypotheses will always win
Use multiple stacks, one for each length
Given fertility possibilities, when we add a new
target word for an input source word, how many do
we add?

22
Example
23
Hill climbing

function HillClimbing(problem, initial-state,
queuing-fn)
node ? MakeNode(initial-state(problem))
while T do
next ? Best(SearchOperator-fn(node,cost-fn))
if(IsBetter-fn(next, node)) then continue
else if(GoalTest(node)) then return node
else exit
end while
return Failure

MT (Germann et al., ACL-2001) node ?
targetGloss(sourceSentence) while T do next
? Best( LocallyModifiedTranslationOf(node))
if(IsBetter(next, node)) then continue else
print node exit end while
24
Types of changes

Translate one or two words (j1e1j2e2)
Translate and insert (j e1 e2)
Remove word of fertility 0 (i)
Swap segments (i1 i2 j1 j2)
Join words (i1 i2)

25
Example

Total of 77,421 possible translations attempted

26
(No Transcript)
27
(No Transcript)
28
How to search better?

MakeNode(initial-state(problem))
RemoveFront(Q)
SearchOperator-fn(node, cost-fn)
queuing-fn(problem, Q, (Next,Cost))

29
Example 1 Greedy Search MakeNode(initial-state(p
roblem))
Machine Translation (Marcu and Wong,
EMNLP-2002) node ? targetGloss(sourceSentence) w
hile T do next ? Best( LocallyModifiedTranslat
ionOf(node)) if(IsBetter(next, node)) then
continue else print node exit end while
30
Climbing the wrong peak
What sentence is more grammatical? 1. better bart
than madonna , i say 2. i say better than bart
madonna ,
Can you make a sentence with these words? a
and apparently as be could dissimilar firing
identical neural really so things thought
two
31
Language-model stress-testing