Search Applications: Machine Translation - PowerPoint PPT Presentation

1 / 33
About This Presentation

Search Applications: Machine Translation


Translation model can be derived from alignment ... Machine Translation ... It's not a simple left-to-right translation ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 34
Provided by: Kathleen268


Transcript and Presenter's Notes

Title: Search Applications: Machine Translation

Search ApplicationsMachine Translation
  • Next time Constraint Satisfaction
  • Reading for today See Machine Translation
    Paper under links
  • Reading for next time Chapter 5

Homework Questions?

  • Introduction to machine translation
  • Statistical approaches
  • Use of parallel data
  • Alignment
  • What functions must be optimized?
  • Comparison of A and greedy local search (hill
    climbing) algorithms for translation
  • How they work
  • Their performance

Approach to Statistical MT
  • Translate from past experience
  • Observe how words, and phrases, and sentences are
  • Given new sentences in the source language,
    choose the most probable translation in the
    target language
  • Data large corpus of parallel text
  • E.g., Canadian Parliamentary proceedings

  • Example
  • Ce nest pas clair.
  • It is not clear.
  • Quantity
  • 200 billion words (2004 MT evaluation)
  • Sources
  • Hansards Canadian parliamentary proceedings
  • Hong Kong official documents published in
    multiple languages
  • Newspapers published in multiple languages
  • Religious and literary works

Alignment the first step
  • Which sentences or paragraphs in one language
    correspond to which paragraphs or sentences in
    another language? (Or what words?)
  • Problems
  • Translators dont use word for word translations
  • Crossing alignments
  • Types of alignment
  • 11 (90 of the cases)
  • 12, 21
  • 31, 13

An example of 22 alignment
  • Fertility a word may be translated by more than
    1 word
  • Notamment - in particular (fertility 2)
  • Limonades - soft drinks
  • Fertility 0 A word translated by 0 words
  • Des ventes - sales
  • Les boissons a base de cola - cola drinks
  • Many to many
  • Elles rencontrent toujours plus dadeptes - The
    growing popularity

Bead for sentence alignment
  • A group of sentences in one language that
    corresponds in content to some group of sentences
    in the other language
  • Either group can be empty
  • How much content has to overlap between sentences
    to count it as alignment?
  • An overlapping clause can be sufficient

Methods for alignment
  • Length based
  • Offset alignment
  • Word based
  • Anchors (e.g., cognates)

Word Based Alignment
  • Assume first and last sentences of the texts
    align (anchors).
  • Then until most sentences aligned
  • Form an envelope of alignments from the cartesian
    product of the list of sentences
  • Exclude alignments if they cross anchors or too
  • Choose pairs of words that tend to occur in
  • Find pairs of source and target sentences which
    contain many possible lexical correspondences.
  • The most reliable augment the set of anchors

The Noisy Channel Model for MT
Language Model P(e)
Decoder eargmaxeP(ef)
Translation Model P(fe)
Noisy Channel
The problem
  • Language model constructed from a large corpus of
  • Bigram model probability of word pairs
  • Trigram model probability of 3 words in a row
  • From these, compute sentence probability
  • Translation model can be derived from alignment
  • For any pair of English/French words, what is the
    probability that pair is a translation?
  • Decoding is the problem Given an unseen French
    sentence, how do we determine the translation?

Language Model
  • Predict the next word given the previous words
  • P(Wn W1Wn-1)
  • Markov assumption
  • Only the last few words affects the next word
  • Usual cases bigram, trigram, 4gram
  • Sue swallowed the large green .
  • Parameter estimation
  • Bigram 20,000X19,000 400 million
  • Trigram 20,0002X19,000 8 trillion
  • 4gram 20,0003X19,0001.6X1017

Translation Model
  • For a particular word alignment, multiply the m
    translation probabilities
  • P(Jean aime Marie John loves Mary)
  • P(JeanJohn)XP(aimeloves)XP(MarieMary)
  • Then sum the probabilities of all alignments

Decoding is NP complete
  • When considering any word re-ordering
  • Swapped words
  • Words with fertility n (insertions)
  • Words with fertility 0 (deletions)
  • Usual strategy examine a subset of likely
    possibilities and choose from that
  • Search error decoder returns e but there exists
    some e s.t. P(ef) P (ef)

Example Decoding Errors
  • Search ErrorPermettez que je donne un example a
    la chambre.Let me give the House one
    example.Let me give an example in the House
  • Model Error Vous avez besoin de toute laide
    disponible.You need all the help you can
    get.You need of the whole benefits available.

  • Traditional decoding method stack decoder
  • A algorithm
  • Deeply explore each hypothesis
  • Fast greedy algorithm
  • Much faster than A
  • How often does it fail?
  • Integer Programming Method
  • Transform to Traveling Salesman (see paper)
  • Very slow
  • Guaranteed to find the best choice

Large branching factors
  • Machine Translation
  • Input sequence of n words, each with up to 200
    possible target word translations.
  • Output sequence of m words in the target
    language that has high score under some goodness
  • Search space
  • 6 words French sentence has 10300 distinct
    translation scores under the IBM M4 translation
    model. Soricut, Knight, Marcu, AMTA2002

Stack decoder A
  • Initialize the stack with an empty hypothesis
  • Loop
  • Pop h, the best hypothesis off the stack
  • If h is a complete sentence, output h and
  • For each possible next word w, extend h by adding
    w and push the resulting hypothesis onto the

  • Its not a simple left-to-right translation
  • Because we multiply probabilities as we add
    words, shorter hypotheses will always win
  • Use multiple stacks, one for each length
  • Given fertility possibilities, when we add a new
    target word for an input source word, how many do
    we add?

Hill climbing
  • function HillClimbing(problem, initial-state,
  • node ? MakeNode(initial-state(problem))
  • while T do
  • next ? Best(SearchOperator-fn(node,cost-fn))
  • if(IsBetter-fn(next, node)) then continue
  • else if(GoalTest(node)) then return node
  • else exit
  • end while
  • return Failure

MT (Germann et al., ACL-2001) node ?
targetGloss(sourceSentence) while T do next
? Best( LocallyModifiedTranslationOf(node))
if(IsBetter(next, node)) then continue else
print node exit end while
Types of changes
  • Translate one or two words (j1e1j2e2)
  • Translate and insert (j e1 e2)
  • Remove word of fertility 0 (i)
  • Swap segments (i1 i2 j1 j2)
  • Join words (i1 i2)

  • Total of 77,421 possible translations attempted

(No Transcript)
(No Transcript)
How to search better?
  • MakeNode(initial-state(problem))
  • RemoveFront(Q)
  • SearchOperator-fn(node, cost-fn)
  • queuing-fn(problem, Q, (Next,Cost))

Example 1 Greedy Search MakeNode(initial-state(p
Machine Translation (Marcu and Wong,
EMNLP-2002) node ? targetGloss(sourceSentence) w
hile T do next ? Best( LocallyModifiedTranslat
ionOf(node)) if(IsBetter(next, node)) then
continue else print node exit end while
Climbing the wrong peak
What sentence is more grammatical? 1. better bart
than madonna , i say 2. i say better than bart
madonna ,
Can you make a sentence with these words? a
and apparently as be could dissimilar firing
identical neural really so things thought
Language-model stress-testing
  • Input bag of words
  • Output best sequence according to a linear
    combination of an
  • ngram LM
  • syntax-based LM (Collins, 1997)

Size 3-7 words long
  • Best searched
  • 32.3 i say better than bart madonna ,
  • Original word order
  • 41.6 better bart than madonna, i say

SBLM trained on an additional 160k WSJ
End of Class Questions
Write a Comment
User Comments (0)