An Integrated Phrase SegmentationAlignment Algorithm for Statistical Machine Translation - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

An Integrated Phrase SegmentationAlignment Algorithm for Statistical Machine Translation

Description:

An Integrated Phrase Segmentation/Alignment Algorithm for Statistical ... Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venogupal, Bing Zhao, Alex Waibel, ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 32
Provided by: Joy293
Category:

less

Transcript and Presenter's Notes

Title: An Integrated Phrase SegmentationAlignment Algorithm for Statistical Machine Translation


1
An Integrated Phrase Segmentation/Alignment
Algorithm for Statistical Machine Translation
  • Joy
  • Advisor Stephan Vogel and Alex Waibel

2
Outline
  • Background
  • Phrase Alignment Algorithms in SMT
  • Segmentation Approaches
  • Integrated Segmentation and Alignment Algorithm
    (ISA)
  • Experiments
  • Discussions

3
Statistical Machine Translation
  • Statistical Machine Translation (Brown et al, 93)
  • Noisy Channel Model
  • Translating from F to E
  • Given a testing sentence f, generate translation
    e, which is
  • Pr(e) Language Model (LM)
  • Pr(fe) Translation Model (TM)

4
Training
  • Training
  • Using large English corpora (e.g. Wall Street
    Journal) to train an LM
  • Using bilingual corpora (e.g. Canadian Hansard)
    to train the TM
  • To get the building blocks for Pr(fe)
  • Word to word translation or phrase to phrase
    translations
  • Reordering information
  • Other features

5
Alignment
  • Alignment for one sentence pair (e,f)
  • Suppose e has l words
  • and f has m words
  • Then alignment a can be represented as
  • Of m values, each between 0 and l.
  • aji means fj is aligned to ei, where e0
    stands for NULL word
  • In short alignment tells us which word in e is
    translated into which word in f

6
Alignment Example
7
Alignment Models
  • Alignment algorithms
  • IBM model 1 to 5 (Brown et al.)
  • HMM model similar to IBM2 (Vogel)
  • Competitive linking (Melamed)
  • Flow Network (Gaussier)
  • Others

8
IBM Model 1
  • IBM model 1
  • Easy to train
  • Simple to understand
  • Used very often in MT research
  • One serious problem for IBM models
  • Word-to-word alignment assumption

9
Phrase-to-phrase Alignment
  • Phrase-to-phrase alignment is better
  • Mismatch between languages
  • Phrases encapsulate the context of words
  • Phrases encapsulate local reordering

10
Outline
  • Background
  • Phrase Alignment Algorithms in SMT
  • Segmentation Approaches
  • Integrated Segmentation and Alignment Algorithm
    (ISA)
  • Experiments
  • Discussions

11
Alignment Algorithms
  • Based on initial word alignment
  • Train word alignment
  • Read off phrase-to-phrase alignments from Viterbi
    path
  • Examples
  • HMM phrase alignment (Vogel)
  • Alignment templates from IBM 4 (Och)
  • Bilingual bracketing (Wu, B. Zhao)
  • Popular in SMT research

12
Outline
  • Background
  • Phrase Alignment Algorithms in SMT
  • Segmentation Approaches
  • Integrated Segmentation and Alignment Algorithm
    (ISA)
  • Experiments
  • Discussions

13
Segmentation Approaches
  • Identify monolingual phrases and segment/bracket
    phrases into one unit (super-word) (Zhang 2000)
  • Train the regular word-to-word alignment
  • ?

14
Problems in Segmentation Approaches
  • Segmentation uses only monolingual information
  • Good segmentations may make alignment even harder
  • ?

15
Outline
  • Background
  • Alignment Algorithms in SMT
  • Segmentation Approaches
  • Integrated Segmentation and Alignment Algorithm
    (ISA)
  • Experiments
  • Discussions

16
Integrated Segmentation and Alignment
  • Lets look at an example first

17
Integrated Segmentation and Alignment
  • Represent a sentence pair (e,f) as a matrix D
  • D(i,j) I(ei,fj). I is a modified point-wise
    mutual information
  • A partition over D is a series of non-overlapping
    rectangle regions d1, d2,,dm.
  • Region dk(rs,re,cs,ce) indicates
  • are aligned to
  • Segmentation and alignment are achieved at the
    same time

18
Integrated Segmentation and Alignment
  • Best partition should yield maximum
  • Computationally intractable to search all
    possible partitions
  • Exponential to sentence length
  • DP not a good idea.
  • An optimal policy has the property that whatever
    the initial state and the initial decisions are,
    the remaining decisions must constitute an
    optimal policy with regard to the state resulting
    from the first decision. -- Richard Bellman's
    Principle of Optimality
  • But here, decision of how to expand the first
    cell changes the search space for the rest of the
    cells
  • Using a computationally cheap algorithm to find
    the good partitions

19
An Example
20
Computational Cheap Algorithm
  • Assumption
  • if the translation for e1e2 is f, I(e1 , f)
    should be very similar to I(e2 , f).
  • Example
  • Algorithm
  • Step1 find the cell in D with max value of I
  • Step2 expand this cell to a rectangle region
    where all cells in the region has similar I as
    this cell
  • Repeat Step1 and Step2 until no more regions can
    be found

21
Example Apply the Algorithm
22
Estimate the probabilities for phrase translations
  • The decoder needs the conditional probabilities
    P(fe)
  • Can not be estimated directly data sparseness
  • Convert I(f,e) to P(fe)
  • IBM model 1 style
  • Context-dependent style
  • where
  • and

23
Outline
  • Background
  • Phrase Alignment Algorithms in SMT
  • Segmentation Approaches
  • Integrated Segmentation and Alignment Algorithm
    (ISA)
  • Experiments
  • Discussions

24
Experiments
  • Chinese-English
  • Small data track
  • Evaluation NIST score against 4 human references

25
Results
  • Baseline IBM model1 HMM phrase
  • Compare to using ISA only, and ISABaseline

26
T-test
  • Student's t-test at the sentence level

27
Compared to IBM1
  • Large data track (2.6M English words, 414K
    Chinese words)

28
No IBM1 is Better
  • Small data track (LDCIBM1ISA)
  • ISA is better even on unigram match than IBM1

29
Summary
  • Integrated Alignment and Segmentation
  • Simple algorithm
  • Enhanced translation quality
  • Better than IBM models
  • Higher quality than HMM alignment
  • A major component in the CMU SMT system

30
ISA Toolkit
  • Location
  • /afs/cs.cmu.edu/user/joy/Release/PhraseAlign
  • Documentation
  • /afs/cs.cmu.edu/user/joy/Release/PhraseAlign/docum
    entation/readme.txt
  • Speed
  • Example 4172 sentence pairs (133K En words, 20K
    Ch words)
  • About 160 seconds for the alignment (10 loops for
    each sentence pair)

31
Selected References
  • Franz Josef Och, Christoph Tillmann, Hermann Ney,
    Improved Alignment Models for Statistical
    Machine Translation, Proceedings of the Joint
    Conference of Empirical Methods in Natural
    Language Processing and Very Large Corpora, pp.
    20-28. University of Maryland, College Park, MD,
    June 1999.
  • Stephan Vogel, Hermann Ney, and Christoph
    Till-mann, HMM-based Word Alignment in
    Statistical Translation, Proceedings of COLING
    '96 The 16th International Conference on
    Computational Linguistics, pp. 836-841.
    Copenhagen, August 1996.
  • Stephan Vogel, Ying Zhang, Fei Huang, Alicia
    Tribble, Ashish Venogupal, Bing Zhao, Alex
    Waibel, The CMU Statistical Translation System,
    to appear in the Proceedings of MT Summit IX, New
    Orleans, LA, U.S.A., September 2003.
  • Ying Zhang, Ralf D. Brown, Robert E. Frederking
    and Alon Lavie, Pre-processing of Bilingual
    Corpora for Mandarin-English EBMT, Proceedings
    of MT Summit VIII, Santiago de Compostela, Spain,
    September 2001.
  • Ying Zhang, Stephan Vogel, Alex Waibel,
    "Integrated Phrase Segmentation and Alignment
    Algorithm for Statistical Machine Translation,"
    in the Proceedings of International Conference on
    Natural Language Processing and Knowledge
    Engineering (NLP-KE'03), Beijing, China, October
    2003.
Write a Comment
User Comments (0)
About PowerShow.com