An Integrated Phrase SegmentationAlignment Algorithm for Statistical Machine Translation - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

An Integrated Phrase SegmentationAlignment Algorithm for Statistical Machine Translation

Description:

An Integrated Phrase Segmentation/Alignment Algorithm for Statistical ... Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venogupal, Bing Zhao, Alex Waibel, ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 32

Provided by: Joy293

Category:

more less

Transcript and Presenter's Notes

Title: An Integrated Phrase SegmentationAlignment Algorithm for Statistical Machine Translation

1
An Integrated Phrase Segmentation/Alignment
Algorithm for Statistical Machine Translation

Joy
Advisor Stephan Vogel and Alex Waibel

2
Outline

Background
Phrase Alignment Algorithms in SMT
Segmentation Approaches
Integrated Segmentation and Alignment Algorithm
(ISA)
Experiments
Discussions

3
Statistical Machine Translation

Statistical Machine Translation (Brown et al, 93)
Noisy Channel Model
Translating from F to E
Given a testing sentence f, generate translation
e, which is
Pr(e) Language Model (LM)
Pr(fe) Translation Model (TM)

4
Training

Training
Using large English corpora (e.g. Wall Street
Journal) to train an LM
Using bilingual corpora (e.g. Canadian Hansard)
to train the TM
To get the building blocks for Pr(fe)
Word to word translation or phrase to phrase
translations
Reordering information
Other features

5
Alignment

Alignment for one sentence pair (e,f)
Suppose e has l words
and f has m words
Then alignment a can be represented as
Of m values, each between 0 and l.
aji means fj is aligned to ei, where e0
stands for NULL word
In short alignment tells us which word in e is
translated into which word in f

6
Alignment Example
7
Alignment Models

Alignment algorithms
IBM model 1 to 5 (Brown et al.)
HMM model similar to IBM2 (Vogel)
Competitive linking (Melamed)
Flow Network (Gaussier)
Others

8
IBM Model 1

IBM model 1
Easy to train
Simple to understand
Used very often in MT research
One serious problem for IBM models
Word-to-word alignment assumption

9
Phrase-to-phrase Alignment

Phrase-to-phrase alignment is better
Mismatch between languages
Phrases encapsulate the context of words
Phrases encapsulate local reordering

10
Outline

Background
Phrase Alignment Algorithms in SMT
Segmentation Approaches
Integrated Segmentation and Alignment Algorithm
(ISA)
Experiments
Discussions

11
Alignment Algorithms

Based on initial word alignment
Train word alignment
Read off phrase-to-phrase alignments from Viterbi
path
Examples
HMM phrase alignment (Vogel)
Alignment templates from IBM 4 (Och)
Bilingual bracketing (Wu, B. Zhao)
Popular in SMT research

12
Outline

Background
Phrase Alignment Algorithms in SMT
Segmentation Approaches
Integrated Segmentation and Alignment Algorithm
(ISA)
Experiments
Discussions

13
Segmentation Approaches

Identify monolingual phrases and segment/bracket
phrases into one unit (super-word) (Zhang 2000)
Train the regular word-to-word alignment
?

14
Problems in Segmentation Approaches

Segmentation uses only monolingual information
Good segmentations may make alignment even harder
?

15
Outline

Background
Alignment Algorithms in SMT
Segmentation Approaches
Integrated Segmentation and Alignment Algorithm
(ISA)
Experiments
Discussions

16
Integrated Segmentation and Alignment

Lets look at an example first

17
Integrated Segmentation and Alignment

Represent a sentence pair (e,f) as a matrix D
D(i,j) I(ei,fj). I is a modified point-wise
mutual information
A partition over D is a series of non-overlapping
rectangle regions d1, d2,,dm.
Region dk(rs,re,cs,ce) indicates
are aligned to
Segmentation and alignment are achieved at the
same time

18
Integrated Segmentation and Alignment

Best partition should yield maximum
Computationally intractable to search all
possible partitions
Exponential to sentence length
DP not a good idea.
An optimal policy has the property that whatever
the initial state and the initial decisions are,
the remaining decisions must constitute an
optimal policy with regard to the state resulting
from the first decision. -- Richard Bellman's
Principle of Optimality
But here, decision of how to expand the first
cell changes the search space for the rest of the
cells
Using a computationally cheap algorithm to find
the good partitions

19
An Example
20
Computational Cheap Algorithm

Assumption
if the translation for e1e2 is f, I(e1 , f)
should be very similar to I(e2 , f).
Example
Algorithm
Step1 find the cell in D with max value of I
Step2 expand this cell to a rectangle region
where all cells in the region has similar I as
this cell
Repeat Step1 and Step2 until no more regions can
be found

21
Example Apply the Algorithm
22
Estimate the probabilities for phrase translations

The decoder needs the conditional probabilities
P(fe)
Can not be estimated directly data sparseness
Convert I(f,e) to P(fe)
IBM model 1 style
Context-dependent style
where
and

23
Outline

Background
Phrase Alignment Algorithms in SMT
Segmentation Approaches
Integrated Segmentation and Alignment Algorithm
(ISA)
Experiments
Discussions

24
Experiments

Chinese-English
Small data track
Evaluation NIST score against 4 human references

25
Results

Baseline IBM model1 HMM phrase
Compare to using ISA only, and ISABaseline

26
T-test

Student's t-test at the sentence level

27
Compared to IBM1

Large data track (2.6M English words, 414K
Chinese words)

28
No IBM1 is Better

Small data track (LDCIBM1ISA)
ISA is better even on unigram match than IBM1

29
Summary

Integrated Alignment and Segmentation
Simple algorithm
Enhanced translation quality
Better than IBM models
Higher quality than HMM alignment
A major component in the CMU SMT system

30
ISA Toolkit

Location
/afs/cs.cmu.edu/user/joy/Release/PhraseAlign
Documentation
/afs/cs.cmu.edu/user/joy/Release/PhraseAlign/docum
entation/readme.txt
Speed
Example 4172 sentence pairs (133K En words, 20K
Ch words)
About 160 seconds for the alignment (10 loops for
each sentence pair)

31
Selected References

Franz Josef Och, Christoph Tillmann, Hermann Ney,
Improved Alignment Models for Statistical
Machine Translation, Proceedings of the Joint
Conference of Empirical Methods in Natural
Language Processing and Very Large Corpora, pp.
20-28. University of Maryland, College Park, MD,
June 1999.
Stephan Vogel, Hermann Ney, and Christoph
Till-mann, HMM-based Word Alignment in
Statistical Translation, Proceedings of COLING
'96 The 16th International Conference on
Computational Linguistics, pp. 836-841.
Copenhagen, August 1996.
Stephan Vogel, Ying Zhang, Fei Huang, Alicia
Tribble, Ashish Venogupal, Bing Zhao, Alex
Waibel, The CMU Statistical Translation System,
to appear in the Proceedings of MT Summit IX, New
Orleans, LA, U.S.A., September 2003.
Ying Zhang, Ralf D. Brown, Robert E. Frederking
and Alon Lavie, Pre-processing of Bilingual
Corpora for Mandarin-English EBMT, Proceedings
of MT Summit VIII, Santiago de Compostela, Spain,
September 2001.
Ying Zhang, Stephan Vogel, Alex Waibel,
"Integrated Phrase Segmentation and Alignment
Algorithm for Statistical Machine Translation,"
in the Proceedings of International Conference on
Natural Language Processing and Knowledge
Engineering (NLP-KE'03), Beijing, China, October
2003.