Statistically Motivated Example-based Machine Translation using Translation Memory - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Statistically Motivated Example-based Machine Translation using Translation Memory

Description:

Statistically Motivated Example-based Machine Translation using Translation Memory Sandipan Dandapat , Sara Morrissey, Sudip Kumar Naskar, Harold Somers – PowerPoint PPT presentation

Number of Views:233

Avg rating:3.0/5.0

Slides: 27

Provided by: SaraMor2

Category:

more less

Transcript and Presenter's Notes

Title: Statistically Motivated Example-based Machine Translation using Translation Memory

1
Statistically Motivated Example-based Machine
Translation using Translation Memory

Sandipan Dandapat , Sara Morrissey, Sudip Kumar
Naskar, Harold Somers
CNGL, School of Computing, DCU

2
Introduction

Machine Translation is the process of automatic
encoding of information (syntactic and semantic)
from one language to another language.
RBMT is characterized by linguistic rules
SMT mathematical model based on probability
distribution from parallel corpus
EBMT integrates both rule-based and
data-driven techniques
EBMT is often linked to another related
technique, Translation Memory (TM) stores
past translation in database
Both EBMT and TM have the idea of using existing
translations
But, EBMT is an automated technique for
translation whereas TM is an interactive tool for
human translators

3
Introduction
SMT Vs. EBMT
Works well when a significant amount of training data is available Can be developed with a limited example-base
Good for open domain translation Good for restricted domain works well when test and training set are close
Has shown difficulties with free word order language Reuse the segment of a test sentence that can be found in the source side of the example base
4
Our Attempt

We try to use EBMT and TM to tackle the
English-Bangla language pair
Proved troublesome with low BLEU score for
various SMT approaches (Islam et. al., 2010)
We attempt to translate medical-receptionist
dialogues, primarily for appointment scheduling
Our Goal
Integrate EBMT and TM for better translation for
restricted domain
EBMT helps to find the closest match and TM is
good for translating segments of a sentence

5
Creating English -Bangla Parallel Corpus

Task to create manually translated
English-Bangla parallel corpus for training
Points to consider native speakers, translation
challenges ? literal vs. explicit
Methodology
Manual translation by native speaker
Discussions on translation conventions

Corpus example
English
Hello, can I get an appointment sometime later
this week?
Bangla

???????, ?? ???????? ????? ????, ??? ??? ???? ????????????? ????? ???? ???
6
Notes on Translation Challenges

Non-alteration of source text
Literal translation of source
Which doctor would you prefer?
I dont mind
Bangla

7
Size and Type of the Corpora

Due to involvement of aforementioned stages, it
is time-consuming to collect large amount of
medical-receptionist dialogue
Thus, our corpus comprises 380 dialogue turns
In transcription, this works out at just under
3000 words (8 words per dialogue)
A very small corpus by any standard

8
Note on Size of the Corpora

How many examples are needed to adopt any
data-driven MT system?

System Language Pair Size
TTL English ? Turkish 488
TDMT English ? Japanese 350
EDGAR German ? English 303
ReVerb English ? German 214
ReVerb Irish ? English 120
METLA-1 English ? French 29
METLA English ? Urdu 7

No SMT system developed with only 380 parallel
sentences
But, many EBMT systems have been developed with
such a small corpus

9
Structure of the Corpus

Medical receptionist dialogue is comprised of
very similarly structured sentences

Example
(1) a. I need a medical for my insurance company.
b. I need a medical for my new job.
(2) a. The doctor told me to come back for a
follow up appointment.
b. The doctor told me to call back in a week.

Thus, it might be helpful to reuse the
translation of common parts while translating new
sentences
This leads us to use EBMT

10
Main Idea

Input
Ok, I have booked you in for eleven fifteen on
Friday with Dr. Thomas.
Fuzzy match in the example-base
Ok, I have booked you in for three thirty on
Thursday with Dr. Kelly.
gtPart of the translation from example-base fuzzy
match
Part of the translation Translation Memory or
SMT
Ok, I have booked you in for eleven fifteen on
Friday with Dr. Thomas.

11
Building Translation Memory (TM)

We build TM automatically from our small
patient-dialogue corpus
We use Moses to build two TMs
Aligned phrase pairs from the Moses phrase table
(phrase table - PT)
Aligned word pairs based on GIZA (lexical
table - LT)

LT LT
hello ?????? ???????
eleven ?????? ??????
PT PT
come in on friday instead ? ???????? ???????? ???? ?????? ?
, but dr finn , ?????? ??? ???

We keep all the target equivalents of a source
phrase in the TM which are stored in a sorted
order based on the phrase translation probability

12
Building Translation Memory (TM)

We build TM automatically from our small
patient-dialogue corpus
We use Moses to build two TMs
Aligned phrase pairs from the Moses phrase table
(phrase table - PT)
Aligned word pairs based on GIZA (lexical
table - LT)

We keep all the target equivalents of a source
phrase in the TM which are stored in a sorted
order based on the phrase translation probability

13
Our Approach

Our EBMT system, like most, has three stages
Matching find closest match with the input
sentence
Adoptability find translation of the desired
segments
Recombination combined the translation of
desired segments

14
Matching

We find the closest sentence (Sc) from the
example base for the input sentence (S) to be
translated
We have used a word based edit distance metric to
find out this closest match sentence from the
example base ( ).

S Ok, I have booked you in for eleven fifteen
on Friday with Dr. Thomas. Sc Ok, I have booked
you in for three thirty on Thursday with Dr.
Kelly.
15
Matching

We consider the associated translation ( Sct) of
Sc as the skeleton translation of the input
sentence S

S Ok, I have booked you in for eleven fifteen
on Friday with Dr. Thomas. Sc Ok, I have booked
you in for three thirty on Thursday with Dr.
Kelly. Sct ????? , ??? ????? ???? ???????????
????? ?????? ??? ????? ???? ???? ????? ? AchchhA
, Ami ApanAra janya bRRihaspatibAra tinaTe
tirishe DAH kelira sAthe buk karechhi.
We will use some segment of the Sct to produce a
new translation
16
Adaptability

We extract the translation of the inappropriate
fragments of the input sentence (S)
To do this, we align three sentences the input
(S), the closest source-side match (Sc) and its
target equivalent (Sct)

Mark the mismatched portion between input
sentence (S) and the closest source-side match
(Sc) using edit-distance
S ok , ive booked you in for lteleven
fifteengt on ltfridaygt with dr ltthomasgt .
Sc ok , ive booked you in for ltthree thirtygt
on ltthursdaygt with dr ltkellygt .

17
Adaptability

We extract the translation of the inappropriate
fragments of the input sentence (S)
To do this, we align three sentences the input
(S), the closest source-side match (Sc) and its
target equivalent (Sct)

Further we align the mismatch portion of Sc with
its associated translation Sct using our TMs (PT
and LT)
S ok , ive booked you in for lteleven
fifteengt on ltfridaygt with dr ltthomasgt .
Sc ok , ive booked you in for ltthree thirtygt
on ltthursdaygt with dr ltkellygt .
Sct ?????, ??? ????? ???? lt1???????????gt
lt0????? ?????gt? ??? lt2????gt? ???? ???? ??????

The number in the angular bracket keeps track of
the order of the appropriate fragments

18
Recombination
Substitute, add or delete the segments from the
input sentence (S) with the skeleton translation
equivalent (Sct) S ok , ive booked you in
for lteleven fifteengt on ltfridaygt with dr
ltthomasgt . Sc ok , ive booked you in for
ltthree thirtygt on ltthursdaygt with dr ltkellygt .
Sct?????, ??? ????? ???? lt1???????????gt
lt0????? ?????gt? ??? lt2????gt? ???? ???? ??????
gtgt ?????, ??? ????? ???? lt1Fridaygt lt0eleven
fifteengt? ??? lt2Thomasgt? ???? ???? ????? ?

Possible ways of obtaining Tx
Tx SMT(x)
Tx PT(x)

19
Recombination Algorithm
20
Experiments

We conduct 5 different experiments
Baseline
SMT use OpenMaTrEx (http//www.openmatrex.org)
EBMT based on the matching step. We consider
the skeleton translation as the desired output
Our Approach
EBMT TM (PT) uses only phrase table during
recombination
EBMT TM(PT,LT) using both phase- and lexical-
table during recombination
EBMT SMT untranslated segments are translated
using SMT

21
Results

Data used for the experiment
Training Data 381 parallel sentences
Test Data 41 sentences disjoint from the
training set
We use BLEU and NIST score for automatic
evaluation

System BLEU NIST
SMT 39.32 4.84
EBMT 50.38 5.32
EBMT TM(PT) 57.47 5.92
EBMT TM(PT,LT) 57.56 6.00
EBMTSMT 52.01 5.51
22
Results

Manual evaluation 4 different native speakers
were asked to rate the translations using the two
scales

Fluency Adequacy
5 Flawless Bangla 5 All
4 Good Bangla 4 Most
3 Non-native Bangla 3 Much
2 Disfluent Bangla 2 Little
1 Incomprehensible 1 None
System Fluency Adequacy
SMT 3.00 3.16
EBMTTM(PT) 3.50 3.55
EBMTTM(PT,LT) 3.50 3.70
EBMTSMT 3.44 3.52
23
Example Translations
24
Assessment of Error Types

Wrong source-target alignment in the phrase table
and lexical table resulting in an incorrect
alignment

25
Assessment of Error Types

Generates erroneous translation during
recombination

in a few minutes in a few minutes in - a.
???? (niYe) b.???? ???? (niYe Asate) c.????
(Asuna). a few minutes - ???? ????? ??????
(kaYeka miniTa derite) in a few minutes ????
???? ????? ?????? (niYe kaYeka miniTa derite)
26
Observations

Baseline EBMT has higher accuracy in all metrics
compared to the baseline SMT system
Combination of EBMT and TM has better accuracy
than both the baseline SMT and EBMT system
The combination of SMT with EBMT has some
improvement over baseline EBMT but has lower
accuracy than combination of TM with EBMT

27
Conclusion and Future Work

We have shown initial investigations for
combining TM in an EBMT framework
The integration of TM with EBMT has improved the
translation quality

The error shows that a syntax based matching and
adaptation might help to reduce false positive
adaptations
Use of morpho-syntactic information during
recombination might improve the translation
quality

28
Thank you!
Questions?

Write a Comment

User Comments (0)