Wrapper Syntax for ExampleBased Machine Translation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Wrapper Syntax for ExampleBased Machine Translation

Description:

El presidente, un rival de largo plazo de Bill Gates, tiene gusto de repartos. ... The chairman, a long-time rival of Bill Gates, likes fast and confidential deals. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 21
Provided by: karolinao
Category:

less

Transcript and Presenter's Notes

Title: Wrapper Syntax for ExampleBased Machine Translation


1
Wrapper Syntax for Example-Based Machine
Translation
  • Karolina Owczarzak, Bart Mellebeek, Declan
    Groves, Josef Van Genabith, Andy Way
  • National Centre for Language Technology
  • School of Computing
  • Dublin City University

2
Overview
  • TransBooster wrapper technology for MT
  • motivation
  • decomposition process
  • variables and template contexts
  • recomposition
  • Example-Based Machine Translation
  • marker-based EBMT
  • Experiment
  • English-Spanish
  • Europarl, Wall Street Journal section of Penn II
    Treebank
  • automatic and manual evaluation
  • Comparison with previous experiments

3
TransBooster wrapper technology for MT
  • Assumption
  • MT systems perform better at translating short
    sentences than long ones.
  • Decompose long sentences into shorter and
    syntactically simpler chunks, send to
    translation, recompose on output
  • Decomposition linguistically guided by syntactic
    parse of the sentence

4
TransBooster wrapper technology for MT
  • TransBooster technology is universal and can be
    applied to any MT system
  • Experiments to date
  • TB and Rule-Based MT (Mellebeek et al., 2005a,b)
  • TB and Statistical MT (Mellebeek et al., 2006a)
  • TB and Multi-Engine MT (Mellebeek et al., 2006b)
  • TransBooster outperforms baseline MT systems

5
TransBooster decomposition
  • Input syntactically parsed sentence (Penn II
    format)
  • Decompose into pivot and satellites
  • pivot usually main predicate (plus additional
    material)
  • satellites arguments and adjuncts
  • Recursively decompose satellites if longer than x
    leaves
  • Replace satellites around pivot with variables
  • static simple same-type phrases with known
    translation
  • dynamic simplified version of original
    satellites
  • send off to translation
  • Insert each satellite into a template context
  • static simple predicate with known translation
  • dynamic simpler version of original clause
    (pivot simplified arguments, no adjuncts)
  • send off to translation

6
TransBooster decomposition example
  • (S (NP (NP (DT the) (NN chairman)) (, ,) (NP (NP
    (DT a) (JJ long-time) (NN rival)) (PP (IN of) (NP
    (NNP Bill) (NNP Gates)))) (, ,)) (VP (VBZ likes)
    (NP (ADJP (JJ fast) (CC and) (JJ confidential))
    (NNS deals))) (. .))
  • The chairman, a long-time rival of Bill
    Gates,ARG1 likespivot fast and confidential
    dealsARG2.

The manV1 likespivot carsV2. The chairman,
a long-time rival of Bill Gates,ARG1 is
sleepingV1. The man seesV1 fast and
confidential dealsARG2.
The chairmanV1 likespivot dealsV2. The
chairman, a long-time rival of Bill Gates,ARG1
likes dealsV1. The chairman likesV1 fast and
confidential dealsARG2.
MT engine
7
TransBooster recomposition
  • MT output a set of translations with dynamic and
    static variables and contexts for a sentence S
  • Remove translations of dynamic variables and
    contexts from translation of S
  • If unsuccessful, back off to translation with
    static variables and contexts, remove those
  • Recombine translated pivot and satellites into
    output sentence

8
TransBooster recomposition example
The chairman, a long-time rival of Bill Gates,
likes fast and confidential deals.
The chairmanV1 likespivot dealsV2. - El
presidente tiene gusto de repartos. The
chairman, a long-time rival of Bill Gates,ARG1
likes dealsV1. - El presidente, un rival de
largo plazo de Bill Gates, tiene gusto de
repartos. The chairman likesV1 fast and
confidential dealsARG2. - El presidente tiene
gusto de repartos rápidos y confidenciales.
The manV1 likespivot carsV2. - El hombre
tiene gusto de automóviles. The chairman, a
long-time rival of Bill Gates,ARG1 is
sleepingV1. - El presidente, un rival de largo
plazo de Bill Gates, está durmiendo. The man
seesV1 fast and confidential dealsARG2. - El
hombre ve repartos rápidos y confidenciales.
El presidente, un rival de largo plazo de Bill
Gates, tiene gusto de repartos rápidos y
confidenciales.
Original translation El presidente, rival de
largo plazo de Bill Gates, gustos ayuna y los
repartos confidenciales.
9
EBMT Overview
  • An aligned bilingual corpus
  • Input text is matched against this corpus
  • The best match is found and a translation is
    produced

EX (input)
search
F2 F4
FX (output)
10
EBMT Marker-Based Chunking
the,a,these
le,la,l,une,un,ces.. on, of
sur, d
.. English phrase on virtually all uses of
asbestos French translation sur virtuellement
tous usages dasbeste on virtually
all uses of
asbestos sur virtuellement tous
usages d asbeste
Marker Chunks on virtually sur
virtuellement all uses tous
usages of asbestos dasbeste
Lexical Chunks on sur
virtually virtuellement all tous
uses usages of d
asbestos asbeste
11
EBMT System Overview
12
Experiment
  • English - Spanish
  • Two test sets
  • Wall Street Journal section of Penn II Treebank
    800 sentences
  • Europarl 800 sentences
  • Out-of-domain factor
  • TransBooster developed on perfect Penn II trees
  • EBMT trained on 958K English-Spanish Europarl
    sentences

13
Experiment Results
Automatic evaluation
  • Results for EBMT vs TransBooster on 741-sentence
    test set from Europarl.

Results for EBMT vs TransBooster on 800-sentence
test set from Penn II Treebank.
14
Experiment - Results
Manual evaluation
  • 100 randomly selected sentences from EP test set
  • source English sentence
  • EBMT translation
  • EBMT TransBooster translation
  • 3 judges, native speakers of Spanish fluent in
    English
  • Accuracy and fluency relative scale for
    comparing the two translations

Inter-judge agreement (Kappa) Fluency 0.948,
Accuracy 0.926
Absolute quality gain when using TransBooster
Fluency 19.33 of sentences Accuracy 15.67 of
sentences
15
Experiment Results
TB improvements
Example 1 Source women have decided that they
wish to work, that they wish to make their
work compatible with their family life. EBMT
hemos decidido su deseo de trabajar, su deseo de
hacer su trabajo compatible con su vida
familiar. empresarias TB mujeres han decidido
su deseo de trabajar, su deseo de hacer su
trabajo compatible con su vida familiar.
Example 2 Source if this global warming
continues, then part of the territory of the eu
member states will become sea or desert. EBMT
si esto continúa calentamiento global, tanto
dentro del territorio de los estados miembros
tendrán tornarse altamar o desértico TB si esto
calentamiento global perdurará, entonces parte
del territorio de los estados miembros de la
unión europea tendrán tornarse altamar o desértico
16
Previous experiments
  • TransBooster vs. SMT on 800-sentence test set
  • from Europarl.

TransBooster vs. EBMT on 800-sentence test set
from Europarl.
TransBooster vs. EBMT on 800-sentence test set
from Penn II Treebank.
17
Previous experiments
  • TransBooster vs. SMT on 800-sentence test set
    from Europarl.

TransBooster vs. EBMT on 800-sentence test set
from Europarl.
18
Previous experiments
TransBooster vs. Rule-Based MT on 800-sentence
test set from Penn II Treebank.
TransBooster vs. SMT on 800-sentence test set
from Penn II Treebank.
TransBooster vs. EBMT on 800-sentence test set
from Penn II Treebank.
19
Summary
  • TransBooster is a universal technology to
    decompose and recompose MT text
  • Net improvement in translation quality against
    EBMT
  • Fluency 19.33 of sentences Accuracy 15.67
    of sentences
  • Successful experiments to date rule-based MT,
    phrase-based SMT, multi-engine MT, EBMT
  • Journal article in preparation

20
Thank You
Write a Comment
User Comments (0)
About PowerShow.com