A dependencybased statistical machine translation model a work in progress CJNLP 2006 - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

A dependencybased statistical machine translation model a work in progress CJNLP 2006

Description:

A dependency-based statistical machine translation model a work in progress. CJNLP 2006 ... Treelet mapping can encode arbitrary translation patterns ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 41

Provided by: bcmiSj

Category:

more less

Transcript and Presenter's Notes

Title: A dependencybased statistical machine translation model a work in progress CJNLP 2006

1
A dependency-based statistical machine
translation model a work in progressCJNLP 2006

Xiaodong SHI
Institute of Artificial Intelligence
Xiamen University
15/11/2006
Shanghai

2
Outline

Overview of Syntax based SMT
Why dependency model?
Model
Decoding
Training and Experiments

3
Overview of Syntax based SMT

Benefits of Syntax-based SMT
Can represent reordering naturally, eg by
transduction grammar rules such as
Yamada SOV ? SVO
Can represent complex translation patterns
?????? gt different with
Quirk nepas gt not
Can represent syntax constraints
agreement(person, gender,number,case )
Can use syntax-based language models

Practical benefits smaller data footprint, can
be used in SMT systems for mobile devices.
E.g. we used a dependency based WSD module in a
MT system used in our mobile phone MT system.

5
(No Transcript)
6
Major Syntax-based SMT model

Dekai WuITG(1995)
AgtB C ltB Cgt
Special case BTG
Alshawi 2000
Head transducerSimultaneous induction of source
and target simple dependency trees

Yamada,2001
Tree-To-StringThe source is parsed to a target
tree which is converted to a string
Gildea 2003
Tree-based alignment
Cmejrek, 2003
Dependency based model for Czech-English

Graehl,2004
Tree-to-string transducer, extends Yamada
Melamed, 2004
Synchronous Multitext grammar Open-sourced
GenPar
Fox,2005
Dependency based SMT
David Chiang, 2005
Hierarchical Phrase

Ding, 2005
Probabilistic synchronous dependency insertion
grammar
Quirk, 2005
Dependency tree to string, treelet
Yang Liu,2006
String-To-Tree A source tree is mapped to a
target string

10
Why dependency grammar?

No nonterminals, simpler to CFG. (easier to
visualize in a small screen)
Naturally lexicalized. (no need to introduced
lexicalized versions of )
Dependency trees are more isomorphic
cross-linguistically Fox, 2002
More amenable to deep semantic processing (e.g.
FAS)

11
a new Dependency based SMT model

Tree-To-Tree
target tree is needed to enforce grammaticality
source tree is needed to constraint reordering
Synchroneity-neutral. (parsing translation)
Synchonous bilingual constraints can be
exploited and joint optimization may be cheaper
Nonsynchonous We can use more contexts in
translation
Relaxed isomorphism Learnable mapping

12
Tree mapping is based on treelet mapping

Quirk A treelet is an arbitrary connected
subgraph (not necessarily a subtree) of a
dependency tree.
Our view A treelet is a generalized pattern of a
treelet in Quirks sense. For sentences w1w2wn,
a treelet is a structured subsequence of
F(w1)F(wn) F(wn) ( R),
where each F maps a word to some of its
features (e.g. word-form, lemma, POS, )
(definition applicable to other tree formalisms
such as CFG tree)

13
characteristics

F is an abstraction function
If F is the identity function and no structure is
considered other than the words form a contiguous
sequence the classical phrase-based SMT model
A variety of F functions can be used to induce a
richer phrase based model
F maps named entities to a few simple symbols (as
in a CASIA model). This can be used to create a
more powerful phrase based SMT model if NE are
first recognized.
F maps words to clustered word classes (as in
Ochs alignment template model)

14
characteristics

If some words are ignored but dependency
relations are intact, then we got the
generalization power of a nonterminal in a CFG
I love students
I love him
gt
I love OBJ

15
characteristics

Any variations of F can be tried
Even dependency relation can be left
underspecified to simplify the model

16
Bilingual treelet mapping

Treelet mapping need not be isomorphic.
Headedness may swap directions, and dependency
relations changed
Treelet mapping can encode arbitrary translation
patterns
Treelet mapping implicitly constrains word order
EBMT can be recast as a special case of the above
model

17
A simple example
18
DPSMT translation process

Parse the source sentence into dependency trees
Construct the target dependency trees by treelet
mapping
Select the best target tree by some probabilistic
criteria
Flatten the target tree to get the translation
Parsing/translation can be done simultaneously or
separatly

19
decoding

Bottom up dynamic programming is a natural
choice.
A beam search strategy can be used to keep mapped
treelets at each higher-up node
A loglinear model can be used to rank the
optimization function and to prune the unviable
alternatives when selecting the best partial
parses
The final target tree can be get from target
trees of the source root node

20
Optimization function

Log-linear models
Common feature functions are
Word or phrase translation probabilities
Dependency language models, e.g. bigram models
Reordering probabilities

21
Training idea

Get a bilingual Dependency treebank
Seed initial treemapping probability using the
bilingual phrase by projecting the phrase onto
the tree
Learn bilingual treelet mappings by using the EM
algorithm on bilingual treebanks

22
Where is data?

We have only
Penn treebanks
Chinese Dependency corpus (from HIT-IR)
Lots of bilingual corpora
Open source tools
We might as well use the Penn treebank (PTB 1.0
and LDC2003E07) first

23
training

preparation
We find that the Penn TreeBank 1.0 has some
oddities that needs to be accounted (4175 Chinese
sentences and only 4172 English sentences)
We find
Chinese English
643a,643b 643
1159a,1159b 1159
2477a, 2477b 2477 (some mistakes in
translation!)
We combined the three Chinese sentences in the
bilingual text file (4172), but we split the
three English sentences in the bilingual treebank
(4175)

24
training

Using giza to do two way training
inputctb1.0.txt
pregiza ctb1.0.txt
plain2snt eng chn
giza 5.cfg -s eng.vcb -t chn.vcb -c eng_chn.snt
giza 5.cfg -s chn.vcb -t eng.vcb -c chn_eng.snt

25
training

Extract bilingual phrases
utf8toansi f2e.VA3.final f2e_giza_alignment.txt
utf8toansi e2f.VA3.final e2f_giza_alignment.txt
align_word f2e_giza_alignment.txt
e2f_giza_alignment.txt 5)Heuristic
dictionary
extract_phrase
com_prob

26
training

Convert the CTB 1.0 into dependency format
java -jar Penn2Malt.jar
Convert the english translation of CTB 1.0
(LDC2003E07) to dependency treebank
Tokenize
mxpost
Maltparser f eng/option.dat

27
Algorithm to Learn the treelet mapping

Seed the treelet probability
For every bilingual tree ltC,Egt
For every phrase induced by the tree/sentence
Decide if the treelet is qualified.
If no, discard it
If yes, put it into the treelet base
generalize it, put the generalized treelets into
treelet base
Recalculate the treelet probability using counts,
goto (2)
If the number of iterations is reached, stop

What are the good treelets?
Lots of noise in bilingual phrases

29
generalize

If a sub-treelet is matched, it can be replaced
by a dont care or a POS only F

30
A small-scale experiment

Purpose to improve reordering of phrase based
SMT
We have a monotone decoder Caravan which performs
relatively well (developed for the silkroad open
source project and took part in both NIST and
IWSLT 2006)
Other teams show that a simple distance based
reordering probability does not improve BLEU score

31
Monotone is bad

We can improve the recordering by using syntax
One special case is the Chinese particle ?
N1 ? N2 gt N2 of N1
A ? N gt A N

32
training

From among 0.56M extracted phrases, we extracted
those with ? and limit our phrase projection to
treelets of maximum depth 2 (which means the
longest path from the root of the subtree to
other nodes is at most 2).
F function is word-form or POS. Only 2 words
before ? is used

33
decoding

Parse the Chinese into dependency structure to
get info to aid phrase based decoding
If the previous word of N is ?, then we consider
alternative reordering of translation of N, with
the penalty as a function of its distortion
log-probability

34
Result for NIST2005