A dependencybased statistical machine translation model a work in progress CJNLP 2006 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

A dependencybased statistical machine translation model a work in progress CJNLP 2006

Description:

A dependency-based statistical machine translation model a work in progress. CJNLP 2006 ... Treelet mapping can encode arbitrary translation patterns ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 41
Provided by: bcmiSj
Category:

less

Transcript and Presenter's Notes

Title: A dependencybased statistical machine translation model a work in progress CJNLP 2006


1
A dependency-based statistical machine
translation model a work in progressCJNLP 2006
  • Xiaodong SHI
  • Institute of Artificial Intelligence
  • Xiamen University
  • 15/11/2006
  • Shanghai

2
Outline
  • Overview of Syntax based SMT
  • Why dependency model?
  • Model
  • Decoding
  • Training and Experiments

3
Overview of Syntax based SMT
  • Benefits of Syntax-based SMT
  • Can represent reordering naturally, eg by
    transduction grammar rules such as
  • Yamada SOV ? SVO
  • Can represent complex translation patterns
  • ?????? gt different with
  • Quirk nepas gt not
  • Can represent syntax constraints
  • agreement(person, gender,number,case )
  • Can use syntax-based language models

4
  • Practical benefits smaller data footprint, can
    be used in SMT systems for mobile devices.
  • E.g. we used a dependency based WSD module in a
    MT system used in our mobile phone MT system.

5
(No Transcript)
6
Major Syntax-based SMT model
  • Dekai WuITG(1995)
  • AgtB C ltB Cgt
  • Special case BTG
  • Alshawi 2000
  • Head transducerSimultaneous induction of source
    and target simple dependency trees

7
  • Yamada,2001
  • Tree-To-StringThe source is parsed to a target
    tree which is converted to a string
  • Gildea 2003
  • Tree-based alignment
  • Cmejrek, 2003
  • Dependency based model for Czech-English

8
  • Graehl,2004
  • Tree-to-string transducer, extends Yamada
  • Melamed, 2004
  • Synchronous Multitext grammar Open-sourced
    GenPar
  • Fox,2005
  • Dependency based SMT
  • David Chiang, 2005
  • Hierarchical Phrase

9
  • Ding, 2005
  • Probabilistic synchronous dependency insertion
    grammar
  • Quirk, 2005
  • Dependency tree to string, treelet
  • Yang Liu,2006
  • String-To-Tree A source tree is mapped to a
    target string

10
Why dependency grammar?
  • No nonterminals, simpler to CFG. (easier to
    visualize in a small screen)
  • Naturally lexicalized. (no need to introduced
    lexicalized versions of )
  • Dependency trees are more isomorphic
    cross-linguistically Fox, 2002
  • More amenable to deep semantic processing (e.g.
    FAS)

11
a new Dependency based SMT model
  • Tree-To-Tree
  • target tree is needed to enforce grammaticality
  • source tree is needed to constraint reordering
  • Synchroneity-neutral. (parsing translation)
  • Synchonous bilingual constraints can be
    exploited and joint optimization may be cheaper
  • Nonsynchonous We can use more contexts in
    translation
  • Relaxed isomorphism Learnable mapping

12
Tree mapping is based on treelet mapping
  • Quirk A treelet is an arbitrary connected
    subgraph (not necessarily a subtree) of a
    dependency tree.
  • Our view A treelet is a generalized pattern of a
    treelet in Quirks sense. For sentences w1w2wn,
    a treelet is a structured subsequence of
  • F(w1)F(wn) F(wn) ( R),
  • where each F maps a word to some of its
    features (e.g. word-form, lemma, POS, )
  • (definition applicable to other tree formalisms
    such as CFG tree)

13
characteristics
  • F is an abstraction function
  • If F is the identity function and no structure is
    considered other than the words form a contiguous
    sequence the classical phrase-based SMT model
  • A variety of F functions can be used to induce a
    richer phrase based model
  • F maps named entities to a few simple symbols (as
    in a CASIA model). This can be used to create a
    more powerful phrase based SMT model if NE are
    first recognized.
  • F maps words to clustered word classes (as in
    Ochs alignment template model)

14
characteristics
  • If some words are ignored but dependency
    relations are intact, then we got the
    generalization power of a nonterminal in a CFG
  • I love students
  • I love him
  • gt
  • I love OBJ

15
characteristics
  • Any variations of F can be tried
  • Even dependency relation can be left
    underspecified to simplify the model

16
Bilingual treelet mapping
  • Treelet mapping need not be isomorphic.
    Headedness may swap directions, and dependency
    relations changed
  • Treelet mapping can encode arbitrary translation
    patterns
  • Treelet mapping implicitly constrains word order
  • EBMT can be recast as a special case of the above
    model

17
A simple example
18
DPSMT translation process
  • Parse the source sentence into dependency trees
  • Construct the target dependency trees by treelet
    mapping
  • Select the best target tree by some probabilistic
    criteria
  • Flatten the target tree to get the translation
  • Parsing/translation can be done simultaneously or
    separatly

19
decoding
  • Bottom up dynamic programming is a natural
    choice.
  • A beam search strategy can be used to keep mapped
    treelets at each higher-up node
  • A loglinear model can be used to rank the
    optimization function and to prune the unviable
    alternatives when selecting the best partial
    parses
  • The final target tree can be get from target
    trees of the source root node

20
Optimization function
  • Log-linear models
  • Common feature functions are
  • Word or phrase translation probabilities
  • Dependency language models, e.g. bigram models
  • Reordering probabilities

21
Training idea
  • Get a bilingual Dependency treebank
  • Seed initial treemapping probability using the
    bilingual phrase by projecting the phrase onto
    the tree
  • Learn bilingual treelet mappings by using the EM
    algorithm on bilingual treebanks

22
Where is data?
  • We have only
  • Penn treebanks
  • Chinese Dependency corpus (from HIT-IR)
  • Lots of bilingual corpora
  • Open source tools
  • We might as well use the Penn treebank (PTB 1.0
    and LDC2003E07) first

23
training
  • preparation
  • We find that the Penn TreeBank 1.0 has some
    oddities that needs to be accounted (4175 Chinese
    sentences and only 4172 English sentences)
  • We find
  • Chinese English
  • 643a,643b 643
  • 1159a,1159b 1159
  • 2477a, 2477b 2477 (some mistakes in
    translation!)
  • We combined the three Chinese sentences in the
    bilingual text file (4172), but we split the
    three English sentences in the bilingual treebank
    (4175)

24
training
  • Using giza to do two way training
  • inputctb1.0.txt
  • pregiza ctb1.0.txt
  • plain2snt eng chn
  • giza 5.cfg -s eng.vcb -t chn.vcb -c eng_chn.snt
  • giza 5.cfg -s chn.vcb -t eng.vcb -c chn_eng.snt

25
training
  • Extract bilingual phrases
  • utf8toansi f2e.VA3.final f2e_giza_alignment.txt
  • utf8toansi e2f.VA3.final e2f_giza_alignment.txt
  • align_word f2e_giza_alignment.txt
    e2f_giza_alignment.txt 5)Heuristic
  • dictionary
  • extract_phrase
  • com_prob

26
training
  • Convert the CTB 1.0 into dependency format
  • java -jar Penn2Malt.jar
  • Convert the english translation of CTB 1.0
    (LDC2003E07) to dependency treebank
  • Tokenize
  • mxpost
  • Maltparser f eng/option.dat

27
Algorithm to Learn the treelet mapping
  • Seed the treelet probability
  • For every bilingual tree ltC,Egt
  • For every phrase induced by the tree/sentence
  • Decide if the treelet is qualified.
  • If no, discard it
  • If yes, put it into the treelet base
  • generalize it, put the generalized treelets into
    treelet base
  • Recalculate the treelet probability using counts,
    goto (2)
  • If the number of iterations is reached, stop

28
  • What are the good treelets?
  • Lots of noise in bilingual phrases

29
generalize
  • If a sub-treelet is matched, it can be replaced
    by a dont care or a POS only F

30
A small-scale experiment
  • Purpose to improve reordering of phrase based
    SMT
  • We have a monotone decoder Caravan which performs
    relatively well (developed for the silkroad open
    source project and took part in both NIST and
    IWSLT 2006)
  • Other teams show that a simple distance based
    reordering probability does not improve BLEU score

31
Monotone is bad
  • We can improve the recordering by using syntax
  • One special case is the Chinese particle ?
  • N1 ? N2 gt N2 of N1
  • A ? N gt A N

32
training
  • From among 0.56M extracted phrases, we extracted
    those with ? and limit our phrase projection to
    treelets of maximum depth 2 (which means the
    longest path from the root of the subtree to
    other nodes is at most 2).
  • F function is word-form or POS. Only 2 words
    before ? is used

33
decoding
  • Parse the Chinese into dependency structure to
    get info to aid phrase based decoding
  • If the previous word of N is ?, then we consider
    alternative reordering of translation of N, with
    the penalty as a function of its distortion
    log-probability

34
Result for NIST2005
  • Pharaoh Caravan Caravan reordering
  • BLEU 0.0753 0.1008 0.1053 (?)
  • 0.1084 (prep)

35
Training Without a bilingual treebank
  • Algorithms slightly change, we must compute the
    best alignment
  • The above training probabilities can be used to
    seed the new corpora

36
??Treelet Translation Probability Estimation
37
(No Transcript)
38
??Viterbi Alignment
39
Concluding remarks
  • A dependency SMT model is proposed
  • Dependency info is helpful to phrase based SMT
  • Full-fledged DPSMT need heavy work

40
?
  • ? ?
  • ?
  • Thanks!
  • ? ?
  • ? ?
  • ?
  • ?
Write a Comment
User Comments (0)
About PowerShow.com