Towards Syntactically Constrained Statistical Word Alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Towards Syntactically Constrained Statistical Word Alignment

Description:

Discriminative Training ... smaller in discriminative training scenario ... Syntax-based discriminative techniques morphology, POS, semantic information... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 53
Provided by: scie5
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Towards Syntactically Constrained Statistical Word Alignment


1
Towards Syntactically Constrained Statistical
Word Alignment
  • Greg Hanneman
  • 11-734 Advanced Machine Translation Seminar
  • April 30, 2008

2
Outline
  • The word alignment problem
  • Base approaches
  • Syntax-based approaches
  • Distortion models
  • Tree-to-string models
  • Tree-to-tree models
  • Discussion

3
Word Alignment
  • Parallel sentence pair F and E
  • Most general map a subset of F to a subset of E

4
Word Alignment
  • Very large alignment spaces!
  • An n-word parallel sentence has n2 possible links
    and 2n2 possible alignments
  • Restrict to one-to-one alignments n! possible
    alignments
  • Alignment models try to restrict or learn a
    probability distribution over this space to get
    the best alignment of a sentence

5
Outline
  • The word alignment problem
  • Base approaches
  • Syntax-based approaches
  • Distortion models
  • Tree-to-string models
  • Tree-to-tree models
  • Discussion

6
A Generative StoryBrown et al. 1990
7
The Framework
  • F words f1 fj fn
  • E words e1 ei em
  • Compute P(F, A E) for hidden alignment variable
    A a1 aj an
  • The major step decomposition, model parameters,
    EM algorithm, etc.
  • aj i word fj is aligned to word ei

8
The IBM ModelsBrown et al. 1993 Och and Ney
2003
  • Model 1 Bag of words word order doesnt
    affect alignment
  • Model 2 Position of words being aligned does
    matter

9
The IBM ModelsBrown et al. 1993 Och and Ney
2003
  • Later models use more implicit structural or
    linguistic information, but not really syntax,
    and not really overtly
  • Fertility P(f ei) of ei producing f words in F
  • Distortion P(t, p E) for a set of F words t in
    a permutation p
  • Previous alignments Probs. for positions in F of
    the different words of a fertile ei

10
The HMM ModelVogel et al. 1996 Och and Ney
2003
  • Linguistic intuition words, and their
    alignments, tend to clump together in clusters
  • aj depends on absolute size of jump between it
    and aj1

11
Discriminative Training
  • Consider all possible alignments, score them, and
    pick the best ones under some set of constraints
  • Can incorporate arbitrary features generative
    models more fixed
  • Generative models EM requires lots of unlabeled
    training data discriminative requires some
    labeled data

12
Discriminative AlignmentTaskar et al. 2005
  • Co-occurrence
  • Position difference
  • Co-occurrence of following words
  • Word-frequency rank
  • Model 4 prediction

13
Outline
  • The word alignment problem
  • Base approaches
  • Syntax-based approaches
  • Distortion models
  • Tree-to-string models
  • Tree-to-tree models
  • Discussion

14
Syntax-Based Approaches
  • Constrain alignment space by looking beyond flat
    text stream take higher-level sentence structure
    into account
  • Representations
  • Constituency structure
  • Inversion Transduction Grammar
  • Dependency structure

15
An MT Motivation
16
Syntax-Based DistortionDeNero and Klein 2007
  • Syntax-based MT should start from syntax-aware
    word alignments
  • HMM model target-language parse trees prefer
    alignments that respect tree
  • Handled in distortion model jumps should reflect
    tree structure

17
Syntax-Based DistortionDeNero and Klein 2007
  • HMM distortion size of jump between aj1 and aj
  • Syntactic distortion tree path between aj1 and
    aj

18
Syntax-Based DistortionDeNero and Klein 2007
  • Training100,000 parallel FrenchEnglish and
    ChineseEnglish sentences with English parse
    trees
  • Both E?F and F ? E combined with different
    unions and intersections, plus thresholds
  • Test Hand-aligned Hansards and NIST MT 2002 data

19
Syntax-Based DistortionDeNero and Klein 2007
  • HMMs roughly equal, better than GIZA
  • Soft union for French hard union for Chinese
    competitive thresholding

20
Tree-to-String Models
21
Tree-to-String Models
  • New generative story
  • Word-level fertility and distortion replaced with
    node insertion and sibling reordering
  • Lexical translation still the same
  • Word alignment produced as a side effect from
    lexical translations

22
Tree-to-String AlignmentYamada and Knight 2001
  • Discussed in other sessions this semester
  • Training 2121 short JapaneseEnglish sentences,
    modified Collins parser output for English
  • Test First 50 sentences of training corpus
  • Beat IBM Model 5 on human judgements perplexity
    between Model 1 and Model 5

23
Subtree CloningGildea 2003
  • Original tree-to-string model is too strict
  • Syntactic divergences, reordering
  • Soft constraint allow alignments that violate
    tree structure, but at a cost
  • Tweak the tree side of the alignment to contain
    things needed for the string side
  • Ex. SVO to OSV

24
Subtree CloningGildea 2003
25
Subtree CloningGildea 2003
S
VP
AUX
VP
do
26
Subtree CloningGildea 2003
27
Subtree CloningGildea 2003
  • For a node np
  • Probability of cloning something as a new child
    of np single EM-learned constant for all np
  • Probability of making that clone a node nc
    uniform over all nc
  • Surprising that this works

28
Subtree CloningGildea 2003
  • Compared with IBM 13, basic tree-to-string,
    basic tree-to-tree models
  • Training 4982 KoreanEnglish sentence pairs,
    with manual Korean parse trees
  • Test 101 hand-aligned held-out sentences

29
Subtree CloningGildea 2003
  • Cloning helps as good or better than IBM
  • Tree-to-tree model runs faster

30
Tree-to-Tree Models
  • Alignment must conform to tree structure on both
    sides space is more constrained
  • Requires more transformation operations to handle
    divergent structures Gildea 2003
  • Or we could be more permissive

31
Inversion Transduction GrammarWu 1997
  • For bilingual parsing get one-to-one word
    alignment as a side effect
  • Parallelbinary-branchingtrees with reordering

32
ITG Operations
  • A ? A A
  • Produce A1 A2 in source and target streams
  • A ? ltA Agt
  • Produce A1 A2 in source stream, A2 A1 in
    target stream
  • A ? e / f
  • Produce e in source stream, f in target stream

33
ITG Operations
  • Canonical form ITG produces only one derivation
    for a given alignment
  • S ? A B C
  • A ? A B B B C B A C B C
    C C
  • B ? ltA Agt ltB Agt ltC Agt ltA Cgt ltB Cgt
    ltC Cgt
  • C ? e / f

34
Alignment with ITGZhang and Gildea 2004
  • Compared IBM 1, IBM 4, ITG, and tree-to-string
    (with and without cloning)
  • Training ChineseEnglish (18,773) and
    FrenchEnglish (20,000) sentences less than 25
    words long
  • Test Hand-aligned ChineseEnglish (48) and
    FrenchEnglish (447)

35
Alignment with ITGZhang and Gildea 2004
  • ITG best, or at least as good as IBM or
    tree-to-string plus cloning
  • ITG has no linguistic syntax

36
Dependency Parsing
  • Discussed in other sessions this semester
  • Notion of violating phrasal cohesion
  • Usually bad, but not always

37
Dependencies ITGCherry and Lin 2006
  • Find invalid dependency spans assign score of 8
    if used by the ITG parser
  • Simple model maximize co-occurrence score with
    penalty for distant words
  • ITG reduces AER by 13 relative dependencies
    ITG reduce by 34

38
Dependencies ITGCherry and Lin 2006
  • Discriminative training with an SVM
  • Feature vector for each ITG rule instance
  • Features from Taskar et al. 2005
  • Feature marking ITG inversion rules
  • Feature (penalty) marking invalid spans based on
    dependency tree

39
Dependencies ITGCherry and Lin 2006
  • Compared Taskar et al. to D-ITG with hard and
    soft constraints
  • Training 50,000 FrenchEnglish sentence pairs
    for counts and probabilities 100 hand-annotated
    pairs with derived ITG trees for discriminative
    training
  • Test 347 hand-annotated sentences from 2003
    parallel text workshop

40
Dependencies ITGCherry and Lin 2006
  • Relative improvement smaller in discriminative
    training scenario with stronger objective
    function
  • Hard constraint starts to hurt recall

41
Outline
  • The word alignment problem
  • Base approaches
  • Syntax-based approaches
  • Distortion models
  • Tree-to-string models
  • Tree-to-tree models
  • Discussion

42
All These Tradeoffs
  • Mathematical and statistical correctness vs.
    computability
  • Simple model vs. capturing linguistic phenomena
  • Not enough syntactic information vs. too much
    syntactic information
  • Ruling out bad alignments vs. keeping good
    alignments around

43
Alignment Spaces
  • Completely unconstrained every alignment link
    (ei, fj) either on or off
  • Permutation space one-to-one alignment with
    reordering Taskar et al. 2005
  • ITG space permutation space satisfying binary
    tree constraint Wu 1997
  • Dependency space permutation space maintaining
    phrasal cohesion

44
Alignment Spaces
  • D-ITG space Dependency n ITG space Cherry and
    Lin 2006
  • HD-ITG space D-ITG space where each span must
    contain a head Cherry and Lin 2006a

45
Examining Alignment SpacesCherry and Lin 2006a
  • Alignment score
  • Learned co-occurrence score
  • Gold-standard oracle score

46
Examining Alignment SpacesCherry and Lin 2006a
  • Learned co-occurrence score
  • More restricted spaces give better results

47
Examining Alignment SpacesCherry and Lin 2006a
  • Oracle score subsets of permutation space
  • ITG rules out almost nothing correct
  • Beam search in dependency space does worst

48
Conclusions
  • Base alignment models are mathematical, limited
    notions of sentence structure
  • Syntax-aware alignment helpful for syntax-aware
    MT DeNero and Klein 2007
  • Using structure as a hard constraint is harmful
    for divergent sentences tweaking trees Gildea
    2003 or using soft constraints Cherry and Lin
    2006 helps fix this

49
Conclusions
  • Surprise winner ITG
  • Computationally straightforward
  • Permissive, simple grammar that mostly only rules
    out bad alignments Cherry and Lin 2006a
  • Does a lot, even when its not the best
  • Discriminative framework looks promising and
    flexible can incorporate generative models as
    features Taskar et al. 2005

50
Towards the Future
  • Easy-to-run GIZA made complicated IBM models
    the norm promising discriminative or
    syntax-based models currently lack such a toolkit
  • Syntax-based discriminative techniques
    morphology, POS, semantic information
  • Any other ideas?

51
References
  • Brown, P., J. Cocke, S. Della Pietra, V. Della
    Pietra, F. Jelinek, J. Lafferty, R. Mercer, and
    P. Roossin, A statistical approach to machine
    translation, Computational Linguistics,
    16(2)79-85, 1990.
  • Brown, P., S. Della Pietra, V. Della Pietra, and
    R. Mercer, The mathematics of statistical
    machine translation Parameter estimation,
    Computational Linguistics, 19(2)263-311.
  • Cherry, Colin and Dekang Lin, Soft syntactic
    constraints for word alignment through
    discriminative training, Proceedings of the
    COLING/ACL Poster Session, 105-112, 2006.
  • Cherry, Colin and Dekang Lin, A comparison of
    syntactically motivated alignment spaces,
    Proceedings of EACL, 145-152, 2006a.
  • DeNero, John and Dan Klein, Tailoring word
    alignments to syntactic machine translation,
    Proceedings of ACL, 17-24, 2007.
  • Gildea, Daniel, Loosely tree-based alignment for
    machine translation, Proceedings of ACL, 80-87,
    2003.

52
References
  • Och, Franz and Hermann Ney, A systematic
    comparison of various statistical alignment
    models, Computational Linguistics, 29(1)19-51,
    2003.
  • Taskar, B., S. Lacoste-Julien, and D. Klein, A
    discriminative matching approach to word
    alignment, Proceedings of HLT/EMNLP, 73-80,
    2005.
  • Vogel, S., H. Ney, and C. Tillmann, HMM-based
    word alignment in statistical translation,
    Proceedings of COLING, 836-841, 1996.
  • Wu, Dekai, Stochastic inversion transduction
    grammars and bilingual parsing of parallel
    corpora, Computational Linguistics,
    23(3)377-403.
  • Yamada, Kenji and Kevin Knight, A syntax-based
    statistical translation model, Proceedings of
    ACL, 523-530, 2001.
  • Zhang, Hao and Daniel Gildea, Syntax-based
    alignment Supervised or unsupervised?
    Proceedings of COLING, 418-424, 2004.
Write a Comment
User Comments (0)
About PowerShow.com