Phrase Extraction in PB-SMT - PowerPoint PPT Presentation

About This Presentation
Title:

Phrase Extraction in PB-SMT

Description:

Phrase induction via percolated dependencies. Experimental setup ... Percolation I ... Parse by applying head percolation tables on constituency-annotated trees ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 43
Provided by: AnkitKSr8
Category:

less

Transcript and Presenter's Notes

Title: Phrase Extraction in PB-SMT


1
Phrase Extraction in PB-SMT
  • Ankit K Srivastava
  • NCLT/CNGL Presentation May 6, 2009

2
About
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

3
PB-SMT Modeling
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

4
PB-SMT
  • Process sequence of words as opposed to mere
    words
  • Segment input, translate input, reorder output
  • Translation model, Language Model, Decoder
  • argmaxe p(ef) argmaxe p(fe) p(e)

5
Learning Phrase Translations
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

6
Extraction I
  • Input is sentence-aligned parallel corpora
  • Most approaches use word alignments
  • Extract (learn) phrase pairs
  • Build a phrase translation table

7
Extraction II
Koehn et al., 03
  • Get word alignments (src2tgt, tgt2src)
  • Perform grow-diag-final heuristics
  • Extract phrase pairs consistent with the word
    alignments
  • Non-syntactic phrases STR

8
Extraction III
  • Sentence-aligned and word-aligned text
  • Monolingual parsing of both SRC TGT
  • Align subtrees and extract string pairs
  • Syntactic phrases

9
Extraction IV
Tinsley et al., 07
  • Parse using constituency parser
  • Phrases are syntactic constituents CON

(ROOT (S (NP (NNP Vinken)) (VP (MD
will) (VP (VB join)
(NP (DT the) (NN board)) (PP
(IN as) (NP (DT a) (JJ
nonexecutive) (NN director)))
(NP (NNP Nov) (CD 29))))))
10
Extraction V
Hearne et al., 08
  • Parse using dependency parser
  • Phrases have head-dependent relationships DEP

HEAD DEPENDENT join Vinken join will board the joi
n board join as director a director nonexecutive a
s director 29 Nov join 29
11
Extraction VI
  • Numerous other phrase extractions
  • Estimate phrase translations directly Marcu
    Wong 02
  • Use heuristic other than grow-diag-final
  • Use marker-based chunks Groves Way 05
  • String-to-String translation models herein

12
Head Percolation and Phrase Extraction
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

13
Percolation I
  • It is straightforward to convert constituency
    tree to an unlabeled dependency tree
    Gaifman 65
  • Use head percolation tables to identify head
    child in a constituency representation Magerman
    95
  • Dependency tree is obtained by recursively
    applying head child and non-head child heuristics
    Xia Palmer 01

14
Percolation II
  • (NP (DT the) (NN board))
  • NP right NN/NNP/CD/JJ
  • (NP-board (DT the) (NN board))
  • the is dependent on board

15
Percolation III
(ROOT (S (NP (NNP Vinken)) (VP (MD
will) (VP (VB join)
(NP (DT the) (NN board)) (PP
(IN as) (NP (DT a) (JJ
nonexecutive) (NN director)))
(NP (NNP Nov) (CD 29))))))
HEAD DEPENDENT join Vinken join will board the joi
n board join as director a director nonexecutive a
s director 29 Nov join 29
NP right NN / NNP / CD / JJ PP left
IN / PP S right VP / S VP left VB
/ VP
INPUT
OUTPUT
16
Percolation IV
  • cf. slide - Extraction III (syntactic phrases)
  • Parse by applying head percolation tables on
    constituency-annotated trees
  • Align trees, extract surface chunks
  • Phrases have head-dependent relations
  • PERC

17
Tools, Resources, and MT System Performance
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

18
System setup I
RESOURCE TYPE NAME DETAILS
Corpora JOC EUROPARL Chiao et al., 06 Koehn, 05
Parsers Berkeley Parser Syntex Parser Head Percolation Petrov et al., 06 Bourigault et al.,05 Xia Palmer 01
Alignment Tools GIZA Phrase Heuristics Tree Aligner Och Ney 03 Koehn et al., 03 Zhechev 09
Lang Modeling SRILM Toolkit Stolcke 02
Decoder Moses Koehn et al., 07
Evaluation Scripts BLEU NIST METEOR, WER, PER Papineni et al., 02, Doddington 02, Banerjee Lavie 05
19
System setup II
CORPORA TRAIN DEV TEST
JOC 7,723 400 599
EUROPARL 100,000 1,889 2,000
  • All 4 systems are run with the same
    configurations (with MERT tuning) on 2 different
    datasets
  • They only differ in their phrase tables ( chunks)

CORPORA STR CON DEP PERC
JOC 236 K 79 K 74 K 72 K
EUROPARL 2145 K 663 K 583K 565 K
20
System setup III
SYSTEM BLEU NIST METEOR WER PER
On JOC (7K) data On JOC (7K) data On JOC (7K) data On JOC (7K) data On JOC (7K) data On JOC (7K) data
31.29 6.31 63.91 61.09 47.34
30.64 6.34 63.82 60.72 45.99
30.75 6.31 64.12 61.34 46.77
29.19 6.09 62.12 62.69 48.21
On EUROPARL (100K) data On EUROPARL (100K) data On EUROPARL (100K) data On EUROPARL (100K) data On EUROPARL (100K) data On EUROPARL (100K) data
STR 28.50 7.00 57.83 57.43 44.11
CON 25.64 6.55 55.26 60.77 46.82
DEP 25.24 6.59 54.65 60.73 46.51
PERC 25.87 6.59 55.63 60.76 46.48
21
Analyzing Str, Con, Dep, and Perc
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

Analysis w.r.t. Europarl data only
22
Analysis I
  • No. of common unique phrase pairs
  • Maybe we should combine the phrase tables

Phrase Types Common to both Unique in 1st type Unique in 2nd type
DEP PERC 369K 213K 195K
CON PERC 492K 171K 72K
STR PERC 127K 2,018K 437K
CON DEP 391K 271K 191K
STR DEP 128K 2,016K 454K
STR CON 144K 2,000K 518K
23
Analysis II
  • Concatenate phrase tables and re-estimate
    probabilities
  • 15 different phrase table combinations ?4Cr ,
    1r4
  • STR CON DEP
    PERC

UNI BI TRI QUAD
S SC, SD, SP SCD, SCP, SDP SCDP
C CD, CP CDP -
D DP - -
P - - -
24
Analysis III
  • All 15 systems are run with the same
    configurations (with MERT tuning)
  • They only differ in their phrase tables
  • This is combining at translation model level

25
Analysis IV
Performance on Europarl
26
Analysis V
  • REF Does the commission intend to seek more
    transparency in this area?
  • S Will the commission ensure that more than
    transparency in this respect?
  • C The commission will the commission ensure
    greater transparency in this respect?
  • D The commission will the commission ensure
    greater transparency in this respect?
  • P Does the commission intend to ensure greater
    transparency in this regard?
  • SC Will the commission ensure that more
    transparent in this respect?
  • SD Will the commission ensure that more
    transparent in this respect?
  • SP Does the commission intend to take to ensure
    that more than openness in this regard?
  • CD The commission will the commission ensure
    greater transparency in this respect?
  • CP The commission will the commission ensure
    greater transparency in this respect?
  • DP The commission will the commission ensure
    greater transparency in this respect?
  • SCD Does the commission intend to take to ensure
    that more transparent commit?
  • SCP Does the commission intend to take in this
    regard to ensure greater transparency?
  • SDP Does the commission intend to take in this
    regard to ensure greater transparency?
  • CDP The commission will the commission ensure
    greater transparency in this respect?
  • SCDP Does the commission intend to take to
    ensure that more transparent suspected?

27
Analysis VI
  • Which phrases does the decoder use?
  • Decoder trace on SCDP
  • Out of 11,748 phrases S(5204) C(2441) D(2319)
    P(2368)

28
Analysis VII
  • Automatic per-sentence evaluation using TER on
    testset of 2000 sentences Snover et al., 06
  • C (1120) P (331) D (301) S (248)
  • Manual per-sentence evaluation on a random
    testset of 100 sentences using pairwise system
    comparison
  • PC (27) PgtD (5) SCgtSCP(11)

29
Analysis VIII
  • Treat the different phrase table combinations as
    individual MT systems
  • Perform system combination using MBR-CN framework
    Du et al., 2009
  • This is combining at system level

SYSTEM BLEU NIST METEOR WER PER
STR 29.46 7.11 58.87 56.43 43.03
CON 28.93 6.79 57.34 58.54 44.83
DEP 28.38 6.81 56.59 58.61 44.74
PERC 29.27 6.82 57.72 58.37 44.53
MBR 29.52 6.85 57.84 58.13 44.40
CN 30.70 7.06 58.52 55.87 42.86
30
Analysis IX
  • Using Moses baseline phrases (STR) is essential
    for coverage. SIZE matters!
  • However, adding any system to STR increases
    baseline score. Symbiotic!
  • Hence, do not replace STR, but supplement it.

31
Analysis X
  • CON seems to be the best combination with STR
    (SC seems to be the best performing system)
  • Has most common chunks with PERC
  • Does PERC harm a CON system needs more analysis
    (bias between CON PERC)

32
Analysis XI
  • DEP is different from PERC chunks, despite being
    equivalent in syntactic representation
  • DEP can be substituted by PERC
  • Difference between knowledge induced from
    dependency and constituency. A different aligner?

33
Analysis XII
  • PERC is a unique knowledge source. Is it just a
    simple case of parser combination?
  • Sometimes, it helps.
  • Needs more work on finding connection with CON /
    DEP

34
Customizing Moses for syntax-supplemented phrase
tables
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

35
Moses customization
  • Incorporating syntax (CON, DEP, PERC)
  • Reordering model
  • Phrase scoring (new features)
  • Decoder Parameters
  • Log-linear combination of T-tables
  • Good phrase translations may be lost by the
    decoder. How can we ensure they remain intact?

36
Work in ProgressandFuture Plans
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

37
Ongoing future work
  • Scaling (data size) (lang. pair) (lang. dir.)
  • Bias between CON PERC
  • Combining Phrase pairs
  • Combining Systems
  • Classify performance into sentence types
  • Improve quality of phrase pairs in PBSMT

38
Endnote
  • Phrase-based statistical machine translation
  • Methods for phrase extraction
  • Phrase induction via percolated dependencies
  • Experimental setup evaluation results
  • Other facts figures
  • Moses customization
  • Ongoing future work
  • Endnote

39
Endnote
  • Explored 3 linguistically motivated phrase
    extractions against Moses phrases
  • Improves baseline. Highest recorded is 10
    relative increase in BLEU on 100K
  • Rather than pursuing ONE way, combine options
  • Need more analysis of supplementing phrase table
    with multiple syntactic T-tables

40
Thank You!
41
Phrase Extraction in PB-SMT
  • Phrase-based Statistical Machine Translation
    (PB-SMT) models the most widely researched
    paradigm in MT today rely heavily on the
    quality of phrase pairs induced from large
    amounts of training data. There are numerous
    methods for extracting these phrase translations
    from parallel corpora. In this talk I will
    describe phrase pairs induced from percolated
    dependencies and contrast them with three
    pre-existing phrase extractions. I will also
    present the performance of the individual phrase
    tables and their combinations in a PB-SMT system.
    I will then conclude with ongoing experiments and
    future research directions.

42
Thanks!
Andy Way
John Tinsley
Sylwia Ozdowska
Sergio Penkale
Patrik Lambert
Jinhua Du
Ventisislav Zhechev
Write a Comment
User Comments (0)
About PowerShow.com