Title: Using Percolated Dependencies in PBSMT
1Using Percolated Dependencies in PBSMT
CLUKI XII April 24, 2009
- Ankit K. Srivastava and Andy Way
- Dublin City University
2About
3Syntactic Parsing and Head Percolation
4Parsing I Constituency Structure
- Vinken will join the board as a nonexecutive
director Nov 29 - (ROOT
- (S
- (NP (NNP Vinken))
- (VP (MD will)
- (VP (VB join)
- (NP (DT the) (NN board))
- (PP (IN as)
- (NP (DT a) (JJ
nonexecutive) (NN director))) - (NP (NNP Nov) (CD 29))))))
5Parsing II Dependency Structure
- Vinken will join the board as a nonexecutive
director Nov 29 - HEAD DEPENDENT
- join Vinken
- join will
- board the
- join board
- join as
- director a
- director nonexecutive
- as director
- 29 Nov
- join 29
-
6Parsing III Head Percolation
- It is straightforward to convert constituency
tree to an unlabeled dependency tree (Gaifman
1965) - Use head percolation tables to identify head
child in a constituency representation (Magerman
1995) - Dependency tree is obtained by recursively
applying head child and non-head child heuristics
(Xia Palmer 2001) - (NP (DT the) (NN board))
- NP right NN/NNP/CD/JJ
- (NP-board (DT the) (NN board))
- the is dependent on board
7Parsing IV Three Parses
- Constituency (phrase-structure) parses
CONrequires CON parser - Dependency (head-dependent) parses DEPrequires
DEP parser - Percolated (head-dependent) parses
PERCrequires CON parser heuristics
8Phrase-Based Statistical Machine Translation
9PBSMT I Framework
- argmaxe p(ef) argmaxe p(fe) p(e)
- Decoder, Translation Model, Language Model
- PBSMT framework in Moses (Koehn et al., 2007)
- Phrase Table in Translation Model Align
words extract phrases score phrases - Different methods to extract phrases
- Moses phrase extraction as baseline system
10PBSMT II Non-syntactic Phrase Extraction
- baseline Moses
- Get word alignments (src2tgt, tgt2src)
- Perform grow-diag-final heuristics (Koehn et al.,
2003) - Extract phrase pairs consistent with the word
alignments - String-based (non-syntactic) phrases STR
11PBSMT III Syntactic Phrase Extraction
- Get word alignments (src2tgt, tgt2src)
- Parse src sentences
- Parse tgt sentences
- Use Tree Aligner to align subtree nodes (Zhechev
2009) - Extract surface-level chunks from parallel
treebanks - Previously, Tinsley et al., 2007 Hearne et al.,
2008 - Syntactic phrases
- CON DEP PERC
12System Design
13System I Tools and Resources
- English-French parallel corpora
- Phrase Structure Parsers (En, Fr)
- Dependency Structure Parsers (En, Fr)
- Head Percolation tables (En, Fr)
- Statistical Tree Aligner
- Giza Word Aligner
- SRILM (Language Modeling) Toolkit
- Moses Decoder
14System II Entries in Phrase tables Europarl
PERC is a unique knowledge source
but is it useful?
15System III Combinations
- Concatenate phrase tables and re-estimate
probabilities - 15 different systems ?4Cr , 1r4
- STR CON DEP
PERC
16MT Systems and Evaluation
17Numbers I Evaluation - JOC
18Numbers II Evaluation - Europarl
19Numbers III Uniquely best
- Evaluate MT systems STR, CON, DEP, PERC on a per
sentence level. (Translation Error Rate) - JOC (440 sentences)
- Europarl (2000 sentences)
20Numbers IV Adding PERC Europarl
21Analysis of Results
22Analysis I STR
- Using Moses baseline phrases (STR) is essential
for coverage. SIZE matters! - However, adding any system to STR increases
baseline score. Symbiotic! - Hence, do not replace STR, but augment it.
23Analysis II CON
- Seems to be the best combination with STR (SC
seems to be the best performing system) - Has most common chunks with PERC
- Does PERC harm a CON system needs more analysis
24Analysis III DEP
- PERC is different from DEP chunks, despite being
formally equivalent - PERC can substitute DEP
25Analysis IV PERC
- Is a unique knowledge source.
- Sometimes, it helps.
- Needs more work on finding connection with CON /
DEP
26Conclusion Future Work
27Conclusion Future Work
- Extended Hearne et al., 2008 by- scaling up data
size from 7.7K to 100K- introducing percolated
dependencies in PBSMT - Manual evaluation
- More analysis of results
- More combining strategies
- Seek to determine if each chunk type owns
sentence types
28Thanks
- ltasrivastava _at_ computing.dcu.iegt