Title: Using Information about Multiword Expressions for the WordAlignment Task
1Using Information about Multi-word Expressions
for the Word-Alignment Task
- Sriram Venkatapathy and Aravind K. Joshi
2Goal
- Show that information (ex compositionality)
about multi-word expressions (MWEs) can be used
for tasks such as Machine Translation (MT) in an
effective way. - Subtask Word-alignment.
3Table of contents
- Motivation and Task description
- Alignment algorithm
- Features
- Results
- Conclusion and Future work
4Motivation
- Previously, suggested that information about MWEs
is helpful for MT. - But, not proven empirically.
- Need to explore the possibility.
5Verb-based MWEs
- Verb is the head
- Example spilling the beans
- Challenge for Machine translation
- The entire source expression translated as a
single - verbal unit in target language.
- Example The cycling event took place in
Philadelphia - Lit. Trans Philadelphia mein saikling ki
pratiyogitaa jagaha li - (philadelphia in cycling
event place take)
6Verb-based MWEs
- We identify the expressions headed by verb using
a dependency parser (Shen, 2006) - Example
7Task Alignment of verb-based MWEs
- Align verb-based MWEs (source language) with
words in the target language sentence in a
parallel corpora.
8Fraction of non-compositional MWEs
- In source language (400 sentence pairs)
- Number of verb-dependent relations
2209 - Number of times verb and dependent
- aligned with same word of target sentence
193 - (ex took place ? hui)
- Percentage
9 (significant!)
9Table of contents
- Motivation and Task description
- Word-Alignment algorithm
- Features
- Results
- Conclusion and Future work
10Algorithm
- Popular models for word-alignment are the
generative models (IBM models, GIZA). - But these models do not easily incorporate
additional parameters which might be helpful for
alignment. - Discriminative models Determines the appropriate
alignment based on the values of a set of
parameters.
11Algorithm (2)
- The best alignment â argmax score( a S, T)
- Here, S is the source verb-based MWE and T is the
target sentence. - score( a S, T) scoreLa( a ) scoreG( a )
12Algorithm (3)
- Computational Complexity for exhaustive search
- Total possibilities NVA
- Where , N Number of words in target sentence
- V Number of verbs in source
sentence - A Number of dependents in
source sentence - Hence, need for an approximate Beam search
algorithm.
13Alignment algorithm (Beam search)
- Three main steps
- Populate the Beam
- Use local features to determine K-best alignments
of verbs and dependents with words in the target
sentence. - Re-order the Beam
- Re-order the above alignments using more complex
features. - Post-processing
- Extend alignments to include other links that can
be inferred.
14Populate the Beam
- Obtain K-best candidate alignments using local
scores . - Local score is computed by looking at the
features of the individual alignment links
independently. - scoreL(s, t) W. fL(s, t)
- scoreLa( a ) ? score (s, t)
15Populate the Beam - 2
- Task Populate the beam in the decreasing order
of scoreLa( a ). - Compute the local score of each source word (verb
and dependents) with every target word. - Sort and store in local beams.
- Example of Local Beams for sentence pairs
- The cycling event took place in Philadelphia
- Philadelphia mein saikling ki pratiyogitaa hui
16Example of local beams
Took
Place
Philadelphia
Event
Complexity O ( (VA) N Log (N) )
17Example of local beams
Took
Place
Philadelphia
Event
Best Alignment
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
18Populate the Beam - 2
- Boundary Partitions the local beams into two
sets of links ( Explored and Unexplored). - The aim is to modify the boundary until the
global Beam has K (Beam size) entries. - The next link to be included in the boundary is
the link which has least difference in score from
the top score of the local beam (Greedy fashion).
19Populate the Beam - 3
Took
Place
Philadelphia
Event
Global Beam
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352
20Populate the Beam - 4
Took
Place
Philadelphia
Event
Global Beam
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352
Event ? Pratiyogitaa , Took ?ki , Place ?
Philadelphia , Philadelphia ? Philadelphia
351
Event ? Pratiyogitaa , Took ?ki , Place ? Hui ,
Philadelphia ? Philadelphia
346
21Re-order the Beam
- Global scores to re-order the beam.
- scoreG(a) W . fG ( a )
- Overall score scoreLa(a) scoreG(a)
- Global features look at properties of the entire
alignment configuration instead of the alignment
locally.
22Re-order the Beam - 2
Beam Size, K 5
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352 51 403
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357 39 396
Event ? Pratiyogitaa , Took ?ki , Place ?
Philadelphia, Philadelphia ? Philadelphia
351 36 387
Event ? Pratiyogitaa , Took ?ki , Place ? Hui ,
Philadelphia ? Philadelphia
346 40 386
Event ? Pratiyogitaa , Took ?Hui , Place ?
saikling , Philadelphia ? Philadelphia
346 30 376
23Post-processing
- Previous steps aligns one source word with one
target word. - But, in Hindi, for compound verbs (and complex
predicates), the verb in English is aligned to
all the words which are part of compound verb in
Hindi.
I Shyam book lose give
24Table of contents
- Motivation and Task description
- Word-Alignment algorithm
- Features
- Results
- Conclusion and Future work
25Features - Local
- DiceWords (Taskar et al., 2005)
- Dice coefficient of the source word and target
word is defined as -
Count (s, t) - DiceWords( s, t)
------------------- -
Count(s) Count(t) - Where s is the source word and t is the target
word -
26Features - Local
- DiceRoots Lemmatized forms of s and t .
- Dict Whether there exists an entry from
- source word s to target word t.
- Null Whether the source word is aligned with
- nothing in the target language.
27Features - Global
- AvgDist Average distance between words in the
target - language (which are
aligned to verbs in the - source language). AvgDist
is normalized by - number of target sentence
words. - Overlap Stores the count of pairs of verbs
in the - source language sentence
which align with - same word in the target
language sentence.
28Features MWE information
- MergePos
- Determines the likelihood of a dependent (with a
POS tag) to align with the same word in the
target language as the word to which its verb is
aligned. - Example In the figure, the feature merge_RP
will be active.
He ran went
29Features MWE information
- MergeMI
- Associates Point-wise Mutual Information with the
cases where the dependents have the same
alignment in the target language as their verb. - The mutual information (MI) is classified into
three groups according to its absolute value - LOW if in the range 0 2
- MED if in the range 3 6
- HIGH if above 6
30Features MWE information
- The feature merge_RP_HIGH is active in the
following figure.
He ran went
31Online large margin learning
- For parameter optimization, we used online-large
margin algorithm called MIRA (McDonald et al.,
2005). - Let the number of sentence pairs be m. The source
sentences are dependency parsed (Shen et al.,
2006). - Let âq be the gold alignment for the qth sentence.
32Online large margin training
- The generic large margin algorithm used by us can
be defined as - Initialize W0, W, I
- For p1 to NIterations
- For q1 to m
- Get K Best predictions (a1, a2, a3 aK) for the
qth training - example using current model Wi. Compute Wi1
by updating Wi based on the predictions, the
sentence and the gold. - i i 1
- W W Wi1
- W W/(NIterationsm)
33Online large margin training
- The updated weight vector Wi1 is computed such
that - there is a minimum change in Wi
- and that the score of the gold alignment exceeds
the score of each predictions by a margin equal
to the number of mistakes in the predictions. - This can be mentioned as the following
optimization problem - Minimize Wi1 Wi such that
- Score( âq ) - Score ( aq ) gt Mistakes (âq,
aq)
34Table of contents
- Motivation and Task description
- Word-Alignment algorithm
- Features
- Results
- Conclusion and Future work
35Results - Dataset
- 400 word-aligned sentences
- 294 sentence pairs for training
- 106 sentence pairs for testing
- Source sentences are dependency parsed.
- For training, use simple heuristics to modify the
- training data such that for every word in the
source sentence, only one corresponding word
exists in target language.
36Experiments with Giza
- We evaluated our approach by comparing our
results with the state-of-art Giza. - Giza was trained using an English-Hindi corpus of
50,000 sentence pairs.
37Experiments with Giza
- We then lemmatize the words in both the source
and target sides of the parallel corpora and then
run Giza again.
38Experiments with our model
- We trained our model with the training set of 294
sentences. - Beam Size 3 , Number of Iterations 3
- The following are the results when we add the
local features, global features (AvgDist,
Overlap) and the Giza Probabilities.
39Experiments with our model
- We now add the features representing properties
of MWE
- We see that by adding MergeMI feature, the AER
decreased by 3 percentage points.
40Table of contents
- Motivation and Task description
- Word-Alignment algorithm
- Features
- Results
- Conclusion and Future work
41Conclusion
- We have proposed a discriminative approach for
using the compositionality information about
verb-based MWEs for the word-alignment task. - We have showed that by adding information about
MWE (through Point-wise MI in our paper), we
obtain an decrease in AER from 0.5279 to 0.4913.
42Future work
- Conduct the experiments on standard word-aligned
dataset (English-French Europarl dataset). - Try better measures of compositionality and see
how they effect the word-alignment accuracy. - Design a frame-work such that information about
MWEs can be used in an End-to-End MT.
43- THANK YOU FOR LISTENING
- Questions and Suggestions
?