Using Information about Multiword Expressions for the WordAlignment Task

About This Presentation

Title:

Using Information about Multiword Expressions for the WordAlignment Task

Description:

Populate the Beam ... Populate the Beam. Obtain K-best candidate alignments using local scores. ... Populate the Beam - 2. Task: Populate the beam in the ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 44

Provided by: vikr5

Category:

more less

Transcript and Presenter's Notes

Title: Using Information about Multiword Expressions for the WordAlignment Task

1
Using Information about Multi-word Expressions
for the Word-Alignment Task

Sriram Venkatapathy and Aravind K. Joshi

2
Goal

Show that information (ex compositionality)
about multi-word expressions (MWEs) can be used
for tasks such as Machine Translation (MT) in an
effective way.
Subtask Word-alignment.

3
Table of contents

Motivation and Task description
Alignment algorithm
Features
Results
Conclusion and Future work

4
Motivation

Previously, suggested that information about MWEs
is helpful for MT.
But, not proven empirically.
Need to explore the possibility.

5
Verb-based MWEs

Verb is the head
Example spilling the beans
Challenge for Machine translation
The entire source expression translated as a
single
verbal unit in target language.
Example The cycling event took place in
Philadelphia
Lit. Trans Philadelphia mein saikling ki
pratiyogitaa jagaha li
(philadelphia in cycling
event place take)

6
Verb-based MWEs

We identify the expressions headed by verb using
a dependency parser (Shen, 2006)
Example

7
Task Alignment of verb-based MWEs

Align verb-based MWEs (source language) with
words in the target language sentence in a
parallel corpora.

8
Fraction of non-compositional MWEs

In source language (400 sentence pairs)
Number of verb-dependent relations
2209
Number of times verb and dependent
aligned with same word of target sentence
193
(ex took place ? hui)
Percentage
9 (significant!)

9
Table of contents

Motivation and Task description
Word-Alignment algorithm
Features
Results
Conclusion and Future work

10
Algorithm

Popular models for word-alignment are the
generative models (IBM models, GIZA).
But these models do not easily incorporate
additional parameters which might be helpful for
alignment.
Discriminative models Determines the appropriate
alignment based on the values of a set of
parameters.

11
Algorithm (2)

The best alignment â argmax score( a S, T)
Here, S is the source verb-based MWE and T is the
target sentence.
score( a S, T) scoreLa( a ) scoreG( a )

12
Algorithm (3)

Computational Complexity for exhaustive search
Total possibilities NVA
Where , N Number of words in target sentence
V Number of verbs in source
sentence
A Number of dependents in
source sentence
Hence, need for an approximate Beam search
algorithm.

13
Alignment algorithm (Beam search)

Three main steps
Populate the Beam
Use local features to determine K-best alignments
of verbs and dependents with words in the target
sentence.
Re-order the Beam
Re-order the above alignments using more complex
features.
Post-processing
Extend alignments to include other links that can
be inferred.

14
Populate the Beam

Obtain K-best candidate alignments using local
scores .
Local score is computed by looking at the
features of the individual alignment links
independently.
scoreL(s, t) W. fL(s, t)
scoreLa( a ) ? score (s, t)

15
Populate the Beam - 2

Task Populate the beam in the decreasing order
of scoreLa( a ).
Compute the local score of each source word (verb
and dependents) with every target word.
Sort and store in local beams.
Example of Local Beams for sentence pairs
The cycling event took place in Philadelphia
Philadelphia mein saikling ki pratiyogitaa hui

16
Example of local beams
Took
Place
Philadelphia
Event
Complexity O ( (VA) N Log (N) )
17
Example of local beams
Took
Place
Philadelphia
Event
Best Alignment
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
18
Populate the Beam - 2

Boundary Partitions the local beams into two
sets of links ( Explored and Unexplored).
The aim is to modify the boundary until the
global Beam has K (Beam size) entries.
The next link to be included in the boundary is
the link which has least difference in score from
the top score of the local beam (Greedy fashion).

19
Populate the Beam - 3
Took
Place
Philadelphia
Event
Global Beam
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352
20
Populate the Beam - 4
Took
Place
Philadelphia
Event
Global Beam
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352
Event ? Pratiyogitaa , Took ?ki , Place ?
Philadelphia , Philadelphia ? Philadelphia
351
Event ? Pratiyogitaa , Took ?ki , Place ? Hui ,
Philadelphia ? Philadelphia
346
21
Re-order the Beam

Global scores to re-order the beam.
scoreG(a) W . fG ( a )
Overall score scoreLa(a) scoreG(a)
Global features look at properties of the entire
alignment configuration instead of the alignment
locally.

22
Re-order the Beam - 2
Beam Size, K 5
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352 51 403
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357 39 396
Event ? Pratiyogitaa , Took ?ki , Place ?
Philadelphia, Philadelphia ? Philadelphia
351 36 387
Event ? Pratiyogitaa , Took ?ki , Place ? Hui ,
Philadelphia ? Philadelphia
346 40 386
Event ? Pratiyogitaa , Took ?Hui , Place ?
saikling , Philadelphia ? Philadelphia
346 30 376
23
Post-processing

Previous steps aligns one source word with one
target word.
But, in Hindi, for compound verbs (and complex
predicates), the verb in English is aligned to
all the words which are part of compound verb in
Hindi.

I Shyam book lose give
24
Table of contents

Motivation and Task description
Word-Alignment algorithm
Features
Results
Conclusion and Future work

25
Features - Local

DiceWords (Taskar et al., 2005)
Dice coefficient of the source word and target
word is defined as
Count (s, t)
DiceWords( s, t)
-------------------
Count(s) Count(t)
Where s is the source word and t is the target
word

26
Features - Local

DiceRoots Lemmatized forms of s and t .
Dict Whether there exists an entry from
source word s to target word t.
Null Whether the source word is aligned with
nothing in the target language.

27
Features - Global

AvgDist Average distance between words in the
target
language (which are
aligned to verbs in the
source language). AvgDist
is normalized by
number of target sentence
words.
Overlap Stores the count of pairs of verbs
in the
source language sentence
which align with
same word in the target
language sentence.

28
Features MWE information

MergePos
Determines the likelihood of a dependent (with a
POS tag) to align with the same word in the
target language as the word to which its verb is
aligned.
Example In the figure, the feature merge_RP
will be active.

He ran went
29
Features MWE information

MergeMI
Associates Point-wise Mutual Information with the
cases where the dependents have the same
alignment in the target language as their verb.
The mutual information (MI) is classified into
three groups according to its absolute value
LOW if in the range 0 2
MED if in the range 3 6
HIGH if above 6

30
Features MWE information

The feature merge_RP_HIGH is active in the
following figure.

He ran went
31
Online large margin learning

For parameter optimization, we used online-large
margin algorithm called MIRA (McDonald et al.,
2005).
Let the number of sentence pairs be m. The source
sentences are dependency parsed (Shen et al.,
2006).
Let âq be the gold alignment for the qth sentence.

32
Online large margin training

The generic large margin algorithm used by us can
be defined as
Initialize W0, W, I
For p1 to NIterations
For q1 to m
Get K Best predictions (a1, a2, a3 aK) for the
qth training
example using current model Wi. Compute Wi1
by updating Wi based on the predictions, the
sentence and the gold.
i i 1
W W Wi1
W W/(NIterationsm)

33
Online large margin training

The updated weight vector Wi1 is computed such
that
there is a minimum change in Wi
and that the score of the gold alignment exceeds
the score of each predictions by a margin equal
to the number of mistakes in the predictions.
This can be mentioned as the following
optimization problem
Minimize Wi1 Wi such that
Score( âq ) - Score ( aq ) gt Mistakes (âq,
aq)

34
Table of contents

Motivation and Task description
Word-Alignment algorithm
Features
Results
Conclusion and Future work

35
Results - Dataset

400 word-aligned sentences
294 sentence pairs for training
106 sentence pairs for testing
Source sentences are dependency parsed.
For training, use simple heuristics to modify the
training data such that for every word in the
source sentence, only one corresponding word
exists in target language.

36
Experiments with Giza

We evaluated our approach by comparing our
results with the state-of-art Giza.
Giza was trained using an English-Hindi corpus of
50,000 sentence pairs.

37
Experiments with Giza

We then lemmatize the words in both the source
and target sides of the parallel corpora and then
run Giza again.

38
Experiments with our model

We trained our model with the training set of 294
sentences.
Beam Size 3 , Number of Iterations 3
The following are the results when we add the
local features, global features (AvgDist,
Overlap) and the Giza Probabilities.

39
Experiments with our model

We now add the features representing properties
of MWE

We see that by adding MergeMI feature, the AER
decreased by 3 percentage points.

40
Table of contents

Motivation and Task description
Word-Alignment algorithm
Features
Results
Conclusion and Future work

41
Conclusion

We have proposed a discriminative approach for
using the compositionality information about
verb-based MWEs for the word-alignment task.
We have showed that by adding information about
MWE (through Point-wise MI in our paper), we
obtain an decrease in AER from 0.5279 to 0.4913.

42
Future work

Conduct the experiments on standard word-aligned
dataset (English-French Europarl dataset).
Try better measures of compositionality and see
how they effect the word-alignment accuracy.
Design a frame-work such that information about
MWEs can be used in an End-to-End MT.

THANK YOU FOR LISTENING
Questions and Suggestions
?

Write a Comment

User Comments (0)

About PowerShow.com

Using Information about Multiword Expressions for the WordAlignment Task - PowerPoint PPT Presentation

Using Information about Multiword Expressions for the WordAlignment Task

Populate the Beam ... Populate the Beam. Obtain K-best candidate alignments using local scores. ... Populate the Beam - 2. Task: Populate the beam in the ... – PowerPoint PPT presentation