Using Information about Multiword Expressions for the WordAlignment Task - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Using Information about Multiword Expressions for the WordAlignment Task

Description:

Populate the Beam ... Populate the Beam. Obtain K-best candidate alignments using local scores. ... Populate the Beam - 2. Task: Populate the beam in the ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 44
Provided by: vikr5
Category:

less

Transcript and Presenter's Notes

Title: Using Information about Multiword Expressions for the WordAlignment Task


1
Using Information about Multi-word Expressions
for the Word-Alignment Task
  • Sriram Venkatapathy and Aravind K. Joshi

2
Goal
  • Show that information (ex compositionality)
    about multi-word expressions (MWEs) can be used
    for tasks such as Machine Translation (MT) in an
    effective way.
  • Subtask Word-alignment.

3
Table of contents
  • Motivation and Task description
  • Alignment algorithm
  • Features
  • Results
  • Conclusion and Future work

4
Motivation
  • Previously, suggested that information about MWEs
    is helpful for MT.
  • But, not proven empirically.
  • Need to explore the possibility.

5
Verb-based MWEs
  • Verb is the head
  • Example spilling the beans
  • Challenge for Machine translation
  • The entire source expression translated as a
    single
  • verbal unit in target language.
  • Example The cycling event took place in
    Philadelphia
  • Lit. Trans Philadelphia mein saikling ki
    pratiyogitaa jagaha li
  • (philadelphia in cycling
    event place take)

6
Verb-based MWEs
  • We identify the expressions headed by verb using
    a dependency parser (Shen, 2006)
  • Example

7
Task Alignment of verb-based MWEs
  • Align verb-based MWEs (source language) with
    words in the target language sentence in a
    parallel corpora.

8
Fraction of non-compositional MWEs
  • In source language (400 sentence pairs)
  • Number of verb-dependent relations
    2209
  • Number of times verb and dependent
  • aligned with same word of target sentence
    193
  • (ex took place ? hui)
  • Percentage
    9 (significant!)

9
Table of contents
  • Motivation and Task description
  • Word-Alignment algorithm
  • Features
  • Results
  • Conclusion and Future work

10
Algorithm
  • Popular models for word-alignment are the
    generative models (IBM models, GIZA).
  • But these models do not easily incorporate
    additional parameters which might be helpful for
    alignment.
  • Discriminative models Determines the appropriate
    alignment based on the values of a set of
    parameters.

11
Algorithm (2)
  • The best alignment â argmax score( a S, T)
  • Here, S is the source verb-based MWE and T is the
    target sentence.
  • score( a S, T) scoreLa( a ) scoreG( a )

12
Algorithm (3)
  • Computational Complexity for exhaustive search
  • Total possibilities NVA
  • Where , N Number of words in target sentence
  • V Number of verbs in source
    sentence
  • A Number of dependents in
    source sentence
  • Hence, need for an approximate Beam search
    algorithm.

13
Alignment algorithm (Beam search)
  • Three main steps
  • Populate the Beam
  • Use local features to determine K-best alignments
    of verbs and dependents with words in the target
    sentence.
  • Re-order the Beam
  • Re-order the above alignments using more complex
    features.
  • Post-processing
  • Extend alignments to include other links that can
    be inferred.

14
Populate the Beam
  • Obtain K-best candidate alignments using local
    scores .
  • Local score is computed by looking at the
    features of the individual alignment links
    independently.
  • scoreL(s, t) W. fL(s, t)
  • scoreLa( a ) ? score (s, t)

15
Populate the Beam - 2
  • Task Populate the beam in the decreasing order
    of scoreLa( a ).
  • Compute the local score of each source word (verb
    and dependents) with every target word.
  • Sort and store in local beams.
  • Example of Local Beams for sentence pairs
  • The cycling event took place in Philadelphia
  • Philadelphia mein saikling ki pratiyogitaa hui

16
Example of local beams
Took
Place
Philadelphia
Event
Complexity O ( (VA) N Log (N) )
17
Example of local beams
Took
Place
Philadelphia
Event
Best Alignment
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
18
Populate the Beam - 2
  • Boundary Partitions the local beams into two
    sets of links ( Explored and Unexplored).
  • The aim is to modify the boundary until the
    global Beam has K (Beam size) entries.
  • The next link to be included in the boundary is
    the link which has least difference in score from
    the top score of the local beam (Greedy fashion).

19
Populate the Beam - 3
Took
Place
Philadelphia
Event
Global Beam
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352
20
Populate the Beam - 4
Took
Place
Philadelphia
Event
Global Beam
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352
Event ? Pratiyogitaa , Took ?ki , Place ?
Philadelphia , Philadelphia ? Philadelphia
351
Event ? Pratiyogitaa , Took ?ki , Place ? Hui ,
Philadelphia ? Philadelphia
346
21
Re-order the Beam
  • Global scores to re-order the beam.
  • scoreG(a) W . fG ( a )
  • Overall score scoreLa(a) scoreG(a)
  • Global features look at properties of the entire
    alignment configuration instead of the alignment
    locally.

22
Re-order the Beam - 2
Beam Size, K 5
Event ? Pratiyogitaa , Took ?Hui , Place ? Hui ,
Philadelphia ? Philadelphia
352 51 403
Event ? Pratiyogitaa , Took ?Hui , Place ?
Philadelphia , Philadelphia ? Philadelphia
357 39 396
Event ? Pratiyogitaa , Took ?ki , Place ?
Philadelphia, Philadelphia ? Philadelphia
351 36 387
Event ? Pratiyogitaa , Took ?ki , Place ? Hui ,
Philadelphia ? Philadelphia
346 40 386
Event ? Pratiyogitaa , Took ?Hui , Place ?
saikling , Philadelphia ? Philadelphia
346 30 376
23
Post-processing
  • Previous steps aligns one source word with one
    target word.
  • But, in Hindi, for compound verbs (and complex
    predicates), the verb in English is aligned to
    all the words which are part of compound verb in
    Hindi.

I Shyam book lose give
24
Table of contents
  • Motivation and Task description
  • Word-Alignment algorithm
  • Features
  • Results
  • Conclusion and Future work

25
Features - Local
  • DiceWords (Taskar et al., 2005)
  • Dice coefficient of the source word and target
    word is defined as

  • Count (s, t)
  • DiceWords( s, t)
    -------------------

  • Count(s) Count(t)
  • Where s is the source word and t is the target
    word

26
Features - Local
  • DiceRoots Lemmatized forms of s and t .
  • Dict Whether there exists an entry from
  • source word s to target word t.
  • Null Whether the source word is aligned with
  • nothing in the target language.

27
Features - Global
  • AvgDist Average distance between words in the
    target
  • language (which are
    aligned to verbs in the
  • source language). AvgDist
    is normalized by
  • number of target sentence
    words.
  • Overlap Stores the count of pairs of verbs
    in the
  • source language sentence
    which align with
  • same word in the target
    language sentence.

28
Features MWE information
  • MergePos
  • Determines the likelihood of a dependent (with a
    POS tag) to align with the same word in the
    target language as the word to which its verb is
    aligned.
  • Example In the figure, the feature merge_RP
    will be active.

He ran went
29
Features MWE information
  • MergeMI
  • Associates Point-wise Mutual Information with the
    cases where the dependents have the same
    alignment in the target language as their verb.
  • The mutual information (MI) is classified into
    three groups according to its absolute value
  • LOW if in the range 0 2
  • MED if in the range 3 6
  • HIGH if above 6

30
Features MWE information
  • The feature merge_RP_HIGH is active in the
    following figure.

He ran went
31
Online large margin learning
  • For parameter optimization, we used online-large
    margin algorithm called MIRA (McDonald et al.,
    2005).
  • Let the number of sentence pairs be m. The source
    sentences are dependency parsed (Shen et al.,
    2006).
  • Let âq be the gold alignment for the qth sentence.

32
Online large margin training
  • The generic large margin algorithm used by us can
    be defined as
  • Initialize W0, W, I
  • For p1 to NIterations
  • For q1 to m
  • Get K Best predictions (a1, a2, a3 aK) for the
    qth training
  • example using current model Wi. Compute Wi1
    by updating Wi based on the predictions, the
    sentence and the gold.
  • i i 1
  • W W Wi1
  • W W/(NIterationsm)

33
Online large margin training
  • The updated weight vector Wi1 is computed such
    that
  • there is a minimum change in Wi
  • and that the score of the gold alignment exceeds
    the score of each predictions by a margin equal
    to the number of mistakes in the predictions.
  • This can be mentioned as the following
    optimization problem
  • Minimize Wi1 Wi such that
  • Score( âq ) - Score ( aq ) gt Mistakes (âq,
    aq)

34
Table of contents
  • Motivation and Task description
  • Word-Alignment algorithm
  • Features
  • Results
  • Conclusion and Future work

35
Results - Dataset
  • 400 word-aligned sentences
  • 294 sentence pairs for training
  • 106 sentence pairs for testing
  • Source sentences are dependency parsed.
  • For training, use simple heuristics to modify the
  • training data such that for every word in the
    source sentence, only one corresponding word
    exists in target language.

36
Experiments with Giza
  • We evaluated our approach by comparing our
    results with the state-of-art Giza.
  • Giza was trained using an English-Hindi corpus of
    50,000 sentence pairs.

37
Experiments with Giza
  • We then lemmatize the words in both the source
    and target sides of the parallel corpora and then
    run Giza again.

38
Experiments with our model
  • We trained our model with the training set of 294
    sentences.
  • Beam Size 3 , Number of Iterations 3
  • The following are the results when we add the
    local features, global features (AvgDist,
    Overlap) and the Giza Probabilities.

39
Experiments with our model
  • We now add the features representing properties
    of MWE
  • We see that by adding MergeMI feature, the AER
    decreased by 3 percentage points.

40
Table of contents
  • Motivation and Task description
  • Word-Alignment algorithm
  • Features
  • Results
  • Conclusion and Future work

41
Conclusion
  • We have proposed a discriminative approach for
    using the compositionality information about
    verb-based MWEs for the word-alignment task.
  • We have showed that by adding information about
    MWE (through Point-wise MI in our paper), we
    obtain an decrease in AER from 0.5279 to 0.4913.

42
Future work
  • Conduct the experiments on standard word-aligned
    dataset (English-French Europarl dataset).
  • Try better measures of compositionality and see
    how they effect the word-alignment accuracy.
  • Design a frame-work such that information about
    MWEs can be used in an End-to-End MT.

43
  • THANK YOU FOR LISTENING
  • Questions and Suggestions
    ?
Write a Comment
User Comments (0)
About PowerShow.com