UIMAbased MEMT: MultiEngine Machine Translation - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

UIMAbased MEMT: MultiEngine Machine Translation

Description:

... a scoring function that combines the alignment weights and a LM score ... Challenge: How do we construct an effective architecture for running MEMT within ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 9
Provided by: AlonL
Category:

less

Transcript and Presenter's Notes

Title: UIMAbased MEMT: MultiEngine Machine Translation


1
UIMA-based MEMTMulti-Engine Machine Translation
  • Faculty
  • Alon Lavie, Jaime Carbonell, Eric Nyberg
  • Students and Staff
  • Gregory Hanneman, Justin Merrill
  • (Shyamsundar Jayaraman, Satanjeev Banerjee)

2
MEMT Goals and Approach
  • Combine the output of multiple MT engines into a
    synthetic output that outperforms the originals
    in translation quality
  • Synthetic combination of the originals, NOT
    selecting the best system

3
Synthetic Translation MEMT
  • Approach
  • Start with output sentences of the various MT
    engines
  • Explicitly align the words that are common
    between any pair of systems, and apply
    transitivity
  • Use the alignments as reinforcement and as
    indicators of possible locations for the words
  • Each engine has a weight that is used for the
    words that it contributes
  • Decoder searches for an optimal synthetic
    combination of words and phrases that optimizes a
    scoring function that combines the alignment
    weights and a LM score

4
Architecture and Engineering
  • Challenge How do we construct an effective
    architecture for running MEMT within large-scale
    distributed projects?
  • Example GALE Project
  • Multiple MT engines running at different
    locations
  • Input may be text or output of speech
    recognizers, Output may go downstream to other
    applications (IE, Summarization, TDT, etc.)
  • Approach Using IBMs UIMA Unstructured
    Information Management Architecture
  • Provides support for building robust processing
    workflows with heterogeneous components
  • Components can act as annotators at the
    character level within documents

5
UIMA-based MEMT
  • MT engines and MEMT engine are set up as
    distributed servers
  • Communication over socket connections
  • Sentence-by-sentence translation
  • Java wrappers convert these into UIMA-style
    annotator components
  • UIMA-based workflows implement a variety of
    independent tasks, with results stored in a
    common Annotations Database (ADB)
  • Translation workflows
  • MEMT workflow
  • Evaluation/scoring workflow

6
UIMA-based MEMT Examples
  • Translation Workflow
  • Retrieve document from ADB
  • Annotate document with translation annotator X
  • Write back new annotation into ADB

7
UIMA-based MEMT Examples
  • MEMT Workflow
  • Retrieve document translation annotations labeled
    by X, Y, Z from ADB
  • Annotate the document with a new MEMT
    annotation
  • Write back MEMT annotation into ADB

8
Conclusions and Open Research Issues
  • New sentence-level MEMT approach with promising
    performance (details?)
  • Easy to run on both research and COTS systems
  • UIMA-based architecture design for effective
    integration is large distributed systems/projects
    ? GALE
  • Main Open Research Issues
  • Improvements to the underlying algorithm better
    word alignments, artificial word alignments
  • Confidence scores at the sentence or word level
  • Decoding is still suboptimal
  • Oracle scores show there is much room for
    improvement
  • Need for additional discriminant features
  • Extend approach to Multi-Engine SR combination
  • Engineering issues synchronization, human
    friendly interfaces with workflows
Write a Comment
User Comments (0)
About PowerShow.com