New Directions in Machine Translation Introduction - PowerPoint PPT Presentation

About This Presentation

New Directions in Machine Translation Introduction


The PANGLOSS Mark III Machine Translation System. ... Journal of Machine Translation (Kluwer) Proceedings of TMI, MT Summit, AMTA ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 21
Provided by: mathi


Transcript and Presenter's Notes

Title: New Directions in Machine Translation Introduction

New Directions inMachine TranslationIntroductio
  • ???
  • ????? ???/???

Why MT Matters?
  • Economics
  • Costs? / Quality? / Turnaround?
  • Many MT developers, customers, and sponsors have
    already invested a lot for years.
  • Politics
  • Multi-lingual Countries / Minority Languages
  • Intelligence Gathering
  • Governments / Companies / Individuals
  • Research
  • AI / CS / Linguistics / Psychology / and so on

Recent Trends
  • PC-based MT Systems
  • Online MT Services, MT on Demand
  • Email, Web pages, Uploads
  • Sub-language MT Systems
  • Dialog-based (Speech-to-Speech) MT Systems
  • Computer-Assisted Translation

Classifying MT Systems
  • Operations
  • Fully-Automatic MT
  • Semi-automatic MT
  • Computer-Assisted Translation (CAT-Tools)
  • Input
  • Unrestricted Texts
  • Restricted Texts (e.g.Technical Manuals) / MT in
  • Sub-languages / Controlled languages
  • Quality
  • High / Low / Acceptable / Applicable / Readable
  • How to evaluate a MT system?
  • Strategies (see next page)

MT Strategies
  • Fundamentals
  • Direct Translation MT
  • Transfer-based MT
  • Interlingua MT
  • Linguists vs. Empiricists
  • New Strategies
  • Knowledge-based MT
  • Example-based MT
  • Statistics-based MT
  • Hybrid MT
  • Japanese manufacturers know well that a single
    linguistic theory cannot lead to a good MT
    system. They realize that a huge amount of
    language phenomena must be processed in an ad-hoc
    manner. (M. Nagao)

Direct MT
Transfer-based MT
  • SL-TL lexicon transfer rules

SL - source language TL - target language
Interlingua-based MT
Source Text (ST)
Interlingua representation (SL-TL lexicon)
Target Text (TT)
Knowledge-based MT
  • All world knowledge? A long-term research
  • Practical Systems e.g. CMUs KANT
  • narrow domain
  • domain model defines all semantic classes and
    instances to represent all concepts in the domain
  • each concept definition includes
  • concept head (name of the concept)
  • slots allowable semantic roles
  • fillers allowable concept classes that the roles
    can contain
  • disambiguation by filler restriction
  • knowledge acquisition
  • automatic or semi-automatic

Example-based MT
  • A companion module to improve MT quality
  • Typically include the following (Nirenburg 1995)
  • sentence-aligned corpus
  • intra-language matching
  • find chunks from source language part of the
    corpus which are best candidates for matching an
    input chunk
  • inter-language matching
  • find the target language chunk corresponding to
    the chunk from the source language part of the
  • chunk-combination
  • The PANGLOSS Mark III Machine Translation System.
    S. Nirenburg, Technical Report CMU-CMT-95-145.
    1995. (available online at http//www.lti.cs.cmu.e

Statistics-based MT(1)
  • Maximize Pr(ST) Pr(S) Pr(TS) / Pr(T)
  • Pr(S) source language model
  • Pr(TS) translation model
  • lexical translation, distortion, and fertility
  • Some comments (Machine Translation 7(4))
  • I joined the attack without realizing that
    precisely what the research was doing was to
    question some of the fundamental assumptions
    underlying MT research since 1966 With
    hindsight, I can see that what this research was
    doing was saying that in the 20 years since
    ALPAC, the second generation architecture had led
    to only slightly better results than the
    architecture it replaced (Harold Somers)
  • My initial reaction was the same as Somers. The
    integration of a CANDIDE-type engine into a
    traditional MT architecture should probably at
    the deepest level the architecture allows (John

Statistics-based MT(2)
  • Machine Translation 7(4)
  • ...not only does it need no linguistics or
    linguists, but no foreign speakers either. ...
    about 43 of sentences correctly translated. That
    compares badly with SYSTRAN which is usually
    assigned figures of around 65 even if it did
    equal SYSTRANs level of performance, it is not
    clear what inferences we should draw. we must
    always remember that they need millions of words
    of parallel texts even to start The problems
    noted then were of long-distance dependencies
    French and English were a lucky choice we
    have good historical reasons for believing that a
    purely statistical method cannot do high-quality
    MT (Yorick Wilks)
  • Word alignment

  • Traditional Evaluation Metrics (Church Hovy)
  • System-based Metrics
  • easy to measure, but only for a particular system
  • e.g. 60 sub-grammars, 900 rewriting rules,
  • Text-based Metrics
  • sentence-based metrics
  • e.g. of semantically or syntactically correct
  • compressibility metrics
  • amount of post-editing metrics
  • Cost-based Metrics cost time (per N words)
  • Demos (must avoid misleading)
  • Developers view or Customers view

Some MT Problems
  • Morphological ambiguity
  • Lexical ambiguity and structural ambiguity
  • Lexical mismatch and structural mismatch
  • Idioms and collocations
  • Ill-formed input
  • World knowledge

CAT Tools
  • Pre-editing and post-editing environments with
    linguistic analyses
  • Translation Memory
  • As the translator translates the text, each
    sentence (translation unit) is also saved
    automatically to a sophisticated translation unit
    database memory. As he translates, any similar
    sentence already in the memory will appear on
    screen for editing.(Ian Gordon)
  • Alignment Tools
  • Terminology Management

  • Exchange Standard
  • (Multilingual) Text Formats
  • Lexicons
  • Knowledge Bases
  • Translation Memories
  • Evaluation Standard

Future Direction
  • Exploratory Research or Prototype Research?
  • Modular Design (cf. Somers Comments)
  • Better Linguistic Theories
  • Lexicon Construction
  • Hybrid MT (Mainline MT engine Additional
  • Spoken Language (Dialog-based) MT
  • MT Evaluation
  • Computer-Assisted Translation / User-Friendly
  • Sub-languages MT Systems
  • Distributed MT / Networked MT
  • MT on Demand

  • Journal of Machine Translation (Kluwer)
  • Proceedings of TMI, MT Summit, AMTA
  • Proceedings of ACL, COLING, ROCLING
  • E-Print Archive http//
  • AAMT http//
  • EAMT http//
  • The Association for Computational Linguistics
  • http//
  • The LINGUIST List http//
  • Translation Research Group http//
  • Localization Industry Standards Association
  • http//

  • ISI _at_ USC http//
  • CMU/LTI http//
  • Verbmobil http//
  • C-STAR II http//
  • GETA http//
  • Machine Translation at PAHO (ACG/T)
  • http//
  • METEO http//
  • WordNet Bibliography
  • http//

  • Globalink, Inc. http//
  • SYSTRAN http//
  • Logos Corporation http//
  • TRADOS http//
  • A.I.SOFT http//
  • CSK Home Page http//
  • http//
  • OKI Software http//
  • KODENSHA http//
  • ASTRANSAC http//
Write a Comment
User Comments (0)