Chapter 21: Machine Translation - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Chapter 21: Machine Translation

Description:

presumably means going by algorithm from machine-readable source ... French: etape, patte, jambe, pied. 16. Different Machine Translation Systems. Rule-based ... – PowerPoint PPT presentation

Number of Views:417
Avg rating:3.0/5.0
Slides: 55
Provided by: Inderje9
Category:

less

Transcript and Presenter's Notes

Title: Chapter 21: Machine Translation


1
Chapter 21 Machine Translation
  • Heshaam Faili
  • hfaili_at_ece.ut.ac.ir
  • University of Tehran

2
What is MT?
  • Machine Translation (MT) means translation using
    computers.
  • Machine-aided human translation (MAHT)
  • Human-aided machine translation (HAMT)
  • Fully automated machine translation (FAMT)
  • Fully human translation

3
Some definitions
  • Machine translation (MT) is the application of
    computers to the task of translating texts from
    one natural language to another. EAMT
  • Machine Translation (MT) as it is generally
    known --- the attempt to automate all, or part of
    the process of translating from one human
    language to another. Arnold D J. MACHINE
    TRANSLATION An Introductory Guide
  • presumably means going by algorithm from
    machine-readable source text to useful target
    text, without recourse to human translation or
    editing." ALPAC report, 1966

4
An Example Translation between Chinese English
5
Different tasks with MT
  • Tasks which rough translation is adequate
  • Tasks where a human post-editor is used
  • Tasks limited to small sublanguage domains in
    which fully automatic high quality translation
    (FAHQT) is still achievable
  • Tasks with Software Localization

6
Machine Translation History
  • 1946-1954 Optimistic attitude towards the new
    technologies in MT
  • 1949 Informal Memorandum
  • Word-to-word translation especially
    Russian-English
  • 1954 The demonstration of the Georgetown
    University
  • Vocabulary 250 words, Grammar 6 rules, Corpus
    a few simple Russian sentences

7
Machine Translation History
  • 1954-1966 Criticism on the subject of MT
  • 1966 ALPAC-Report (Automatic Language Processing
    Advisory Committee)
  • MT is slower, not very reliable and twice as
    expensive as human translation

8
Machine Translation History
  • 1966-1975 Revision of the aims and goals of MT
  • Definition of more realistic goals
  • Limitation of the research to technical languages
  • Syntactical analysis of the source text
  • Development of different translation strategies

9
Machine Translation History
  • 1975-1989 Increasing interest and promotion
    for MT
  • Rapid increase of the demand for translations
  • Improvements in hard- and software
  • The use of artifical intelligence methodes is now
    possilbe

10
Machine Translation History
  • 1990-2000
  • Development of comercial products based on
    personal computers
  • Specialized supplementary information (medicine,
    law, economics...)
  • Translation of spoken language (VERBMOBIL)

11
Machine Translation History
  • 2000-Now
  • Statistical Approaches and Hybrid Models
  • Google Translation Engine ( http//translate.goog
    le.com )
  • Yearly MT Official Evaluation race (
    http//www.nist.gov )
  • Automated MT Evaluation (NIST, BLEU)

12
Machine Translation History
13
What happened between ALPAC and Now?
  • Need for MT and other NLP applications confirmed
  • Change in expectations
  • Computers have become faster, more powerful
  • WWW
  • Political state of the world
  • Maturation of Linguistics
  • Development of hybrid statistical/symbolic
    approaches

14
Language Similarities or Differences
  • Universal some aspects which is true for every
    language
  • Every Language has words referring to people, or
    every language has nouns or verbs
  • Typology Study of systematic cross-linguistics
    similarities and differences
  • Morphology Aspects
  • isolating Vs. Polysynthetic
  • Agglutinative Vs. fusion
  • Syntactical Aspects
  • SVO , SOV or VSO
  • Syntactical-Morphological Aspects
  • Head-Marking Vcs. Dependent-marking
  • Specific differences Date Format and Standards,
    verb tense differences,
  • Lexical Differences Different scenes

15
Lexical Differences
English leg, foot, paw French etape, patte,
jambe, pied
16
Different Machine Translation Systems
  • Rule-based
  • Statistical Approaches
  • Hybrid Systems (Using Statistical approach in an
    Rule-based Architecture or )

17
Three MT Approaches Direct, Transfer, Interlingua
18
Machine Translation Architectures
  • Direct architecture
  • Direct architecture was used for most MT systems
    of the first generation
  • there are no intermediate stages in the process
    of translation

19
Direct Architecture, 4 Steps
20
Machine Translation Architectures
  • Characteristics of direct MT systems 
  • no complex linguistic theories or parsing
    strategy
  • make use of syntactic, semantic and lexical
    similarities between the source and the
    target-language
  • based on a single language pair
  • direct MT systems are robust, they even
    translate sentences with incomplete information
  • dictionaries are the most important components of
    the direct MT systems

21
Machine Translation Architectures
  • Transfer architecture
  • It consists of three separate stages
  • analysis
  • Transfer (Syntactical or Lexical)
  • synthesis/generation

22
Transfer Architecture,
23
Transfer Example eng-gtSpanishMary did not slap
the green witch
24
Transfer English-gtJapanese
25
Some Examples
26
Persian Example
  • I ate the apple ? ?? ??? ?? ?????
  • VP ? V NP ? VP ? NP RA V
  • I asked the man ? ?? ?? ??? ?????
  • VP ? V NP ? VP ? AZ NP V

27
Machine Translation Architectures
  • Characteristics of transfer MT systems
  • consist of complete linguistic conceptions, not
    only single grammatical or syntactic rules
  • the analysis and generation components can be
    used again for further language pairs, if the
    components are exactly separated
  • the dictionaries of the transfer MT systems are
    also separated

28
Machine Translation Architectures
  • Interlingua architecture
  • The interlingua system consists of two stages 
  • The source text is analysed into an interlingual
    representation from which the text of the target
    language will be directly generated
  • Semantic Analyzer

29
Interlingua Architecture
30
Machine Translation Architectures
  • Interlingua architecture
  • Advantage
  • The interlingua representation can be used for
    any other language
  • Disadvantage
  • It is difficult to create language-independent
    representations

31
Statistical Approaches
32
Statistical Approach
  • 3 stages
  • Language model P(E)
  • Translation model P(FE)
  • Decoder

33
SYSTRAN
  • Developed in the late 1950s by Peter Toma
  • Initial system for Russian-English translations
  • Later adapted for US Air Force and NASA
  • Adaptation for other languages
  • Important because it had a big influence on many
    Japanese MT systems

34
SYSTRAN
  • Rule-based System
  • Using finite state grammar (ATN)
  • Using a large knowledge-base
  • Working on 23 languages specially UE languages
  • Customers AltaVista, Lycos, AOL, Compuserve,
    Terra, Google, Apple ?...

35
AppTek TranSphere
  • Rule-based System
  • Using LFG (Lexical Functional Grammar)
  • Analyze the semantic, morphological and syntactic
    structures in English and produce their
    equivalents in the target language
  • Utilize a general-purpose lexicon in addition to
    special domain micro-dictionaries
  • Translate English to Arabic, Korean, Chinese,
    Turkish, Persian/Dari and Pashto-English
  • Bi-Translate French, German, Italian, Portuguese,
    Russian, Spanish, Ukrainian, Hebrew and Dutch

36
MÉTÉO
  • Development of an English-French translation
    system by the TAUM Group to cope with the
    bilingual policy of the Canadian government
  • 1975 Contract to develop a system to translate
    public weather forecasts
  • 1984 Development of Météo 2
  • This program proved to be more reliable, faster
    and more cost-effective
  • 1989 Development of a French-English version

37
Sakhr Enterprise Machine Translation
  • Using transfer Architecture
  • analysis on all linguistic levels morphological,
    lexical, syntactic and semantic
  • Arabic - English

38
CiyaTran MT
  • English - Arabic-scripts languages
    Arabic-Persian-Pashto
  • Analyzing the semantic, morphological and
    syntactical structure of input text
  • Utilizing Fuzzy Logic and Statistical Analysis
  • Using a general-purpose lexicon, as well as 85
    domain-specific databases with over 3,000,000
    words and phrases

39
ARIANE (GETA)
  • 1960-1970 Development of CETA System for three
    language pairs
  • Change of the name to ARIANE (GETA) as the system
    was changed into a Transfer system

40
EUROTRA
  • Developed for the translation requirements within
    the European Community
  • A system designed to replace the Systran system
    because of its several limitations
  • 3 phases in the development of the program
  • One of the biggest MT project regarding
    expenditure, organizations and people involved

41
Google Translation
  • Lunched on 2004
  • Beta version on English ? Arabic and English ?
    Chinese
  • Fully Statistical
  • Commercial usage no technical document found
  • On 2005, become the best translator on these two
    language http//www.nist.gov

42
Shiraz Project
  • This project involved the creation of an
    extensible research prototype of a Persian to
    English machine translation system
  • Persian to English
  • Transfer Based Translation
  • Syntactic Analysis
  • Unification Based context free grammar
  • Stopped

43
Moses statistical MT
  • Open source with C
  • allows you to automatically train translation
    models for any language pair.
  • All you need is a collection of translated texts
    (parallel corpus).
  • beam-search
  • phrase-based

44
PSMT (Prolog Statistical Machine Translation)
  • Used Prolog to Translate simple structures
  • 3 sections
  • Language Model Learner
  • Dictionary Learner
  • Search Program

45
Phramer Statistical Machine Translation
  • Phrase-based
  • Open-Source with Java
  • Using Bayesian model

46
EGYPT
  • Statistical MT
  • French-English
  • Academic
  • Some workshops related to EGYPT established

47
MT Challenges Ambiguity
  • Syntactic AmbiguityI saw the man with the
    telescope

S
S
NP
VP
NP
VP
VP
PP
NP
V
I
I
PP
NP
V
With the telescope
NP
saw
With the telescope
saw
the man
the man
48
MT Challenges Ambiguity
  • Syntactic AmbiguityI saw the man on the hill
    with the telescope
  • Lexical Ambiguity
  • E book
  • Semantic Ambiguity
  • Homographyball(E) pelota, baile(S)
  • Polysemykill(E), matar, acabar (S)
  • Semantic granularityesperar(S) wait, expect,
    hope (E)be(E) ser, estar(S)fish(E) pez,
    pescado(S)

49
How do we evaluate MT?
  • Human-based Metrics
  • Semantic Invariance
  • Pragmatic Invariance
  • Lexical Invariance
  • Structural Invariance
  • Spatial Invariance
  • Fluency
  • Accuracy
  • Do you get it?
  • Automatic Metrics Bleu

50
BiLingual Evaluation Understudy (BLEU Papineni,
2001)
  • Automatic Technique, but .
  • Requires the pre-existence of Human (Reference)
    Translations
  • Produce corpus of high-quality human translations
  • Judge closeness numerically (word-error rate)
  • Compare n-gram matches between candidate
    translation and 1 or more reference translations

51
Other Evaluation metrics
  • Bleu, NIST, TER, Precision and Recall, Meteor,
  • BLEU rank each MT output by a weighted average
    of the number of N-gram overlaps with the human
    translations

52
BLEU metric
53
BLEU bad results on 1-gram BLEU , better on
modified BLEU
54
BLEU metric
Keep penalty for candidate which is short
Effective reference length
Write a Comment
User Comments (0)
About PowerShow.com