Machine%20Translation:%20Challenges%20and%20Approaches - PowerPoint PPT Presentation

About This Presentation
Title:

Machine%20Translation:%20Challenges%20and%20Approaches

Description:

Invited Lecture Introduction to Natural Language Processing Fall 2008 Machine Translation: Challenges and Approaches Nizar Habash Associate Research Scientist – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 45
Provided by: Niza49
Category:

less

Transcript and Presenter's Notes

Title: Machine%20Translation:%20Challenges%20and%20Approaches


1
Machine TranslationChallenges and Approaches
Invited LectureIntroduction to Natural Language
Processing Fall 2008
  • Nizar HabashAssociate Research Scientist
  • Center for Computational Learning Systems
  • Columbia University

2
  • Currently, Google offers translations between the
    following languages

Arabic Bulgarian Catalan Chinese Croatian Czech Danish Dutch Filipino Finnish French German Greek Hebrew Hindi Indonesian Italian Japanese Korean Latvian Lithuanian Norwegian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swedish Ukrainian Vietnamese
3
  • Thank you for your attention!
  • Questions?

4
BBC found similar support!!!
5
Road Map
  • Multilingual Challenges for MT
  • MT Approaches
  • MT Evaluation

6
Multilingual Challenges
  • Orthographic Variations
  • Ambiguous spelling
  • ??? ??????? ?????? ?????? ????????? ????????
  • Ambiguous word boundaries
  • Lexical Ambiguity
  • Bank ? ??? (financial) vs. ???(river)
  • Eat ? essen (human) vs. fressen (animal)

7
Multilingual Challenges Morphological Variations
  • Affixation vs. RootPattern

write ? written ??? ? ?????
kill ? killed ??? ? ?????
do ? done ??? ? ?????
  • Tokenization

And the cars ? and the cars
????????? ? w Al SyArAt
Et les voitures ? et le voitures
8
Translation Divergences conflation
am
???
suis
I
here
not
???
Je
ici
ne
pas
??? ??? I-am-not here
I am not here
Je ne suis pas ici I not am not here
9
Translation Divergences head swap and categorial
English John swam across the river quickly
Spanish Juan cruzó rapidamente el río nadando Gloss John crossed fast the river swimming
Arabic ???? ??? ???? ????? ????? Gloss sped john crossing the-river swimming
Chinese ?? ?? ? ? ? ? ? ? Gloss John quickly  (DE) swam  cross  the (Quantifier)    river
Russian ???? ?????? ???????? ???? Gloss John quickly cross-swam river
10
Road Map
  • Multilingual Challenges for MT
  • MT Approaches
  • MT Evaluation

11
MT ApproachesMT Pyramid
Source meaning
Target meaning
Source syntax
Target syntax
Source word
Target word
Analysis
Generation
12
MT ApproachesGisting Example
Sobre la base de dichas experiencias se
estableció en 1988 una metodología.
Envelope her basis out speak experiences them
settle at 1988 one methodology.
On the basis of these experiences, a methodology
was arrived at in 1988.
13
MT ApproachesMT Pyramid
Source meaning
Target meaning
Source syntax
Target syntax
Source word
Target word
Analysis
Generation
14
MT ApproachesTransfer Example
  • Transfer Lexicon
  • Map SL structure to TL structure

poner
mod
subj
obj
?
subj
obj
mantequilla
en
X
obj
Y
X puso mantequilla en Y
X buttered Y
15
MT ApproachesMT Pyramid
Source meaning
Target meaning
Source syntax
Target syntax
Source word
Target word
Analysis
Generation
16
MT ApproachesInterlingua Example Lexical
Conceptual Structure
(Dorr, 1993)
17
MT ApproachesMT Pyramid
Source meaning
Target meaning
Source syntax
Target syntax
Source word
Target word
Analysis
Generation
18
MT ApproachesMT Pyramid
Source meaning
Target meaning
Source syntax
Target syntax
Source word
Target word
Analysis
Generation
19
MT ApproachesStatistical vs. Rule-based
Source meaning
Target meaning
Source syntax
Target syntax
Source word
Target word
Analysis
Generation
20
Statistical MT Noisy Channel Model
Portions from http//www.clsp.jhu.edu/ws03/prework
shop/lecture_yamada.pdf
21
Statistical MT Automatic Word Alignment
Slide based on Kevin Knights http//www.sims.berk
eley.edu/courses/is290-2/f04/lectures/mt-lecture.p
pt
  • GIZA
  • A statistical machine translation toolkit used to
    train word alignments.
  • Uses Expectation-Maximization with various
    constraints to bootstrap alignments

Maria no dio una bofetada a la
bruja verde







Mary did not slap the green witch
22
Statistical MT IBM Model (Word-based Model)
http//www.clsp.jhu.edu/ws03/preworkshop/lecture_y
amada.pdf
23
Phrase-Based Statistical MT
Slide courtesy of Kevin Knight
http//www.sims.berkeley.edu/courses/is290-2/f04/l
ectures/mt-lecture.ppt
Morgen
fliege
ich
nach Kanada
zur Konferenz
Tomorrow
I
will fly
to the conference
In Canada
  • Foreign input segmented in to phrases
  • phrase is any sequence of words
  • Each phrase is probabilistically translated into
    English
  • P(to the conference zur Konferenz)
  • P(into the meeting zur Konferenz)
  • Phrases are probabilistically re-ordered
  • See Koehn et al, 2003 for an intro.
  • This is state-of-the-art!

24
Word Alignment Induced Phrases
Slide courtesy of Kevin Knight
http//www.sims.berkeley.edu/courses/is290-2/f04/l
ectures/mt-lecture.ppt
Maria no dió una bofetada a
la bruja verde







Mary did not slap the green witch
(Maria, Mary) (no, did not) (slap, dió una
bofetada) (la, the) (bruja, witch) (verde, green)
25
Word Alignment Induced Phrases
Slide courtesy of Kevin Knight
http//www.sims.berkeley.edu/courses/is290-2/f04/l
ectures/mt-lecture.ppt
Maria no dió una bofetada a
la bruja verde







Mary did not slap the green witch
(Maria, Mary) (no, did not) (slap, dió una
bofetada) (la, the) (bruja, witch) (verde,
green) (a la, the) (dió una bofetada a, slap the)
26
Word Alignment Induced Phrases
Slide courtesy of Kevin Knight
http//www.sims.berkeley.edu/courses/is290-2/f04/l
ectures/mt-lecture.ppt
Maria no dió una bofetada a
la bruja verde







Mary did not slap the green witch
(Maria, Mary) (no, did not) (slap, dió una
bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap
the) (Maria no, Mary did not) (no dió una
bofetada, did not slap), (dió una bofetada a la,
slap the) (bruja verde, green witch)
27
Word Alignment Induced Phrases
Slide courtesy of Kevin Knight
http//www.sims.berkeley.edu/courses/is290-2/f04/l
ectures/mt-lecture.ppt
Maria no dió una bofetada a
la bruja verde







Mary did not slap the green witch
(Maria, Mary) (no, did not) (slap, dió una
bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap
the) (Maria no, Mary did not) (no dió una
bofetada, did not slap), (dió una bofetada a la,
slap the) (bruja verde, green witch) (Maria no
dió una bofetada, Mary did not slap) (a la bruja
verde, the green witch)
28
Word Alignment Induced Phrases
Slide courtesy of Kevin Knight
http//www.sims.berkeley.edu/courses/is290-2/f04/l
ectures/mt-lecture.ppt
Maria no dió una bofetada a
la bruja verde







Mary did not slap the green witch
(Maria, Mary) (no, did not) (slap, dió una
bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap
the) (Maria no, Mary did not) (no dió una
bofetada, did not slap), (dió una bofetada a la,
slap the) (bruja verde, green witch) (Maria no
dió una bofetada, Mary did not slap) (a la bruja
verde, the green witch) (Maria no dió una
bofetada a la bruja verde, Mary did not slap the
green witch)
29
Advantages of Phrase-Based SMT
Slide courtesy of Kevin Knight
http//www.sims.berkeley.edu/courses/is290-2/f04/l
ectures/mt-lecture.ppt
  • Many-to-many mappings can handle
    non-compositional phrases
  • Local context is very useful for disambiguating
  • Interest rate ?
  • Interest in ?
  • The more data, the longer the learned phrases
  • Sometimes whole sentences

30
MT ApproachesStatistical vs. Rule-based vs.
Hybrid
Source meaning
Target meaning
Source syntax
Target syntax
Source word
Target word
Analysis
Generation
31
MT Approaches Practical Considerations
  • Resources Availability
  • Parsers and Generators
  • Input/Output compatability
  • Translation Lexicons
  • Word-based vs. Transfer/Interlingua
  • Parallel Corpora
  • Domain of interest
  • Bigger is better
  • Time Availability
  • Statistical training, resource building

32
Road Map
  • Multilingual Challenges for MT
  • MT Approaches
  • MT Evaluation

33
MT Evaluation
  • More art than science
  • Wide range of Metrics/Techniques
  • interface, , scalability, , faithfulness, ...
    space/time complexity, etc.
  • Automatic vs. Human-based
  • Dumb Machines vs. Slow Humans

34
Human-based Evaluation ExampleAccuracy Criteria
35
Human-based Evaluation ExampleFluency Criteria
36
Automatic Evaluation ExampleBleu
Metric(Papineni et al 2001)
  • Bleu
  • BiLingual Evaluation Understudy
  • Modified n-gram precision with length penalty
  • Quick, inexpensive and language independent
  • Correlates highly with human evaluation
  • Bias against synonyms and inflectional variations

37
Automatic Evaluation ExampleBleu Metric
  • Test Sentence
  • colorless green ideas sleep furiously

Gold Standard References all dull jade ideas
sleep irately drab emerald concepts sleep
furiously colorless immature thoughts nap angrily
38
Automatic Evaluation ExampleBleu Metric
  • Test Sentence
  • colorless green ideas sleep furiously

Gold Standard References all dull jade ideas
sleep irately drab emerald concepts sleep
furiously colorless immature thoughts nap angrily
Unigram precision 4/5
39
Automatic Evaluation ExampleBleu Metric
  • Test Sentence
  • colorless green ideas sleep furiously
  • colorless green ideas sleep furiously
  • colorless green ideas sleep furiously
  • colorless green ideas sleep furiously

Gold Standard References all dull jade ideas
sleep irately drab emerald concepts sleep
furiously colorless immature thoughts nap angrily
Unigram precision 4 / 5 0.8 Bigram precision
2 / 4 0.5 Bleu Score (a1 a2 an)1/n
(0.8 ? 0.5)½ 0.6325 ? 63.25
40
Automatic Evaluation ExampleMETEOR (Lavie and
Agrawal 2007)
  • Metric for Evaluation of Translation with
    Explicit word Ordering
  • Extended Matching between translation and
    reference
  • Porter stems, wordNet synsets
  • Unigram Precision, Recall, parameterized
    F-measure
  • Reordering Penalty
  • Parameters can be tuned to optimize correlation
    with human judgments
  • Not biased against non-statistical MT systems

41
Metrics MATR Workshop
  • Workshop in AMTA conference 2008
  • Association for Machine Translation in the
    Americas
  • Evaluating evaluation metrics
  • Compared 39 metrics
  • 7 baselines and 32 new metrics
  • Various measures of correlation with human
    judgment
  • Different conditions text genre, source
    language, number of references, etc.

42
Automatic Evaluation ExampleSEPIA (Habash and
ElKholy 2008)
  • A syntactically-aware evaluation metric
  • (Liu and Gildea, 2005 Owczarzak et al., 2007
    Giménez and Màrquez, 2007)
  • Uses dependency representation
  • MICA parser (Nasr Rambow 2006)
  • 77 of all structural bigrams are surface n-grams
    of size 2,3,4
  • Includes dependency surface span as a factor in
    score
  • long-distance dependencies should receive a
    greater weight than short distance dependencies
  • Higher degree of grammaticality?

43
Interested in MT??
  • Contact me (habash_at_cs.columbia.edu)
  • Research courses, projects
  • Languages of interest
  • English, Arabic, Hebrew, Chinese, Urdu, Spanish,
    Russian, .
  • Topics
  • Statistical, Hybrid MT
  • Phrase-based MT with linguistic extensions
  • Component improvements or full-system
    improvements
  • MT Evaluation
  • Multilingual computing

44
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com