Machine Translation (Level 2) - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Translation (Level 2)

Description:

Machine Translation (Level 2) Anna S gvall Hein GSLT Course, September 2004 Translation substitute the text material of one language (SL) by the equivalent text ... – PowerPoint PPT presentation

Number of Views:236
Avg rating:3.0/5.0
Slides: 104
Provided by: Anna153
Category:

less

Transcript and Presenter's Notes

Title: Machine Translation (Level 2)


1
Machine Translation (Level 2)
  • Anna Sågvall Hein
  • GSLT Course, September 2004

2
Translation
  • substitute the text material of one language
    (SL) by the equivalent text material of another
    language (TL) (Catford 1965 20)
  • Translation consists in producing in the target
    language the closest natural equivalent of the
    text material of the source language, in the
    first hand concerning meaning, in the second hand
    concerning style (Nida 1975 32)
  • Translation is in theory impossible, but in
    practice fairly possible Mounin (1967)
  • Catford, J. C. (1965), A Linguistic Theory of
    Translation, Oxford Press, England.
  • Mounin, G. (1967) Les problèmes théotitiques de
    la traduction. Paris
  • Nida, E. (1975), A Framework for the Analysis and
    Evaluation of Theories of Translation, in
    Brislin, R. W. (ed) (1975), Translation
    Application and Research, Gardner Press, New
    York.

3
Equivalence
  • form
  • meaning
  • style
  • effect

4
Formal and dynamic equivalence
  • Formal equivalence focuses attention on the
    message itself, in both form and content. It aims
    to  allow the reader to understand as much of the
    SL context as possible.
  • Dynamic equivalence is based on the principle of
    equivalent effect, i.e. that the relationship
    between receiver and message should aim at being
    the same as that between the original receivers
    and the SL message.
  • (Nida 75)

5
Can computers translate?
  • Not a simple yes or no it depends on the purpose
    of the translation and the required quality.

6
Classical problems with MT
  • unrealistic expectations
  • bad translations
  • difficulties in integrating MT in the work flow
  • the Ericsson case

7
Feasibility of machine translation
  • quality in relation to purpose
  • control of the source language
  • human machine interaction
  • re-use of translations
  • evalution

8
Quality
  • publishing quality
  • editing quality
  • browsing qualiy

9
Translation related tasks
  • translation
  • browsing
  • gisting
  • drafting
  • message dissemination
  • cross-language information searches
  • cross-language interchanges

10
MT as a cross-language communication tool
  • MT is used not only for pure translation purposes
    but also for writing in a foreign language and
    for browsing (Hutchins 2001)
  • Hutchins, J., 2001, Towards a new vision for MT,
    Introductory speech at MT Summit VIII conference,
    18-22 September 2001
  • (http//ourworld.compuserve.com/homepages/WJHutchi
    ns/MTS-2001.pdf)

11
Control of the source language
  • spell checked and grammar checked SL
  • sublanguage
  • Domain
  • Text type
  • controlled language

12
Spell checking and grammar checking
  • If there are spelling errors or typos in the SL
    dictionary search will fail
  • If there are grammatical errors in the SL
    grammatical analysis will fail
  • Where and how should spell and grammar checking
    be accounted for? Before or in the process?

13
Controlled language
  • consistent authoring of source texts
  • reduction of ambiguity
  • full linguistic coverage
  • controlled vocabulary
  • full lexical coverage
  • controlled grammar
  • full grammatical coverage
  • controlled language checking
  • e.g. Scania Checker

14
Ex. of controlled languages
  • Simplified English
  • KANT controlled English
  • Scania Swedish
  • Scania checker

15
Human intervention
  • before
  • language checking
  • during
  • e.g. ambiguity resolution
  • after
  • post-editing

16
Re-use of translations
  • translation memories
  • translation dictionaries incl. terminologies
  • lexicalistic translation
  • statistical machine translation
  • example-based translation

17
Evaluation of MT
  • human
  • automatic
  • using a gold standard
  • coverage (recall)
  • quality (precision)
  • global similarity measures
  • merge of recall and precision
  • BLEU, NIST

18
Why machine translation?
  • cheaper
  • faster
  • more consistent
  • when it succeeds

19
What is MT proper?
  • To be considered as MT, a system should provide
  • minimally correct morphology
  • minimal syntactic processing
  • minimal semantic processing
  • handle and produce full sentences
  • Hutchins, J., 2000, The IAMT Certification
    initiative and defining translation system
    categories (http//nl.ijs.si/eamt00/proc/Hutchins.
    pdf)

20
Examples of MT products
  • Systran (http//babelfish.altavista.com/)
  • Comprendium (based on Metal)
  • ProMT (http//www.translate.ru/eng)
  • ESTeam
  • See further http//ourworld.compuserve.com/homepa
    ges/WJHutchins/Compendium-4.pdf ,
    http//www.foreignword.com/Technology/mt/mt.htm

21
Basic strategies
  • direct translation
  • rule-based translation
  • transfer
  • interlingua
  • example-based translation
  • statistical translation
  • hybrids

22
Direct translation
  • no complete intermediary sentence structure
  • translation proceeds in a number of steps, each
    step dedicated to a specific task
  • the most important component is the bilingual
    dictionary
  • typically general language
  • problems with
  • ambiguity
  • inflection
  • word order and other structural shifts

23
Simplistic approach
  • sentence splitting
  • tokenisation
  • handling capital letters
  • dictionary look-up and lexical substitution incl.
    some heuristics for handling ambiguities
  • copying unknown words, digits, signs of
    punctuation etc.
  • formal editing

24
Advanced classical approach (Tucker 1987)
  • Source text dictionary look-up and morphological
    analysis
  • Identification of homographs
  • Identification of compound nouns
  • Identification of nouns and verb phrases
  • Processing of idioms

25
Advanced approach, cont.
  • processing of prepositions
  • subject-predicate identification
  • syntactic ambiguity identification
  • synthesis and morphological processing of target
    text
  • rearrangement of words and phrases in target text

26
Feasibility of the direct translation strategy
  • Is it possible to carry out the direct
    translation steps as suggested by Tucker with
    sufficient precision without relying on a
    complete sentence structure?

27
Assignment 1 manual direct translation
  • Sv. Ytterst handlar kampen för sysselsättning om
    att hålla samman Sverige.?
  • En. Ultimately, the fight for full employment
    concerns the cohesion of Swedish society.
  • (from Statement of Government Policy 1996)
  • Define an algorithm and a dictionary (based on
    Norstedts) for simplistic translation of the
    example.
  • Present the model and the result.

28
Assignment 1, cont.
  • Improve the result stepwise in accordance with
    the advanced direct translation strategy
  • Specify each step carefully and demonstrate its
    effect on the translation.
  • Evaluate and discuss the final result.
  • Translate the ex. using Systran
    (http//kwic.systran.fr/systran/svdemo) and
    discuss the differences in an evaluative way
  • Report the assignment and up-load on the web
    (041001)

29
Current trends in direct translation
  • re-use of translations
  • translation memories of sentences and
    sub-sentence units such as words, phrases and
    larger units
  • lexicalistic translation
  • example-based translation
  • statistical translation
  • Will re-use of translations overcome the problems
    with the direct translation approach that were
    discussed above?
  • If so, how can they be handled?

30
Systran
  • System Translation
  • developed in the US by Peter Toma
  • first version 1969 (Ru-En)
  • EC bought the rights of Systran in 1976
  • currently 18 language pairs
  • demo version sv-en in 2003 (http//kwic.systran.fr
    /systran/svdemo)
  • http//babelfish.altavista.com/

31
Systran, cont.
  • more than 1,600,000 dictionary units
  • 20 domain dictionaries
  • daily use by EC translators, administrators of
    the European institutions
  • originally a direct translation strategy
  • see HS
  • today more of a transfer-based strategy

32
Ex. 1 fairly good translation /Systran sv-en
  • "Enskilda företagare som inte bildat bolag
    klassificeras hit." 
  • "Individual entrepreneurs that have not formed
    companies are classified  here.
  • Systemet har känt igen bildat som en perfektform
    och översätter tempusformen korrekt have formed
    med negationen not på rätt plats.

33
Ex. 2 word order problem/ Systran sv-en
  •  "När byarna kontaktades hade de inte ens utsatts
    för influensa." 
  • "When the villages were contacted had they not
    even been exposed to flu.
  • Systemet har inte hittat subjekt och predikat och
    ger därför fel ordföljd.

34
Ex. 3 ambiguity problem/ Systran sv-en
  • "Vad kan vi lära av Arrawetestammen?" 
  • "What can we faith of the Arawete?
  • Systemet hittar inte sambandet mellan kan och
    lära och ser därför inte att lära är ett verb.

35
Ex. 4 ambiguity problem/ Systran sv-en
  • Extrapoleringen går till så här. " 
  • The extrapolation goes to so here.
  • Systemet känner inte till partikelverbet känna
    till och översätter därför felaktigt ord för ord.

36
Systran Linguistic Resources
  • Dictionaries
  • POS Definitions
  • Inflection Tables
  • Decomposition Tables
  • Segmentation Dictionaries
  • Disambiguation Rules
  • Analysis Rules

37
Systran Processing Steps
  • Analysis
  • Lookup
  • Compound Decomposition
  • Disambiguation
  • Syntactic Analysis
  • Compound Expansion
  • Sentence Transfer
  • Initial Target Structure
  • Lookup
  • Default Transfer of Attributes
  • Structure Transformation

38
Systran Processing Steps (cont)
  • Sentence Synthesis
  • Structure Transformation
  • Inflection lookup
  • Surface Transformation

39
Motivations for transfer-based translation
  • lexical ambiguity
  • structural differences
  • See further Ingo 91

40
Example 1
  • Sv. Fyll på olja i växellådan. ?
  • En. Fill gearbox with oil.
  • (from the Scania corpus)
  • fyll på ? fill
  • obj ? adv
  • adv ? obj

41
Example 2
  • Sv. I oljefilterhållaren sitter en
    överströmningsventil. ?
  • En. The oil filter retainer has an overflow
    valve.
  • (from the Scania corpus)
  • sitter ? has
  • adv ? subj
  • subj ? obj

42
Transfer-based translation
  • intermediary sentence structure
  • basic processes
  • analysis
  • transfer
  • generation (synthesis)
  • language modules
  • dictionary and grammar of SL
  • transfer dictionary and transfer rules
  • dictionary and grammar of TL

43
Direct translation
SL
TL
Metal
Transfer
Multra
Interlingua
44
Levels of intermediary structure
  • cf. JM, Chapter 21
  • word order

45
Metal
  • See HS

46
MULTRA
  • Multilingual Support for Translation and Writing
  • translation engine
  • transfer-based
  • shake-and-bake
  • modular
  • unification-based
  • preference machinery
  • trace-able

47
(No Transcript)
48
Analysis
  • chart parser (Lisp ? C)
  • procedural formalism
  • unification and other kinds of operations
  • sentence structure
  • feature structure
  • grammatical relations
  • surface order implicit via grammatical relations
  • See further Sågvall HeinStarbäck (99),Weijnitz
    (02), Dahllöf (89)

49
Transfer
  • unification-based
  • declarative formalism
  • Multra transfer formalism (Beskow 93)
  • lexical and structural rules
  • rules are partially ordered
  • a more specific rule takes precedence over a less
    specific one
  • specificity in terms of number of transfer
    equations
  • all applicable rules are applied
  • written in prolog

50
Generation
  • syntactic generation
  • Multra syntactic generation formalism (Beskow
    97a)
  • PATR-like style
  • unification
  • concatenation
  • typed features
  • morphological generation (Beskow 97b)
  • lexical insertion rules
  • morphological realisation and phonological finish
    in prolog
  • written in prolog

51
An example Tippa hytten.
  • Tippa hytten.
  • ( (PHR.CAT CL
  • MODE IMP
  • SUBJ 2ND
  • VERB (WORD.CAT VERB
  • INFF IMP
  • DIAT ACT
  • LEX TIPPA.VB.1
  • VSURF )
  • OBJ.DIR (PHR.CAT NP
  • NUMB SING
  • GENDER UTR
  • CASE BASIC
  • DEF DEF
  • HEAD (LEX HYTT.NN.1
  • WORD.CAT NOUN)))
  • REG (V1.LEM TIPPA.VB)
  • SEP (WORD.CAT SEP

52
Transfer structure
  • Transfer structure
  • VERB WORD.CAT VERB
  • LEX TILT.VB.0
  • DIAT ACT
  • INFF IMP
  • OBJ.DIR PHR.CAT NP
  • DEF DEF
  • NUMB SING
  • HEAD WORD.CAT NOUN
  • LEX CAB.NN.0
  • MODE IMP
  • SUBJ 2ND
  • VSURF
  • SEP WORD.CAT SEP
  • LEX STOP.SR.0
  • PHR.CAT CL

53
Generation
  • Tilt the cab.  

54
A grammar rule
defrule legal.obj lt?1 phr.catgt 'np, not lt?1
casegt 'gen, not lt?1 casegt 'subj
55
Transfer rules
  • copy feature
  • delete feature
  • transfer feature
  • assign feature

56
Copy feature
LABEL mode SOURCE lt modegt ?x1 TARGET lt
modegt ?x2 TRANSFER
57
Delete feature
LABEL REG SOURCE lt REGgt ANY TARGET ltgt
ltgt TRANSFER
58
Transfer feature
LABEL OBJ.DIR SOURCE lt OBJ.DIRgt
?x1 TARGET lt OBJ.DIRgt ?x2 TRANSFER ?x1 ltgt
?x2  
59
Define feature
LABEL trycka.in-press SOURCE lt lex
symgttrycka.vbin.ab.1 lt word.catgtVERB TARGET
lt lexgtpress.vb.1 lt word.catgtVERB TRANSF
ER  
60
A generation rule
LABEL CL.IMP
X1 ---gt X2 X3 X4
ltX1 PHR.CATgt CL
ltX1 VERBgt ltX2gt
ltX1 TYPEgt IMP
ltX1 OBJ.DIRgt ltX3gt ltX1 SEPgt
ltX4gt
61
A contextual lexical rule
LABEL tänka.på-think.about SOURCE lt verb
lex symgt tänka.vb.1 lt obj.prep phr.catgt
pp lt obj.prep prepgt ?prep lt obj.prep
prep lex symgt på.pp.1 lt obj.prep rectgt
?rect1 TARGET lt obj.prep phr.catgt pp lt
obj.prep prep word.catgt PREP lt obj.prep
prep lexgt about.pp.1 lt obj.prep rectgt
?rect2 TRANSFER ?rect1ltgt?rect2  
62
A generation trace
  • 1-Applying Rule cl-sep
  • 1- Applying Rule cl.imp
  • 1- Applying Rule subj2nd-verb-obj.dir
  • 1- Applying Rule verb.main.act
  • 1- Applying Rule np.the-df
  • 1- Applying Rule ng.noun-def
  • 1-Success!

63
Language resources in the MATS system
  • dictionary in a database with different views
  • analysis grammar
  • transfer grammar
  • incl. contextually defined lexical rules
  • generation grammar

64
sv-en_LinkLexicon
65
en-Inflections
66
en_LemmaLexicon
67
en_LexemeLexicon
68
en_Lexicon
69
en_StemLexicon
70
sv_Inflections
71
sv_LemmaLexicon
72
sv_LexemeLexicon
73
sv_Lexicon
74
sv_StemLexicon
75
The MATS system
  • Frozen demo

76
Assignment 2 Working with MATS
  • http//stp.ling.uu.se/evapet/mt04/assignment2.htm
    l

77
Lexicalistic translation
  • Identify (lexical) translation units in the
    source sentence
  • Translate each unit separately (considering the
    context)
  • Order the result in agreement with a model of the
    target language
  • Formulation due to Lars Ahrenberg see further AH
    (reading list) see also Beaven, L. John,
    Shake-and-Bake Machine Translation. Coling 92,
    Nantes, 23-28 Aout 1992.

78
T4F a lexicalistic system
  • processes in T4F
  • tokenisation
  • tagging
  • transfer
  • transposition
  • filtering
  • See further AH (in the reading list)

79
Interlingua translation
  • See SN

80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
Applications of alignment
  • translation memories
  • translation dictionaries
  • lexicalistic translation
  • statistical machine translation
  • example-based translation

84
Translation memories
  • based on sentence links
  • optionally, sub sentence links
  • See further Macklovitch, E. (2000)

85
Translation dictionaries
  • based on word links
  • refinement of word links

86
Refinement of word alignment data
  • neutralise capital letters where appropriate
  • lemmatise or tag source and target units
  • identify ambiguities
  • search for criteria to resolve them
  • identify partial links
  • compounds?
  • remove or complete them
  • manual revision?

87
Informally about statistical MT
  • build a translation dictionary based on word
    alignment
  • aim for as big fragments as possible
  • keep information on link frequency
  • build an n-gram model of the target language
  • implement a direct translation strategy
  • including alternatives ordered by length and
    frequency
  • process the output by the n-gram model filtering
    out the best alternatives and adjust the
    translation accordingly

88
Example-based MT
  • HS (in the reading list)

89
Some current research topics
  • intersentential dependences
  • hybrid systems data-driven and rule-driven
  • improved alignment techniques
  • improved language modeling in ST
  • automatic learning from post-editing
  • translation by structural correspondences
  • translation of spoken language
  • improved preference strategies
  • ambiguity preserving translation

90
Intersentential dependencies
  • pronoun resolution
  • lexical ambiguity resolution, such as
  • (torkar)motorn the motor
  • (förbrännings)motorn the engine
  • fluency

91
Preserving the information structure
  • information structure is expressed in different
    ways in the source and the target
  • syntactic clues are exploited in the analysis to
    compute the information structure (topic-focus
    articulation)
  • information structure is used to guide the
    generation

92
An example
Torkarmotorn M2 är sammankopplad med omkopplare S24 och intervallrelä R22. För att inte motorn skall överbelastas, t.ex. om torkarbladen fastnat, finns en inbyggd termovakt som bryter strömmen till motorn när Wiper motor M2 is connected to switch S24 and intermittent relay R22. To prevent motor overload, e.g. if the wiper blade gets stuck, there is an integral thermal sensor which breaks the current to the motor when
93
Preferences
  • syntactic preferences
  • the principle of right association
  • the principle of minimal attachment
  • two-stage processing
  • semantic preferences
  • lexical selectional restrictions
  • lexical contextual rules
  • conceptual taxonomies
  • likelihood of occurrence
  • See further Bennet, P. Paggio, P., 1993,
    Preference in Eurotra.

94
Preferences in Multra
  • parsing
  • a formalism for expressing syntactic preferences
    in the parse
  • not fully developed
  • transfer
  • contextual lexical rules
  • rule specificity
  • generation
  • rule specificity

95
Hybrid systems
  • aims
  • components
  • problems
  • architecture
  • scores

96
Aims of a hybrid system
  • simple techniques for simple tasks
  • complex techniques for complex tasks

97
Components of a hybrid systems
  • component strategies
  • translation memory
  • full sentences
  • fragments
  • direct translation
  • statistical translation
  • ebmt

98
Component strategies, contd
  • rule-based translation
  • simplistic analysis (cf. direct translation)
  • word by word (S ? sequence of words)
  • phrase by phrase (S ? sequence of phrases)
  • partial parsing
  • full parsing

99
Problems of a hybrid system
  • how does the system know when a simple technique
    is appropriate?
  • does the source tell?
  • does the target tell?

100
Architecture and scores
  • simple first?
  • concerting results?
  • scoring?

101
Improved techniques for re-use of translation
  • combining clues for word alignment (Tiedemann
    2003)
  • interactive word alignment (Ahrenberg et al.
    2003)
  • parallel treebanks

102
Translation by structural correspondences
  • LFG
  • HPSG

103
Translation of spoken language
  • See
  • Krauver, Steven (ed.), 2000, Machine Translation,
    June 2000. Volume 15, Issue 1-2, Special issue on
    Spoken Language Translation.
Write a Comment
User Comments (0)
About PowerShow.com