Title: Machine Translation (Level 2)
1Machine Translation (Level 2)
- Anna Sågvall Hein
- GSLT Course, September 2004
2Translation
- substitute the text material of one language
(SL) by the equivalent text material of another
language (TL) (Catford 1965 20) - Translation consists in producing in the target
language the closest natural equivalent of the
text material of the source language, in the
first hand concerning meaning, in the second hand
concerning style (Nida 1975 32) - Translation is in theory impossible, but in
practice fairly possible Mounin (1967) - Catford, J. C. (1965), A Linguistic Theory of
Translation, Oxford Press, England. - Mounin, G. (1967) Les problèmes théotitiques de
la traduction. Paris - Nida, E. (1975), A Framework for the Analysis and
Evaluation of Theories of Translation, in
Brislin, R. W. (ed) (1975), Translation
Application and Research, Gardner Press, New
York.
3Equivalence
- form
- meaning
- style
- effect
4Formal and dynamic equivalence
- Formal equivalence focuses attention on the
message itself, in both form and content. It aims
to allow the reader to understand as much of the
SL context as possible. - Dynamic equivalence is based on the principle of
equivalent effect, i.e. that the relationship
between receiver and message should aim at being
the same as that between the original receivers
and the SL message. - (Nida 75)
5Can computers translate?
- Not a simple yes or no it depends on the purpose
of the translation and the required quality.
6Classical problems with MT
- unrealistic expectations
- bad translations
- difficulties in integrating MT in the work flow
- the Ericsson case
7Feasibility of machine translation
- quality in relation to purpose
- control of the source language
- human machine interaction
- re-use of translations
- evalution
8Quality
- publishing quality
- editing quality
- browsing qualiy
9Translation related tasks
- translation
- browsing
- gisting
- drafting
- message dissemination
- cross-language information searches
- cross-language interchanges
10MT as a cross-language communication tool
- MT is used not only for pure translation purposes
but also for writing in a foreign language and
for browsing (Hutchins 2001) - Hutchins, J., 2001, Towards a new vision for MT,
Introductory speech at MT Summit VIII conference,
18-22 September 2001 - (http//ourworld.compuserve.com/homepages/WJHutchi
ns/MTS-2001.pdf)
11Control of the source language
- spell checked and grammar checked SL
- sublanguage
- Domain
- Text type
- controlled language
12Spell checking and grammar checking
- If there are spelling errors or typos in the SL
dictionary search will fail - If there are grammatical errors in the SL
grammatical analysis will fail - Where and how should spell and grammar checking
be accounted for? Before or in the process?
13Controlled language
- consistent authoring of source texts
- reduction of ambiguity
- full linguistic coverage
- controlled vocabulary
- full lexical coverage
- controlled grammar
- full grammatical coverage
- controlled language checking
- e.g. Scania Checker
14Ex. of controlled languages
- Simplified English
- KANT controlled English
- Scania Swedish
- Scania checker
15Human intervention
- before
- language checking
- during
- e.g. ambiguity resolution
- after
- post-editing
16Re-use of translations
- translation memories
- translation dictionaries incl. terminologies
- lexicalistic translation
- statistical machine translation
- example-based translation
17Evaluation of MT
- human
- automatic
- using a gold standard
- coverage (recall)
- quality (precision)
- global similarity measures
- merge of recall and precision
- BLEU, NIST
18Why machine translation?
- cheaper
- faster
- more consistent
- when it succeeds
19What is MT proper?
- To be considered as MT, a system should provide
- minimally correct morphology
- minimal syntactic processing
- minimal semantic processing
- handle and produce full sentences
- Hutchins, J., 2000, The IAMT Certification
initiative and defining translation system
categories (http//nl.ijs.si/eamt00/proc/Hutchins.
pdf)
20Examples of MT products
- Systran (http//babelfish.altavista.com/)
- Comprendium (based on Metal)
- ProMT (http//www.translate.ru/eng)
- ESTeam
- See further http//ourworld.compuserve.com/homepa
ges/WJHutchins/Compendium-4.pdf ,
http//www.foreignword.com/Technology/mt/mt.htm
21Basic strategies
- direct translation
- rule-based translation
- transfer
- interlingua
- example-based translation
- statistical translation
- hybrids
22Direct translation
- no complete intermediary sentence structure
- translation proceeds in a number of steps, each
step dedicated to a specific task - the most important component is the bilingual
dictionary - typically general language
- problems with
- ambiguity
- inflection
- word order and other structural shifts
23Simplistic approach
- sentence splitting
- tokenisation
- handling capital letters
- dictionary look-up and lexical substitution incl.
some heuristics for handling ambiguities - copying unknown words, digits, signs of
punctuation etc. - formal editing
24Advanced classical approach (Tucker 1987)
- Source text dictionary look-up and morphological
analysis - Identification of homographs
- Identification of compound nouns
- Identification of nouns and verb phrases
- Processing of idioms
25Advanced approach, cont.
- processing of prepositions
- subject-predicate identification
- syntactic ambiguity identification
- synthesis and morphological processing of target
text - rearrangement of words and phrases in target text
26Feasibility of the direct translation strategy
- Is it possible to carry out the direct
translation steps as suggested by Tucker with
sufficient precision without relying on a
complete sentence structure?
27Assignment 1 manual direct translation
- Sv. Ytterst handlar kampen för sysselsättning om
att hålla samman Sverige.? - En. Ultimately, the fight for full employment
concerns the cohesion of Swedish society. - (from Statement of Government Policy 1996)
- Define an algorithm and a dictionary (based on
Norstedts) for simplistic translation of the
example. - Present the model and the result.
28Assignment 1, cont.
- Improve the result stepwise in accordance with
the advanced direct translation strategy - Specify each step carefully and demonstrate its
effect on the translation. - Evaluate and discuss the final result.
- Translate the ex. using Systran
(http//kwic.systran.fr/systran/svdemo) and
discuss the differences in an evaluative way - Report the assignment and up-load on the web
(041001)
29Current trends in direct translation
- re-use of translations
- translation memories of sentences and
sub-sentence units such as words, phrases and
larger units - lexicalistic translation
- example-based translation
- statistical translation
- Will re-use of translations overcome the problems
with the direct translation approach that were
discussed above? - If so, how can they be handled?
30Systran
- System Translation
- developed in the US by Peter Toma
- first version 1969 (Ru-En)
- EC bought the rights of Systran in 1976
- currently 18 language pairs
- demo version sv-en in 2003 (http//kwic.systran.fr
/systran/svdemo) - http//babelfish.altavista.com/
31Systran, cont.
- more than 1,600,000 dictionary units
- 20 domain dictionaries
- daily use by EC translators, administrators of
the European institutions - originally a direct translation strategy
- see HS
- today more of a transfer-based strategy
32Ex. 1 fairly good translation /Systran sv-en
- "Enskilda företagare som inte bildat bolag
klassificeras hit." - "Individual entrepreneurs that have not formed
companies are classified here. - Systemet har känt igen bildat som en perfektform
och översätter tempusformen korrekt have formed
med negationen not på rätt plats.
33Ex. 2 word order problem/ Systran sv-en
- "När byarna kontaktades hade de inte ens utsatts
för influensa." - "When the villages were contacted had they not
even been exposed to flu. - Systemet har inte hittat subjekt och predikat och
ger därför fel ordföljd.
34Ex. 3 ambiguity problem/ Systran sv-en
- "Vad kan vi lära av Arrawetestammen?"
- "What can we faith of the Arawete?
- Systemet hittar inte sambandet mellan kan och
lära och ser därför inte att lära är ett verb.
35Ex. 4 ambiguity problem/ Systran sv-en
- Extrapoleringen går till så här. "
- The extrapolation goes to so here.
- Systemet känner inte till partikelverbet känna
till och översätter därför felaktigt ord för ord.
36Systran Linguistic Resources
- Dictionaries
- POS Definitions
- Inflection Tables
- Decomposition Tables
- Segmentation Dictionaries
- Disambiguation Rules
- Analysis Rules
37Systran Processing Steps
- Analysis
- Lookup
- Compound Decomposition
- Disambiguation
- Syntactic Analysis
- Compound Expansion
- Sentence Transfer
- Initial Target Structure
- Lookup
- Default Transfer of Attributes
- Structure Transformation
38Systran Processing Steps (cont)
- Sentence Synthesis
- Structure Transformation
- Inflection lookup
- Surface Transformation
39Motivations for transfer-based translation
- lexical ambiguity
- structural differences
- See further Ingo 91
40Example 1
- Sv. Fyll på olja i växellådan. ?
- En. Fill gearbox with oil.
- (from the Scania corpus)
- fyll på ? fill
- obj ? adv
- adv ? obj
41Example 2
- Sv. I oljefilterhållaren sitter en
överströmningsventil. ? - En. The oil filter retainer has an overflow
valve. - (from the Scania corpus)
- sitter ? has
- adv ? subj
- subj ? obj
42Transfer-based translation
- intermediary sentence structure
- basic processes
- analysis
- transfer
- generation (synthesis)
- language modules
- dictionary and grammar of SL
- transfer dictionary and transfer rules
- dictionary and grammar of TL
43Direct translation
SL
TL
Metal
Transfer
Multra
Interlingua
44Levels of intermediary structure
- cf. JM, Chapter 21
- word order
45Metal
46MULTRA
- Multilingual Support for Translation and Writing
- translation engine
- transfer-based
- shake-and-bake
- modular
- unification-based
- preference machinery
- trace-able
47(No Transcript)
48Analysis
- chart parser (Lisp ? C)
- procedural formalism
- unification and other kinds of operations
- sentence structure
- feature structure
- grammatical relations
- surface order implicit via grammatical relations
- See further Sågvall HeinStarbäck (99),Weijnitz
(02), Dahllöf (89)
49Transfer
- unification-based
- declarative formalism
- Multra transfer formalism (Beskow 93)
- lexical and structural rules
- rules are partially ordered
- a more specific rule takes precedence over a less
specific one - specificity in terms of number of transfer
equations - all applicable rules are applied
- written in prolog
50Generation
- syntactic generation
- Multra syntactic generation formalism (Beskow
97a) - PATR-like style
- unification
- concatenation
- typed features
- morphological generation (Beskow 97b)
- lexical insertion rules
- morphological realisation and phonological finish
in prolog - written in prolog
51An example Tippa hytten.
- Tippa hytten.
- ( (PHR.CAT CL
- MODE IMP
- SUBJ 2ND
- VERB (WORD.CAT VERB
- INFF IMP
- DIAT ACT
- LEX TIPPA.VB.1
- VSURF )
- OBJ.DIR (PHR.CAT NP
- NUMB SING
- GENDER UTR
- CASE BASIC
- DEF DEF
- HEAD (LEX HYTT.NN.1
- WORD.CAT NOUN)))
- REG (V1.LEM TIPPA.VB)
- SEP (WORD.CAT SEP
52Transfer structure
- Transfer structure
- VERB WORD.CAT VERB
- LEX TILT.VB.0
- DIAT ACT
- INFF IMP
- OBJ.DIR PHR.CAT NP
- DEF DEF
- NUMB SING
- HEAD WORD.CAT NOUN
- LEX CAB.NN.0
- MODE IMP
- SUBJ 2ND
- VSURF
- SEP WORD.CAT SEP
- LEX STOP.SR.0
- PHR.CAT CL
53Generation
54A grammar rule
defrule legal.obj lt?1 phr.catgt 'np, not lt?1
casegt 'gen, not lt?1 casegt 'subj
55Transfer rules
- copy feature
- delete feature
- transfer feature
- assign feature
56Copy feature
LABEL mode SOURCE lt modegt ?x1 TARGET lt
modegt ?x2 TRANSFER
57Delete feature
LABEL REG SOURCE lt REGgt ANY TARGET ltgt
ltgt TRANSFER
58Transfer feature
LABEL OBJ.DIR SOURCE lt OBJ.DIRgt
?x1 TARGET lt OBJ.DIRgt ?x2 TRANSFER ?x1 ltgt
?x2
59Define feature
LABEL trycka.in-press SOURCE lt lex
symgttrycka.vbin.ab.1 lt word.catgtVERB TARGET
lt lexgtpress.vb.1 lt word.catgtVERB TRANSF
ER
60A generation rule
LABEL CL.IMP
X1 ---gt X2 X3 X4
ltX1 PHR.CATgt CL
ltX1 VERBgt ltX2gt
ltX1 TYPEgt IMP
ltX1 OBJ.DIRgt ltX3gt ltX1 SEPgt
ltX4gt
61A contextual lexical rule
LABEL tänka.på-think.about SOURCE lt verb
lex symgt tänka.vb.1 lt obj.prep phr.catgt
pp lt obj.prep prepgt ?prep lt obj.prep
prep lex symgt på.pp.1 lt obj.prep rectgt
?rect1 TARGET lt obj.prep phr.catgt pp lt
obj.prep prep word.catgt PREP lt obj.prep
prep lexgt about.pp.1 lt obj.prep rectgt
?rect2 TRANSFER ?rect1ltgt?rect2
62A generation trace
- 1-Applying Rule cl-sep
- 1- Applying Rule cl.imp
- 1- Applying Rule subj2nd-verb-obj.dir
- 1- Applying Rule verb.main.act
- 1- Applying Rule np.the-df
- 1- Applying Rule ng.noun-def
- 1-Success!
63Language resources in the MATS system
- dictionary in a database with different views
- analysis grammar
- transfer grammar
- incl. contextually defined lexical rules
- generation grammar
64sv-en_LinkLexicon
65en-Inflections
66en_LemmaLexicon
67en_LexemeLexicon
68en_Lexicon
69en_StemLexicon
70sv_Inflections
71sv_LemmaLexicon
72sv_LexemeLexicon
73sv_Lexicon
74sv_StemLexicon
75The MATS system
76Assignment 2 Working with MATS
- http//stp.ling.uu.se/evapet/mt04/assignment2.htm
l
77Lexicalistic translation
- Identify (lexical) translation units in the
source sentence - Translate each unit separately (considering the
context) - Order the result in agreement with a model of the
target language - Formulation due to Lars Ahrenberg see further AH
(reading list) see also Beaven, L. John,
Shake-and-Bake Machine Translation. Coling 92,
Nantes, 23-28 Aout 1992.
78T4F a lexicalistic system
- processes in T4F
- tokenisation
- tagging
- transfer
- transposition
- filtering
- See further AH (in the reading list)
79Interlingua translation
80(No Transcript)
81(No Transcript)
82(No Transcript)
83Applications of alignment
- translation memories
- translation dictionaries
- lexicalistic translation
- statistical machine translation
- example-based translation
84Translation memories
- based on sentence links
- optionally, sub sentence links
- See further Macklovitch, E. (2000)
85 Translation dictionaries
- based on word links
- refinement of word links
86Refinement of word alignment data
- neutralise capital letters where appropriate
- lemmatise or tag source and target units
- identify ambiguities
- search for criteria to resolve them
- identify partial links
- compounds?
- remove or complete them
- manual revision?
87Informally about statistical MT
- build a translation dictionary based on word
alignment - aim for as big fragments as possible
- keep information on link frequency
- build an n-gram model of the target language
- implement a direct translation strategy
- including alternatives ordered by length and
frequency - process the output by the n-gram model filtering
out the best alternatives and adjust the
translation accordingly
88Example-based MT
89Some current research topics
- intersentential dependences
- hybrid systems data-driven and rule-driven
- improved alignment techniques
- improved language modeling in ST
- automatic learning from post-editing
- translation by structural correspondences
- translation of spoken language
- improved preference strategies
- ambiguity preserving translation
90Intersentential dependencies
- pronoun resolution
- lexical ambiguity resolution, such as
- (torkar)motorn the motor
- (förbrännings)motorn the engine
- fluency
91Preserving the information structure
- information structure is expressed in different
ways in the source and the target - syntactic clues are exploited in the analysis to
compute the information structure (topic-focus
articulation) - information structure is used to guide the
generation
92An example
Torkarmotorn M2 är sammankopplad med omkopplare S24 och intervallrelä R22. För att inte motorn skall överbelastas, t.ex. om torkarbladen fastnat, finns en inbyggd termovakt som bryter strömmen till motorn när Wiper motor M2 is connected to switch S24 and intermittent relay R22. To prevent motor overload, e.g. if the wiper blade gets stuck, there is an integral thermal sensor which breaks the current to the motor when
93Preferences
- syntactic preferences
- the principle of right association
- the principle of minimal attachment
- two-stage processing
- semantic preferences
- lexical selectional restrictions
- lexical contextual rules
- conceptual taxonomies
- likelihood of occurrence
- See further Bennet, P. Paggio, P., 1993,
Preference in Eurotra.
94Preferences in Multra
- parsing
- a formalism for expressing syntactic preferences
in the parse - not fully developed
- transfer
- contextual lexical rules
- rule specificity
- generation
- rule specificity
95Hybrid systems
- aims
- components
- problems
- architecture
- scores
96Aims of a hybrid system
- simple techniques for simple tasks
- complex techniques for complex tasks
97Components of a hybrid systems
- component strategies
- translation memory
- full sentences
- fragments
- direct translation
- statistical translation
- ebmt
98Component strategies, contd
- rule-based translation
- simplistic analysis (cf. direct translation)
- word by word (S ? sequence of words)
- phrase by phrase (S ? sequence of phrases)
- partial parsing
- full parsing
99Problems of a hybrid system
- how does the system know when a simple technique
is appropriate? - does the source tell?
- does the target tell?
100Architecture and scores
- simple first?
- concerting results?
- scoring?
101Improved techniques for re-use of translation
- combining clues for word alignment (Tiedemann
2003) - interactive word alignment (Ahrenberg et al.
2003) - parallel treebanks
102Translation by structural correspondences
103Translation of spoken language
- See
- Krauver, Steven (ed.), 2000, Machine Translation,
June 2000. Volume 15, Issue 1-2, Special issue on
Spoken Language Translation.