Making machine translation work - PowerPoint PPT Presentation

Loading...

PPT – Making machine translation work PowerPoint presentation | free to download - id: 447af5-Nzc2Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Making machine translation work

Description:

Making machine translation work By Stefan, Simon, Lisa, Nina and Dennis Making machine translation work Introduction Human versus Machine Translation Methods in ... – PowerPoint PPT presentation

Number of Views:441
Avg rating:3.0/5.0
Slides: 84
Provided by: wwwhomesU
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Making machine translation work


1
Making machine translation work
  • By Stefan, Simon, Lisa, Nina and Dennis

2
Making machine translation work
  • Introduction
  • Human versus Machine Translation
  • Methods in Machine Translation
  • Example-Based Machine Translation

3
Making machine translation work
  • Group work HT vs. MT
  • Try to translate the following proverb
  • ? Wer A sagt, muss auch B sagen.
  • HT use your language knowledge
  • MT Use Babel Fish (http//babelfish.altavista.com
    /tr)

4
Making machine translation work
  • Possible solution

HT MT
In for a penny, in for a pound. Who says A, also B must say.
In how far is such a translation
suitable/appropriate?
5
Human and Machine Translation
  • HT and MT differ in two main points
  • 1. Mode of process
  • 2. Mode of product
  • based on different specifications and theoretical
    positions
  • both modes are used for comparison

6
Human and Machine Translation
  • Mode of process
  • By comparing the modes of process you
  • gain knowledge about the respective stages and
    intersections
  • can make decisions about choices of alternative
    methods
  • and about new designs of translation methods

7
Human and Machine Translation
  • Mode of product
  • By comparing the modes of product you
  • check the appropriateness of the translation
  • figure out the most efficient method
  • ? the MT product must be usable in the same way
    as the human product
  • ? secure a basis of equality

8
Human and Machine Translation
  • Another criterion for comparison
  • text input must be a constant so that the
    products are comparable
  • ? help to formulate guidelines for HT or MT texts

9
Human and Machine Translation - translation
processes -
  • Translation as problem solving

10
Human and Machine Translation - translation
processes -
  • Four major steps
  • (a ) SL linguistic de-composition
  • (b) Problem identification at the SL linguistic
    and cognitive level
  • (c) Problem solution at the cognitive and TL
    linguistic level (knowledge base)
  • (d) TL linguistic re-composition

11
Human and Machine Translation - translation
processes -
  • Characteristics of HT
  • Knowledge base is flexible
  • Problems can be transferred
  • Intuition/experience of the translator
  • Knowledge base expands constantly

12
Human and Machine Translation - translation
processes -
  • MT model of problem solving

13
Human and Machine Translation - translation
processes -
  • Characteristics of MT
  • Knowledge base is relatively limited and rigid
  • Has fixed and pre-established connections
  • Limited possibility of transferring problems
  • less semantic and pragmatic level experience
  • Lack of essential world-knowledge

14
Human and Machine Translation - translation
processes
Major levels of comparison
Human modules Machine modules
Comprehension Analysis
Matching Transfer
Writing Generation/Synthesis
15
Human and Machine Translation - translation
processes
  • Comprehension vs. Analysis

Human Machine
adapts innovations works retrospectively
high amount of interpretative capacity limited amount of interpretative capacity
interferencing
16
Human and Machine Translation - translation
processes
  • Matching vs. Transfer

Human Machine
compensation of items which cannot be matched in conventional ways equivalents cannot be pre-planned or incorporated
17
Human and Machine Translation - translation
processes
  • Writing vs. Generation/Synthesis

Human Machine
can respond to syntactic or lexical innovations or deviations works prospectively
can create equivalences
18
Human and Machine Translation - translation
products -
  • Products can be compared with regard to
  • to the nature of the output language
  • to the produced text

19
Human and Machine Translation - translation
products -
  • The nature of MT language
  • MT language is constructed and artificial (the
    computer cant produce sentences on its own)
  • it corresponds to the designers perception of SL
    and TL
  • has no creative potential (it is not as flexible
    and multifunctional as HT language)
  • They exclude emotive, aesthetic of other meanings
  • ? each MT system produces its own language (i.e.
    Weidner English or Atlas English)

20
Human and Machine Translation - translation
products -
  • The nature of MT language
  • MT systems are one-way converter (they only
    recognize words that belong to the system)
  • MT language often needs post-editing

21
Human and Machine Translation - translation
products -
  • Flexibility vs. rigidity in text types
  • MT lang. is conceived on the sentence level
  • ? no distinctions on the text type possible
  • ? MT systems can only handle text types they have
    been programmed for
  • ? unknown text types cause unacceptable output

22
Human and Machine Translation - translation
products -
23
Human and Machine Translation - translation
products -
  • Challenge for MT language
  • construction of user-friendly articifial language
  • optimum transfer of information from SL/NL to AL
  • to convince users that AL is equally efficient as
    NL

24
The Pragmatic Circumstances of Automation in
Translation
  • Methods of MT
  • Linguistic approach
  • Semantic approach
  • Users of MT systems
  • Some MT systems
  • Functional types of MT

25
Methods of MT Linguistic approach
  • three strategies
  • Analysis of the source text
  • Mode of transfer
  • Generation of target text

26
Linguistic approach Three main subtypes
  • a) Language-pair-specific direct systems
  • Earliest type of system
  • Reflects the design philosophy of the 1950s and
    1960s
  • Exploited direct correspondences between two
    languages

27
Linguistic approach Three main subtypes
  • b) Interlingual systems
  • SL text transformed into a semantic and syntactic
    representation (equivalent of the transfer phase)
    which is common to at least two languages
  • That text in an other language can be generated
    from this representation
  • transform from a source language A into a
    target language B, using rules expressed in a
    third language C. (Cherry. 1966)
  • Two phases 1. Analysing in terms of the
    interlingual representation 2. TL sentences are
    produced from this representation.

28
Linguistic approach Three main subtypes
  • c) Transfer systems
  • Analysis phase SL text is processed to the depth
    required by the rules of its grammar
  • Transfer phase based on the target language
    transforming into a representation for the
    generation of a target language text
  • Generation phase the transfer representation is
    then transformed into a text in the TL without
    any further back-reference to the results of
    analysis.

29
The semantic approach
  • Semantic processes only operate after the
    identification of syntactic structures.
  • Chief components are semantic parsing, i.e.
    analysis of semantic features instead of, or in
    addition to, grammatical categories.
  • The system does understand the SL text, before
    translation begins.

30
Users of MT systems
  • The translator as producer
  • Machine to provide cheaper, faster and a larger
    volume of production, without significant loss of
    quality
  • Clearly seen as a industry product

31
Users of MT systems
  • The writer as translation producer
  • Writers gain a certain degree of independence
    from translators, who exclusively determined form
    and quality of the end product
  • Writers may want to develop bi- or multilingual
    texts directly rather than write a text for
    subsequent translation

32
Users of MT systems
  • Readers of translation
  • to be able to by-pass the time-consuming and
    costly human translation circuit, and instead
    obtain instant translations produced by an MT
    system

33
Users of MT systems
  • The information supplier
  • possibilities of providing translated versions
    automatically as part of the general information
    supply, e.g. multilingual versions of electronic
    journals or databases

34
Some MT systems
  • ATLAS
  • Japanese system, based on structural transfer,
    for specialised technical texts
  • CULT
  • Interactive system, for on-line translation of
    texts in the field of mathematics from Chinese
    into English

35
Some MT systems
  • METEO
  • The Canadian Federal Government system for the
    production of bilingual French-English weather
    reports
  • SYSTRAN
  • Oldest commercially available MT system, of
    un-edited output, for post-editing use, for
    restricted-language document input and for
    general use in the French Minitel system
  • Largest number of language pairs, all EC languages

36
Function types of machine translation
  • Two possible modes of viewing automatic
    translation
  • See the computer as an aid to human translation
  • Accept that the computer provides a translation
    service sui generis which is not comparable to
    the human variety

37
MT as human translation aid
  • MT as aids to translators
  • Intended to accelerate the human process of
    translation
  • Output is artificial to the extend that it does
    not conform to certain expectations
  • End user still wants a human product, but will
    accept MT as long as it is either cheaper or
    produced more quickly

38
MT as human translation aid
  • Systems are greatly improved by concentrating on
    particular text types and ranges of vocabulary
  • Systems offer subject-specific modules of
    vocabulary and phraseology that can be switched
    into the process

39
Machine assisted human translation
40
Machine assisted human translation
  • Check text against an automated dictionary
  • Ignores common words and function words
  • Looks up translation equivalents for special
    vocabulary items
  • Speed up the process

41
Machine assisted human translation
42
Machine assisted human translation
  • Text is pre-translated automatically
  • Output not adequate for direct use or
    post-editing
  • Offers words and expressions
  • Translator reduce the time for dictionary look-up
  • Save the time of actually typing the found
    translation equivalents

43
Machine assisted human translation
44
Machine assisted human translation
  • MT produces artificial language (AL2)
  • Post-editing efforts must be less than that
    required for a full human translation

45
Machine assisted human translation Three-stage
machine assistance
46
Machine assisted human translation
  • Text is prepared for MT by human pre-editing
  • System produces output in AL2 which post-editors
    can convert into a NL2 document
  • Final document is not distinguishable from a
    human translation

47
Machine assisted translation
  • These models of MT hide the true nature of MT
  • Rather an aid than an alternative to human
    translation
  • Application is limited
  • simplest and the most difficult types of MT
    systems to design
  • Examples ALPS, ATLAS, WEIDNER, SYSTRAN

48
Translation by reference to existing models
  • System scans existing documents by
    text-deconstruction method of text comparison
  • Identifies similar passages and offer these to
    the translator as models for the new task

49
MT as text-type specific independent systems
  • automatic in the sense that human intervention
    is not required between input and output
  • Is used
  • Without the intervention of a human translator
  • As a text-production system for previously edited

50
MT as text-type specific independent systems
  • Three forms of output
  • Raw translation in AL2 suitable for post-editing
    and possible conversion to NL2
  • A final AL2 version which can be used almost in
    same way as natural language text, has been
    pre-editing
  • Unedited final translation, i.e. an artificial
    language, which is acceptable for readers

51
Reader-oriented MT
  • Readers accept difficult-to-read texts if they
    are cheap and above all fast

52
Reader-oriented MT
  • Output is machine-produced and therefore by
    definition an artificial product which may be
    easier or more difficult to understand than a NL
    text
  • Not comparable to a human translation
  • L2 reader receive a text in L1
  • Submit the text to MT in full knowledge that the
    output is a machine-translated text

53
Writer-oriented MT
  • Writer know better than anybody else what they
    want to say
  • Translators have to interpret what writers have
    said
  • Machine asks questions about elements which it
    cannot analyse

54
Writer-oriented MT
55
Writer oriented editing of pre-translated text
  • System offer menus of existing SL text segments
    which are pre-translated
  • E.g. business letters choice of type of letter,
    separate menus within the types

56
EBMT
57
Definition
  • Man does not translate a simple sentence by
    doing deep linguistic analysis, rather, man does
    translation, first, by properly decomposing an
    input-sentence into other language phrases, and
    finally by properly composing these fragmental
    translations into one long sentence. The
    translation of each fragmental phrase will be
    done by the analogy translation principle with
    proper examples as its reference. (Nagao)

58
Model EBMT
59
  • EMBT does not presuppose an analytic translation
  • ? it is an analog translation system
  • Founded on
  • 1) translation by decomposing
  • 2) translation of phrases
  • 3) composing fragments into long section

60
  • EBMT consists of a bilingual corpus
  • 1) a fixed corpus (how much is) of
    sentence-pairs
  • Example How much is the bread? Wie teuer ist
    das Brot?
  • How much is the car? Wie teuer ist das Auto?
  • ?question varies by just one element (minimal
    pair the bread/the car)

61
  • Often linked with translation memory (TM)
  • it must in fact be possible to produce a
    programme, which would enable the word processor
    to remember whether any part of a new text
    typed into it had already been translated.
  • ? T9-typing within mobile phones

62
History
  • Until the 80s ? rule-based translations
  • Research dominated by corpus-based approaches
  • 1) statistical machine translation
  • 2) EBMT
  • first suggested by Nagao Makoto in 1984
  • soon attracted the attention of scientists in the
    field of natural language processing.

63
Matching
  • First task in an EBMT system
  • Searched for a word or phrase that closely
    matches the source language
  • ? Example Where is the plate
  • ? Correct translation Wo ist der Teller
  • ? and not Wo ist die Platte
  • most appropriate word is inserted

64
Problems
  • Long passages ? low probability of complete match
  • Short passages ? probability of ambiguity
  • Sentences are not translated completely but are
    divided into smaller sections
  • ? often incoherent translation results

65
  • Problem of the size of the example database
  • Some of the systems are more experimental than
    others
  • Adding examples improves translation performance
  • No improvement after an amount of examples, which
    is too broad

66
Problem suitability of examples
  • Some examples have identical translation
  • Same phrase may have two different translations
    caused by inconsistency
  • Too big variety of examples may cause problems
    with the choice of the exact word
  • Ambiguity
  • Can lead to overgeneralization

67
Problem storage of examples
  • Normally words are stored with no further
    information
  • To avoid ambiguity and to limit the choice
  • ?Expansion of examples by adding contextual
    markers
  • ?context is regarded in order to help finding the
    right word

68
Suitable translation problems
  • EBMT best suited for sublanguage translation
  • EBMT is often more suitable than MT
  • Antidote to structure-preserving translation as
    first choice

69
Adaptability
  • Most difficult step in EBMT process
  • Appropriate fragments have to be extracted from
    the text
  • Problem 1) words have to find its correspondence
    to the matched portions
  • 2) find the correct
    recombination, which is appropriate and
    grammatical

70
Boundary Friction Problem of inflection
  • ?Example I ate the apple
  • Translation Ich aß der Apfel
  • Example II The apple is on the table
  • Translation Der Apfel liegt auf dem Tisch.
  • To solve the problem the translation system had
    to contain a grammatical system of the target
    language

71
  • Examples should be similar in internal and
    external context
  • Example-retrieval can be scored on two counts
  • ? closeness of the match between the input text
    and the example
  • ? the adaptability of the example, on the basis
    of the relationship between the representations
    of the example and its translations

72
Recombination
  • New generation of the target text
  • Last action in translation-process
  • Often not possible to put translated phrases
    together
  • ?Example Its raining outside Es regnet nach
    draußen
  • Recombination has to make sure that the phrases
    are put together conformly
  • ?Example Its raining outside Es regnet
    draußen

73
Computational Problems
  • Huge costs in terms of
  • ?storage
  • ?creation
  • ?matching/retrieval algorithms
  • SPEED as a main issue
  • ? A computer-translation has to be as fast as a
    speech-translation

74
Flavours of EBMT
  • Used as a component in a MT-system
  • EBMT can be used
  • ?with other engines
  • ?for certain problems
  • ?when some other component cannot deliver a
    result
  • EBMT bitter rival to the existing engines

75
Example-based transfer
  • Examples are stored as trees or other complex
    structures as example-based transfer systems.
  • ? In these systems, source language input
    strings are analysed into structured
    representations in a conventional manner, only
    transfer is on the basis of examples rather than
    rules, and then generation of the target language
    output is again done in a traditional way. (H.
    Somers)

76
Generalization
  • Syntactic category
  • Example
  • ?play baseball yakyu o suru
  • ?play tennis tenisu o suru
  • ?play the piano piano o hiku
  • ?play the violine baiorin o hiku
  • Different vocabulary for play in Japanese,
    engine has to distinguish whether an instrument
    or sport is meant
  • Play x (NP/sport) x (NP) o suru
  • Play x (NP/instrument) x (NP) o hiku

77
Generalization
  • Syntactic category
  • Example
  • ?play baseball yakyu o suru
  • ?play tennis tenisu o suru
  • ?play the piano piano o hiku
  • ?play the violine baiorin o hiku
  • Different vocabulary for play in Japanese,
    engine has to distinguish whether an instrument
    or sport is meant
  • Play x (NP/sport) x (NP) o suru
  • Play x (NP/instrument) x (NP) o hiku

78
  • Semantic category
  • A word must be chosen first
  • Word is generated
  • Word-level rule is made up
  • The quality of the translation rules depends on
    the quality of the thesaurus
  • Works best with non-idiomatic texts

79
  • Automatic category
  • A simpler approach
  • Less initial analysis of the corpora
  • ?I am coming geliyorum
  • ?I am going gidiyorum
  • ?I am comeing gelHyoryHm
  • ?I am going gidHyoryHm
  • - I am stays fixed, while come and go differ

80
Multi-engine system
  • EBMT two other techniques knowledge based MT
    and lexical transfer engine
  • Multi-engine system combines EBMT with
    rule-based and corpus-based approaches
  • User can
  • ?modify the results
  • ?intervene in the choice of translation
  • ?edit the output

81
Conclusion
  • What counts as EBMT?
  • Use of a bilingual corpus
  • Use of a reference corpus
  • What is the aim of EBMT?
  • ?to generalize the examples as much as possible
  • What is the problem of EBMT?
  • ? Some translations are suitable, some are not

82
  • Advantages of EBMT
  • Examples are real language data overgeneration
    is reduced
  • ?Linguistic knowledge can be more easily enriched
    by adding more examples
  • ?can be quickly developed
  • ?not as a rival but as an alternative

83
Literatur
  • Somers, H. (2003). An overview of EBMT. In
    Michael Carl and Andy Way (eds) Recent advances
    in Example-Based Machine Translation, Dordrecht
    Kluwer, 3-57.
  • Sager, J. (1994). Language engineering and
    translation consequences of translation.
    Amsterdam. 267-292
About PowerShow.com