Machine Translation MT - PowerPoint PPT Presentation

About This Presentation

Machine Translation MT


... History The history of MT research has gone through a ... from 2000s research on spoken translation ... in paper dictionaries in an appropriate fashion. – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 34
Provided by: Shaf92


Transcript and Presenter's Notes

Title: Machine Translation MT

Machine Translation MT Computer-Assisted
Translation CAT
Machine Translation
  • Introduction to MT systems
  • Generations of MT systems
  • Different types of MT systems
  • Construction of MT systems
  • Knowledge representation
  • Knowledge processing
  • MT engines
  • New directions of MT systems
  • Evaluation of MT CAT systems

Machine Translation Introduction
  • Machine translation ( MT) is a long-term
    scientific dream of enormous social, political
    and commercial importance.
  • It was one of the earliest applications suggested
    for digital computers, but turning this dream
    into reality has turned out to be a much harder.
  • Despite different problems and difficulties, some
    degree of Machine translation is now a daily
    reality and it is likely that in the future, the
    bulk of routine technical and business
    translation will be done with some kind of
    machine translation tools.

Machine Translation History
  • The history of MT research has gone through a
    number of phases in which certain frameworks have
  • First generation From the late 1960s the
    syntactic orientation was dominant with syntactic
    transfer approaches.
  • Second generation In the 1980s the AI
    orientation was popular and more attention was
    paid to semantics.
  • Third generation from 1990s the corpus-based
    model with example-based methodologies is the
    focus of much translation activity. (e.g. old
    versions of Electronic Dictionaries)
  • Forth generation from 2000s research on spoken
    translation has developed into a major focus of
    MT activity. (e.g. latest versions of Electronic
    Dictionaries )
  • Last ten years research on Computer-Assisted
    Translation CAT has developed into a major focus
    of translation activity.

How To Construct an MT system
Knowledge Representation
  • Different kinds of knowledge are generally needed
    for Machine translation and must be represented
    in such a way it can be processed automatically
    by MTs
  • Knowledge of the source language
  • Knowledge of the target language
  • Knowledge of the various correspondences between
    source language and target language (at least
    knowledge of how individual words can be
  • Knowledge of the culture, social conventions,
  • Etc.
  • Several kinds of linguistic knowledge are usually
  • Phonological knowledge
  • Morphological knowledge
  • Syntactic knowledge
  • Semantic knowledge

Knowledge Representation Dictionary
  • The central and largest component of an MT system
    is Dictionary.
  • The size and quality of the dictionary limits the
    scope and coverage of a system and the quality of
    translation that can be expected.
  • Electronic dictionaries of MT must at least
    represent the information we can find in paper
    dictionaries in an appropriate fashion.

Knowledge Representation Paper
Knowledge Representation Electronic Dictionaries
  • Entries in MT monolingual-dictionary will be
    equivalent to collection of attributes and
    values, like the following
  • Lex button, catn, ntypecommon, numbersing,
    humanno, concreteyes.
  • Lexbutton, catv, vtypemain, finite, person,
  • Entries can be implemented as records in a
  • Entries in MT bilingual-dictionary are generally
    represented by translation rules, like the
  • Button ?? ?? / ???? / ???/ ??? ... ???
  • This allows the replacement of certain source
    language oriented information with corresponding
    target language information.

Knowledge Representation Morphology
  • Morphology is concerned with the internal
    structure of words and how words can be formed.
  • MT CAT systems must add a morphological
    components that can recognize different word
    formation processes
  • Inflection a word is derived from another word
    form by maintaining the same part of speech or
    category walk ? walks
  • Describe regular inflections by general rules,
  • Lex walks, catv, finite, person3rd,numbersing
    , tensepres) ?? Vs
  • Describe irregular inflections by explicit rules,
  • Lexbe, catv, finite, person3rd,numbersing,
    tensepres) ?? is
  • Derivation a word of a different category is
    derived from another word or word stem by
    application of a process involving stems and
    affixes grammar, ?grammatical, arrive ?arrival
  • regular derivational processes can be described
    by rules
  • Irregular derivations can be solved simply by
    listing all derived words
  • Compounding a new word or unit is formed by
    combination of two or more words

Knowledge Representation Syntax and Grammars
  • Syntax is concerned with how sentences can be
    made up out of words.
  • To describe syntax, a grammar (set of rules) is
    generally used in MT CAT.
  • For the first kind of information, programmers
    and developers with consultations of linguists
    have to represent the concerned divisions of the
    sentence into their constituent parts and the
    categorization of these parts as nominal, verbal,
    and so on.
  • Consider that in English a sentence consists of
    noun phrase followed by an auxiliary verb
    followed by a verb phrase. Noun phrase consists
    of etc. We can represent these knowledge by
    the following grammar
  • S ? NP (AUX) VP
  • NP ? (DET) (ADJ) N PP
  • VP ? V (NP) PP
  • PP ? P NP
  • N ? user printer
  • V ? clean
  • AUX ? should
  • DET ? the a
  • P ? with
  • a user should clean the printer is a sentence
    in the above grammar

Knowledge Representation Meaning
  • Knowledge about the meaning of sentences are an
    important part of the translation process and
    allow MT CAT systems to produce better results.
  • Three useful kinds of knowledge relating to the
    meaning can be distinguished
  • Semantic knowledge meaning of words and
    sentences independently of the context they
    appear in.
  • Pragmatic knowledge meaning of expressions in
  • Real world or common sense knowledge
  • It is useful to represent these kind of knowledge
    in MT CAT systems in order to increase their
    performance. Accomplishing this goal proved to be
    the most difficult task in the developing the MT
    CAT systems.

Knowledge Processing
  • We give now an idea of how knowledge can be
    manipulated automatically by MT systems
  • This can be done in two stages parsing and
  • Parsing is the process of taking an input string
    of expressions and producing representations
    appropriate to the translation
  • Generation is the process of taking an
    appropriate representation and producing the
    corresponding sentence
  • A graphical representation will be used for
    parsing and generation processes. However, the
    internal representations are lists (very useful
    data structures).

Knowledge Processing Parsing
  • The task of a parser is to take a formal grammar
    and a sentence and
  • Check if it is indeed grammatical
  • Show how the words are combined into phrases
  • Different parsing methods exist and are
    subdivided into two categories Top-Down parsing
    method and Bottom-Up parsing method.
  • Examples of parsing using grammars defined in the
    previous sections and sentence the user should
    clean the printer are given bellow.

Parsing Bottom-Up algorithm
Parsing Top-Down algorithm
Latest Engines in MT
  • Speech Recognition MT trying to apply to MT
    techniques which have been highly successful in
    Automatic Speech Recognition.
  • Computer-assisted Translation the idea is to
    collect a bilingual corpus of translation pairs
    and then use a best match algorithm to find the
    closest example to the source phrase in question.
    Ex Trados, Worsfast etc.

What is a CAT Tool?
  • CAT stands for "Computer Aided Translation Tool".
    The terms "Translation Memory" and "TM" are
    sometimes used to refer to the same type of tool.
    A CAT tool is a computer program that helps a
    translator to work efficiently. This is
    achieved through three main functions
  • A CAT tool breaks texts into segments (sentences
    or sentence fragments) and presents the segments
    in a convenient way, to make translating easier
    and faster. In some tools, for example Tardos ,
    each segment is presented in a special box, and
    the translation can be entered in another box
    right below the source text.

  • The translation of each segment is saved together
    with the source text. Source text and translation
    will always be treated and presented as a
    translation units (TU). You can return to a
    segment at any time to check the translation.
    There are special functions which help to
    navigate through the text and to find segments
    which need to be translated or revised (quality
  • The main function of a CAT tool is to save the
    translation units in a database, called
    translation memory , so that they can be re-used
    for any other text, or even in the same text.
    Through special "search" features. The search
    functions of CAT tools can also find segments
    which do not match 100. This saves time and
    effort and helps the translator to use consistent

Evaluation of MT CAT Systems
  • The evaluation of MT CAT systems is a complex
    task. This is not only because many different
    factors are involved, but because measuring
    translation performance is itself difficult.
  • Clarity a traditionally way of assessing the
    quality of translation is to assign scores to
    output sentences.
  • Accuracy It is important to check whether the
    meaning of the source is preserved in the
  • Error Analysis tries to establish how seriously
    errors affect the translation output.
  • Test Suite running the system on a large corpus
    of test texts will reveal different possible

How to start using Trados?
  • Steps to follow for creating, opening and
    exporting a translation memory, and further basic
    features of the software.
  • You have to take into account that these steps
    correspond to SDL Trados 2006, so some menus can
    be different in other Trados versions.

To create a translation memory
  • 1. Go to Windows / Start /All programas/ SDL
    Internacional SDL Trados 2006 / Translators
    Workbench. The software will start running and
    will request the user name.
  • 2. Go to File / New.
  • 3. A window will show where you have to choose
    the source and target language by clicking on
    Add. Then, click on Create.
  • 4. A window will display where you have to enter
    the name for the TM and browse where to save it.
  • NoteNext time you open Translators Workbench,
    the last memory used will be opened by default.

(No Transcript)
(No Transcript)
To open a translation memory
  • There are two ways of opening a translation
  • 1. You can double click on the icon of the TM you
    wish to open,
  • or open Trados Translators Workbench.
  • 2. Go to File / Open
  • 3. A window will be displayed, were you have to
    look for the TM you want to open, and once found,
    click on it.

  • The Trados TM will provide an existing equivalent
    sentence in the TL if it matches 100.
  • The Trados TM will provide suggested words or
    phrases in different colors if the equivalent
    sentence does not match 100.
  • Easily select from the possible suggestions
    offered by the MT.
  • Confirm these suggestions offered by MT or
    simply type your own words or phrases.
  • click Ctrl Enter to confirm and move on to
    another sentence.
  • Once finished translating the whole text, click
    File . Save as .. rename the file . Saving is
    accomplished .

Some pieces of advice
  • Dont press Enter when you are inside a
    translation unit since you can break it.
  • Using the commands from the keyboard speeds up
    the job.
  • If you have any problems go to Help/Help Topics
    in Translators Workbench.

Creating a Multiterm Termbase to Use in SDL
Trados Studio
    Multiterm is a separate program, it's not part of
    Trados or Studio. It needs to be downloaded and
    installed separately, and it appears as a
    standalone program in your SDL folder in your All
    Programs list in Windows. If you don't see it
    there, make sure to go to your SDL account and
    download and install the program from the My
    Downloads page.2. Termbases cannot be created
    in Trados or Studio. The "Create New Termbase"
    you see in the SDL Trados main page or the
    "Terminology Management" button in the Studio
    home page are merely links that will take you to
    Multiterm, if it's installed in your computer.

Creating a simple Multiterm termbase
  • Multiterm can be as simple or as complex as you
    want it to be. In this example, the simplest kind
    of termbase will created
  • source term
    target term.
  • No other index fields will be included.1. Open
    SDL Multiterm Desktop, Go to File, then select
    Create Termbase then Save your termbase in the
    dialog box that opens

  • Click Next on Step 1 to 5 of the Termbase Wizard
    .choose your languages..Click Finish,
  • In this case the termbase has been created but
    it's empty.

  • To manually add terms, click on the Terms tab on
    the bottom left of your screen, click F3 or click
    on the Add New Entry icon right under the Edit
    menu. You will see the Entry screen, as shown

  • Double click on the little box next to the pencil
    icon and enter the term for each entry.

  • Press F12 to save the changes.
  • The term is now part of your termbase and
    therefore will be available when you use the
    termbase in Studio.This concludes the basics of
    termbase creation.
Write a Comment
User Comments (0)