Title: The Role of Translation Tools in the Information Age
1The Role of Translation Tools in the
Information Age
- Feng Zhiwei
- Institute of Applied Linguistics, MOE, China
- zwfengde_at_hotmail.com
-
- First Sino-German Symposium on Knowledge
Handling, 2007, Beijing
2Abstract
- We are in the information age. The snowballing
acceleration of available information resulted in
drastic changes in the way of translators work.
The paper introduces the translation tools in
information age machine translation system,
translation resources on Internet and on CD-ROM,
computer-assisted terminology management system,
parallel corpora, translation memories and
localization tools, computer-aided machine
translation system. If translators use these
translation tools properly, they shall improve
the efficiency and quality of translation.
3Keywords
- multilingualism, machine translation system,
Internet, CD-ROM, translation resources,
computer-assisted terminology management system,
parallel corpora, translation memories,
localization tools, computer-aided machine
translation system.
4English lingua franca?
- 80 percent of all business transactions in
Denmark are carried out in English. - Many large corporations have adopted English as
their official language. - 85 percent of international organizations use
English as their working language. - In Europe, 99 percent of all international
organizations have English as one of their
official languages.
5English lingua franca?
- 98 percent of all German physicists and 83
percent of all German chemists publish their
findings in English. - 90 percent of all scientific publications are
written in English. - The majority of Nobel Prizes go to laureates who
are citizens of countries where English is the
official language. - English is the default language for
international scientific conferences. No matter
where they take place or what their specific
topics are. - english-reader.JPG
6Linguistic uniformity or multilingualism ?
- In the world, there are 6000 different languages
with different cultural backgrounds. - in Welsh of United Kingdom, the people speak
welsh language. - In USA, some people speak Spanish.
- The multilingualism is very popular and
necessary.
7Multilingualism in European Union
- IN EU institutions, its original 15 member states
have the privilege of using their state languages
to conduct their official business. - This multilingualism is made possible by the work
of about 4000 in-house translators, interpreters
and terminologists and many more freelancers. - With 11 official languages (for 15 states) and
110 possible language-pair combinations, in 1997,
2 billion euros were spent on translation. - This does not include the more than 200,000 pages
translated by EC-SYSTRAN MT system each year.
8Multilingualism in European Union -- cont
- Each additional official language increases the
demands by 250 to 300 linguists. - With the expansion of the EU by as many as 12 new
members and the integration of 10 new languages,
the number of combinations would increase
exponentially, resulting in 420 combinations of
21 languages.
9Localization in business
- The clients will only buy in their own language.
- The sellers need to speak the language of the
customer and to adopt their conduct and products
to the specific characteristics of the local
market . - Localization is not only for products, but also
for the methods of designing, producing,
marketing and distribution.
10Internet and translation
- Internet becomes a multilingualism network.
- 2000-2005, internet growth of English is only
126.9, while internet growth of Russian is
664.5, Portuguese 327.3, Chinese 309.6, French
235.9. - Growth rate in number of Internet users in
non-English speaking countries are much higher
than in English speaking countries. - Dominant position of English was broken.
- Translation becomes more and more important.
- internet-langue.doc
11Information explosion
- In information age, the snowballing acceleration
of available information resulted the information
explosion. - The amount of knowledge to be processed within
the next decade is larger than the amount of
knowledge accumulated during the past 2500 years. - 165,000 scientific journals are currently being
published. - 20,000 new scientific papers are produced every
day.
12Information explosion
- The amount of data that is circulating on the
Internet on any given day is larger than all the
information available throughout the 19 century
(Der Spiegel, 1996) - The combined vocabulary of technical and
scientific disciplines amounted to 30 million
words in 1991 (Siemens, 1991)
13Translation market
- According to the study by Allied Business
Intelligence, the global translation market is
10.4 billion in 1999 and 17.2 billion in 2003
respectively. - In 1997, the EU-funded ASSIM study estimated the
total turnover of the translation markets of 18
member states of the EU and EEA (European
Economic Area ) to be 3.75 billion euros with
software, audio-visual and multimedia translation
constituting 20 percent of the total turnover.
14Translation market
- According to ASSIM study, the total number of
in-house and external translators in EU and EEA
exceeds 100,000. - The total turnover of the translation market in
China continent exceeded 10 billion Yuan. - Electronic translation tools will be helpful in
translation
15Using electronic translation tools
- According to ASSIM study, more than 50 percent of
translators interviewed for the 1997 ASSIM report
were using electronic dictionaries, and about
one-third of the translators were using
translation memory systems. - A lot of translators in China continent and Hong
Kong were using electronic tools in their
translation.
16Electronic Translation Tools
- Machine Translation System
- Translation Resources on Internet
- Translation Resources on CD-ROM
- Computer-Assisted Terminology Management
- Parallel Corpus as Translation Tools
- Translation Memory
- Localization Tools
- Computer-Assisted Translation System
17Machine Translation System
- The first attempts to mechanize translation were
made as early as the 1930s. - Weaver memorandum (1949)
- I have a text in front of me which is written in
Russian but I am going to pretend that is really
written in English and that is has been coded in
some strange symbols. All I need to do is strip
off the code in order to retrieve the information
contained in the text. - MT is a decoding system.
18Warren Weaver (1947)
ingcmpnqsnwf cv fpn owoktvcv hu ihgzsnwfv
rqcffnw cw owgcnwf kowazoanv ...
19Warren Weaver (1947)
e e e e ingcmpnqsnwf cv fpn
owoktvcv e e e hu
ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv
...
20Warren Weaver (1947)
e e e the ingcmpnqsnwf cv fpn
owoktvcv e e e hu
ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv
...
21Warren Weaver (1947)
e he e the ingcmpnqsnwf cv fpn
owoktvcv e e e t hu
ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv
...
22Warren Weaver (1947)
e he e of the ingcmpnqsnwf cv fpn
owoktvcv e e e t hu
ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv
...
23Warren Weaver (1947)
e he e of the fof ingcmpnqsnwf cv fpn
owoktvcv e f o e o oe t hu
ihgzsnwfv rqcffnw cw owgcnwf ef kowazoanv
...
24Warren Weaver (1947)
e he e of the ingcmpnqsnwf cv fpn owoktvcv
e e e t hu ihgzsnwfv
rqcffnw cw owgcnwf e kowazoanv ...
25Warren Weaver (1947)
e he e is the sis ingcmpnqsnwf cv fpn
owoktvcv e s i e i ie t hu
ihgzsnwfv rqcffnw cw owgcnwf es kowazoanv
...
26Warren Weaver (1947)
decipherment is the analysis ingcmpnqsnwf cv fpn
owoktvcv of documents written in ancient hu
ihgzsnwfv rqcffnw cw owgcnwf languages
... kowazoanv ...
27Warren Weaver (1947)
When I look at an article in Russian, I say to
myself This is really written in English, but it
has been coded in some strange symbols. I will
now proceed to decode.
28FAHQT can not be achieved
- But Full Automatic High-Quality Translation
(FAHQT) can not be achieved using todays
technology. - Bar-Hillels example (1959),
- John was looking for his toy box. Finally he
found it. The box was in the pen. John was very
happy. - How we can decide the sense of pen is a
play-pen, and is not the writing tool. It is very
difficult using todays technology !
29MT in SAP company.
- SAP uses the mainframe-based Metal MT system for
translations from German to English. - SAP see MT as enhancing productivity and of
growing importance in the companys translation
methodology. - SAP has found that using MT, under the best
circumstances, can be two or four times faster
than traditional translation methods.
30Multilingual MT system -- Systran
-
- The Systran translation from English to Chinese
is readable. - Web of Systranhttp//www.systransoft.com
- systran.JPG
31Multilingual Intelligent hand-phone
- Beijing city guide is a multilingual
translation hand-phone (Beijing Information
development Company, 2006-08) - hand-phone.jpg
- Foreign visitor type in I want to Beijing
Hotel, hand-phone can translate it as ???????. - Taxi driver type in ??????, hand-phone can
translate it as You are welcome to Beijing
32Translation Resources on Internet
- Internet is the language resource for
translation. - Finding data on the Internet is no problem at
all. - But finding reliable information is a rather
difficult task. - Finding the information you really need can be
very time-consuming and often frustrating.
33Three strategies for Internet search
- Institutional search through URL(Uniform Resource
Locator). - Thematic search via subject trees.
- Keyword search via search engine. Search engine
basically consist two components - A large index of words contained in web
documents. - Retrieval software that lets you search for words
in the index and then display the matching
documents on the screen.
34Libraries online and virtual bookstores
- In order to understand the source text, it may be
as necessary to access libraries and browse
virtual bookstore. - Via OPAC(http//catalog.loc.gov),you can search
main libraries. catalog.jpg - Via Amazon(http//www.amazon.com), you can browse
the virtual bookstores on web. amazon.JPG
35General encyclopedias
- Via Britannica online, you can search
Encyclopaedia Britannica on line. - http//www.britannica.com
- britannica.JPG
36Specialized encyclopedias
- PC Webopedia is to the world of specialized
online encyclopedias. - It is a English reference work for information
and communication technology (ICT). It contains a
multitude of ICT terms, including elaborate and
easy-to-understand definitions. - Via PC Webopedia(http//www.pcwebopedia.com),you
can search ICT encyclopedias on line. - webopedia.JPG
37General monolingual dictionaries
- Merriam-Webster Collegiate Dictionary is
available online. - Via Merriam-Webster(http//www.m-w.com), you can
search either the dictionary or the thesaurus. - Simply enter your search terms, and click the
search button. - M-W.JPG
38General multilingual dictionaries
- One-Look Dictionary is a search platform allowing
you to search simultaneously about 600 word
lists, glossaries, dictionaries and databases. - Via OneLook(http//www.onelook.com), you can
search the online multilingual dictionaries.
one-look.JPG - Via ????(http//cb.kingsoft.com), you can
search Chinese-English or English-Chinese
dictionaries online. kingsoft.JPG
39Multilingual Terminology databases
- Via Termite(http//www.itu.int), you can search
online multilingual terminology database of
International Telecommunication Union (ITU).
itu.JPG - Via Eurodicautom(http//europa.eu.int/eurodicautom
), you can search the online EU multilingual
terminology database. In 1999, it contained 5.5
million entries and 180,000 abbreviations in the
EUs 11 official languages. eurodautom.JPG
40Newspaper and magazine archives
- Via Spanish newspaper ABC (http//www.abc.es),
German newspaper Die Welt (http//www.welt.de),
American magazine Newsweek (http//www.newsweek.co
m), you can search the related background
information for your translation. - abc.JPG
- welt. JPG
- Newsweek. JPG
41Translation Resources on CD-ROM
- Translation resources on CD-ROMs are offline
language resources. CD-ROM can offer information
offline. - CD-ROMs are able to store vast amounts of data
(in general around 650 Mb), They are highly
suitable for storing multimedia information or
huge amounts of textual data. The contents of the
32-volume edition of the Encyclopaedia Britannica
can be stored on a single CD-ROM. - CD-ROMs are fairly cheap to produce.
- Multimedia ability graphics, audio and video
sequences are easily integrated. - The use of hyperlinks allows for effective
networking of entries (for cross-references,
synonyms, etc)
42Encyclopedia on CD-ROM
- Encyclopedia of China on CD-ROM?
- encyclop.jpg
- Britannica Concise Encyclopedia on CD-ROM?
- concise-encyclo.JPG
43Specialized encyclopedia on CD-ROM
- Construction Installation Encyclopedia?
- constrution.JPG
44General dictionaries on CD-ROM
- Oxford English Dictionary (OED) on CD-ROM?
OED-3.JPG - Bibliorom Larouse on CD-ROM?larouse.JPG
45Electronic dictionaries in palm
- Children talking dictionary and spell corrector
- Various English dictionaries
- Bilingual dictionaries
- Franklin. JPG
46Computer-assisted terminology management
- Professional translation is mostly technical
translation. - A technical translator is forced to keep up with
the many fast changes that are taking place in
the fields of information technology,
manufacturing, business, medicine, biotechnology
,etc. - It would be unrealistic to expect a translator to
be a nature expert in all these fields. - But the translator must to be an expert in
quickly finding the information that he is
lacking. - Search for terminology can take up to 75 of a
translators time). - Terminology management is a general term for the
documentation, storage, manipulation and
presentation of specialized vocabulary. It can
help the translator to resolve the problems of
terminology.
47Trados MultiTerm
- Main functions of Trados MultiTerm(http//trados.
com) - Creating a new terminology database entry.
- Importing terminology data.
- Importing data from a word processor
- Retrieving terminology data
- Exporting terminology data
- Creating word lists, glossaries or dictionaries.
- Distributing terminology data
- Exporting data via WinWord
- Exchanging data between a word-processor and
MultiTerm - trados.JPG
- trados-term.JPG
48Corpora as translation tools
- Corpus constitutes the raw textual material for
various forms of linguistic analysis. The
parallel corpus can help translator to compare
the source language and target language. - Using corpora to check the acceptability of
translation text. - Using Internet documents to Create a corpus.
- Retrieving data from your corpus with
WordSmith(http//www1.oup.com/elt/catalogue/Multim
edia/Wordsmith). - Creating the wordlist.
- Concordance shows the occurrence of a given
search term in its textual context. - Finding the keywords from a short article.
wordsmith.JPG - Using Alta Vista Personal(http//altavista.com)to
index and search local documents. altvista.JPG
49Bible bilingual parallel corpus
- Following is the segments of bible
corpus(http//www.o-bible.com/b5/int.html) - 11hb5 ? ? ? ? ? ? ? ?
- kjv In the beginning God created the
heaven and the earth. - bbe At the first God made the heaven and
the earth. - 12hb5 ? ? ? ? ? ? . ? ? ? ? . ? ? ? ? ? ? ?
? ? ? - kjv And the earth was without
form, and void and darkness was upon the face of
the deep. And the Spirit of God moved upon the
face of the waters. - bbe And the earth was waste and
without form and it was dark on the face of the
deep and the Spirit of God was moving on the
face of the waters. - 13hb5 ? ? ? ? ? ? ? ? ? ? ? ?
- kjv And God said, Let there be
light and there was light. - bbe And God said, Let there be
light and there was light. - 14hb5 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
- kjv And God saw the light, that it
was good and God divided the light from the
darkness. - bbe And God, looking on the light,
saw that it was good and God made a division
between the light and the dark, - 15hb5 ? ? ? ? ? ? ? ? ? ? . ? ? ? ? ? ? ? ?
? ? ? ? ? ? - kjv And God called the light Day,
and the darkness he called Night. And the evening
and the morning were the first day. - bbe Naming the light, Day, and the
dark, Night. And there was evening and there was
morning, the first day.
50Translation memories localization tools
- Since many products are based on previously
existing products, the corresponding
documentation is also based on prior
documentation. - Research has shown that 50 or more of the
elements in a text can be repeated in the same
text. - If those elelments have been translated
previously, it will be useful for translators to
be able to recycle that prior work. - Translation Memories (TMs) recycle existing
translations so as to reduce time and costs as
well as improve quality and consistency.
51Three categories of search results in TMs
- Perfect or exact match The translation unit
found in the database corresponds exactly to the
new source text element (100 match). - Full match The translation unit found in the
database is identical to a stored translation
unit with the exception of variable elements such
as dates, numbers, time, measurements, etc. - Fuzzy match All other matches that do not match
an existing segment exactly but range within a
user-defined minimum match value (e.g. 75) are
fuzzy matches. The sentence match with the
highest degree of similarity is displayed first.
All other matches with a lower degree of
similarity are added to a match list which can be
accessed by the user. - If no match is found, the sentence has to be
translated manually. The new translation is
stored in the database.
52Benefits of using TMs
- With a Translation memory system, the level of
benefits is proportional to the degree of
repetition in the document. - The use of TM can result in enormous savings,
both for the client and the translator or
translation agency. - Increase in income.
- Elimination of repetitive translation tasks.
- Consistency.
53Translation memory of TRADOS
- Translators Workbench(http//www.trados.com)
- trados-TM.JPG
- SDL-Trados 2006 freelance
- SDL-Trados 2006 professional
54Software localization tools
- Localization is the process of adapting a product
to the specific situation of its target market.
This includes not only translating the texts
accompanying the product but also adapting to the
cultural norms of the local market. - Corel Catalyst (http//alchemysoftware.ie).
catalyst.JPG - Passolo (pass software localizer)
(http//www.passolo.com). passolo.JPG
55Computer-Aided Translation(Computer-Assisted
Translation)
- Computerized systems responsible for the
production from one natural language to another,
with or without human assistance. The central
core of MT itself is the automation of the full
translation process (Hutchins Somers, 1992) - According to the degree of automation or the
degree of human involvement in the translation
process, we use following terms - FAHQT Fully Automatic High-Quality Translation
- FAMT Fully Automatic Machine Translation
- HAMT Human-Aided Machine Translation
- MAHT Machine-Aided Human Translation
56FAHQT FAMT
- FAHQT, based on the idea that MT systems were
capable of producing translations of quality
comparable to that of human translators, was
abandoned. - FAMT is possible, its output of translation is a
raw translation. But the quality of translation
is not good.
57HAMT
- In HAMT, the source text is decoded and analyzed
by the system, not by the human operator, whose
task consists of assisting in the translation
process. - Human involvement
- Pre-editing
- Interaction
- Post-editing
58Pre-editing
- Preparing the source text in order to avoid the
problems from the outset. - Avoid idiomatic expression.
- Avoid omitting pronoun before a verb.
- Avoid omitting relative pronouns.
- Breaking up long sentences into shorter ones.
- Keeping to standard, formal English in which
grammatical connection are clearly expressed.
59Human-machine interaction
- The system pauses during the translation process.
- For example,
- when MT can not resolve systematic or semantic
ambiguities in source text analysis. - when it can not decide on one target language
equivalent or the other. - Errors can be avoided in the analysis stage.
60Post-editing
- Correction of the target text which generated by
the MT system. - Whether post-editing is conducted, and to what
extent, largely depends on the quality required
by the user.
61MAHT
- MAHT includes the use of aids such as electronic
dictionaries, terminology database, translation
memory system, and other electronic tools. - In contract with FAMT and HAMT, in MAHT, the
decoding and analysis of the source text lies in
the hands of the translator. - CAT are sometimes used to cover both HAMT and
MAHT.
62????
- 1. ???,??????M,??????????,2004??
- 2. ???,??????? ???????????M,???????,2003??
- 3. D. Jurafsky,J. Martin,????????(???????)M,????
???,2005?? - 4. F. Austermuehl, Electronic Tools for
Translators, St. Jerome Publishing, 2001.
63Thank you!