Coping with Babel - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Coping with Babel

Description:

Never break a linguistically complete text unit over more than one non-inline element: ... Fuzzy matching statistical. Advantages: cost reduction, consistency ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 44

Provided by: gca

Category:

more less

Transcript and Presenter's Notes

Title: Coping with Babel

1
Coping with Babel

How to Localize XML

2
Designing for Localization

L10N - adapting material for target markets
Document design can seriously impact the costs of
translation and localization.
Other language rules can differ significantly
from English.
There are clear dos and donts.
Overriding principle is good XML practice.

3
Entity references

Do not use entity references for word
substitution
ltparagtUse a tool to release the catch.lt/paragt
Cause problems for inflected languages
Cause problems for parsing/translation tools
Use boiler plate text instead

4
Translatable attributes

Avoid using translatable attributes
ltparagtUse a lttool id"a1098" name"claw hammer"gt
to release the CPU retention catch.lt/paragt
Cause problems for inflected languages
Cause extra burden for translators
More to go wrong

5
CDATA sections

Avoid using CDATA sections that may contain
translatable text
lttmplgtlt!CDATAltpgtPlease refer to the ltemgtindex
pagelt/emgt page for further informationlt/pgtgtlt/tmp
lgt
Lose syntactical control
Segmentation problems
How are translation tools to cope?

6
Processing instructions

Avoid Processing Instructions in translatable
text
ltparagtUse a lt?tool name"claw hammer"?gt to
release the CPU retention catch.lt/paragt
Syntactically week
Confuse translation memory operations

7
Infinite Naming Schemes

Avoid the use of infinite naming schemes
ltresources xmllang"en"gt
lterr001gtCannot open file 1.lt/err001gt
lthint001gtHint does file 1 exist.lt/hint001gt
lterr002gtIncorrect value.lt/err002gt
lthint002gtHint Must be between 1 and
2.lt/hint002gt
lterr003gtConnection timeout.lt/err999gt
...
lt/resourcesgt
No clear element definitions

8
Typographical elements

Avoid the use of "typographical" elements
ltparagtltbgtDo not uselt/bgt ltbr/gt type
elements.lt/paragt
Bad XML practice.
Causes problems for translators.
Target language text may be in the opposite order.

9
Do not break sentences

Never break a linguistically complete text unit
over more than one non-inline element
ltparagt
ltlinegtThis text should not belt/linegt
ltlinegtbroken this way the translated text may
well be in a different order.lt/linegt
lt/paragt

10
XML Translation Standards

LISA - Localization Industry Standards
Association http//www.lisa.org
OASIS - Organization for the Advancement of
Structured Information Standards
http//www.oasis-open.org
W3C - World Wide Web Consortium
http//www.w3c.org
OLIF Consortium http//www.olif.net

11
LISA Standards

TMX - Translation Memory Exchange format
http//www.lisa.org/tmx
TBX - Termbase Exchange format
http//www.lisa.org/tbx
SRX - Segmentation Rules Exchange format
http//www.lisa.org/srx
GMX - GILT Metrics Exchange format
http//www.lisa.org/gmx

12
OASIS L10N Standards

XLIFF - XML Localization Interchange File
Format http//www.oasis-open.org/committees/tc_ho
me.php?wg_abbrevxliff
TransWS - Translation Web Services
http//www.oasis-open.org/committees/tc_home.php?w
g_abbrevtrans-ws

13
W3C and OLIF

W3C to start on Localization Directives standard.
OLIF - Open Lexicon Interchange Format
http//www.olif.net

14
xmltm

XML Text Memory
A radical new approach to translating XML
documents

15
Computational Linguistic Methodologies

Machine Translation
Translation Memory
Hybrid Linguistic Inferencing Engines
Terminology

16
Translation memory

Advent in early 1980s
Intermediate format
Alignment
Storage
Leveraged memory
Fuzzy matching statistical
Advantages cost reduction, consistency
Drawbacks proofreading, managing memories
No significant advances in technology

17
XML namespace

Major new feature of XML compared to SGML
Allows the mapping of different ontological
entities onto the same representation
Allows different ways to look at the same data
Namespaces can be made transparent

18
xmltm namespace

Text Memory namespace
Can be mapped onto any XML document
Vertical view of document in terms of text
segments
Can be totally transparent

19
xmltm namespace
Example of the use of namespace in an XML
document
ltdocument xmlnstm"urnxml-Intl-tm" gt lttmtmgt
ltsectiongt ltparagt lttmtegt
lttmtugt Namespace is very flexible.
lt/tmtugt lttmtugt It is very
easy to use. lt/tmtugt lt/tmtegt
lt/paragt
20
xmltm namespace
original document view
tm namespace view
doc
tm
title
te
text
tu
text
section
section
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
para
text
te
sentence
sentence
tu
tu
21
xmltm namespace
original document view
text
tm namespace view
sentence
sentence
tu
te
tu
22
xmltm namespace
original document view
text
ltparagt
Namespace is very simple. It is easy to use.
lt/paragt
tm namespace view
sentence
sentence
tu
te
tu
ltparagt
lttmte ide1gt
lttmtu idu1.1gt
lt/tmtugt
Namespace is very simple.
lttmtu idu1.2gt
lt/tmtugt
It is easy to use.
lt/tmtegt
lt/paragt
23
xmltm Text Memory

Author memory
Maintain memory of source text
Authoring statistics
Authoring tool input
Translation memory
Automatic alignment
Maintain perfect link of source and target text
Reduce translation costs

24
xmltm DOM differencing
Source Document
Updated Source Document
tu id1
tu id1
tu id2
deleted
tu id3
tu id3
tu id4
tu id4
origid5
tu id7
tu id5
modified
tu id6
tu id6
tu id8
new
25
xmltm Author Memory

Namespace aware differencing
Identify changes from the previous version
Unique text unit identifiers are maintained
Modification history
Text units can be loaded into a database
Authoring environment integration

26
xmltm Translation Memory

The tm namespace can be used to create XLIFF
files
Automatic alignment of source and target
languages
Allows for more focused translation matching
Perfect matching
Leveraged matching from document - identical text
Leveraged matching from database
Modified text unit matching
Linguistically enhanced fuzzy matching
Non translatable text unit identification

27
xmltm translation
Translated Document
XLIFF Document
Source Document
trans-unit id1
tu id1
tu id1
tu id2
trans-unit id2
tu id2
tu id3
tu id3
trans-unit id3
tu id4
trans-unit id4
tu id4
trans-unit id5
tu id5
tu id5
tu id6
trans-unit id6
tu id6
28
xmltm translated document
translated document view
translated tm namespace view
doc
tm
title
te
tekst
tu
tekst
section
section
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
para
tekst
te
zdanie
zdanie
tu
tu
29
xmltm perfect alignment
Source Document
Translated Document
Perfect alignment
tu id1
tu id1
tu id2
tu id2
tu id3
tu id3
tu id4
tu id4
tu id5
tu id5
tu id6
tu id6
30
xmltm perfect matching
Matched Target Document
Perfect Matching
Updated Source Document
tu id1
tu id1
tu id2
deleted
tu id3
tu id3
tu id4
tu id4
requires translation
modified
tu id7
tu id7
tu id6
tu id6
requires translation
tu id8
new
tu id8
31
xmltm contextual memory
Source Document
Translated Document
Perfect alignment
tu id1
tu id1
tu id2
tu id2
tu id3
tu id3
tu id4
tu id4
tu id5
tu id5
tu id6
tu id6
32
xmltm leveraged DB memory
Translated Document
Perfect alignment
Source Document
tu id1
tu id1
tu id2
tu id2
tu id3
tu id3
tu id4
tu id4
tu id5
tu id5
tu id6
tu id6
DB
33
xmltm in-document leveraged matching
Perfect Matching
Updated Source Document
Matched Target Document
tu id1
tu id1
tu id2
deleted
tu id3
tu id3
tu id4
tu id4
requires translation
modified
tu id7
tu id7
tu id6
tu id6
requires proofing
leveraged match
tu id8
newsame id3
tu id8
34
xmltm in-document fuzzy matching
Perfect Matching
Updated Source Document
Matched Target Document
tu id1
tu id1
tu id2
deleted
tu id3
tu id3
tu id4
tu id4
requires translation
tu id7
tu id7
modorigid5
fuzzy match
tu id6
tu id6
requires proofing
leveraged match
tu id8
Newsame
tu id8
35
xmltm db leveraged matching
Perfect Matching
Updated Source Document
Matched Target Document
tu id1
tu id1
tu id2
deleted
tu id3
tu id3
tu id4
tu id4
requires translation
tu id7
tu id7
modorigid5
fuzzy match
tu id6
tu id6
requires proofing
doc leveraged match
tu id8
newsame
tu id8
requires proofing
tu id9
tu id9
DB leveraged match
DB
36
xmltm non translatable text
Perfect Matching
Updated Source Document
Matched Target Document
tu id1
tu id1
requires no translation
tu id2
tu id2
non translatable
non trans
tu id3
tu id3
tu id4
tu id4
requires translation
tu id7
tu id7
fuzzy match
tu id6
tu id6
requires proofing
doc leveraged match
tu id8
newsame
tu id8
requires proofing
tu id9
tu id9
DB leveraged match
DB
37
Traditional Translation Scenario
Publishing
Translation
Extracted text
source text
tm process
Prepared text
Translated text
Translate
QA
38
xmltm Translation Scenario
Publishing
leveraged matching
xml source text
Extracted text
Prepared text
tm process
Automatic Process
web interface
Translator
Translate
Web
QA
xml target text
Automatic Process
39
xmltm matching