Open standards in use in localisation an engineering approach - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Open standards in use in localisation an engineering approach

Description:

Increase competence, focused on features (not compatibility) ... XHTML, DHTML (HTML CSS Scrpting DOM),... XML based standards: DITA, S1000D, TMX, TBX, XLIFF... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 58
Provided by: tektransla
Category:

less

Transcript and Presenter's Notes

Title: Open standards in use in localisation an engineering approach


1
Open standards in use in localisation - an
engineering approach
Andrés Vega, DCU, Dublin, Ireland 12th June
2009
2
Agenda
  • Introduction Why Standards?
  • Part 1 Unicode and OpenType Fonts
  • Part 2 XML, CMS and DITA
  • Part 3 TMX, XLIFF, TBX and SRX
  • Final thoughts and QA
  • About the author and Tek

3
Why Standards?
  • Allow faster technology development
  • Assembling standard components
  • Concentrating effort on specialisation
  • Increase competence, focused on features (not
    compatibility)
  • Facilitate inter-operability
  • Open standards allow information to be shared
  • (Not locked on proprietary standards)
  • Complementary tools may be developed
  • Choose tool/resource for each job
  • Guarantee future compatibility
  • Provide conformance validation mechanisms
  • Standard verification serves as QA procedure

4
Part 1 Encodings and Unicode
  • Terminology
  • Pre-Unicode Encodings (ASCII, ANSI, Multibyte)
  • Unicode
  • Unicode Workflow example
  • Unicode Transition issues (FrameMaker)
  • Unicode Transition issues (QuarkXpress)
  • OpenType fonts

5
Terminology
  • Coded Character Set
  • (Set of characters associated with codes)
  • Defined in RFC 2978
  • Code point (number associated with a
    character)
  • Encoding / Charset
  • (Coded character set with a character
    encoding scheme)
  • Character mapping
  • (Relation between code points of two
    different encodings)
  • Alias (Alternate name for an encoding)

6
Chronology
  • Proprietary Encodings (Manufacturer dependant)
  • ASCII (ANSI X3.4 1968, 7-bit encoding)
  • ASCII national variants ISO 696
  • MS-DOS code pages (1980, 8-bit encoding)
  • Doublebyte and Multibyte encodings
  • ISO-8859-n many 8-bit encodings defined by ISO
  • Windows CPs
  • Unicode

7
ASCII -gt ANSI
  • ASCII (American Standard Code for Information
    Interchange) 128 characters, US English
    only
  • Positions 0 - 31 and 127 reserved for control
    characters. They have standardized names and
    descriptions, but usage varies.
  • American English characters range from 32
    (space) to 126 (tilde ).
  • There are several national variants of ASCII
    (only 128 characters). In such variants, some
    special characters have been replaced by national
    letters (and other symbols).
  • Positions 128 - 255 are not used in ASCII. They
    belong to ANSI
  • ANSI codepages extend ASCII character set to
    give support to specific languages/scripts. There
    are five main groups
  • Windows CPs (and also old MS-DOS CPs)
  • Mac CPs
  • ISO-8859-n CPs (n1 Latin-1 to n16)
  • Other ASCII compatible CPs (KOI-8, ASMO, )
  • IBM EBCDIC

8
ASCII National Variants ISO 646
9
Visual comparison Western codepages
  • ASCII (ANSI_X3.4-1968) Windows western
    (CP1252) ISO Latin1 (8859-1) UNIX
  • EBCDIC (Western 500V1) Mac Roman
  • New Line
  • Unix LF (0A)
  • Mac CR (0D)
  • Win/DOSCRLF

10
Examples with codepoints

11
Then came Unicode
  • Challenges
  • Too Many Character sets
  • Three great families (ANSI, DBCS, BiDi) three
    application types
  • Multilingual data (storage, display, processing)
  • Cross-platform and character set
    inter-conversion issues
  • Information loss WROC?AW
  • WROCLAW ? Fallback WROCLAW
  • (CE text within ASCII) Cross-Platform WROCAW
    (Mac)
  • Misreading WROCxW (Trad Ch)
  • What Unicode is
  • Universal character encoding standard by the
    Unicode Consortium
  • 21-bit character set with 3 main encoding forms
    (UTF-32, UTF-16, UTF-8)
  • Not just the character set
  • Character properties (Name, Category, Casing,
    Decomposition, )
  • Annexes, Technical Reports (Comparison,
    Sorting, Hyphenation, )
  • What Unicode is not
  • Glyph repertoire glyphs provided are examples,
    not canonical!
  • Unicode alone does not provide language support!

12
Unicode (Benefits and Issues)
  • Unicode benefits
  • One vendor neutral encoding standard for all
    languages
  • Stable, but it keeps evolving
  • Multilingual rendering/storage/transfer (No
    conversion - No corruption)
  • Unified content processes (Globalized, Web
    enabled)
  • Internationalisation
  • Easy conversion from/to/between legacy codepages
  • Issues or drawbacks with Unicode
  • Size (ANSI 1byte, DBCS 2byte, UTF-8 1-4 byte,
    UTF-16 2-4 byte)
  • UniHan related (Font dependence, Gaiji and
    variants)
  • Inconsistencies on implementation choices across
    scripts
  • Several ways to generate pre-composed characters
  • Implementation issues
  • Script Enabling requires Input, Display,
    Storage, Retrieval, Output
  • Bidirectional support, Complex Scripts issues

13
Unicode encodings
  • Unicode encoding forms examples
  • UTF-16 Little Endian (Less significant byte
    first)
  • ÿþT h i s i s U n i c o d e t e x t
  • FFFE540068006900730020006900730020001C2055006E006
    90063006F0064006500200074006500780074001D20
  • UTF-16 Big Endian (Most significant byte first)
  • þÿ T h i s i s U n i c o d e t e x t
  • FEFF00540068006900730020006900730020201C0055006E0
    0690063006F0064006500200074006500780074201D
  • UTF-8 (byte-based encoding, uses 1, 2, 3, or 4
    bytes)
  • ïThis is âUnicode textâ
  • EFBBBF5468697320697320E2809C556E69636F64652074657
    874E2809D
  • BOM (Byte Order Mark) Character UFEFF
  • UTF-16LE FFFE (required)
  • UTF-16BE FEFF (required)
  • UTF-8 EFBBBF (can be omitted)

14
Unicode streamlines workflows
  • Pre-Unicode Workflow (FrameMaker 7)
  • Character corruption risks in all orange (middle
    3 groups) steps
  • Final document presents issues in TOC and index
    generation and in searches
  • Unicode Workflow (FrameMaker 8)

Back Conversion
File Preparation
Translation Review
DTP and Merge
Files to localize
  • Western RTF and fonts
  • CE RTF and fonts
  • Cyrillic RTF and fonts
  • Turkish RTF and fonts
  • Greek RTF and fonts
  • Baltic RTF and fonts
  • FM (Design font)
  • FM (CE font)
  • FM (Cyrillic font)
  • FM (Turkish font)
  • FM (Greek font)
  • FM (Baltic font)

English FrameMaker With Design Fonts
  • Multilingual
  • Target
  • Document
  • With several
  • ANSI fonts
  • Western RTF
  • CE RTF
  • Cyrillic RTF
  • Turkish RTF
  • Greek RTF
  • Baltic RTF
  • UTF-8 FM with original design fonts

English FrameMaker Design Fonts
Multilingual Document Design Fonts
  • UTF-8 XML

UTF-16 TTX and fonts
15
Example 1 ANSI codepages RTF issues
  • Trados saves Doc files as RTF files before
    processing
  • Word .Doc segmented file Apie i vadova
    (correct Lithuanian)
  • .RTF saved on English PC Apie ðá vadovà
  • Header \rtf1\adeflang1025\ansi\ansicpg1252\uc1
  • \adeff0\deff0\stshfdbch13\stshfloch0\stshfhich0\s
    tshfbi0\deflang1033\deflangfe1042
  • \fonttbl\f0\froman\fcharset0\fprq2\\panose
    02020603050405020304Times New Roman
  • \f1\fswiss\fcharset0\fprq2\\panose
    020b0604020202020204Arial
  • ()
  • 100\gt\rtlch\fcs1 \af0 \ltrch\fcs0
    \cf1\lang1063\langfe1033\loch\af1015
  • \hich\af1015\dbch\af0\langnp1063\insrsid13789330\
    charrsid337550 Apie \'f0\'e1 vadov\'e0
  • .RTF saved on Baltic PC Apie i vadova
  • Header \rtf1\ansi\ansicpg1252\uc1
    \deff0\deflang1033\deflangfe1033\fonttbl
  • \f0\froman\fcharset186\fprq2\\panose
    02020603050405020304Times New Roman
  • \f1\fswiss\fcharset186\fprq2\\panose
    020b0604020202020204Arial ()
  • ()
  • 100\gt\cf1\lang1063\langfe1033\loch\af2462\hich
    \af2462\dbch\af0\langnp1063
  • Apie \'f0\'e1 vadov\'e0

16
Unicode simplifies processes
Hardware VPN for Mac OSX Japanese and Mac OS9
Chinese
Client
TEK
Software VPN
VPN client
Internet
PC Setup (Quark 7 OpenType)
STEP Server
Router
Specific setup for Chinese and Japanese. Removed
after Quark migration to Unicode
Hardware VPN


STEPXpress
STEPXpress
Mac OS9 Chinese
Mac OSX Japanese
17
Example 2 Quark western PC mapping issue
  • Quark 6 imported Polish text on PC File
  • Displays OK after CE font is applied (lower half)

18
Example 2 Quark western PC mapping issue
  • But when opened on a Mac
  • Extended characters are read as if they were
    Windows Western.
  • Some can be mapped to Mac Roman , but they
    do not have
  • the same corresponding CE character on the Mac
    Latin II encoding.
  • Other characters cannot be mapped, are replaced
    by fallbacks

19
Unicode transition issues
  • Transition issues
  • Mixed content legacy and UTF8 (FrameMaker)
  • FM7 FM8 update Import old
    corrupted Filter version
    English seen OK vars template
    variables corrupts ANSI
  • Localisation tools, filters, etc not fully
    adapted or tested
  • Example Style names containing extended
    characters
  • New filter for FrameMaker 8 English names
    are OK (UTF-8 ASCII)
  • German designed file Filter does
    not accept UTF-8 Style names
  • Backwards conversions Unicode version saved as
    non-Unicode version

UTF-8 Content ANSI Variables ANSI Template
ANSI Content ANSI Variables ANSI Template
UTF-8 Content ANSI Variables ANSI Template
UTF-8 Content Corrupt Vars ANSI Template
TTX
20
Example 3 Trados 7 TM imported in Trados 6
  • Trados 7 export is UTF-8,
  • but Trados 6 does not
  • recognize it and
  • imports it as ANSI
  • Issue seen as UTF-8
  • Issue seen as ANSI

21
OpenType fonts
  • Challenges
  • Two font families (TrueType and PostScript),
    two font technologies
  • Inter-platform issues
  • Benefits of Open Type
  • Support large character sets (Unicode,
    multiscript)
  • Glyph variants supported Solves Unicode UniHan
    ambiguities
  • Supports advanced typography
  • Font embedding control
  • Features
  • Contain both TrueType and PostScript outline
    data
  • Glyph substitution
  • Glyph positioning
  • Script and language information

22
Part 2 XML and CMS
  • Markup languages
  • XML
  • CMS
  • DITA

23
Markup languages SGML, HTML, XML
  • Markup text
  • Plain text tags
  • Tags define the structure, layout and/or
    formatting of the text
  • Markup languages timeline
  • GML IBM 1978
  • SGML ISO standard 1985 (Meta-language to create
    markup languages)
  • HTML Hyper Text Markup Language (Hyper text
    Links)
  • - Derivated from SGML 1980-90
  • - HTML 2.0 First proper HTML specification
    (1995)
  • XML eXtensible Markup Language (1998)
  • Other markup languages
  • XHTML, DHTML (HTMLCSSScrptingDOM),...
  • XML based standards DITA, S1000D, TMX, TBX,
    XLIFF...
  • Other RTF, MIF, DocBook, TeX,...

24
HTML vs XML Visual comparison of markup
  • HTML
  • Declaration Does not exist
  • Doctype HTML or none
  • Elements
  • HTML element can be ommited
  • Defined by pairs of start-end tags. Some tags
    may not have closing couple (ltpgt, lthrgt)
  • Names are case insensitive
  • Tag pairs can be interwoven
  • ltbgtltIgtbold and italiclt/Bgtlt/igt
  • Attributes
  • Names and sometimes values already defined by
    standard
  • Quotes around values are optional
  • XML
  • Declaration Required
  • Doctype May link to a DTD or Schema
  • Elements
  • Only one root element and its required
  • Al tags must be closed (or self-closed)
  • ltLineBreak/gt
  • - Element names are case sensitive
  • - Tags have to be correctly nested
  • ltbgtltigtbold and italiclt/igtlt/bgt
  • Attributes
  • Any names and values can be defined
  • - All attributes must use quotes, single or double

25
HTML example Translators view
Edit view within Trados
TagEditor
26
XML example Translators view
Edit view within Trados
TagEditor
27
XML
  • eXtensible Markup Language (Meta-language for
    markup languages)
  • Used to define, share and validate information
    (data and structure)
  • An XML document contains
  • XML declaration lt?xml version'1.1'
    encoding'UTF-8' standalone'yes'?gt
  • Document Type declaration(s) lt!DOCTYPE root
    SYSTEM rootDTD.dtd" gt
  • Elements ltelement attributevaluegtContentlt/
    elementgt or ltelement/gt
  • Other comments, entities/NCRs, instructions,
    conditional sections
  • Specific Syntax (well-formed XML)
  • Only one root element
  • Tags in nested open/close pairs lttaggt lt/taggt
  • Element names obey certain conventions
  • Elements may contain attributes
  • DTD (Valid XML)
  • Defines rules on structure, valid tags and
    attributes and valid data
  • Guarantees reliable data exchange between
    different systems
  • Can be included in each XML, but is normally
    external

28
XML (General benefits)
  • Simple (XML is plain text) but can embed any
    content type
  • Platform independent, Unicode encoded
  • Content is easily validated cross-platform data
    transfer is safer
  • Structured (defines structural relationships
    within data)
  • Open and Extensible well supported standard
  • Metadata and version control capable
  • Format independent
  • Powerful data transformation tools (XSL)
    Multiple outputs

29
XML (Localisation benefits and issues)
  • Localisation benefits
  • Structured Content detached merged (updates
    handling)
  • XML support easily implemented on Localisation
    processes/tools
  • Easy validation versus DTD
  • Extensible XML based localisation standards
    XLIFF, TMX, TBX,...
  • Metadata (source/target version control,
    updates, element status)
  • Format independent
  • Single-sourcing (localized once, published into
    many formats)
  • Source content and formatting changes are not
    inter-dependant
  • Content localisation and proofreading before
    formatting (DTP)
  • Issues
  • Transition needs to be well planned and
    performed
  • Segmentation issues (DTD needs to be
    multilingual aware)
  • Source For more information see page ltxref
    refpagexxxgt
  • Japanese ????ltxref refpagexxxgt??????????

30
Content Management Systems
  • What are Content Management Systems?
  • Sets of tools configured around a data
    repository (database)
  • Designed to manage information in small
    meaningful bits
  • Product based
  • Topic based
  • Information is isolated from format
  • Store localized content layers (as other
    alternative content layers)
  • Provide tools for
  • Consistent content authoring (Style and
    Terminology)
  • Version control
  • Change tracking
  • Workflow capabilities

31
CMS (Benefits)
  • General benefits
  • Granularity (no redundancy)
  • Reuse (content reuse and multi output)
  • Improved Quality and Consistency
  • Single-source and multi-publishing
  • Easy rebranding/reformatting
  • Metadata info and version control
  • Workflow and Automation
  • Localisation benefits
  • Workflow status control features
  • Localisation of updates via content deltas
    improved time-to-market
  • Localisation independent from output format
    (better matching)

32
CMS (Issues)
  • Issues
  • Authoring for reuse (topic model, single-source,
    cross-reference)
  • Segmentation issues
  • LF Chars (0A) No Validation! Segmentation
    issue
  • Localisation readiness
  • CMS must be multilingual enabled (storage, I/O,
    processing)
  • Localisation workflow support
  • Strong version control and version rollback
  • Capability to export up-to-date paired TM
    content
  • Integration with LQA tools
  • Not to increase ROI in the short run (DTP is
    still needed!!)

CMS
Quark Xxxx Xxxx Xxxx xxxx Xxxx xxxx
33
CMS Localisation Workflow
Client
Tek
Client Validators
Select only delta content
Translation (TTX format)
Revision (TTX format)
XML
CMS
Content Validation in Tracked-changes RTF
Prepared for Proofreading (Colour-coded RTF
format)
Insertion of Validation changes (TTX TMs)
XML
XML
Full document in XML
Preprocessing of XML
Layout Consistence Validation in PDF file
Import to FrameMaker
DTP in FrameMaker
Delivery in FrameMaker
34
DITA
  • DITA (Darwin Information Typing Architecture)
  • Topic-based XML framework for writing and
    delivering information
  • Developed by IBM (19992000) to replace the
    complex IBMDoc format
  • Later became a public OASIS standard (2005)
  • Fast implementation on Authoring and Content
    Management
  • DITA model consists of
  • A Document Type Definition (DTD)
  • Specifies base DITA types, their elements and how
    they can be defined
  • (base DITA information types are Topic and
    Concept, Task, Reference).
  • A set of XSLT stylesheets that control the
    output.
  • Writers use them in conjunction with an XML
    processor to convert
  • DITA documents to more usable formats, such as
    HTML or PDF.

35
DITA (Components)
  • DITA topics
  • XML elements that contain the information of each
    information 'topic'. Each topic can consist of a
    concept, a related task with its action steps and
    a set of references to other topics.
  • DITA Maps
  • XML elements that establish hierarchical
    relationships among topics.
  • Relationship Tables
  • XML elements that establish non-hierarchical
    relationships among topics.

36
DITA (Benefits)
  • DITA aims for
  • Reuse Not only of contents, but also of design
    and processes.
  • Content reuse being topic based, each element
    has complete meaning and can be separately
    created and maintained yet it can be combined
    with other topics for different outputs.
  • SingleSourcing as form is separated from
    content.
  • Design reuse allow information sharing while
    making it easy to develop to cover specific
    needs.
  • Processes reuse Uses overrides to inherit all
    basic and intermediate processes and still allow
    for custom processes when needed.
  • Standardization intended to last without major
    reworks.
  • Strongly typed Strong but generic core that can
    be used as a fall back for light implementations.
  • Flexible through specialization Allows to
    create new types based on the core types, thus
    specializations can be defined and implemented
    for specific uses.

37
DITA Example
38
Part 3 Interchange formats
  • TMX
  • XLIFF
  • TBX
  • SRX

39
TMX
  • What is TMX?
  • Translation Memory eXchange
  • Standard by LISA (Localisation Standards
    Industry Association)
  • Provides a standard method for TM data
    description
  • XML-compliant (validated against its TMX DTD)
  • Uses other ISO standards for date, time, lang,
    country
  • Consists of
  • Container format specification
  • Translation unit elements lttugt
  • Optional format description elements (font
    change,...)
  • Subflows (footnotes, index entries)
  • Low-level meta-markup format for segment content
  • Segment element ltseggt

40
TMX (Benefits)
  • Transfer TM assets across tools/vendors
  • Prevents character corruption (Unicode)
  • Provides clients with control over their
    translated assets
  • Non-proprietary and vendor neutral
  • Can be integrated with LQA tools
  • Provides Translators/Vendors with freedom of
    tool choice
  • Specialized tools share TM assets
  • Tools may be outdated, assets will not
  • Facilitates work distribution/outsourcing

41
TMX (Issues)
  • Issues
  • Tag handling issues
  • TMX DTD cannot validate inline codes
  • TMX compliance level varies
  • Segmentation issues
  • Different segmentation rules on different CAT
    tools
  • Sentence based (TM) vs Field based (CMS,
    Database export)
  • Consecuence reduced translation leverage

42
TMX (Examples)
TMX Version 1.4b (exported from Trados 7)
43
TMX (Examples)
Translation unit in TMX 1.1 format from Trados
7 Translation unit in TMX 1.4b format from
Trados 7
44
XLIFF
  • Xml Localisation Inter-exchange File Format
  • Standard by OASIS
  • Tool-neutral XML-based standard localisation
    resource container format
  • To store/transfer/manipulate localizable
    content, context and other info
  • Has Built-in support for CAT tools and related
    standards (TBX, TMX)
  • Features
  • Translation suggestions (TM, Glossary, MT) to
    approve or edit
  • Metadata Translate, notes, context info,
    version
  • Hierarchical data structures
  • Abstraction of formatting and inline codes
  • Structural formatting stored in the skeleton
    file
  • Inline formatting can be dealt with two ways
  • Replaced by g (paired) and x (isolated) tags
    (OpenTag style)
  • Encapsulated into bpt, ept (paired), it or ph
    (isolated) tags

45
XLIFF (Description)
  • Separates localizable and non-localizable
    content
  • Non-localisable Skeleton (separate or embedded)
  • Localizable 'file' Elements with Header
    (metadata) and Body
  • Body can contain 'trans-unit' and 'bin-unit'
    elements
  • Each trans-unit can have
  • lttrans-unit id"abc123" resname"resourceID"
    restype"string" translate"yes"gt
  • unique id, resource id, resource type,
    translate yes/no
  • ltsource xmllang"en-US"gtTranslatable
    content.lt/sourcegt
  • Translatable content source and language
  • lttarget xmllang"es" state"needs-review-trans
    lation"gtTraducción.lt/targetgt
  • Currently validated translation
  • ltalt-trans match-quality"100" tool"TM"gt
    ltsourcegtTranslatable content.lt/sourcegt
    lttarget xmllang"es"gtContenido
    traducible.lt/targetgt lt/alt-transgt
  • alt-trans translation suggestion(s)
  • lt/trans-unitgt (closing tag)

46
XLIFF (Benefits for translation)
  • Benefits For the translation process
  • One common format on which to translate
  • One (or few) translatable document
  • Control on Translatable/Non-translatable content
  • Better information handling (context, notes,
    metadata)
  • Better TM matching due to formatting abstraction
  • Concurrent tool processing visible at review
    stage
  • Support for all localisation phases
  • Supports metrics info on each trans-unit

47
XLIFF (Other Benefits and Drawbacks)
  • Benefits For localisation tool developers
  • Common platform for tool developers to write to
  • Easy adoption of new formats (new filters to
    XLIFF)
  • All generic XML processing benefits
  • Drawbacks
  • Conversion tools needed into XLIFF and back
  • Many XLIFF features are not implemented by most
    tools
  • Segmentation is inherent to XLIFF file
    generation
  • As opposed to tailored tools, WYSIWYG is
    difficult to attain

48
XLIFF Workflow
  • No XLIFF Scenario
  • XLIFF Scenario

Many Formats!
SGML Editor
.mif
.xml
.htm
.rtf
Software Editor
.dll
.rc
.resx
SGML Editor
Many Filters!
XLIFF
.mif
.xml
.htm
.rtf
.dll
.rc
Software Editor
.resx
LQA
49
LISA terminology exchange standard TBX
  • What is TBX?
  • Term Base eXchange standard by LISA
  • XML based, vendor-neutral, open standard
  • Why TBX?
  • Terminology handled using proprietary standards
  • Difficult to share
  • Difficult to develop tools to enhance term
    adherance
  • Glossary format choice linked to translation
    tool
  • Glossary usually mantained by LSP
  • Limited client control

50
TBX (Benefits and Implementation status)
  • Benefits
  • Better control of terminology (source
    consistency)
  • Improved quality
  • Improved consistency
  • Improved terminology control at target
  • Reduced glossarisation effort (localisation
    phase)
  • Master provided with source
  • Allows automated QA checks
  • Platform and tool independent glossaries (global
    consistency)
  • Unify terminology across platforms/formats/vendors
  • Current status
  • TBX Basic (Lighter approach)
  • TBX Checker

51
TBX Example
  • TBX Basic example from LISA

52
LISA segmentation rules standard SRX
  • What is SRX?
  • Segmentation Rules eXchange format
  • Describes how localisation tools segment text
    for processing
  • Benefits
  • Standardises segmentation process (avoid
    segmentation issues)
  • Structure and Elements
  • ltsrxgt root element, contains one of each
    ltheadergt, ltbodygt
  • ltheadergt attrs segmentsubflows, cascade may
    contain ltformathandlegt
  • ltformathandlegt define how to handle boundary
    formatting
  • ltbodygt contains one of each ltlanguagerulesgt,
    ltmaprulesgt
  • ltlanguagerulesgt contains one or more
    ltlanguagerulegt
  • ltlanguagerulesgt contains one or more ltrulegt
  • ltrulegt attrs break contains a pair
    ltbeforebreakgt, ltafterbreakgt
  • ltbeforebreakgt, ltafterbreakgt contain the
    segmentation regular expresions
  • ltmaprulesgt encloses several ltlanguagemapgt
    defining rules precedence

53
SRX example
  • SRX tool within Passolo

54
Final Thoughts
  • Unicode
  • As a rule, use it. If delivery uses other
    encoding, convert at final stage
  • XML
  • Powerful for single-source, multi-output
    requirements
  • CMS
  • Costly. Depends on volume. First consider XML
    model only then migrate
  • DITA
  • Use it if it matches your data model. It will
    reduce migration effort to CMS
  • TMX
  • Use for safe TM tool to tool transfer, specially
    software into doc
  • XLIFF
  • Still not fully implemented. Good alternative
    for Java and Web content.
  • Use it to unify side processes (LQA)
  • TBX
  • Use to exchange glossary info. Good for clients
  • SRX
  • Very much need, but still few implementations.

55
About the Author - Andrés Vega
  • 9 years of experience as a Localisation Engineer
    with Tek Translation International.
  • Specializing in complex project engineering with
    special focus on CMS, encodings and complex
    scripts.
  • Previous work as a programming languages teacher
    OO programming, C and Java.
  • Background in Chemistry and Healthcare.

56
About Tek Multilingual translation and
localisation business solutions designed to meet
the needs of Life Sciences, IT and Manufacturing
  • Since 1961
  • Over 65 languages
  • Expert Resources and Service
  • Located in US, Spain, Brazil, China Ireland, UK,
    Denmark
  • Scalability
  • Simplification and standardisation
  • ISO 90012000 certification
  • Follow-the-sun
  • Solutions-based approach for best business value
  • Tek OneWorld Platform for your language
    industry needs
  • Business Intelligence
  • Language Quality Solutions
  • Open Connectivity, WW Collaboration

57
Q A
Andrés Vega MuñozLocalisation EngineerTek
Translation InternationalEmail av_at_tektrans.com
www.tektrans.com
Write a Comment
User Comments (0)
About PowerShow.com