The Theory and Practice of PseudoTranslation - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

The Theory and Practice of PseudoTranslation

Description:

Introduce Pseudo-Translation, its uses and several methods ... Babelfish, for example. Produces texts that are (usually, mostly, sorta) in the target language. ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 27
Provided by: Yah992
Category:

less

Transcript and Presenter's Notes

Title: The Theory and Practice of PseudoTranslation


1
The Theory and Practice of Pseudo-Translation
2
About this Presentation
  • Addison Phillips
  • Internationalization Architect, Yahoo!
  • Introduce Pseudo-Translation, its uses and
    several methods of implementation.

3
Whats Pseudo-Translation?
  • Convert text (typically English ASCII text) to
    non-random non-ASCII.
  • Typically to test a product for localizability
    before actual translations are ready.

key, display string dialogTitle, Dialog
Title aMessage, This is a message.
4
What is it good for?
  • Testing localizability
  • makes hard-coded strings visible
  • makes problems with resources and resource
    formats visible
  • makes layout problems emanating from
    length/height/font changes visible
  • Does anybody really do that any more?

5
Testing Software
  • Its also good for functional QA
  • generate data in non-ASCII encodings
  • generate non-ASCII data
  • generate test cases from ASCII

6
Your Mission
  • Pseudo-translate text to support localizability
    and other test purposes using general purpose
    code.
  • Easy to deploy and customize
  • Support functional, encoding, and localizability
    testing

7
Anatomy of a Pseudo-Translation
Source String
DNT Identifier
  • MSG012Translate me.
  • _at__at__at_Tràñslátè mé.

Translated String
Pre-pend
Post-pend
8
More Anatomy
Identifier Inclusion
Boundary Marking
  • MSG012Traanslaatee Mee

Vowel/Letter Doubling
Inclusion
9
Interesting Boundary-Marking Patterns
  • This is a concatenated string.
  • This is a message-formatted string.

10
Types of Pseudo-Translation
  • Algorithmic Translation
  • Maps code points using ASCII Math
  • Mapped Translation
  • Maps code points using a table
  • Random Translation
  • Maps code points at random
  • Simulated Encoding
  • Generate encoding-based test string
  • Auto-magic Translation
  • See graphic
  • More?

11
Algorithmic
  • The most common example zenkaku ASCII
  • abcd abcd
  • U0061 U0062 U0063 U0064
  • UFF41 UFF42 UFF43 UFF44
  • Add 0xFEE0 to ASCII!
  • Produces strings that are visibly ASCII but
    structurally multi-byte (in SJIS or UTF-8 for
    example)
  • Produces characters that are kind of chubby,
    tests Asian font support, and stresses
    layout/spacing

12
Mapped
  • Convert characters using a mapping table
  • e - é
  • Produces strings that are non-ASCII and exercise
    casing, sorting, encoding, display, and other
    capabilities of the software, but are still
    recognizably English
  • Often used with vowel doubling or pre/post-fixing
    to simulate text swell.

13
Mapping Rotation
  • Single character-character mappings produce
    predictable results. Same string
    pseudo-translates the same way every time.
  • Character rotation replaces one character with
    different characters so that the same string is
    rarely the same twice.
  • Exercises assumptions in the code about strings
    having to match.

14
Rotation Frequency
  • Using variable length replacement lists helps
    ensure pseudo-strings arent the same
  • Do more than just the vowels. Do consonants too!

15
Random
  • Replace one character with a random character
  • Exercises a larger range of Unicode or the target
    encoding.
  • But produces strings that are never legible.

16
Simulated Encoding
  • QA projects frequently require globs of text that
    exercise a particular character encoding.
  • You could download some from the Web but how to
    you know if the encoding is correct
    through-and-through? How can you exercise the
    full range of the encoding, including rare
    characters?
  • Replace an ASCII text buffer with a similarly
    sized and structured text in a given encoding (or
    using a given character repertoire).

17
Auto-Magical
  • Use a machine translation Web service or tool to
    make text in a given language.
  • Babelfish, for example
  • Produces texts that are (usually, mostly, sorta)
    in the target language.

18
Funkadelic
19
An Approach to Pseudo-Translation
  • I used Java for this.
  • Three main components
  • Expose text for translation
  • Perform translations
  • Manage translation process
  • Probably could have used ICU Transliterator
    classes for this but probably wouldnt have
    learned as much

20
Expose Text
  • ITextProvider
  • An interface exposes only the translatable bits
    of text. (Not every character in your
    properties/JSON/XLIFF/message catalog should be
    pseudo-translated).
  • Implementers extend this interface for their
    specific file format or input source.
  • So I dont have to provide a parser for every
    format.
  • Provides for segmentation of text into
    translation units

21
Manage Translation
  • TranslatorProvider
  • Implemented as a Service Provider Interface
    (SPI)
  • Loads the various translation classes, including
    those developed by users separate from the
    pseudo-translation library
  • Provides a consistent interface to the
    translation process
  • and I dont have to provide every possible
    Translator class

22
Translate Stuff
  • Translator
  • an abstract class that defines the translation
    process.
  • AbstractPseudoTranslator
  • a convenience class that adds specific features
    unique to pseudo-translation, such as pre- and
    postpending strings.

23
Why SPIs?
  • Provide users with the ability to create their
    own classes and integrate without a re-compile.
  • Who do I mean by users?
  • Mostly QA folks. Integration of
    Pseudo-translation with automated test harnesses.

24
Summary
  • And a couple of Live Demos, since I dont learn
    quickly

http//images.search.yahoo.com/search/images?pliv
edemoseiUTF-8frsfpxwrt
25
PseudoTranslator Demo
26
QA
  • Presentation available (soon) at
  • http//www.inter-locale.com
Write a Comment
User Comments (0)
About PowerShow.com