Challenges of web internationalization - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Challenges of web internationalization

Description:

Key International goals for businesses. Challenges of global UI design ... Imagery, Symbolism. Use of Color. Layout. Media (Audio, Video, etc.) 6. Date Formats. e md, ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 36
Provided by: texte
Category:

less

Transcript and Presenter's Notes

Title: Challenges of web internationalization


1
Challenges of web internationalization
  • Badi Kumar S
  • Yahoo! Inc.

2
Objectives
  • Key International goals for businesses
  • Challenges of global UI design
  • Challenges related to data formats
  • Standard Unicode Character Set
  • Open Source i18n library ICU

3
Key International Goals
  • Business Goals Growth, Growth, Growth
  • Reduce development, test, and integration costs
  • (Re)Use more components globally, reliably
  • Shorten time to market
  • Minimize changes for each language/region
  • Simultaneous worldwide shipment (SimShip)
  • Efficient, cost-effective localization
  • (Localization cost) times (Regional Markets)
  • Quality

4
How Do We Get There?
  • Successful Publishing Around The World
  • Desirable content
  • Consistent with local laws
  • Compatible with local infrastructure
  • Acceptable pricing models
  • Desirable delivery formats (UI)
  • Native language(s), Local customs

5
Global User Interface Design
  • Language
  • Data Format/Presentation
  • Imagery, Symbolism
  • Use of Color
  • Layout
  • Media (Audio, Video, etc.)

6
Date Formats
7
Supporting International Formats
0000 0100 0000 0000 (0x0400)
1.024
1,024
Value 0x0400 Style European result
Format(Value, style)
Value 0x0400 Style American result
Format(Value, style)
8
Definitions
I18n
  • Internationalization
  • To design and develop an application
  • without built-in cultural assumptions
  • that is efficient to localize
  • Localization
  • To tailor an application to meet the needs of a
    particular region, market, or culture

L10n
9
Definition of terms
  • Translation - To render an application into
    another language
  • Globalization
  • Companies participating in the global economy,
    establishing themselves in foreign markets
  • Adapting products and services to end-users
    cultural and linguistic requirements
  • I18N L10N

Y4!
G11N
10
Data Validation
  • Phone Numbers
  • USA 1 (781) 789-1898
  • France 33.1.6172.8041
  • Number of digits is not fixed
  • Identifiers may use international text
  • Postal codes, License plates, et al.

Courtesy License Plates of the World
www.worldlicenseplates.com
11
Data Validation
  • Validation logic chosen dynamically
  • (Based on intl, user preference, etc.)

/ choose validation logic / if (intl jp)
then validator JapaneseDateFormat if (intl
uk) then validator EnglishDateFormat if
(intl fr) then validator EuropeanDateFormat
result ValidateData(input, validator)
12
Titles and Addresses
United Kingdom Mr. Badi Kumar Yahoo! Inc 210
Bath Road Slough, Berkshire England SL1 3XE
Japan Japan 104-0032 Tokyo Chuo-Ku Hacchoubori
3-11-12 Taiki Building Yahoo! KK Tanaka-san
Country
Title
Not only layout, but tab order has to change! And
watch out for exit validation of country!
13
Sorting
  • English ABC...RSTUVWXYZ
  • German AÄB...NOÖ...SßTUÜVYZ
  • Swedish/Finnish AB...STUVWXYZÅÄÖ
  • Norwegian AB...VWXYÜZÆØÅ
  • Note Y Ü
  • Spanish ch sorts between c d
  • Color, Charlar, Dar

14
Text Processing
  • Sorting
  • Line Wrapping
  • Word Breaking, Hyphenization
  • Capitalization
  • Quotes
  • Styles (Bold, Italic, Ruby, Amikake)
  • Writing direction (LTR, RTL, Vertical)

15
Internationalization Libraries
  • Avoid implementing regional formats!
  • IBM ICU, Basis Tech. Rosette
  • Native OS or Program Language
  • Windows, Java API
  • Posix Locales, Formatters

16
Summary
  • Encapsulate data parsing/formatting and use
    internationalized API locale
  • Complex data (e.g. address) changes positions,
    fields, size and tab order
  • Color should be a cue not sole indicator
  • Graphics, audio, video may change
  • Text display separate from graphics
  • Generalize and abstract for global use

17
Arent these problems solved already?
  • Yes! Partly by the open source library called ICU.

18
Unicode Character Set
Example Unicode Characters
19
Unicode Character Standard
  • Developed by the Unicode Consortium
  • www.unicode.org
  • Covers all major living scripts
  • Version 4.0 has 96,000 characters
  • Capacity for 1 million characters
  • Unicode Character Set ISO 10646
  • Unicode adds character properties and algorithms
  • ISO and Unicode work together to synchronize
  • ISO support enhances international acceptance

20
Unicode Worldwide, Multilingual
  • 17 Planes of 64K
  • 0-10FFFF, 21 Bits
  • Basic Multilingual Plane (BMP)
  • Common characters
  • 1st Supplementary Plane
  • archaic, fictional characters
  • 2nd Supplementary Plane
  • Ideographs

21
17 Planes of 64K
22
Unicode Character Set
  • Organized by scripts into blocks

23
Unicode Is Generative
  • Composition can create new characters
  • Base non-spacing (combining) character(s)
  • A Å
  • U0041 U030A U00C5
  • a . ?
  • U0061 U0302 U0323 U1EAD
  • a . ?
  • U0061 U0323 U0302 U1EAD
  • Note Unicode notation is Uhhhh

24
Unicode Characteristics
  • Multilingual
  • All scripts/languages, one character set
  • Character Properties
  • Case, digit, alpha/letter/ideogram, directional
    class, mirroring, combining class, etc. provided
    by Unicode
  • Logical order for bidirectional languages
  • Round Trip Conversion To Legacy Encodings
  • Byte Order Mark (BOM)
  • Big vs. Little endian and encoding identifier

25
Unicode Characteristics
  • 3 equivalent forms
  • UTF-8 8-bit variable width, multi-byte (max. 4)
  • UTF-16 16-bit, variable width, surrogates (max
    2)
  • UTF-32 32-bit, fixed width (max 1)
  • UCS-2 is old terminology, dont use.
  • Design avoids multi-byte performance problems
  • Algorithm specifications provide interoperability
  • Allows one binary program image to be used
    worldwide
  • Developers do not need to be linguists to
    implement

26
Storage and Serialization Formats
  • UTF-32
  • 32 bits per character
  • One unit per character
  • Unicode only goes to 10FFFF (21 bits)
  • UTF-16
  • 16 bits per code unit
  • Can use two surrogate values ie two code units
    per character

27
Properties of UTF-8
  • Transforms Unicode to sequences of octets
  • ASCII-compatible (Characters 0-127)
  • Non-ASCII characters are either 2, 3 or 4 bytes
  • European generally 2, CJK generally 3, higher
    planes 4.
  • Result
  • Algorithms searching for ASCII characters(e.g.,
    / \ lt gt ? - a b c d etc.) work correctly
  • String length is not greatly increased
  • All of Unicode supported

28
Choosing a UTF
  • UTF-8
  • Good choice for migrating legacy software and
    file formats (ASCII compatibility, multi-byte
    encoding)
  • Best storage form for European languages
  • UTF-16
  • More efficient for sorting, processing
  • Best storage for Asian languages
  • Requires wide character datatypes
  • Good choice for new implementations
  • UTF-32 -efficient processing, wastes memory

29
Unicode Does Not Equal Internationalization
  • Unicode simplifies development
  • Single source code
  • Enables multilingual processing
  • Properties reduce research for each language
  • Unicode does not fix all internationalization
  • E.g. Date, time, number and other formats
  • Linguistic processing can require additional
    algorithms, data (e.g. word breaking)
  • Continue identify, support cultural requirements
  • Conversion to native encodings for interface to
    legacy software, systems can impose limitations

30
Summary Unicode
  • Well-supported, ubiquitous, and often required in
    integrated environments.
  • Simplifies working with many languages
  • Large character set requires consideration
  • Requires removing assumptions that 1 character is
    1 byte or word.

31
Developing Software For The World- Summary
  • Gather international requirements.
  • Design Internationalization In Early
  • Use Global images, widgets, etc. where possible.
    Plan for localization elsewhere.
  • Test international data early (pseudo-localize.)
    Involve international testers.
  • Maximize locale-independence.
  • Use Unicode.

32
International components for Unicode (ICU)
  • Why ICU?
  • Open source
  • Flexible
  • Portable foundation
  • In sync with the standards including Unicode and
    CLDR. UnicodeString
  • Minimizes cost
  • Solves most of the problems related to i18n
  • Comprehensive functionalities for globalization
    requirements

33
ICU Features
  • Text Unicode text handling, full character
    properties and character set conversions (500
    code pages)
  • Analysis Unicode regular expressions full
    Unicode sets character, word and line boundaries
  • Comparison language sensitive collation and
    searching

34
ICU Features contd
  • Transformations normalization, upper/lowercase,
    script transliterations (50 pairs)
  • Locales comprehensive data (230) resource
    bundle architecture
  • Complex Text Layout Arabic, Hebrew, Indic and
    Thai
  • Formatting and Parsing multi-calendar and time
    zone,dates, times, numbers, currencies, messages

35
Questions
?
Write a Comment
User Comments (0)
About PowerShow.com