Going Global: Publishing in Asia with Dynatext and Dynaweb - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Going Global: Publishing in Asia with Dynatext and Dynaweb

Description:

Rosette Library for Unicode Handling. Provides Encoding Conversions. Unicode / Legacy Encodings ... ECM directly uses Rosette encoding conversions ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 31
Provided by: anti2
Category:

less

Transcript and Presenter's Notes

Title: Going Global: Publishing in Asia with Dynatext and Dynaweb


1
Going Global Publishing in Asia with Dynatext
and Dynaweb
Your Logo Here
  • Jason Olson
  • Basis Technology Corporation

2
Motivation Being Multilingual
Motivation
  • Target Population
  • Market Importance

3
Target Population
Motivation
  • 90 Million people speak Korean as their first
    language
  • 1.5 Billion people speak Chinese as their first
    language

4
Why are these Markets Important?
Motivation
  • Significant Growth
  • Your multinational customers are in these markets
  • Your competition is in these markets

5
Challenges of Asian Languages
Challenges
  • Unfamiliar characters, and a lot of them
  • Encoding issues - Multiple encoding schemes
  • SGML issues
  • Testing
  • Dynatext and Dynaweb

6
What are these characters?
Challenges Characters
  • Fear of the unknown (its all Greek to me)

7
What is an Encoding?
Challenges Encoding
  • A mapping between the characters of a script and
    a sequence of numbers.
  • ASCII EBCDIC Basic Latin, numerals, and
    symbols
  • ISO 8859-1 ( Latin-1) Extended Latin (Western
    Europe)
  • ISO 8859-2 ( Latin-2) Extended Latin (Eastern
    Europe)
  • ISO 8859-5 Latin, Cyrillic
  • ISO 8859-6 Latin, Arabic
  • ISO 8859-7 Latin, Greek
  • JIS X 0208-1990 Latin, Greek, Cyrillic,
    Hiragana, Katakana, Kanji
  • KSC 5601-1987 Latin, Greek, Cyrillic, Hiragana,
    Katakana, Hangul, Hanja
  • GB 2312-80 (CP 936) Latin, Greek, Cyrillic,
    Hiragana, Katakana, Hanzi

8
Multiple Encoding Schemes
Challenges Encoding
  • Korean
  • KSC5601
  • Chinese
  • GB2312
  • BIG5
  • CP950
  • Japanese
  • Shift-JIS
  • JIS
  • ISO-2022JP

9
SBCS vs. DBCS vs. MBCS
Challenges Encoding
  • Single Byte per Character
  • ASCII, ISO8859-1-10
  • Two (Double) Bytes per Character
  • Shift-JIS, EUC-KR, GB2312, Big5
  • Multiple Bytes per Character
  • EUC-JP, ISO-2022, UTF-8
  • Lead-byte Trail-byte processing

10
Testing
Challenges Testing
  • Is it right?

11
SGML Issues
Challenges SGML
  • Getting your SGML translated
  • Trail-byte gotchas
  • Style Sheet Restrictions

12
Trail-byte Gotchas
Challenges SGML
Overlap between ASCII and Trail Byte Range
13
Style Sheet Restrictions
Challenges SGML
  • Unrecognized Characters / Symbols
  • Encoding notation

14
The Solution
Solution
  • Technical Side
  • PLS Architecture
  • Rosette
  • KCCPLS
  • Application Side
  • Getting Translated Docs
  • Using the PLS Correctly
  • Testing

15
PLS Architecture
Solution Technical Side
  • INSO and Basis Tech jointly designed the PLS
    architecture
  • PLM - Does Language Specific Processing
  • word-breaking
  • date formats
  • printing
  • ECM - Does Encoding Conversions (Unicode / Legacy)

16
Rosette Library for Unicode Handling
Solution Technical Side
  • Provides Encoding Conversions
  • Unicode / Legacy Encodings
  • Provides Character Classifications and Transforms
  • Cross Platform
  • WinNT, Win95, Macintosh, Unix, Mainframe

17
Marriage of PLS and Rosette
Solution Technical Side
  • The creation of the KCCPLS
  • Internals are completely Unicode
  • ECM directly uses Rosette encoding conversions
  • PLM is built upon Rosette character
    classification

18
The Result The KCC PLS
Solution Technical Side
  • The KCCPLS is a simple addition to your extant
    Dynaweb or Dynatext installation
  • You can correctly process and display Korean,
    Simplified Chinese, Traditional Chinese

19
What does it mean?
Solution Technical Side
20
Future PLS
Solution Technical Side
  • On the fly conversions between TC and SC
  • Source data in Unicode - Multilingual Documents

21
Getting Translated Docs
Solution Application Side
  • Identify materials to be translated
  • Select a reputable Localization vendor with a
    good track record
  • Prepare an English Glossary
  • Translate glossary into target language
  • Translation and editing
  • Charlotte

22
Using the PLS with Dynaweb
Solution Application Side
  • Run the Installer
  • InstallShield on Windows
  • install script on Unix
  • Make sure your pls.map has a locale, character
    set and language definition that matches your
    books

23
Using the PLS with Dynaweb
Solution Application Side
  • the PLS.MAP

24
Using the PLS with Dynaweb
Solution Application Side
  • Edit setlocale.dwc

25
Using the PLS with Dynaweb
Solution Application Side
  • Setup of the Browser
  • Proper fonts on system
  • Set encoding correctly

26
Using the PLS with Dynatext
Solution Application Side
  • Run the Installer
  • Check pls.map
  • Change font declarations in Stylesheets
  • Have proper language support
  • Native language version of Windows
  • English Windows with Language Packs
  • Unix with Locales

27
Testing
Solution Application Side
28
References
References
  • CJKV Information Processing
  • Ken Lunde OReilly Press ISBN 1-56592-224-7
  • Developing International Software
  • Nadine Kano Microsoft Press ISBN 1-55615-840-8
  • The Unicode Standard, Version 2.0
  • The Unicode Consortium ISBN 0-201-48345-9
  • unicode.basistech.com

29
Get more information
  • http//inso.basistech.com

30
Other Products from Basis Technology
  • Japanese Morphological Analyzer
  • Chinese Morphological Analyzer
  • Chinese Script Converter
Write a Comment
User Comments (0)
About PowerShow.com