Unicode - PowerPoint PPT Presentation

About This Presentation
Title:

Unicode

Description:

96 thousand characters, so far. All characters accessible at the same ... ?, ?,... ?, ?, ?, . Lingua Franca for Computers. Developed & supported by industry ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 21
Provided by: mark738
Learn more at: https://icu-project.org
Category:
Tags: franca | lingua | unicode

less

Transcript and Presenter's Notes

Title: Unicode


1
Unicode
  • Mark Davis
  • Unicode Consortium President
  • IBM Chief SW Globalization Architect
  • 2003-09-24

2
Universal Character Encoding
  • Unique number for every character


3
Unifies all Languages
  • 96 thousand characters, so far
  • All characters accessible at the same time, in
    the same document
  • A, Ž, ?, ?, ?,
  • ?, ?, ?,
  • ?, ?, ?, ..

4
Lingua Franca for Computers
  • Developed supported by industry leaders
  • Apple, HP, IBM, JustSystem, Microsoft, Oracle,
    SAP, Sun, Sybase, Unisys,
  • Required by modern standards
  • XML, HTML, Java, ECMAScript (JavaScript), LDAP,
    CORBA 3.0, WML, Perl, etc.
  • Implemented in
  • All modern operating systems, browsers, and other
    products

5
International Domain Names
  • Approved - Unicode-Based
  • Examples
  • http//??????.com
  • http//?a??a????.com
  • http//????.com

6
Standard Resources
  • www.unicode.org
  • Online Standard
  • Technical Reports
  • FAQs
  • General Information
  • Discussion Forums, Conferences

7
Programming Resources
  • System APIs
  • Windows, Java, Unix, Oracle, DB2, Sybase, Mac,
    Linux,
  • Languages
  • Java, JavaScript, C, Perl 5.6.0, C, C, SQL,
  • Cross-platform libraries
  • ICU, Rosette,

8
Stability
  • Developers / other standards need absolute
    stability
  • Characters are never moved or deleted
  • Ordering of characters is by collation, not
    binary order. See UTS 10 Unicode Collation
    Algorithm
  • Characters may be deprecated (discouraged).
  • Characters never change names
  • Annotations are used to clarify usage
  • See Unicode Policies

9
Indic Support in Unicode
  • ISCII the basis for characters and allocation
  • Consortium actively engaged with Indian
    Government, which is a member
  • Welcomes addition of missing characters (e.g.
    Vedic), clarifications or corrections of usage

10
Structural Similarities with ISCII
  • Within script, layout and contents nearly
    identical
  • Independent dependent vowels
  • Halant model for representing conjuncts
  • conjuncts / half-forms not directly encoded
  • represented by sequences instead
  • Phonetic sequence order in syllables

11
Structural Differences with ISCII
  • Unicode is stateless
  • No shifting to get different scripts
  • Each character has a unique number
  • Unicode is uniform
  • No extension bytes necessary
  • All characters coded in the same space

12
Additional Characters
  • Indian Government is developing proposals for
  • Additions of missing characters
  • Vedic
  • Individual characters for certain scripts
  • Annotations and Descriptions

13
Global Applications now support languages of India
  • Companies supporting Indic with Unicode
  • OpenType fonts
  • Font support for Indic
  • Microsoft Windows
  • Java (IBM contributed ICU Indic Layout)
  • Linux

14
Benefits for India
  • All documents, anywhere in the world, can have
    Indic text
  • Allows seamless multilingual documents in India
  • including scriptures and minority languages
  • Opens up software export market, beyond English
  • Connects India to the world

15
How India Can Contribute
  • Effective Communication with the Unicode
    Consortium
  • Provide Resources for Development
  • Descriptions of Usage
  • Descriptions of Character Shaping
  • Transliteration Tables from Script to Script
  • Collation Information
  • OpenType fonts

16
What Developers Can Do
  • Interwork with existing ISCII systems
  • Move to Unicode for future developments
  • Java, Windows, Linux,

17
The Future
  • The world is moving rapidly to Unicode
  • Unicode makes India open to the world
  • The world comes to you, and
  • You go to the world
  • You can help

18
Q A
19
Backup Slides
20
Multiple Forms
  • UTF-8 maximal compatibility with 8-bit systems
  • UTF-16 good storage, interoperability with
    Windows/Java
  • UTF-32 simplest processing
  • Fast, lossless conversion
  • See Forms of Unicode
Write a Comment
User Comments (0)
About PowerShow.com