UNICODE - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

UNICODE

Description:

Conjuncts Glyphs for rendering (not for Unicode) - Multiple ways to express ... Glyph / Case Mapping / .... Unicode : Stages for Acceptance. Initial & Explore Stage ... – PowerPoint PPT presentation

Number of Views:368
Avg rating:3.0/5.0
Slides: 14
Provided by: drmuku
Category:
Tags: unicode | glyph | kannada

less

Transcript and Presenter's Notes

Title: UNICODE


1
UNICODE Indic Scripts
  • Dr. Mukul K Sinha
  • Expert Software Consultants Ltd., New Delhi
  • expert_at_vsnl.com

2
Indian Languages Scripts
  • Indian Languages 22 Constitutionally Recog.
  • Scripts 11 (3 Mithilakshar / Olchiki /
    Maitie Mayank)
  • Devnagari 8, Bangala -3, Gurumukhi,
    Gujarati, Oriya,
  • Tamil, Telugu, Kannada, Malayalam/ Roman/
    Perso-Arabian 3
  • (Sanskrit, Hindi, Marathi, Nepali, Maithili,
    Santhali, Bodo, Dogri)
  • (Bangala, Assameses, Manipuri), (Urdu,
    Kashmiri,Sindhi)
  • Literate Population 65, English 5,
    Multi-lingual mostly
  • Digital Divide Language Divide ?!
  • Need of Indic Language Computing
    Environment!

3
Scripts of Population
  • Devnagari - 49.76
  • Bangala - 10.2
  • Telugu - 8.1
  • Tamil - 6.5
  • Arabic/Pers. - 6.1
  • Gujarati - 5.1
  • Kannada 4.0
  • Malayalam 3.7
  • Oriya - 3.5
  • Gurumukhi 2.9
  • ..
  • Govt. of India
  • 1997 Survey

4
Scripts Print Internet
  • Population Print
    Web
  • Latin 39 72 84
  • CJK 22 25 13
  • Indic 22 2.2 0.3
  • Year 2000
  • Language Script
  • Europe (11) 1
  • Indic (184) 12

5
Indic Scripts ISCII
  • Dept. of Electronics Indian Script Standard
    Committee 1986 88
  • ISCII 1998.
  • ISCII 1989 8 bit character Code (Lower Set
    ASCII )
  • Escape sequence for Script identity.
  • Brahmi-based-covering 9 Indic scripts
  • (Transliteration automatic)

6
Unicode Consortium - History
  • Consortium / Non-profit / Regd. -1991 - USA.
  • Open Standard / ISO W3C
  • Members Corporate / Institutional - Voting
  • Assoc. / Individual - Non-voting
  • To meet implementation needs
  • that will not be invalidated any time in future

7
Unicode
  • For Characters (Scripts / Ideographs / Symbols/)
  • NOT Glyphs.
  • Platform independent.
  • Content Inter-operable / Inter-change.
  • Availability of tools for text processing.
  • Global Presence.

8
Unicode Version
  • Unicode
  • Version 1/1.1/1.2 1991 / 92 / 93
  • Version 2.0 /2.1 1996 /97 Internet Web
  • Version 3.0/3.1/3.2 2000 /1 /2 India
    Govt.
  • Version 4.0/4.1 2003/2004 96,382
    (70,207)
  • Version 5.0 -

9
Unicode Technical Specification
  • Code Area - 0 10FFFF (Hexadecimal)
  • Basic Multilingual Plane
  • 0- FFFF 16 bit Code (2bytes) Plane 0
  • E000 F000 (Private Use Area)
  • Plane 1 .. Plane 2 .. Plane 15 (PUA- A)
    PUA-B
  • PUA (National standard Vendor Specific Codes)
    (Japanese NEC Fujitsu)
  • Indic Script One code page for each language
  • For Information Exchange UTF 8 (8 bit byte
    string)
  • can be 1 to 4 (UTF-8) bytes

10
Unicode Indic Scripts
  • Conjuncts Glyphs for rendering (not for
    Unicode)
  • - Multiple ways to express
  • -
  • ZWNJ (U200C) / ZWJ (U200D)
  • Collation - Unicode Language Order
    different
  • - Devnagari Hindi /Marathi
  • - Latin (Danish / Norwegion Ä / Ö)
  • (Hungarian / German)

11
Unicode Stability Policy
  • Encoding Stability
  • - Code NOT to be moved /removed
  • - Later version Superset of Earlier Version
  • Name Stability
  • - Character Name NOT to be changed
  • Identity Stability
  • - Identifying characteristic unchanged
  • - Glyph / Case Mapping / .
  • .

12
Unicode Stages for Acceptance
  • Initial Explore Stage
  • Chakma / Newari (BMP), Kaithi /Ahom/ Indus
    Valley (P1)
  • Proposal in early Committee Recommendation
  • Manipuri (BMP), Brahmi (Plane 1)
  • Approved Proposal with ISO
  • Lepcha /Ol-chiki /Saurashtra (BMP)
  • Finalized Encoding Script in Pre Publication
  • Code Pages for Limbu / kharosti / Sylot Nagari

13
Tasks
  • Convergence State Govts. / Language Commun.
  • Active Participation in UNICODE
  • Recognized Languages
  • Additional Initiatives for
  • Other Indic Region Scripts
  • E-Governance Applications Unicode complaint
    Indic Scripts
Write a Comment
User Comments (0)
About PowerShow.com