Unicode and the University How script encoding can change future research, teaching, and outreach of - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Unicode and the University How script encoding can change future research, teaching, and outreach of

Description:

Unicode Standard 4.1 defines over 97,000 characters, including a ... Over 50 different scripts are covered... 1. Background: What is Unicode? www.unicode.org ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 38
Provided by: debo78
Category:

less

Transcript and Presenter's Notes

Title: Unicode and the University How script encoding can change future research, teaching, and outreach of


1
Unicode and the University How script encoding
can change future research, teaching, and
outreach of the academy
  • Deborah Anderson
  • Researcher, Dept. of Linguistics
  • UC Berkeley

Yale University 30 June
2005
2
1. Background What is Unicode?
  • International character encoding standard

3
1. Background What is Unicode?
  • International character encoding standard
  • Character encoding assignment of a number to a
    letter or symbol found in a text
  • T is 0054 (U0054)

4
1. Background What is Unicode?
  • Unicode is how MS Word documents and webpages are
    typically stored on modern computers.
  • Unicode serves as the basis of standards for
    keyboards, fonts, e-mail and web documents, and
    text searching today.

5
1. Background What is Unicode?
  • Unicode encodes characters, not glyphs
  • Character smallest component of written
    language that has semantic value
  • Glyphs are the surface representation of a
    character it appears on the printed page, on
    your monitor, etc. and make up fonts

6
1. Background What is Unicode?
  • Unicodes domain
  • Abstract character LATIN SMALL LETTER A
  • Fonts domain
  • Glyphs a a a a a a

7
1. Background What is Unicode?
  • Unicode Standard 4.1 defines over 97,000
    characters, including a wide variety of symbols
    (IPA, math symbols, arrows, punctuation, etc.)
  • Over 50 different scripts are covered

8
1. Background What is Unicode?
www.unicode.org
9
Missing Modern Scripts
but over 80 scripts are missing
  • China
  • Lanna
  • Naxi Geba
  • Naxi Tomba
  • Nushu
  • Pollard
  • Africa
  • Bamum
  • Bassa
  • Mende
  • Vai

Southeast Asia (excluding China) Batak Cham Javan
ese Kayah Li Pahawh Hmong Rejang Viet Thai
  • India, Nepal, and/or Bangladesh
  • Chakma
  • Kaithi
  • Lepcha
  • Methei/Manipuri
  • Newari
  • Ol Chiki
  • Saurashtra
  • Sorang Sompeng
  • Varang Kshiti

Other Sutton Sign-Writ. Blissymbolics Moonscript

10
Missing Historic Scripts
  • Ahom
  • Alpine
  • Aramaic
  • Avestan
  • Aztec Pictograms
  • Balti
  • Bamum
  • Brahmi
  • Büthakukye
  • Byblos
  • Carian
  • Chalukya Chola
  • Cypro-Minoan
  • Egyptian Hierogly.
  • Elbasan

Elymaic Glagolitic Grantha Hatran Hungarian
Runic Iberian Indus Valley Jurchin Kaithi Kawi
Khotanese Kitan Lge Script Kitan Sm
Script Landa Linear A
Palmyrene Phoenician Proto-Elamite Pyu Rongo
Rongo Samaritan Satavahana Sharada Siddham South
Arabian Soyombo Takri Tangut Ideogr. Uighur Vedic
accents
  • Luwian
  • Lycian
  • Lydian
  • Mandaic
  • Manichaean
  • Mayan Hieroglyphs
  • Meroitic
  • Modi
  • Nabataean
  • North Arabic
  • Numidian
  • Old Permic
  • Orkhon
  • Pahlavi

11

Missing symbols
Technical symbols
Map symbols
12
http//linguistics.berkeley.edu/sei
13
1. Background How to Get Unicode to Work
  • The character/script has to be included in The
    Unicode Standard
  • A Unicode-compliant font with the needed glyph(s)
  • Recent operating system (OS X, Windows XP, 2000)
  • Recent browser (Firefox, Safari, Internet
    Explorer)
  • Unicode-compliant text editor and input method
    (MS Word, Word 2004, Mellel)

14
2. The Unicode approval process
  • New character/scripts proposals must be approved
    by two standards bodies
  • 1. Unicode Technical Committee

15
Unicode Technical Committee
16
2. The Unicode approval process
  • New character/scripts proposals must be approved
    by two standards bodies
  • 1. Unicode Technical Committee
  • 2. ISO Working Group 2 (composed of 32 voting
    national body representatives)

17
2. The Unicode approval process
  • Represented at UTC
  • computer companies, companies requiring
    multilingual capabilities, other organizations
  • Represented at ISO
  • governments
  • Q Who represents scholars, educators, and
    minority communities?

18
2. Unicode approval process
  • In May 2005 Unicode added a new level
  • Full membership 1 vote in UTC
  • Institutional membership 1 vote
  • Supporting membership ½ vote
  • Associate membership no vote

19
2. Unicode approval process
  • In May 2005 Unicode added a new level
  • Full membership
  • Institutional membership
  • Supporting membership
  • Associate membership

20
2. Unicode approval process
  • In May 2005 Unicode added a new level
  • Full membership
  • Institutional membership
  • Supporting membership
  • Associate membership

21
Side Note Other UTC Activities

22
3. Unicode and the University Opportunities and
Issues
  • Offers a single international standard for
    interchange of text data
  • Supported by major corporations and govts
  • Provides format that will remain pivotal for many
    years to come
  • Capable of representing all the languages of the
    world in their original script (with space for
    over 1m characters)

23
3. Opportunities Teaching
  • Impact on online course materials
  • Multilingual materials will be widely accessible
    without requiring special fonts (both for local
    and distance-learning)
  • Text materials will be able to be searched
  • Once script is in Unicode and with Unicode fonts
    available, chat, email, blogs, in a given
    script will be possible
  • Help to keep alive study of lesser known langs.,
    both historic and modern

24
3. Opportunities Research
  • Research papers will be easily shared with
    colleagues and publishers (without relying on
    special fonts)
  • Possible to search within document and across
    Internet in original script (without
    transliteration or transcription)
  • Text will be saved in a standardized way, without
    resorting to proprietary font encodings

25
3. Opportunities Outreach
  • By participating in Unicode, the univ. can help
    ensure linguistic diversity and cultural
    preservation (not a special concern of
    corporations)
  • Once a script is in Unicode, the university can
    take an active role in developing literacy
    materials (and creating fonts) with those in
    underdeveloped nations

26
Multikulti.org.uk
27
Ethnomed.org (Harborview Medical Center)
28
www.nkoinstitute.com
29
3. Opportunities Benefits to Unicode
  • Give university (/library)s perspective
  • Provide more eyes to review proposals and
    technical annexes, standards, and reports
  • University can serve as a conduit to specialists
    and user communities, who might otherwise be left
    out (e.g., Lepcha)

30
3. Issues at the University
  • Scholars have relied on work-arounds to get
    characters to appear
  • non-Unicode fonts (used in docs., PDFs, webpages)
  • transliteration and transcription systems (i.e.,
    Beta Code for Greek, etc.)
  • use of images in webpages
  • Reluctance to change over to Unicode-compliant
    fonts (and migrate data)

31
3. Issues at the University
  • Unicode not esp. well understood by language
    depts. considered a technical problem
  • Some in academia feel that problems will be
    solved if they wait long enough
  • Frustration felt when dealing with the computer
    and font industry

32
3. Issues at the University
  • Difficult to explain to administrators importance
    of participation in Unicode development
  • Unicode cuts across a wide swathe of disciplines
    and areas
  • Takes a dedicated representative with time and
    travel funding to participate
  • Costs for membership felt prohibitive

33
Ramifications of NOT Participating in Unicode
development
  • The needs of those at the university will not be
    adequately met
  • The Web will be relegated to the more politically
    prominent languages and scripts, leaving out
    several minority scripts, furthering the digital
    divide historic scripts will also be missing
  • Will be increasingly difficult in get missing
    scripts to work with off-the-shelf software

34
4. How to Assist
  • Participate in the Unicode Consortium
  • Quarterly meetings
  • Costs per year
  • Full 12K ( gt 15K as 1 August)
  • Institutional 9.5K (gt 12K)
  • Supporting 6K (gt 7,500K)
  • Associate 2K / 1200 for
    non-profits
  • (gt 2.5K / 1500 for non-profits)
  • Or work with SEI

35
4. How to Assist
  • Promote Unicode within the departments
  • Encourage scholars to get their professional
    societies to join Unicode (or to join themselves)
    or offer to review proposals.

36
Conclusion
  • Working with Unicode presents an opportunity to
    build the infrastructure for world-wide
    computerization and literacy.
  • Both Unicode and the university could benefit
    from this participation (but time is of the
    essence).

37
dwanders_at_berkeley.eduSEI linguistics.berkeley.e
du/seiLibrary-specific questionsjaliprand_at_pobo
x.comUnicode website www.unicode.org
Write a Comment
User Comments (0)
About PowerShow.com