Title: Unicode and the University How script encoding can change future research, teaching, and outreach of
1Unicode and the University How script encoding
can change future research, teaching, and
outreach of the academy
- Deborah Anderson
- Researcher, Dept. of Linguistics
- UC Berkeley
Yale University 30 June
2005
21. Background What is Unicode?
- International character encoding standard
31. Background What is Unicode?
- International character encoding standard
- Character encoding assignment of a number to a
letter or symbol found in a text - T is 0054 (U0054)
-
41. Background What is Unicode?
- Unicode is how MS Word documents and webpages are
typically stored on modern computers. - Unicode serves as the basis of standards for
keyboards, fonts, e-mail and web documents, and
text searching today. -
51. Background What is Unicode?
- Unicode encodes characters, not glyphs
-
- Character smallest component of written
language that has semantic value - Glyphs are the surface representation of a
character it appears on the printed page, on
your monitor, etc. and make up fonts
61. Background What is Unicode?
- Unicodes domain
- Abstract character LATIN SMALL LETTER A
- Fonts domain
- Glyphs a a a a a a
-
71. Background What is Unicode?
- Unicode Standard 4.1 defines over 97,000
characters, including a wide variety of symbols
(IPA, math symbols, arrows, punctuation, etc.) - Over 50 different scripts are covered
-
81. Background What is Unicode?
www.unicode.org
9 Missing Modern Scripts
but over 80 scripts are missing
- China
- Lanna
- Naxi Geba
- Naxi Tomba
- Nushu
- Pollard
- Africa
- Bamum
- Bassa
- Mende
- Vai
Southeast Asia (excluding China) Batak Cham Javan
ese Kayah Li Pahawh Hmong Rejang Viet Thai
- India, Nepal, and/or Bangladesh
- Chakma
- Kaithi
- Lepcha
- Methei/Manipuri
- Newari
- Ol Chiki
- Saurashtra
- Sorang Sompeng
- Varang Kshiti
Other Sutton Sign-Writ. Blissymbolics Moonscript
10Missing Historic Scripts
- Ahom
- Alpine
- Aramaic
- Avestan
- Aztec Pictograms
- Balti
- Bamum
- Brahmi
- Büthakukye
- Byblos
- Carian
- Chalukya Chola
- Cypro-Minoan
- Egyptian Hierogly.
- Elbasan
Elymaic Glagolitic Grantha Hatran Hungarian
Runic Iberian Indus Valley Jurchin Kaithi Kawi
Khotanese Kitan Lge Script Kitan Sm
Script Landa Linear A
Palmyrene Phoenician Proto-Elamite Pyu Rongo
Rongo Samaritan Satavahana Sharada Siddham South
Arabian Soyombo Takri Tangut Ideogr. Uighur Vedic
accents
- Luwian
- Lycian
- Lydian
- Mandaic
- Manichaean
- Mayan Hieroglyphs
- Meroitic
- Modi
- Nabataean
- North Arabic
- Numidian
- Old Permic
- Orkhon
- Pahlavi
11Missing symbols
Technical symbols
Map symbols
12http//linguistics.berkeley.edu/sei
131. Background How to Get Unicode to Work
- The character/script has to be included in The
Unicode Standard - A Unicode-compliant font with the needed glyph(s)
- Recent operating system (OS X, Windows XP, 2000)
- Recent browser (Firefox, Safari, Internet
Explorer) - Unicode-compliant text editor and input method
(MS Word, Word 2004, Mellel)
142. The Unicode approval process
- New character/scripts proposals must be approved
by two standards bodies - 1. Unicode Technical Committee
15 Unicode Technical Committee
162. The Unicode approval process
- New character/scripts proposals must be approved
by two standards bodies - 1. Unicode Technical Committee
- 2. ISO Working Group 2 (composed of 32 voting
national body representatives) -
172. The Unicode approval process
- Represented at UTC
- computer companies, companies requiring
multilingual capabilities, other organizations - Represented at ISO
- governments
- Q Who represents scholars, educators, and
minority communities? -
182. Unicode approval process
- In May 2005 Unicode added a new level
- Full membership 1 vote in UTC
- Institutional membership 1 vote
- Supporting membership ½ vote
- Associate membership no vote
192. Unicode approval process
- In May 2005 Unicode added a new level
- Full membership
- Institutional membership
- Supporting membership
- Associate membership
202. Unicode approval process
- In May 2005 Unicode added a new level
- Full membership
- Institutional membership
- Supporting membership
- Associate membership
21Side Note Other UTC Activities
223. Unicode and the University Opportunities and
Issues
- Offers a single international standard for
interchange of text data - Supported by major corporations and govts
- Provides format that will remain pivotal for many
years to come - Capable of representing all the languages of the
world in their original script (with space for
over 1m characters)
233. Opportunities Teaching
- Impact on online course materials
- Multilingual materials will be widely accessible
without requiring special fonts (both for local
and distance-learning) - Text materials will be able to be searched
- Once script is in Unicode and with Unicode fonts
available, chat, email, blogs, in a given
script will be possible - Help to keep alive study of lesser known langs.,
both historic and modern
243. Opportunities Research
- Research papers will be easily shared with
colleagues and publishers (without relying on
special fonts) - Possible to search within document and across
Internet in original script (without
transliteration or transcription) - Text will be saved in a standardized way, without
resorting to proprietary font encodings
253. Opportunities Outreach
- By participating in Unicode, the univ. can help
ensure linguistic diversity and cultural
preservation (not a special concern of
corporations) - Once a script is in Unicode, the university can
take an active role in developing literacy
materials (and creating fonts) with those in
underdeveloped nations
26Multikulti.org.uk
27Ethnomed.org (Harborview Medical Center)
28www.nkoinstitute.com
293. Opportunities Benefits to Unicode
- Give university (/library)s perspective
- Provide more eyes to review proposals and
technical annexes, standards, and reports - University can serve as a conduit to specialists
and user communities, who might otherwise be left
out (e.g., Lepcha)
303. Issues at the University
- Scholars have relied on work-arounds to get
characters to appear - non-Unicode fonts (used in docs., PDFs, webpages)
- transliteration and transcription systems (i.e.,
Beta Code for Greek, etc.) - use of images in webpages
- Reluctance to change over to Unicode-compliant
fonts (and migrate data)
313. Issues at the University
- Unicode not esp. well understood by language
depts. considered a technical problem - Some in academia feel that problems will be
solved if they wait long enough - Frustration felt when dealing with the computer
and font industry
323. Issues at the University
- Difficult to explain to administrators importance
of participation in Unicode development - Unicode cuts across a wide swathe of disciplines
and areas - Takes a dedicated representative with time and
travel funding to participate - Costs for membership felt prohibitive
33Ramifications of NOT Participating in Unicode
development
- The needs of those at the university will not be
adequately met - The Web will be relegated to the more politically
prominent languages and scripts, leaving out
several minority scripts, furthering the digital
divide historic scripts will also be missing -
- Will be increasingly difficult in get missing
scripts to work with off-the-shelf software
344. How to Assist
- Participate in the Unicode Consortium
- Quarterly meetings
- Costs per year
- Full 12K ( gt 15K as 1 August)
- Institutional 9.5K (gt 12K)
- Supporting 6K (gt 7,500K)
- Associate 2K / 1200 for
non-profits - (gt 2.5K / 1500 for non-profits)
- Or work with SEI
354. How to Assist
- Promote Unicode within the departments
- Encourage scholars to get their professional
societies to join Unicode (or to join themselves)
or offer to review proposals.
36Conclusion
- Working with Unicode presents an opportunity to
build the infrastructure for world-wide
computerization and literacy. - Both Unicode and the university could benefit
from this participation (but time is of the
essence).
37dwanders_at_berkeley.eduSEI linguistics.berkeley.e
du/seiLibrary-specific questionsjaliprand_at_pobo
x.comUnicode website www.unicode.org