Unicode Support for Mathematics - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Unicode Support for Mathematics

Description:

C-language precedence is too intricate for most programmers to use extensively ... In general, ( ) can be used to clarify or overrule precedence ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 34
Provided by: downloadM
Category:

less

Transcript and Presenter's Notes

Title: Unicode Support for Mathematics


1
Unicode Support for Mathematics
  • Murray Sargent III
  • Microsoft

2
Overview
  • Unicode math characters
  • Semantics of math characters
  • Unicode and markup
  • Multiple ways of encoding math characters
  • Not yet standardized math characters
  • Inputting math symbols

3
Unicode Math Characters
  • 340 math chars exist in ASCII, U2200 U22FF,
    arrows, combining marks of Unicode 3.0
  • 996 math alphanumeric characters are in Unicode
    3.1s Plane 1
  • 591 new math symbols and operators are in Unicode
    3.2s BMP
  • One math variant selector
  • One new combining character (reverse solidus).

4
Basic Set of Alphanumeric Characters
  • Latin digits (0 - 9)
  • Upper- lowercase Latin letters (a - z, A - Z)
  • Uppercase Greek letters ? - O plus the nabla ?
    and the variant of theta T given by U03F4
  • Lowercase Greek letters a - ? plus the partial
    differential sign ? and glyph variants of e, ?,
    ?, f, ?, and p
  • Only unaccented forms of letters are used

5
Math Alphanumeric Characters
  • Math needs various Latin and Greek alphabets like
    normal, bold, italic, script, Fraktur, and
    open-face
  • May appear to be font variations, but have
    distinct semantics
  • Without these distinctions, you get gibberish,
    violating Unicode rule plain text must contain
    enough info to permit the text to be rendered
    legibly, and nothing more
  • Plain-text searches should distinguish between
    alphabets, e.g., search for script H shouldnt
    match H, etc.
  • Reduces markup verbosity

6
Legibility Loss
  • Without math alphabets, the Hamiltonian formula 
  • H ? dt eE2 µH2
  •  becomes an integral equation
  • H ? dt eE2 µH2

7
Math Alphanumeric Chars (cont)
  • Plain a-z, A-Z, 0-9, ?-?, ?-O
  • Bold a-z, A-Z, 0-9, ?-?, ?-O
  • Italic a-z, A-Z, ?-?, ?-O
  • Bold italic a-z, A-Z, ?-?, ?-O
  • Script a-z, A-Z
  • Bold script a-z, A-Z
  • Fraktur a-z, A-Z
  • Bold Fraktur a-z, A-Z
  • Double struck a-z, A-Z, 0-9
  • Sans-serif a-z, A-Z, 0-9
  • Sans-serif bold a-z, A-Z, 0-9, ?-?, ?-O
  • Sans-serif italic a-z, A-Z
  • Sans-serif bold italic a-z, A-Z, ?-?, ?-O
  • Monospace a-z, A-Z, 0-9

8
How Display Math Alphabets?
  • Can use Unicode surrogate pair mechanisms
    available on OS
  • Alternatively, bind to standard fonts and use
    corresponding BMP characters
  • Second approach probably faster and to display
    Unicode one needs font binding in any event. But
    most traditional fonts are not suited to math
    alphabetic characters
  • A single math font may look more consistent

9
Math Alphabetics via Glyph Variants
  • One approach to the math alphanumerics would be
    to use a set of math glyph variant selectors
  • Such a tag would follow a base character
    imparting a math style
  • Approach was dropped since it seemed likely to be
    abused
  • One math variant selector does exist to offer a
    different line slant for some composite symbols
  • Other variant selectors are being defined for
    nonmath purposes, e.g., Han variants

10
Multiple Character Encodings
  • As with nonmath characters, math symbols can
    often be encoded in multiple ways, composed and
    decomposed
  • E.g., ? can be U003D, U0338 or U2260
  • Recommendation use the fully composed symbol,
    e.g., U2260 for ?
  • For alphabetic characters, use combining-mark
    sequences to get consistent typography
  • Some representations use markup for the
    alphabetic cases. This allows multicharacter
    combining marks.

11
Compatibility Holes
  • Compatibility holes (reserved positions) exist in
    some Unicode sequences to avoid duplicate
    encodings (ugh!)
  • E.g., U2071-U2073 are holes for ¹²³, which are
    U00B9, U00B2, and U00B3, respectively
  • Math alphanumerics have holes corresponding to
    Letterlike symbols.
  • Recommendation you can use the hole codes
    internally, but must import and export the
    standard codes.

12
Nonstandard Characters
  • People will always invent new math characters
    that arent yet standardized.
  • Use private use area for these with a
    higher-level marking that these are for math.
  • This approach can lead to collisions in the math
    community (unless a standard is maintained)
  • Cut/copy in plain text can have collisions with
    other uses of the private use area

13
Unicode and Markup
  • Unicode was never intended to represent all
    aspects of text
  • Language attribute sort order, word breaks
  • Rich (fancy) text formatting built-up fractions
  • Content tags headings, abstract, author, figure
  • Glyph variants Poetica font 58 ampersands
    Mantinia font novel ligatures (TT, TE, etc.)
  • MathML adds XML tags for math constructs, but
    seems awfully wordy

14
Unicode Plain Text
  • Can do a lot with plain text, e.g., BiDi
  • Grey zone use of embedded codes
  • Unicode ascribes semantics to characters, e.g.,
    paragraph mark, right-to-left mark
  • Lots of interesting punctuation characters in
    range U2000 to U204F
  • Extensive character semantics/properties tables,
    including mathematical, numerical

15
Unicode Character Semantics
  • Math characters have math property
  • Math characters are numeric, variable, or
    operator, but not a combination
  • Properties are useful in parsing math plain text
  • MathML doesnt use these properties every
    quantity is explicitly tagged
  • Properties still can be useful for inputting text
    for MathML (noone wants to type all those tags!)
  • Sometimes default properties need to be overruled
  • Would be useful to have more math properties

16
Plain Text Encoding
  • TEX fraction numerator is what follows a up to
    keyword \over
  • Denominator is what follows the \over up to the
    matching
  • are not printed
  • Simple rules give unambiguous plain text, but
    results dont look like math
  • How to make a plain text that looks like math?

17
Simple plain text encoding
  • Simple operand is a span of alphanumeric
    characters
  • E.g., simple numerator or denominator is
    terminated by any operator
  • Operators include arithmetic operators, most
    whitespace characters, all U22xx, an argument
    break operator (displayed as small raised dot),
    sub/superscript operators
  • Fraction operator is given by the Unicode
    fraction slash operator U2044

18
Fractions
  • abc/d gives
  • More complicated operands use parentheses ( ),
    brackets , or
  • Outermost parens arent displayed in built-up
    form
  • E.g., plain text (a c)/d displays as
  • Easier to read than TEXs, e.g., a c \over d
  • MathML ltmfracgtltmrowgtltmigtalt/migtltmogtlt/mogt
    ltmigtclt/migtlt/mrowgtltmrowgtltmigtdlt/migt lt/mrowgtlt/mfracgt
  • Neat feature plain text looks like math

19
Subscripts and Superscripts
  • Unicode has numeric subscripts and superscripts
    along with some operators (U2070-U208E)
  • Others need some kind of markup like
    ltmsupgtlt/msupgt
  • With special subscript and superscript operators
    (not yet in Unicode), these scripts can be
    encoded nestibly
  • Use parentheses as for fractions to overrule
    built-in precedence order

20
Presentation markup
  • Presentation markup directs how the math should
    be rendered.

ltmrowgt ltmigtElt/migt ltmogtlt/mogt ltmrowgt
ltmigtmlt/migt ltmogtInvisibleTimeslt/mogt
ltmsupgt ltmigtclt/migt ltmngt2lt/mngt
lt/msupgt lt/mrowgt lt/mrowgt
21
Content markup
  • Content markup describes the meaning of the
    expression, not the format.

ltrelgt lteq/gt ltcigtElt/cigt ltapplygt
lttimesgt ltcigtmlt/cigt ltapplygt
ltpower/gt ltcigtclt/cigt
ltcngt2lt/cngt lt/applygt lt/timesgt
lt/applygt lt/relgt
22
(No Transcript)
23
Unicode TEX Example
24
Symbol Entry
  • GUI PCs can display a myriad glyphs, mathematics
    symbols, and international characters
  • Hard to input special symbols. Menu methods are
    slow. Hot keys are great but hard to learn
  • Reexamine and improve symbol-input and storage
    methods
  • With left/right Ctrl/Alt keys, PC keyboard gives
    direct access to 600 symbols. Maximum possible
    2100 1030
  • Use on-screen, customizable, keyboards and symbol
    boxes
  • Drag drop any symbol into apps or onto keyboards

25
Hex to Unicode Input Method
  • Type Unicode character hexadecimal code
  • Make corrections as need be
  • Type Altx to convert to character
  • Type Altx to convert back to hex (useful
    especially for missing glyph character)
  • Resolve ambiguities by selection
  • Input higher-plane chars using 5 or 6-digit code
  • New MS Word standard

26
Built-Up Formula Heuristics
  • Math characters identify themselves and neighbors
    as math
  • E.g., fraction (U2044), ASCII operators,
    U2200U22FF, and U20D0U20FF identify neighbors
    as mathematical
  • Math characters include various English and Greek
    alphabets
  • When heuristics fail, user can select math mode
    WYSIWYG instead of visible math on/off codes

27
Operator Precedence
  • Everyone knows that multiply takes precedence
    over add, e.g., 353 18, not 24
  • C-language precedence is too intricate for most
    programmers to use extensively
  • TEX doesnt use precedence relies on to
    define operator scope
  • In general, ( ) can be used to clarify or
    overrule precedence
  • Precedence reduces clutter, so some precedence is
    desirable (else things look like LISP!)
  • But keep it simple enough to remember easily

28
Layout Operator Precedence
  • Subscript, superscript
  • Integral, sum ò S P
  • Functions Ö
  • Times, divide /
  • Other operators Space ". , - Tab
  • Right brackets )
  • Left brackets (
  • End of paragraph FF EOP

29
Mathematics as a Programming Language
  • Fortran made great steps in getting computers to
    understand mathematics
  • Java and C accept Unicode variable names
  • C has preprocessor and operator overloading,
    but needs extensions to be really powerful
  • Use Unicode characters including math
    alphanumerics
  • Use plain-text encoding of mathematical
    expressions
  • Cant use all mathematical expressions as code,
    but can go much further than current languages go
  • When to to multiply? In abstract, multiplication
    is infinitely fast and precise, but not on a
    computer

30
void IHBMWM(void) gammap gammasqrt(1
I2) upsilon cmplx(gammagamma1,
Delta) alphainc alpha0(1-(gammagammaI2/gamm
ap)/(gammap upsilon)) if (!gamma1
fabs(DeltaT1) lt 0.01) alphacoh
-halfalpha0I2pow(gamma/gammap,
3) else Gamma 1/T1 gamma1 I2sF
(I2/T1)/cmplx(Gamma, Delta) betap2
upsilon(upsilon gammaI2sF) beta
sqrt(betap2) alphacoh 0.5gammaalpha0(I2sF
(gamma upsilon) /(gammapgammap -
betap2)) ((1gamma/beta)(beta -
upsilon)/(beta upsilon) -
(1gamma/gammap)(gammap - upsilon)/ (gammap
upsilon)) alpha1 alphainc alphacoh
31
(No Transcript)
32
(No Transcript)
33
Conclusions
  • Unicode provides great support for math in both
    marked up and plain text
  • Unicode character properties facilitate
    plain-text encoding of mathematics but arent
    used in MathML
  • Heuristics allow plain text to be built up
  • Need two more Unicode assignments subscript and
    superscript operators
  • On-screen keyboards and symbol boxes aid formula
    entry
  • Unicode math characters could be useful for
    programming languages
Write a Comment
User Comments (0)
About PowerShow.com