Windows 2000 Indian Language Developers Conference - PowerPoint PPT Presentation

1 / 91
About This Presentation
Title:

Windows 2000 Indian Language Developers Conference

Description:

Caret, mouse hits. For indivisible clusters. Arrow keys skip over clusters ... Caret shows proportional position. Use system controls or query Uniscribe. Font metrics ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 92
Provided by: favery
Category:

less

Transcript and Presenter's Notes

Title: Windows 2000 Indian Language Developers Conference


1
Windows 2000 Indian Language Developers
Conference
  • F. Avery Bishop
  • Senior Program Manager for Multilingual Developer
    Communications, and
  • David C. Brown
  • Development Lead for Complex Script Enabling in
    Windows Operating Systems
  • Microsoft Corporation

2
Agenda for the Day
  • Welcome and Keynote
  • International Features of Windows 2000
  • Complex Script Processing in Windows 2000
  • Uniscribe The Unicode Script Processor
  • Lunch
  • Guidelines for supporting complex scripts in
    Win32 applications
  • Supporting Indian text in Enterprise applications
  • Introduction to Open Type Fonts
  • Microsoft developer programs in India

3
Updates on Session Materials
  • Todays presentations vary slightly from your
    session handouts
  • For updates to ppt files and demos,
    seewww.microsoft.com/globaldev

4
International Features in Microsoft Windows
2000F. Avery BishopSenior Program
ManagerMicrosoft Corporation
5
Agenda International Features of Windows 2000
  • Definitions of key concepts
  • Windows 2000 single-binary internationalization
  • Multilingual content
  • Windows 2000 Multilanguage version
  • New complex script support, including
  • Support for Indian languages
  • Complex Scripts in web pages
  • Right-to-left layout of shell, applications

Old name Windows NT 5.0
6
Definitions
  • ScriptA set of symbols used to write one or
    more languages
  • Locale
  • A place or locality (Dictionary definition)
  • Set of user preferences related to language and
    local customs
  • Language GroupTerm used to describe the
    supported script families in Windows NT 5

7
Definitions
  • System LocaleNot really a locale. Determines
    which script non-Unicode applications will
    support (e.g., what Windows 9x system Windows NT
    emulates)
  • User LocaleUser preferences for formatting of
    dates, currencies, numbers, etc.
  • Input LocalePairing of input language and
    method of of input determines what language is
    currently being entered and how

8
Definitions
  • Enabling for a scriptAdding support for input,
    display, and output of the script
  • LocalizationTranslating user interface elements
  • GlobalizationDeveloping software such that
    feature design and code design are not limited to
    a single locale or script

9
Definitions
  • Complex ScriptsScripts that require contextual
    processing for display, editing, and other
    processing

10
All language versions of Windows 2000 use the
same core binary files!So What?
  • Advantages to Users
  • Can enter text in any supported language on any
    version of Windows 2000
  • Any language version of well written Win32 app
    runs on any language version of Windows 2000
  • Advantages to developers
  • Develop all language versions on one system
  • Can develop and ship a single binary for all
    languages

11
More on Unified language Support in Windows 2000
  • Effect of system default locale on application
  • ANSI applications require appropriate system
    locale setting
  • ANSI/Unicode applications may require system
    locale setting (more on this later)
  • Pure Unicode applications work with any system
    locale
  • Native Unicode support
  • Important New scripts will have no codepage, the
    support is through Unicode only (e.g., Indian
    scripts, Armenian, Gregorian)

12
Unicode allows processing of Multilingual Content
  • System components
  • Internet Explorer 5.0 can do amazing things!
  • Others Winlogon, File system, Notepad, etc.
  • Unicode applications
  • Office 2000
  • Your application!

13
Windows 2000 Multilanguage Version
  • Language of menus and dialogs is a
    per-user-setting
  • Installable language modules
  • Sold through MOLP, Select, and Enterprise
    Agreement
  • Available to developers through MSDN

14
Support for Complex Scripts in Windows 2000
A complex script is one that requires special
processing, such as
  • Bi-directional (BiDi) reordering (Arabic, Hebrew)
  • Contextual shaping (Arabic, Indic family)
  • Display of combining characters (Arabic, Thai,
    Indian)
  • Specialized word-break and justification rules
    (Thai)
  • Disallowing illegal character combinations
    (Indian, Thai)

15
RTL Orientation, or Mirroring
16
Right-to-Left Mirroring API
  • One function call will mirror all windows in an
    application
  • Can also mirror selective windows
  • APIs to suppress mirroring of bitmaps
  • May need to modify coding practices

17
Support for Indian Languages in Windows 2000
  • APIs handle Devanagari and Tamil text through
    Unicode
  • Locale support
  • Time, Date, number, currency formats
  • Sorting
  • Conversion
  • Explicit function calls convert to/from ISCII
  • No Windows 98 compatibility mode

18
How We Developed Indian Script Support in Windows
2000
  • Worked with Government organizations
  • Consulted with NCST, CDAC, academics
  • Brought engineers from NCST
  • Added Indian shaping engines to Uniscribe
  • Helped define feature tables for Open Type
  • Hired Hindi/Tamil speakers to test

19
Complex Scripts in Web pages
  • IE 5.0 supports complex scripts, including
    Devanagari and Tamil in
  • Standard HTML text
  • DHTML All properties in DOM
  • XML
  • Recommended encoding is UTF-8
  • Place charsetutf-8 in HTTP header
  • Allows mixed scripts

20
Demo!
21
Questions?
22
Further Information and Resources
  • http//www.microsoft.com/globaldev(Watch for
    updates!)
  • MSJ articles, e.g.,
  • Uniscribe http//www.microsoft.com/msj/1198/multi
    lang/multilangtop.htm
  • Multilingual UI Coming April 1999
  • Send suggestions to nlshelp_at_microsoft.com

23
Break!
24
Complex Script Processing in Microsoft Windows
2000 David BrownDevelopment LeadMicrosoft
Corporation
25
Agenda
  • Overview
  • Implementation
  • Details

26
1. Overview
  • Distinct language groups
  • Mix any and all scripts
  • Most apps are easy to develop
  • CS Complex Script

27
Complex Script Language groups
  • Arabic, Hebrew, Indic, Thai, Vietnamese
  • Part of ALL versions of Windows 2000
  • Enable in Control Panel - Regional Settings
  • Turn it on today!

28
All scripts, any mix
  • Unicode makes representation easy
  • Common framework and APIs
  • Individual script and font handlers
  • Multilingual for no extra effort

29
Built into standard system APIs
  • Plain text
  • ExtTextOut, Drawtext, TabbedTextOut
  • System edit control
  • Dialog boxes
  • Formatted text
  • Richedit
  • HTML control
  • See the Win32 SDK
  • Dont write your own formatting

30
Font fallback
  • Standard system fonts
  • For dialogs, plaintext edit controls
  • and other plaintext display
  • Dialog boxes work automatically

31
Summary
  • CS support is standard in Windows 2000
  • No restrictions on script combinations
  • Easy (unless you are implementing your own
    formatting)

32
2. Implementation
  • Callouts from GDI and USER
  • Performance
  • Text broken by script and direction
  • Script handlers
  • LPK.DLL

33
Callouts from GDI and USER
  • ExtTextOut, DrawText passed early to LPK.DLL
  • Plaintext edit control has many callouts
  • Caret placement
  • Text measurement
  • Line breaking
  • Word advance
  • Safe, stable changes to OS core

34
Fast path for non CS
  • Normal GDI 11 char to glyph
  • Simple side by side placement
  • No CS characters
  • If right-to-left, no neutrals
  • If Digit substitution, no digits
  • Performance is good

35
Split by script and direction
  • Separate e.g. Devanagari, Tamil, Western
  • Left-to-right or right-to-left
  • Unicode bidirectional algorithm
  • Atomic item of display

36
Handler for each script
  • Script shaping and reordering
  • Devanagari - matra I reordered before consonant
    cluster
  • Tamil - vowel sign O surrounds consonant cluster
  • Urdu - Initial, media, final, alone forms
  • Various font formats
  • Backward compatability
  • Shaping - ligatures, contextual forms
  • Placement of marks
  • Script handlers understand scripts

37
Language Pack LPK.DLL
  • Apply NLS settings (preferred digits)
  • Plaintext edit control
  • Calls to Uniscribe string handling
  • LPK.DLL is OS ltgt Uniscribe bridge

38
Application
USER GDI
LPK.DLL
Uni-scribe
39
Summary
  • Callouts from GDI and USER
  • Performance issues
  • Split by script and direction
  • Script handlers
  • LPK.DLL

40
3. Details
  • Clusters
  • Caret placement and Mouse hits
  • Word breaking
  • Font metrics
  • Measuring text
  • Metafiles

41
Clusters
  • Indivisible - Indian, Thai, Vietnamese
  • Divisible - Arabic

42
Caret, mouse hits
  • For indivisible clusters
  • Arrow keys skip over clusters
  • Del deletes entire cluster
  • Backspace decomposes cluster one character at a
    time
  • Arrows and Mouse select whole clusters
  • Left click snaps to nearest boundary
  • For divisible clusters
  • Caret shows proportional position
  • Use system controls or query Uniscribe

43
Font metrics
  • Matching the body height

44
Font metrics
  • Matching the ascender

45
Font metrics
  • Matching the descender

46
Matching fonts
  • When CS text is predominant
  • Full CS line spacing
  • Increase Western height
  • When Western text is predominant
  • Compromise line spacing
  • Accept some clipping
  • System edit control
  • Line spacing from single font
  • Richedit, HTML control
  • Line spacing adjusted for multiple fonts

47
Measuring text
  • Adding characters can make text smaller

48
Metafiles
  • Device independent
  • Store Unicode - Enhanced metafile
  • Use ExtTextOut(W)
  • Windows adjusts widths for different playback
    fonts
  • Device dependant
  • Avoid
  • Stores glyphs
  • Requires identical font for playback

49
Summary
  • Caret placement and Mouse hits
  • Word breaking
  • Font metrics
  • Measuring text
  • Metafiles
  • Format with richedit, MSHTML

50
Resources
  • Uniscribe - next talk
  • OpenType - later today
  • Win32 SDK
  • Richedit
  • RTF
  • messages
  • Text object model
  • HTML control
  • HTML
  • Document object model

51
Questions?
52
Conclusions
  • Windows 2000 is multilingual
  • Included on every CD
  • Format with system controls
  • is much easier than writing your own
  • You can write your own formatting
  • Uniscribe provides all you need

53
Uniscribe The Unicode Script Processor David
BrownDevelopment LeadMicrosoft Corporation
54
Agenda
  • Overview
  • Layers
  • Low level APIs
  • High level APIs

55
1. Overview
  • Uniscribe is a DLL
  • Client applications
  • Hides language details
  • Hides OS details

56
USP10.DLL
  • Platforms
  • Windows 2000
  • Windows NT 4
  • Windows 98
  • Windows 95 (excluding Far East)
  • Single worldwide binary
  • Installs with Windows2000, IE5, Office 2000

57
Client applications
  • Windows 2000
  • Word 2000
  • Excel 2000
  • Access 2000
  • PowerPoint 2000
  • MSHTML (IE5)
  • Richedit 3
  • MS Agent
  • Frontpage Express
  • HTML/RTF converter

58
Hides language details
  • Syllable structure (Indian, Thai)
  • Contextual shaping (Arabic)
  • Caret placement
  • Wordbreak
  • National digits
  • Bidirectional layout (Arabic, Hebrew)

59
Hides Unicode OS details
  • APIs are Unicode on all platforms
  • Hides glyph codes
  • Hides font differences
  • Shaping tables
  • Fixed repetoire fonts

60
Summary
  • Cross platform Unicode display API

61
2. Layers
  • Win32 glyph support
  • OpenType
  • Shaping engines
  • Low level APIs
  • Formatted text, Full control, Less simple
  • High level APIs
  • Plaintext, Simple

62
Win32 API
  • Truetype fonts
  • Internally indexed by glyph
  • Glyph manipulation
  • ExtTextOut(ETO_GLYPHINDEX)
  • GetGlyphOutline(GGO_GLYPHINDEX)
  • Font table access
  • GetFontData

63
OpenType
  • Provides standard table structures
  • Contextual glyph substitution
  • Mark to base attachment
  • Defines instances for scripts
  • Examples
  • Initial form of Arabic letter
  • Half form of Devanagari consonant
  • Attachment position for Nukta

64
Shaping engines
  • Per script
  • Understand language rules
  • Understand font features
  • OpenType provides full control
  • Many older fixed layout fonts

65
Low level APIs
  • Low level item support for formatting apps
  • Break string by script and direction
  • Shaping
  • Caret and mouse
  • Word breaking, justification

66
High level APIs
  • Simple string support for LPK and plaintext apps
  • Features not in low level APIs
  • Font fallback
  • Tabstops
  • Bidi highlighting
  • Similar functionality to ExtTextOut, DrawText,
    TabbedTextOut

67
Summary
  • High level plaintext APIs
  • Low level formatting APIs
  • Shaping engines
  • OpenType
  • Win32 API

68
3. Low level APIs
  • Formatting text
  • Style runs
  • Measurement
  • Paragraph filling
  • Rendering

69
One run
70
Script and Direction Boundaries
  • ScriptItemize generates items
  • Each item has single script and direction
  • Implements the Unicode Bidi algorithm
  • Application must merge items into its own style
    runs
  • Runs are unique in
  • Font, Style
  • Script, Direction

71
Glyphs and Metrics
  • One run at a time
  • ScriptShape generates
  • glyphs,
  • glyph attributes
  • map of character to glyph buffer offsets
  • ScriptPlace generates
  • advance widths
  • combining character x,y offsets

72
Line Filling
  • Measure runs in logical order until the line
    overflows
  • ScriptBreak provides codepoint attribute
    information
  • Whitespace
  • Start of word for scripts such as Thai
  • Break the overflow run using these attributes

73
Word breaking
  • ScriptBreak
  • Thai, Khmer run words together
  • This is 5 words
  • Grammatical analysis
  • Dictionary

74
Layout and Rendering
  • ScriptLayout for visual order
  • Embedding levels from ScriptItemize
  • Use for generic multilingual support
  • ScriptTextOut renders each run
  • Glyphs from ScriptShape
  • Positions from ScriptPlace

75
Caret Placement Mouse Hits
  • ScriptXtoCP, ScriptCPtoX
  • CP - character position
  • X - horizontal coordinate
  • Which edge?
  • In bi-directional text the trailing edge of one
    character is not necessarily adjacent to the
    leading edge of the next character

76
Leading and Trailing edges
77
Font Fallback
  • The more scripts you support the more you need
    font fallback
  • ScriptShape returns HRESULT USP_E_SCRIPT_NOT_IN_FO
    NT if you ask it to shape a run that the selected
    font cannot support

78
Summary
  • Script
  • Itemize
  • Shape, Place
  • Break, Layout
  • TextOut
  • CPtoX, XtoCP

79
4. High level APIS
  • Purpose
  • Analysis
  • Display
  • Font fallback

80
Purpose
  • For Windows 2000
  • ExtTextOut
  • DrawText
  • System edit control
  • Cross-platform Unicode plaintext display
  • Easier than low level APIs

81
Analyze
  • ScriptStringAnalyse
  • Itemizes, shapes, places etc.
  • Features
  • Variety of tabbing options
  • Clipping, justification
  • Font fallback
  • Control character representation
  • hotkey substitution
  • password entry
  • Returns handle to analysis

82
Querying the analysis
  • ScriptString
  • Size
  • pcOutChars
  • pLogAttr
  • GetOrder
  • CPtoX, XtoCP
  • GetLogicalWidths
  • Validate

83
Displaying and freeing
  • ScriptStringOut
  • Clipping rect like ExtTextOut
  • Selection highlighting
  • ScriptStringFree

84
Bidi highlighting
  • Arabic letters right-to-left
  • Arabic numbers left-to-right

85
Font fallback
  • When font in HDC is missing
  • Codepoints
  • Fallback clusters with codepoints not present in
    the font
  • Scripts
  • Fallback when Item script not supported by the
    font
  • Finally to GDI for Far East font linking
  • Requires Microsoft Sans Serif

86
Summary
  • ScriptString
  • Analyse
  • query analysis ...
  • Out
  • Free

87
Demo
88
Resources
  • OpenType talk
  • Complex script sample CSSAMP
  • Win32 SDK
  • Microsoft Systems Journal
  • November 1998

89
Questions?
90
Conclusion
  • Unicode plaintext display APIs
  • Unicode formatted text support APIs
  • Cross-platform

91
Lunch!
Write a Comment
User Comments (0)
About PowerShow.com