IDN: Technology, Status, Overview, and Directions - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

IDN: Technology, Status, Overview, and Directions

Description:

Most Internet application protocols defined for ASCII, or at least seven-bit characters ... Some resistance from some gTLD registries ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 33
Provided by: john992
Category:

less

Transcript and Presenter's Notes

Title: IDN: Technology, Status, Overview, and Directions


1
IDN Technology, Status, Overview, and Directions
  • John C KLENSIN, Ph.D.
  • APT Workshop on ENUM and IDN
  • August 2003

2
Internationalized Domain Names (IDN)
  • Term used in many ways
  • Strictly, domain name labels that represent names
    containing non-host name characters.
  • Only host name (or LDH) strings are actually
    entered into the DNS.
  • Sometimes, IDN is used to refer to a
    fully-qualified domain name that contains at
    least one non-LDN label/
  • Sometimes used to refer to other ways of
    internationalization or localization
  • Keywords,
  • Special searching or directory mechanisms, etc.

3
Internationalization and Users
  • Users typically do not want internationalization
    (or multilingual capability) but
  • Systems that are localized adapted to their
    particular
  • Language
  • Writing system and character codes
  • Location
  • Interests
  • Internationalization is
  • A means to localization
  • Necessary given the global nature of the Internet

4
Internationalization and the Internet
  • Consideration given to international characters
    in the 1970s
  • Character set standards werent ready
  • Project that led to MIME
  • multimedia email capability
  • initiated largely to standardize and permit
    non-ASCII characters
  • Web
  • Recognized requirement early
  • Details only for Western European languages until
    mid-90s
  • All were done by tagging
  • Tagging is consistent with localization approaches

5
DNS Internationalization
  • Tension between
  • Network-facing identifier
  • User-facing name (of a company, product,
    organization,)
  • Constraints on solutions
  • Short label strings no reasonable way to tag
  • Uniqueness of names
  • Potential for confusion or fraud
  • Requirement for non-ASCII names is clear but
  • Caution is in order many possible traps and
    risks
  • Hard to go back if too permissive

6
Why Look at this Now
  • Many opportunities for confusion
  • Some national regulation may be in order.
  • Some real technical constraints
  • Assuming the DNS is like something else and
    proceeding on that basis can be problematic
  • Good tutorial in forthcoming US National Academy
    of Engineering/ National Research Council report
  • Much better to think through things now than to
    try to undo or redo later.
  • Easier and safer to adopt narrow rules and then
    expand as understanding grows than to try to
    restrict what was previously permitted.
  • Advice Permit only what you fully understand and
    need.

7
IETF Encoding Standards or Local Variations
  • Utility and spread of the Internet has depended
    critically on
  • End to end connectivity
  • Internet hosts can reach each other or understand
    why not
  • DNS integrity
  • Any DNS reference means the same thing,
    worldwide
  • There is huge flexibility within the existing
    standards for per-zone (local/national) policies
    and decisions.
  • A country that adopts its own protocols or DNS
    string interpretations is likely to isolate its
    businesses and users from global connectivity and
    global markets.

8
The History of the LDH Name
  • Concerns in the 1970s about user confusion and
    transcription from non-computer forms
  • Eliminating, where possible, characters that
    could be confused when written
  • Hence
  • Case-insensitive
  • Prohibition of _ could be confused with -
  • Prohibition of national use character positions
  • Resulted in host name rules letters, digits,
    and hyphen
  • Host name rules are about
  • What can be registered in a zone
  • Applications restrictions
  • Ultimately not the DNS technology, which can
    store binary strings with few restrictions.

9
Why Internationalize Domain Names
  • Important concern about people using their own
    languages and characters
  • Use of domain names in interfaces by end users,
    not just as system/network identifiers.

10
Representing Unicode/ ISO10646
  • No tagging equals no national character sets
  • Unlike applications (such as the web), no room in
    DNS for character set tagging, so a
    comprehensive, universal character set UCS--
    is a requirement
  • More characters, mixing scripts
  • Many opportunities for problems from look-alikes
    that were not present in ASCII alone
  • Ambiguities about
  • Scripts
  • Case-matching
  • Unification

11
Applications International Characters
  • Most Internet application protocols defined for
    ASCII, or at least seven-bit characters
  • Often not an accident or ignorance consider use
    of IA4 and IA5 in many ITU Recommendations
  • Waiting for applications to be upgraded could
  • Be a long wait
  • Involve some unpredictability with sender not
    knowing receiver capabilities
  • Plug-ins and patches do not yield a consistent
    user experience

12
The IETF IDNA Standard
  • Internationalizing Domain Names in Applications
  • Some mappings within Unicode
  • Normalization of different ways to represent some
    characters
  • Mapping of some similar or identical characters
  • Some case-mappings
  • Some forbidden characters/ code points
  • But many issues not addressed
  • Encoding Unicode characters into LDH form for the
    DNS
  • The xn string
  • Applications and character representations.

13
Current Status IETF
  • IDNA complete and awaiting more implementation
    and user experience
  • General recognition that additional registration
    restrictions are needed but
  • IETF is not going to specify it
  • Unlike LDH, seen as a per-zone problem
  • EPP Registrar-Registry protocol
  • Can accommodate internationalized names
  • Some registry decisions about extensions
    registrars may not be able to use same techniques
    with different registries
  • Preliminary efforts underway (no working groups
    yet) on
  • Fully-internationalized URIs (IRIs)
  • Email addresses

14
Technology Developments
  • Several browser plug-ins
  • No known implementations in widely-available
    general-purpose browsers or other applications
    yet.
  • DNS diagnostic tools (nslookup, dig, etc) not yet
    upgraded to permit entry/display of Unicode
    strings.

15
Current Status ICANN
  • Prohibition/ recommendation against labels
    starting with two characters and two hyphens if
    they are not IDNA strings.
  • Recommendation established just before Montreal
    meeting
  • Specifies language-based registry restrictions
    but no details
  • Agreed to by CJK registries
  • Some resistance from some gTLD registries
  • Growing feeling that it will need some revision,
    but no plan about how and when to do one.
  • Continuing uncertainty about gTLDs

16
The Meaning of Language
  • JET, ICANN, etc., use the term language to
    describe tables and rules.
  • Not the normal usage
  • Really Zone-Language-Script
  • No one really knows what the limits of a
    language are, although governments can make
    decisions within their territories.
  • Scripts actually overlap in strange ways.
    Neither Unicode Consortium nor ISO have been able
    to define scripts associated with particular
    languages
  • E.g., for some zones in Western Europe the
    appropriate language-script is generic
    European, i.e., Latin-1. For others, more
    specific lists of characters may be needed.

17
Look-alike Character Confusion
  • Much focus so far on CJK
  • Characters based on Chinese Han writing
  • Making differently-encoded or different-appearing
    characters match
  • Alphabetic language problem may turn out to be
    harder
  • Common origins ?
  • Avoiding having similar-looking, but distinct,
    characters confused.
  • Not new 1 and l, 0 and O
  • USA
  • pectopan

18
JET Guidelines and Their Extensions
  • Per-zone, per-language restrictions on
    registration
  • The idea of a variant character and IDN Package
  • Mixed-script labels are
  • Particularly good opportunities for deception
  • Sometimes useful
  • Well-defined (now) for CJK, but
  • Alphabets may be harder
  • Particularly difficult issues with
    Roman-Greek-Cyrillic (pecopan, EAH,)
  • Labels as words in a language
  • Not a traditional approach excludes fanciful
    labels
  • Dictionary lookup is an approach, but may cause
    other problems.

19
Major Issues
  • Multilingual strings
  • Labels and names
  • Variant charging in JET-like models
  • Cost of a reserved label
  • Cost of activation given that the label has no
    value to anyone else
  • DNS as an administrative hierarchy
  • New types of conflict/ dispute problems

20
Technical Interoperability
  • IDNA is entirely a client algorithm and
    procedure, hence depends on correct client
    implementations.
  • Plug-ins may help, but only with specific
    applications.
  • Open source development effort being put
    together.
  • JET Guidelines and similar approaches are
    registry-dependent
  • Do not raise interoperability issues.
  • May raise user experience ones

21
Administrative Hierarchy Issues
  • Policy and trust relationships
  • No cross-tree cross-references to branches of
    hierarchy
  • Organizational branding
  • http//www.product.tld/ or
  • http//www.organization.tld/product

22
New Dispute and Resolution Issues
  • ICANN-WIPO UDRP assumes
  • Homogeneous scripts and language characters
  • Conflicts about rights to identical names
  • but not
  • Labels constructed from line or box-drawing
    characters
  • Look-alike characters and strings from different
    scripts
  • Translations, transcriptions, transcodings
  • Is the relevant name the IDNA encoding or its
    display/presentation form?

23
Problems IDNs Dont Solve
  • Registration policy issues
  • This language is more important
  • The gTLD problem
  • Applications and local character sets
  • Even JET Guidelines wont eliminate confusion
  • DNS is a poor search mechanism and getting
    worse.

24
The Whois Policy Issues
  • Registration in non-ASCII and data in ???
  • Searching of a multilingual/ multiscript database
  • Reading the records
  • Information about variants and IDN Package
    contents

25
Economics
  • Domain Name Market has collapsed.
  • Original success projections and the ICANN
    Seven
  • Notions of a profitable monopoly over
    multilingual TLDs do not seem to be going
    anywhere.

26
Competition and Policy
  • Policy tradeoff between
  • More flexibility of registrations
  • Less risk of conflicts, deception, or fraud
  • Each domain or zone will need to develop its own
    policy, and there will probably be wide
    variations.
  • So-called ML.ML introduces complex questions of
    allocations
  • essentially independent TLDs.
  • ICANN policy so far apply separately, no
    rights to added domains
  • Implications of a country deciding to go its own
    way with, e.g., local character codings.

27
The Path Forward with IDNs
  • Implementation on a per-zone and per-application
    basis
  • Development of new dispute resolution policies
  • Discovery of new interface, confusion, and
    user-level interoperability problems
  • What do you do with a domain name in a script you
    cant read or write?
  • The two hundred-sided business card
  • Local character codings and Unicode mapping
  • Running out of those names too
  • DNS name guessing is not a good search/location
    procedure in a growing network.

28
Where Will We End Up
  • Increasing use of search engines??
  • Clear trend, but
  • As the Internet gets larger, general-purpose free
    text ones may have already peaked
  • Increasing use of interest-specific portals??
  • DNS labels, IDNA strings, or label translation
  • The idea of unique keywords
  • More separation of searching or locating and
    retrieval??
  • Stable-reference URIs ?
  • Information retrieval experience and bookmarks
  • Deliberately-populated directories??

29
Conclusions
  • IDN deployment is starting and will succeed, but
  • Registries, application developers, and users
    have a lot to learn
  • Except in special cases, early user experiences
    may not be wonderful.
  • An Internet that is
  • Optimized to local language and culture
  • Globally accessible and useful
  • may not be easily attained
  • IDNs are, at best, a useful tool in effective
    localization and use in user languages and
    scripts.

30
Balancing Localization, Internationalization, and
User Experience
  • Probably requires going beyond the DNS
  • May require an Internet presentation layer
  • Rethinking, not just patching
  • Ways to find information
  • Ways to remember what was found and accessing it
    again
  • Thinking about things librarians have known for
    centuries
  • New ideas about user interfaces
  • Translation of a good French-language-oriented
    interface into Chinese or Arabic may not produce
    a good Chinese or Arabic interface.

31
Selected Further Readings
  • Role of the DNS RFC 3467
  • IDNA RFCs 3490, 3491, 3492, 3454
  • Unicode evolution and stability
  • draft-faltstrom-unicode-synchronization-00.txt
    (forthcoming)
  • JET Guidelines for CJK, applications to other
    scripts
  • draft-jseng-idn-admin-04.pdf
  • draft-xdlee-idn-cdnadmin-00.txt
  • draft-klensin-reg-guidelines-00.txt
  • draft-hoffman-idn-reg-00.txt
  • Tradeoffs between labels and translations
  • draft-klensin-idn-tld-00.txt
  • Issues with domain names in unexpected forms
  • draft-klensin-name-filters-02.txt
  • Alternate searching and retrieval models
  • draft-klensin-dns-search-05.txt
  • draft-mealing-sls-02.txt
  • ICANN Policy Statements and Recommendations
  • IDN deployment statement http//www.icann.org/ann
    ouncements/announcement-20jun03.htm
  • More generally http//www.icann.org/document-name

32
Finding these documents
  • RFCs
  • ftp//ftp.rfc-editor.org/in-notes/rfcNNNN.txt
    (NNNN is the RFC number)
  • Internet-Drafts (draft-xxxx)
  • http//www.ietf.org/internet-drafts/document-name.
  • Note that these documents are transient and that
    the two-digit number is a version number. If the
    version cited is not found, try a higher number
    or the search engine at http//www.ietf.org/ID
Write a Comment
User Comments (0)
About PowerShow.com