Beyond HTML: Extensible Markup Language - PowerPoint PPT Presentation

About This Presentation
Title:

Beyond HTML: Extensible Markup Language

Description:

A Definition of Text in Computer Terms. Premise: A Text is the Sum of its Components ... Vervet Logic's XMLPro. Extensibility's XML Authority / XML Turbo ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 21
Provided by: timoth70
Category:

less

Transcript and Presenter's Notes

Title: Beyond HTML: Extensible Markup Language


1
Beyond HTMLExtensible Markup Language
  • Timothy W. Cole
  • Grainger Engineering Library Information
    CenterUniversity of Illinois at Urbana-Champaign
  • American Association of Law Libraries19 July
    2000
  • t-cole3_at_uiuc.edu
  • http//dli.grainger.uiuc.edu/Publications/TWCole/A
    ALL_2000/

2
Ordered Hierarchy of Content ObjectsA Definition
of Text in Computer Terms
  • Premise A Text is the Sum of its Components
  • So a ltBOOKgt Could Be Defined as
    ContainingltFRONT_MATTERgt ltCHAPTERgts
    ltBACK_MATTERgt
  • ltFRONT_MATTERgt Could ContainltBOOK_TITLEgt
    ltAUTHORgts ltPUBLISHERgt
  • While Each ltCHAPTERgt Could ContainltCHAPTER_TITLE
    gt ltSECTIONgts
  • And Each ltSECTIONgt Could ContainltSECTION_TITLEgt
    ltPARAGRAPHgts
  • Components Chosen Reflect Anticipated Use

3
Ordered Hierarchy of Content Objects(continued)
  • OHCO is a Useful, Albeit Imperfect Model
  • More Powerful Than Model of Text as a Stream of
    Characters Formatting Instructions
  • Does Not Allow for Overlapping Content Objects
  • OHCO Model is Inherent in XML, HTML
  • XML Designed for Descriptive Content Objects, Not
    Presentational Content Objects
  • XML Syntax is Fixed, But Semantics is Extensible

4
XML Basics Markup Content
  • Consider Would Display Aslt?xml
    version'1.0' ?gt Colè, Tim lt!-- This is
    an Example --gtltauthor sequence'first'gtltLNamegt
    Colegrave lt/LNamegt,ltFNamegt Tim lt/FNamegt
    lt/authorgt
  • This example illustrates
  • XML Processing Instructions
  • XML Comments (Ignored by XML Applications)
  • XML Element Markup, Including an Attribute
  • XML Content, Including an Entity

5
XML Basics (continued)
  • Well-Formed XML Rules
  • XML Element Markup is Case-Sensitive
  • All XML Tags Must Be Closed
  • Hierarchical Nesting No Overlapping Elements
  • All XML Attribute Values Must Be Quoted
  • Enforces Stricter Syntax than HTML
  • Facilitates Fast, Efficient Parsing
  • Extensible Semantics Provide Flexibility
  • Well-Formed More Lightweight Than SGML

6
Is It Valid Or Well-Formed?When Does It Matter?
  • All Web Browsers Need Is Well-Formed
  • XML Authoring Tools Need To Validate
  • Otherwise Tower of Babel Ensues
  • Indexing Agents Schema-Specific Rendering
    Agents May Need To Validate
  • Illustrations
  • Malformed XML
  • Well-Formed But Invalid XML
  • Valid XML

7
Library Uses of XMLUsing XML for Primary Sources
  • Facilitates Searching
  • Full-Text Searching Field-Specific Searching
  • More Meaningful Proximity Searching
  • Better Retrieval / Browsing
  • Selective Views / Suppression of Personal Data
  • Re-Ordered Piecemeal Views
  • Illustration -- Illinois Agronomy Handbook
  • Search
  • Browsing

8
Library Uses of XMLXML for Metadata Wrapping
  • Facilitates Interchange, Normalization, ...
  • Simpler than Fixed Fields, Record Headers, Etc.
  • XML Implementations of Metadata Standards, e.g.
    RDF, EAD, DC, FGDC, US-MARC
  • Easier Routing / Handling of Specialized Content
  • In Combination with Primary Source XML
  • Automatic Extraction of Metadata From Source
  • Facilitates Authority Control

9
Library Uses of XML XML for Document Management
  • Smarter Documents
  • XML Namespaces -- Integrating Multiple XML
    Schemas (Including XHTML)
  • Rights Management, Technical Requirements,
  • Facilitates Enhanced Linking Between Docs.
  • Creation of Links From Marked Up Content
  • Easy to Add or Modify Links Over Time
  • XLink XPointer Promise More Robust Linking
  • Metadata File from Illinois DLIB Testbed
  • Schema Integrates RDF, DC, Project Design

10
Components of XML ImplementationsDTDs XML
Schemas
  • Use Either to
  • Define Content Models
  • Declare Attributes Entities
  • DTDs Inherited from SGML
  • DTDs Themselves Not Well-Formed XML
  • Limits on Detail of Content Model Definitions
  • Minimal Data Typing
  • XML Schemas Are Well-Formed XML
  • Data Typing Better Content Models Supported
  • Not Yet in Widespread Use

11
Components of XML ImplementationsEncoding
Entities(Using Characters Not on Your Keyboard)
  • Computers Use 1s and 0s, but Characters form the
    Basis of Human-Readable Texts
  • Coded Character Sets (CCS) Assign Integer Values
    to Characters -- ASCII, ISO 8859, Unicode
  • Character Encoding Schemes (CES) Map Those
    Integers to Bytes -- 7-bit, 8-bit, UTF-8
  • Bytes Are Then Rendered as Glyphs by Your
    Computer, Using Font Appropriate to CCS/ CES
  • Font Unavailable Or CCS/CES Misunderstood Results
    in Incorrect Character(s) on Screen

12
Components of XML ImplementationsEncoding
Entities (continued)
  • Common Ways to Deal With This Problem
  • Select CCS/CES Appropriate to Language
  • Use Default CCS/CES, but Override Default Font
  • Use XML/HTML Named or Numeric Entity
  • HTML Understands Non-Extensible Set of Named
    Entities
  • XML Understands Numeric Entities Corresponding to
    Unicode CCS, All Named Entities Must Be Declared
    in DTD
  • Use Unicode for CCS, UTF-8 for CES - XML Defaults
  • An Illustration in HTML

13
Components of XML ImplementationsPresentation -
CSS Style Sheets
  • XML Content Objects Have No Style
  • Use Cascading Style Sheets (CSS)Work Like CSS
    for HTML, Except
  • Must Be Explicit About Everything
  • No Special Treatment of Class ID Attributes
  • Attach CCS to XML Using Special XML PI
  • CSS Does Define Formatting
  • CSS DOES NOT Reorganize or Add Content
  • Simple XML-CSS Example The CSS Used

14
Components of XML ImplementationsTransformations
- XSLT Style Sheets
  • Some Characteristics of XSLT Style Sheets
  • XSLT Files Are Well-Formed XML
  • XSLT Transform to Another Schema, Or to XHTML
  • XSLT Objects Have Implicit Functionality
  • Attach XSLT To Document Using XML PI
  • XSLT Can Reorganize Add Content
  • Still Need CSS for Presentation -- CSS Style
    Sheets Work on the Output of XSLT Processing
  • Supplement XSLT With Script To Manipulate
    Modify Actual Content
  • Simple XSLT Example The XSLT Style Sheet

15
The State-of-the-Art in XML Tools
  • XML Authoring
  • Add-Ons to Established Word Processors,
    e.g.WordPerfect 9 / WordPerfect 2000
  • Tools With SGML Roots, e.g.ArborTexts Epic
    (was Adept) EditorSoftQuads XMetaL Editor
  • New XML Tools, e.g.Vervet Logics
    XMLProExtensibilitys XML Authority / XML Turbo
  • So Far, There Are Fewer Authoring Tools
    Customized for Specialized XML Schemas

16
The State-of-the-Art in XML Tools (continued)
  • XML Presentation Tools
  • Latest Releases of Netscape Navigator/Mozilla,
    and Microsofts Internet Explorer Support XML--
    But Support is Generic, Partial, Uneven
  • Plug-Ins, Standalones Available / In Work for
    Advanced XML Schemas (CML, MML, VML,)
  • XML Database Integration Tools
  • Add-Ons to Established DBMS Available/In
    WorkMicrosoft SQL Server-XML Technology Preview
  • Illustration With Query CSS XML Source File
  • XML Query Language Specification In Work

17
Developing XML ApplicationsThe Politics of XML
  • Evolution of XML
  • XML Formalized as W3C Recommendation 2/98
  • Numerous Ancillary Specs Released In
    WorkNamespaces, XSLT, XLink/XPointer, XML
    Signature
  • Numerous Early Implementors(Chemistry, Biology,
    Multimedia, Metadata)
  • Prerequisites for Community Implementations
  • Identify Target(s) of Opportunity
  • Define Horizontal Vertical Content Objects
  • Consensus Building Community Buy-In
  • Test Implementations Tool Building

18
Developing XML ApplicationsThe Politics of XML
(continued)
  • Status of XML In Legal Community
  • LegalXML Has Identified Targets Begun Process of
    Defining Content Objects Building Consensus
  • Progress in Some Areas, e.g.Court Filing (see
    also XML Court Interface)
  • Less Visible Progress in Other Workgroups,
    e.g.Reference, Public Law, Users
  • Presence ( Vested Interests) of Extensive
    Non-XML Legal Automation Systems In Place Lessens
    Motivation

19
Developing XML ApplicationsThe Politics of XML
(continued)
  • Status of XML In Publishing Libraries
  • Extensive XML Work in MetadataUnfortunately Has
    Led to Competing Stds.
  • Many Publishers Have Been Using SGML for a Decade
    or More -- But Only Internally
  • Perceived Tradeoff (probably overrated)Publicly
    Releasing Primary Sources in XML vs.Control of
    Product Marketplace
  • Problems with Early SGML Web Experiments
  • No One Wants to be FirstBut No One Wants to be
    Last Either

20
Future Directions
  • Continued Evolution of Standards, Tools
  • Continued Development of Community
    Implementations -- Selected Disciplines
  • Increased Use of XML Behind the Scenes
  • Carryover from SGML Trends
  • Integration of XML with Databases
  • XML Unlikely to Replace HTML, Other Document
    Formats, But Will Co-Exist
  • Magnitude of Role in Law Libraries Uncertain, but
    Likely to Have At Least Some Role
Write a Comment
User Comments (0)
About PowerShow.com