XML Tutorial - PowerPoint PPT Presentation

About This Presentation
Title:

XML Tutorial

Description:

Create new tags used by your application, or use META , DIV, and CLASS (and ... Alternatively, use XML style sheet to create HTML-like presentation(s) ... – PowerPoint PPT presentation

Number of Views:446
Avg rating:3.0/5.0
Slides: 46
Provided by: bobgl
Category:
Tags: xml | create | tutorial

less

Transcript and Presenter's Notes

Title: XML Tutorial


1
XML Tutorial
2
Outline
  • Todays web Created by hand for-eyes-only
  • Can HTML become smarter?
  • SGML -gt XML
  • The next generation web XML and component-based
    commerce
  • Prologue XML and EDI

3
A Web Created by Hand for Eyes
  • Much of the web is hand-crafted
  • HTML often exploited and extended to achieve
    specific layout and formatting
  • HTML has too low an Information IQ to enable
    many desirable applications

4
The Limits of Hand-crafting
Time to Convert Word Processing Documentand
Apply HTML Markup (minutes/page)
Number of Pages
1 10 60
10
10 minutes 100 minutes 10 hours
100 minutes 16.67 hours 12.5 days
100
16.67 hours 20.83 days 4.17 months
1000
10000
20.83 days 6.94 months 3.47 years
100000
6.94 months 5.79 years 34.72 years
5
Low vs. High IQ Encoding
  • What information can be encoded?
  • How adaptable or flexible is the format for
    encoding style, structure, or markup?
  • Can the format tell you what it encodes?
  • ASCII is very low IQ only character info
  • SGML is highest IQ encodes anything and
    completely specifies the encoding rules
  • PDF? HTML?

6
HTML is too low in IQ
  • HTML was designed as a simple markup language
  • simple structures headings, lists, links
  • strong emphasis on formatting
  • weak for encoding content
  • HTML wasnt designed to encode the structure and
    semantics needed for complex applications

7
Web Applications That Need Smarter Data
  • Data interchange between Web clients
  • Moving processing from server to client
  • Multiple client-side views w/o new data
  • Information push from personalized applications

8
Can HTML be made smarter?
  • Create new tags used by your application, or use
    ltMETAgt, DIV, and CLASS (and hope they dont
    interfere elsewhere)
  • Use a standard metadata model (but which one?
    Dublin Core, PICS, P3, OPS,)
  • Hide applet code in comments (platform
    dependent?)
  • Hack, hack, hack...

9
Inherent Limitations of HTML
  • Not extensible
  • Limited capability to encode structure
  • No validation
  • Lossy interchange

10
XML
  • Extensible Markup Language - a standard way of
    creating markup languages for the Web
  • a file format for data representation
  • a schema for describing data or message
    structures
  • a mechanism for extending and annotating HTML
    with semantic information
  • XML is a simplification of SGML, the Standard
    Generalized Markup Language
  • easier to understand and implement

11
HTML Apartment Listing
  • ltHTMLgt
  • ltHEADgt
  • ltTITLEgtAn Apartment For Rentlt/TITLEgt
  • lt/HEADgt
  • ltBODYgt
  • ltH1gtApartmentlt/H1gt
  • ltPgt1800 square feet, 3 bedrooms, 7 baths.
  • ltH2gtNo pets, smoking forbidden!lt/H2gt
  • ltH3gtAmenitieslt/H3gt
  • ltPgt
  • Sunny location, good view, has air-conditioner.
  • ltH3gtLocationlt/H3gt
  • ltPgt2008 South E. Avenue, Eureka, CA
  • ltH3gtCost, Etc.lt/H3gt
  • ltPgtPrice 3600 a month
  • ltPgtContact (415) 123-4567
  • ltPgtAvailable immediately
  • ltPgtThis offer posted 1 August 1997 in the Eureka
    Daily Times
  • lt/BODYgt

12
An XML Apartment Listing
  • lt?XML VERSION1.0?gt
  • lt!DOCTYPE APTLISTING SYSTEM APTLISTING.DTDgt
  • ltLISTINGgt
  • ltADINFOgt
  • ltPOSTEDgtMarch 26, 1997lt/POSTEDgt
  • ltWHERE_POSTEDgtBelmont Courierlt/WHERE_POSTEDgt
  • ltCONTACTgt(650) 111-2222lt/CONTACTgt
  • lt/ADINFOgt
  • ltDESCRIPTIONgt
  • ltAREAgt1400 SQUARE FEETlt/AREAgt
  • ltAMENITIESgt1 bedroom, 1 bathroomlt/AMENITIESgt
  • ltCOMMENTgtSmall cottage in a big
    forestlt/COMMENTgt
  • lt/DESCRIPTIONgt
  • ltPOLICIESgt
  • ltPETSgtNot allowedlt/PETSgt
  • ltBOZOSgtNot allowedlt/BOZOSgt
  • lt/POLICIESgt
  • ltCOSTgt875lt/COSTgt
  • lt/LISTINGgt

13
But First One Minute SGML
  • Standard Generalized Markup Language, ISO 8879
  • SGML defines the markup language that specifies
    the logical rules for a given type of document
  • Markup transforms a flat stream of text into a
    set of objects or elements that can be
    manipulated by other applications
  • Since there is no universal tag set that can
    describe all documents, SGML provides the means
    for defining the tag set that meets your needs

14
SGMLs Big Idea Document Types
  • Idea of document type easy to understand
  • The Document Type Definition or DTD defines
  • the class of documents that shares a common
    information model
  • permissible elements and attributes, their
    contents, the order in which they occur
  • The DTD is the document schema that makes an
    instance self-describing
  • From a DTD a parser can be generated to test any
    document for conformance

15
Examples of Document Types
  • User manuals
  • Reference manuals
  • Directories
  • Newsletters
  • Brochures
  • Catalogs
  • Datasheets
  • Proposals
  • Dictionaries
  • Technical reports
  • Contracts
  • Regulations
  • Policies and procedures
  • Journal Articles
  • Textbooks
  • Purchase Orders
  • Invoices
  • Recipes

16
HTML as a Document Type
  • HTML can be described as an application of SGML -
    the HTML document type
  • Simple structures headings, lists, links
  • Strong emphasis on formatting, weak for encoding
    content
  • Not designed to encode the content distinctions
    for any particular industry or application
  • But most HTML doesnt conform to the HTML DTD

17
Designing a DTD
  • Determine information requirements, purposes,
    uses (and their priorities)
  • deliver in one or more print and online formats
  • create new information products
  • interchange with other authors or publishers
  • integrate information into equipment
  • meet company, industry, customer standards

18
Designing a DTD
  • Determine process, tool, external constraints or
    standards
  • Identify and name information components and
    component containers
  • Create categories to organize the components
  • Determine when, where, how often components appear

19
Designing a DTD
  • Identify meta-information to augment the
    information components
  • bibliographic information
  • process and workflow-related information
  • Describe the component hierarchy in a graphic
    notation to visualize it
  • Transcribe the graphic notation into formal
    syntax
  • Test the analysis on sample documents
  • Document the process and the results

20
SGML Close, but no Cigar
  • SGML has been successful in niches, but hasnt
    been adopted by rank-and-file Web publishers
  • the quiet revolution
  • the million dollar secret
  • Perceived as too complex (because of features
    dating from keystroke-minimizing origins)
  • Small vendors didnt have the clout to legitimize
    SGML in the mass market (but some of them
    cleverly dumbed-down their tools for HTML)

21
XML Right Place, Right Time
  • Looks like HTML, but acts like SGML--
  • Backed by
  • World Wide Web Consortium (W3C)
  • Sun - give Java something to do
  • Microsoft - with great enthusiasm
  • Netscape - with less enthusiasm
  • SGML tool vendors and consultants
  • Innovators in EDI community

22
Specific XML Proposals to Simplify SGML
  • All elements have start and end tags
  • All attributes are namevalue
  • Changed syntax for EMPTY elements
  • lttocgt gt lttoc/gt
  • ltgraphic filex.gifgt gt ltgraphic
    filex.gif/gt
  • No connector in content models
  • No inclusions and exclusions
  • DTD not necessary because it can be inferred if
    instance is well-formed

23
XML Adoption Scenarios
  • The transition from the Web for eyes to the
    automated Web
  • 1st generation XML leaves HTML alone
  • 2nd generation HTML as output format created
    from XML instance
  • 3rd generation XML repositories

24
1st Generation XML
  • No disruption of existing HTML production
    processes
  • XML production process may have nothing to do
    with HTML production process
  • XML for processes, HTML for eyes, but XML and
    HTML can be linked together

25
1st Generation XML Leaves HTML as is
DELIVERY
CREATION
XML
conversion to XML
data source
conversion to HTML
HTML for eyes
26
2nd Generation XML
  • Creation of XML is primary process
  • Replace hand-crafted HTML with automated
    down-translation
  • Alternatively, use XML style sheet to create
    HTML-like presentation(s)
  • instance at a time retargeting

27
Up Down Translation
Content/structure-based text objects SGML, XML,
databases
Formatted electronic text HTML, word processing
files
Easier to translate to
Unstructured electronic text ASCII
More structure (energy)
Printed text
28
2nd Generation XML Restores Order
XML
down translate
HTML
XML source
data source
conversion to XML
down translate
down translate
HDML
XML style sheet(s)
HTML- like
29
HTML as an Output Format
  • Treating HTML as an output format generated from
    an SGML source repository insulates you from
    ongoing changes to HTML and the latest
    proprietary extensions
  • HTML created by down translation can be richer
    in structure and more consistent that HTML
    created by hand at many times the cost

30
3rd Generation XML
  • reuse, not just retargeting
  • XML a first-class citizen from the start
  • content-oriented DTD
  • native authoring, or enhanced markup by editorial
    or production staff
  • no longer file at a time, create db and work on
    it
  • support for custom applications

31
3rd Generation XML Repository
Input 1
Output 1

X M L
Input 2
Output 2
up- translation or decom-position
down- translationor assembly
Input 3
Output 3
Output 4
Input 4
32
Retargeting and Reuse Requirements
  • different delivery channels
  • Web
  • CD-ROM, CD-ROM Web hybrids
  • Braille, large print, voice synthesis (ICADD)
  • different dialects of HTML for different
    browsers or bandwidths or as HTML changes
  • different applications (slice and dice)
  • reference manual vs help vs tutorial

33
XML for the Webs Little Languages
  • CDF -- channel definition format, eliminates
    need for proprietary push plug-in
  • OSD -- open software description, for
    describing configurations for automated
    distribution of software
  • PICS -- for content ratings
  • RDF -- resouce description framework, merging
    Netscape and Microsoft metadata initiatives
  • CBL -- common business language in eCo framework

34
The Next-Generation Web
PROBLEMS
SOLUTIONS
Metadata and Object APIs -- self-describing
smart Web
The Web is eyeballs-only
No content encoding
Web catalogs and documents in their native
schema
Distributed registries and structure-based
retrieval
Things cant be found
No automation of tasks
Agent-based run-time environment
35
Infrastructure Requirements
  • A means of transforming legacy Internet services
    into components
  • Todays services are accessed through browsers or
    ad hoc APIs
  • An extensible semantic framework for component
    integration
  • Heterogeneity and lack of standards
  • A scalable, distributed indexing structure and
    registry services for components
  • Things cant be found systematically
  • An agent-based execution environment
  • No run-time integration or automation of tasks

36
The Internet Today
Database
FTP Server
Application
Web ServerDocuments
Web ServerDocuments
Web ServerDocuments
Application
Database
37
A Commerce Type Definition (CTD)
  • lt!Doctype Taxonomy public "-//CommerceNet//DTD
    Taxonomy V1.0//EN"gt
  • ltTaxonomygt
  • ltHeadgt
  • ltLabelgtUnited Airlineslt/Labelgt
  • ltVersiongt1.0lt/Versiongt
  • ltBasegtWorld Airline Registry1.12.3.7lt/Basegt
  • ltRegistrygttoe.commerce.net2111lt/Registrygt
  • lt/Headgt
  • ltBodygt
  • ltServicesgt
  • ltPassenger_Flight_Informationgt
  • ltFlight_NumbergtUA 200lt/Flight_Numbergt
  • ltFlight_Price USgt168.50lt/Flight_Price USgt
  • ltFlight_DestgtHonolulu, Hawaiilt/Flight_Destgt
  • lt/Passenger_Flight_Informationgt
  • ltCargo_Flight_Informationgt
  • lt/Cargo_Flight_Informationgt
  • lt/Servicesgt
  • lt/Bodygt

38
Step 1 XML Metadata
CTD
CTD
CTD
Database
FTP Server
Application
CTD
CTD
Web ServerDocuments
CTD
Web ServerDocuments
CTD
CTD
Web ServerDocuments
Application
Database
39
Step 2 Registries
CTD
CTD
CTD
Database
Registry
FTP Server
Application
CTD
CTD
Registry
Web ServerDocuments
CTD
Registry
Web ServerDocuments
CTD
CTD
Web ServerDocuments
Application
Database
Registry
40
Common Business Language (CBL)
  • Who am I?
  • Company name, contact, public key certificates
  • What am I?
  • Agent/object (API), document (DTD), database
    (schema)
  • Available data
  • Product list, price list, terms and conditions,
    catalog, order form
  • Available services
  • Buy, sell, RFQ, search catalog

41
Step 3 CBL Components
CTD
CTD
CTD
Database
Registry
FTP Server
Application
CTD
CTD
Registry
Web ServerDocuments
CTD
Registry
Web ServerDocuments
CTD
CTD
Web ServerDocuments
Application
Database
Registry
42
Step 4 Agents
CTD
CTD
CTD
Agent
Database
Registry
FTP Server
Application
CTD
CTD
Agent
Registry
Web ServerDocuments
CTD
Registry
Web ServerDocuments
CTD
CTD
Agent
Web ServerDocuments
Application
Database
Registry
43
Step 5 Business Services
Matchmaking Services
CTD
CTD
CTD
Agent
Database
Registry
FTP Server
Application
CTD
CTD
Agent
Registry
Web ServerDocuments
CTD
Registry
Web ServerDocuments
CTD
CTD
Agent
Web ServerDocuments
Application
Database
Trust Intermediaries
Registry
44
Wrapping Up
  • HTML will continue to exist, but most serious
    publishers will produce HTML and XML versions of
    their content from the same smarter source
  • XML unifies document and database perspectives
    and tools for Web publishing and lets them be
    automated in the same way

45
Prologue XML and EDI
  • XML appeals to the EDI community because
  • it reinforces the move to Internet EDI
  • it suggests a way to make transaction sets easier
    to define and self-describing
  • But which kind of XML/EDI?
  • incremental strategy of wrapping existing EDI
    transactions in XML syntax
  • radical re-thinking of EDI to create XML
    fragments for transaction components that are
    dynamically combined as needed

46
Learning More
  • The mother of all information about XML is the
    SGML Home Page - www.sil.org/sgml/xml.html
  • Best overall book for managers to get started
    with SGML and XML is ABCDSGML by Liora Alschuler
  • Best overall book for HTML-savvy types is SGML on
    the Web by Yuri Rubinsky Murray Maloney
Write a Comment
User Comments (0)
About PowerShow.com