XML - PowerPoint PPT Presentation

About This Presentation
Title:

XML

Description:

XML An introduction xml XML like HTML is created from the Standard Generalized Markup Language, SGML A brief introduction to XML: A simple xml doc – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 48
Provided by: higg2
Category:
Tags: xml

less

Transcript and Presenter's Notes

Title: XML


1
XML
  • An introduction

2
xml
  • XML like HTML is created from the Standard
    Generalized Markup Language, SGML

3
A brief introduction to XML A simple xml doc
  • lt?xml version 1.0?gt
  • lt! a simple xml examplethis is a comment --!gt
  • ltmymessagegt
  • ltmessagegtWelcome to XML!lt/messagegt
  • lt/mymessagegt

4
In validator file is in examples\ch05\intro.xml
5
XML documents and format
  • An XML document contains data, not formatting
    information. As well learn, there are ways (xsl
    and fo files, for example) to provide formatting
    for xml analogous to that in which css provided
    formatting for html.

6
XML
  • XML are typically stored in a file with suffix
    .xml, though this is not required. They can be
    created with any editor (save as ASCII text).
    Many packages like MS Word can save files as type
    .xml
  • An xml document contains a single root which
    contains other elements, Anything appearing
    before the root is called the prolog. Elements
    directly under the root are its children. The
    structure is recursive.
  • In the example, the roots child message contains
    the text Here is some message.

7
The character set
  • XML characters are CR, LF and Unicode.
  • An XML document consists of markup and character
    data.
  • Markup is enclosed in angle brackets (like html)
    ltgt
  • Character data appears between the start and end
    tag.
  • An xml parser passes whitespace characters to the
    application. Insignificant whitespace can be
    collapsed in a process called normalization.
  • It is a good idea to add whitespace to an xml
    document for readability.
  • , lt, gt, and are reserved characters. An
    entity reference makes it possible to use these
    as characters in the character data part of an
    xml document.
  • Entity references begin with and end with
  • In this way character data is not confused with
    markup.
  • Single and double quote are used to delimit
    attribute values.

8
More on syntax
  • There must be exactly one root.
  • Proper nesting of elements is required.
  • Start tags require close tags.
  • Unlike HTML, the author can define her own tags
    in XML.
  • Tags are case sensitive
  • Parser needs to distinguish markup from character
    data
  • Typically, whitespace is normalized reduced to
    1 whitespace char.
  • Entity references are marked with an ampersand
    and allow us to use meta characters (lt, gt and
    so on) which are part of the language syntax.
  • Entity references (for example, lt) allow us
    to represent and distinguish the reserved
    characters lt,gt, in XML.
  • They may only appear as an entity reference in
    character data

9
XML intro continued
  • A DOM-based parser returns a tree structure. A
    DOM parser must process the entire document to
    create a (java) object which may be 3 or 4X the
    size of the original. Not advisable if there are
    storage size constraints.
  • A SAX (Simple-API for XML) -based parser returns
    events. SAX parsers have a smaller footprint.
  • Many parsers can be downloaded for free and
    several come with java 1.4

10
A brief introduction to XML
  • An xml validator parses an XML document and
    indicates if it is correct.
  • A number of free Validators are available,
    including one from MS which I downloaded and used
    in this ppt.

11
Validator
  • Microsoft provides a validating program free for
    download (with javascript and VBscript versions)
    at
  • http//msdn.microsoft.com/archive/default.asp?url
    /archive/en-us/samples/internet/xml/xml_validator/
    default.asp
  • Or search MSDNvalidator
  • There are others out there
  • http//validator.w3.org/
  • http//www.stg.brown.edu/service/xmlvalid/
  • http//www.w3schools.com/XML/xml_validator.asp

12
Link to validator program on my w drive
  • http//employees.oneonta.edu/higgindm/internet20p
    rogramming/validate_js.htm
  • This is a link for javascript validator
  • http//employees.oneonta.edu/higgindm/internet20p
    rogramming/validate_vbs.htm
  • This is a link for vbscript validator

13
MS Validatorhttp//employees.oneonta.edu/higgind
m/internet20programming/validate_js.htm
14
Parser continued
  • The parser will indicate if the document is
    well-formed.
  • In DOM-based parsing, a in the left margin
    indicates a node has children and a indicates
    all child nodes have been expanded.
  • The MS Validator uses color coding to indicate
    child nodes can be expanded
  • An element that stores other elements is called a
    container element.
  • The parser makes the document content available
    for further processing if it is well-formed.

15
Validator example
16
Validator
17
Reserved characters
  • ltmessagegtltgtamplt/messagegt would enable a
    character data message to contain characters ltgt

18
DTD document type definition
  • a dtd file may contain the definition of an xml
    structure.
  • XML files may refer back to a dtd.
  • If an XML document has a DTD or Schema, a
    validating parser can determine not merely if it
    is well-formed XML, but whether it is valid.
  • Valid means conforming to a dtd or schema.

19
Another example Unicode
  • Lang.xml (next slide) uses unicode entity
    references to represent arabic words.
  • lang.dtd (also shown in a later slide) is used to
    generate unicode characters (arabic) for some
    entity references in the XML file.

20
DTD document type definition a dtd file may
contain the definition of an xml structure.
  • lt?xml version "1.0"?gt
  • lt!-- Fig. 5.4 lang.xml --gt
  • lt!-- Demonstrating Unicode --gt
  • lt!DOCTYPE welcome SYSTEM "lang.dtd"gt
  • ltwelcomegt
  • ltfromgt
  • lt!-- Deitel and Associates --gt
  • 158315751610157816141604
  • 157116061583
  • lt!-- entity --gt
  • assoc
  • lt/fromgt
  • ltsubjectgt
  • lt!-- Welcome to the world of Unicode --gt
  • 15711607160415751611
  • 157616031605
  • 160116101616
  • 1593157516041605
  • lt!-- entity --gt

21
Lang.dtd
  • lt!-- lang.dtd --gt
  • lt!ELEMENT welcome ( from, subject )gt
  • lt!ELEMENT from ( PCDATA )gt
  • lt!ELEMENT subject ( PCDATA )gt
  • lt!ENTITY assoc "15711587161716081588
    161616101614157816181587"gt
  • lt!ENTITY text "15751604161016081606
    1610160316081583"gt

22
Lang.xml in validator
23
Lang.xml in IE
24
About the example
  • The DTD reference contains DOCTYPE, the name of
    the root, the SYSTEM flag indicating the DTD file
    is external, and the name of that file.
  • Root element welcome contains two elements from
    and subject.
  • Some lines contain entity references for unicode.
  • The DTD also defines some other entity references.

25
More about markup
  • XML end tags may consist of /gt if there is an
    empty element as in
  • ltemptyelt xxxx /gt
  • but otherwise must consist of a complete end-tag
    as in
  • ltsometaggt xxxxxxxxxxx lt/sometaggt
  • Elements may or may not have content (child
    elements or character data)
  • Elements may have 0 or more attributes associated
    with them. Attributes appear in the elements
    start tag
  • ltcar doors 4/gt
  • Attribute values must appear in single or double
    quotes.
  • Element and attribute names may not contain
    blanks.
  • Here, element car has attribute doors with value
    4.
  • Attributes may contain any characters and be of
    any length but must start with a letter or
    underscore.

26
Usage.xml uses a stylesheet
  • lt?xml version "1.0"?gt
  • lt!-- Fig. 5.5 usage.xml --gt
  • lt!-- Usage of elements and attributes --gt
  • lt?xmlstylesheet type "text/xsl" href
    "usage.xsl"?gt
  • ltbook isbn "999-99999-9-X"gt
  • lttitlegtDeitelaposs XML Primerlt/titlegt
  • ltauthorgt
  • ltfirstNamegtPaullt/firstNamegt
  • ltlastNamegtDeitellt/lastNamegt
  • lt/authorgt
  • ltchaptersgt
  • ltpreface num "1" pages
    "2"gtWelcomelt/prefacegt
  • ltchapter num "1" pages "4"gtEasy
    XMLlt/chaptergt
  • ltchapter num "2" pages "2"gtXML
    Elements?lt/chaptergt

27
Usage.xls
  • In notes
  • lt? Xxxxx ?gt in usage.xml represents a pi (that
    is, a processing instruction). PI consist of a
    PI target (xmlstylesheet, in this example) and a
    PI value. Note syntax.
  • PI can be used to help authors embed
    application-specific data in an xml document. If
    the application processing the xml doesnt use
    the PI, then it has no effect on the xml document
    content.

28
Usage.xml in validator
29
Usage.XML document loaded into IE Browser uses
stylesheet to generate HTML
30
CData
  • The character data appearing in CData sections is
    ignored by the xml parser.
  • CData might be used for JavaScript or VBScript.
  • CData starts with lt!CData and ends with gt
  • CData may contain reserved characters, but not
    the text gt

31
Text example 5.7
  • lt?xml version "1.0"?gt
  • lt!-- Fig. 5.7 cdata.xml --gt
  • lt!-- CDATA section containing C code --gt
  • ltbook title "C How to Program" edition "3"gt
  • ltsamplegt
  • // C comment
  • if ( this-gtgetX() lt 5 ampamp
    value 0 ! 3 )
  • cerr ltlt this-gtdisplayError()
  • lt/samplegt
  • ltsamplegt
  • lt!CDATA
  • // C comment
  • if ( this-gtgetX() lt 5 value 0 ! 3
    )
  • cerr ltlt this-gtdisplayError()
  • gt
  • lt/samplegt
  • C How to Program by Deitel amp Deitel
  • lt/bookgt

32
CData example from text 5.7
33
Cdata.xml in MS validator (file is in
examples\ch05)
34
letter.xml - I removed blank lines to get it to
fit here
  • lt?xml version "1.0"?gt
  • ltlettergt
  • ltcontact type "from"gt
  • ltnamegtJane Doelt/namegt
  • ltaddress1gtBox 12345lt/address1gt
  • ltaddress2gt15 Any Ave.lt/address2gt
  • ltcitygtOthertownlt/citygt
  • ltstategtOtherstatelt/stategt
  • ltzipgt67890lt/zipgt
  • ltphonegt555-4321lt/phonegt
  • ltflag gender "F"/gt
  • lt/contactgt
  • ltcontact type "to"gt
  • ltnamegtJohn Doelt/namegt
  • ltaddress1gt123 Main St.lt/address1gt
  • ltaddress2gtlt/address2gt
  • ltcitygtAnytownlt/citygt
  • ltstategtAnystatelt/stategt
  • ltzipgt12345lt/zipgt

35
letter.xml in Validator
36
namespaces
  • Naming collisions can occur when xml authors use
    the same tag names
  • Namespaces provide a mechanism for making tag
    references unambiguous.
  • A namespace reference appears with the start and
    end tags followed by a colon. So,
  • ltmoviecharactergtScroogelt/moviecharactergt can be
    differentiated from ltasciicharactergtcolonlt/ascii
    charactergt
  • Namespace prefixes are tied to unique URI in the
    xml document. Almost any name can be used to
    create a namespace prefix.
  • In this example ascii and movie are namespace
    prefixes. Namespace prefixes can precede element
    and attribute values to avoid collisions.
  • A URL may be used for a URI. The only
    requirement though is uniqueness as the URLs are
    not visited by the parser.

37
Namespace example 5.8
  • lt?xml version "1.0"?gt
  • lt!-- Fig. 5.8 namespace.xml --gt
  • lt!-- Namespaces --gt
  • lttextdirectory xmlnstext "urndeiteltextInfo"
  • xmlnsimage "urndeitelimageInfo"gt
  • lttextfile filename "book.xml"gt
  • lttextdescriptiongtA book listlt/textdescript
    iongt
  • lt/textfilegt
  • ltimagefile filename "funny.jpg"gt
  • ltimagedescriptiongtA funny
    picturelt/imagedescriptiongt
  • ltimagesize width "200" height "100"/gt
  • lt/imagefilegt
  • lt/textdirectorygt

38
Namespace.xml in validator file is in
examples\ch05
39
Namespace.xml example 5.8 in IE
40
Namespaces continued
  • Providing a prefix can be tedious. A default
    namespace can be created and elements and
    attributes used in the xml document from this
    namespace do not need prefixes.

41
Default namespaces
  • lt?xml version "1.0"?gt
  • lt!-- Fig. 5.9 defaultnamespace.xml --gt
  • lt!-- Using Default Namespaces --gt
  • ltdirectory xmlns "urndeiteltextInfo"
  • xmlnsimage "urndeitelimageInfo"gt
  • ltfile filename "book.xml"gt
  • ltdescriptiongtA book listlt/descriptiongt
  • lt/filegt
  • ltimagefile filename "funny.jpg"gt
  • ltimagedescriptiongtA funny
    picturelt/imagedescriptiongt
  • ltimagesize width "200" height "100"/gt
  • lt/imagefilegt
  • lt/directorygt

42
Default namespaces
  • Now, file is in the default namespace.
  • Compare this example to the earlier namespace
    example where text and image were distinct
    namespaces.

43
Defaultnamespace.xml in IE
44
Day planner case studyto be continued
  • lt?xml version "1.0"?gt
  • lt!-- Fig. 5.10 planner.xml --gt
  • lt!-- Day Planner XML document --gt
  • ltplannergt
  • ltyear value "2000"gt
  • ltdate month "7" day "15"gt
  • ltnote time "1430"gtDoctoraposs
    appointmentlt/notegt
  • ltnote time "1620"gtPhysics class at
    BH291Clt/notegt
  • lt/dategt
  • ltdate month "7" day "4"gt
  • ltnotegtIndependence Daylt/notegt
  • lt/dategt
  • ltdate month "7" day "20"gt
  • ltnote time "0900"gtGeneral Meeting in
    room 32-Alt/notegt
  • lt/dategt
  • ltdate month "7" day "20"gt
  • ltnote time "1900"gtParty at
    Joeaposslt/notegt
  • lt/dategt
  • ltdate month "7" day "20"gt

45
Planner.xml in validator
46
day planner using a java GUI. SAX parser is used
to parse the document.(in text chapter 8)
47
Homework on this section
  • Install an xml validator
  • Create your own xml file and validate it.
  • Post screenshots of your XML file and what
    validator.
Write a Comment
User Comments (0)
About PowerShow.com