XML I - PowerPoint PPT Presentation

About This Presentation
Title:

XML I

Description:

XML defines a framework for transmitting structured data, hence an XML document ... E.g.s of XML parsers are The Lark and Larval XML parsers for Java, Sun's Project ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 24
Provided by: femi6
Category:
Tags: xml | lark

less

Transcript and Presenter's Notes

Title: XML I


1
XML I
2
  • Learning Objectives
  • What is XML
  • Features of XML
  • Uses of XML
  • Structure of an XML document
  • Document Type Declaration
  • Document Type Definitions (DTDs)

3
  • What is XML?
  • XML means Extensible markup language.
  • It is NOT a version of HTML
  • Derived from SGML (Standard Generalized Mark-up
    language, which was established in 1986 as a
    standard for generalized electronic document
    exchange.
  • Has 3 main features structure, extensibility and
    validation.
  • XML defines a framework for transmitting
    structured data, hence an XML document is
    essentially a structured document for storing
    information.
  • Allows creation of custom mark-up tags for
    describing virtually anything.
  • XML documents are processed by an XML processor.

4
  • Uses of XML
  • Applied use of its capability of storage, and
    exchange of structural data between applications,
    that constitute the core of systems).
  • Examples of XML applications are Chemical Markup
    Language (CML), Extensible Financial Reporting
    Markup Language (XFRML), and Mathematical Markup
    Language.
  • Used in e-commerce to store, and transmit
    product, and other data, including financial
    information.
  • Used in Open Financial eXchange.
  • Used in search engines to store, and search data.
  • Applied use in virtually every sector.

5
  • By including, or referencing a Document type
    definition (DTD), XML documents can be validated.
  • XML Syntax Fundamentals
  • XML syntax describes the constructs used to
    define the structure and layout of an XML
    document, as well as the constraints involved.
  • An XML processor is a software module that reads
    an XML document, and provides access to its
    content and structure.
  • XML processors typically process documents on
    behalf of applications, and are readily available
    as software plug-ins.
  • IE 5.0 is an e.g. of an XML application that
    processes and displays XML documents.

6
  • Entity The basic building block of an XML
    document. Contains either parsed or unparsed
    data.
  • Parsed data consists of characters that are
    considered as character data or mark-up, and are
    processed by an XML processor.
  • Unparsed character is handled as raw text and is
    not processed.
  • E.g. ltnamegtJohnlt/namegt, ltnamegt and lt/namegt are
    mark-up, while John is character data.
  • Markup Used to provide a description of a
    documents storage structure (entities) and
    logical structures (elements).
  • Elements Describe the logical structure. They
    have start tags e.g. ltnamegt and end tags (
    lt/namegt ), or a single empty tag (ltname/gt).

7
  • XML mark-up components include
  • Tags Most obvious component in XML syntax, used
    to describe elements.
  • Processing instructions Passed by the parser to
    the application. Begin with lt? and end with ?gt.
    E.g lt?xml version1.0?gt indicates that the
    document is based on xml version 1.0
  • Document type declarations Used to specify
    information about the document, including the
    documents root element, and the Document Type
    Definition (DTD). Must appear after the XML
    declaration, but before the root element e.g


  • lt?xml version1.0gt
    lt!DOCTYPE addressbook
    SYSTEM Addressbook.dtdgt ltaddressbookgt

    ltcontactgt
  • addressbook declared in line 2 must
    correspond to ltaddressbookgt in line 3, the root
    element of the document.

8
  • Entity references Used to assign aliases to
    pieces of data. They are made within an ampersand
    () and a colon (). E.g. apos corresponds to
    an apostrophe () while amp corresponds to .
  • Comments Used to present information that is
    technically not part of the documents content.
    Begin with lt! and end with -- gt
  • Marked (CDATA) Sections Used to block off text
    that is to be sidestepped by the parser. Defined
    by enclosing it in within lt!CDATA and gt. E.g.
    lt!CDATAltnamegtJohnlt/namegt. In this example,
    the name element is not recognized as mark-up and
    John is not recognized as parsed character data.
  • It is common to use CDATA sections to
    quote a piece of XML code, e.g. in a tutorial.

9
  • Styling XML for display
  • Accomplished in 2 ways
  • With the use of CSS.
  • With XSL. More complex and advanced than CSS
  • Parsing XML
  • Can be validating or non-validating.
  • Validating parsers validate XML documents against
    a DTD or XML Schema.
  • E.g.s of XML parsers are The Lark and Larval XML
    parsers for Java, Suns Project X Parser for
    Java, IBMs XML Parser for Java, Oracle XML
    parser for Java, IBMs XML Parser for C.

10
  • Example of an XML Document
  • lt?xml version1.0?gt
  • lt!DOCTYPE addressbook SYSTEM Addressbook.dtdgt
  • ltaddressbookgt
  • ltcontactgt
  • ltnamegtTony Bennlt/namegt
  • ltaddressgt210 Temple roadlt/addressgt
  • ltcitygtLondonlt/citygt
  • ltpostcodegtNW9 0RTlt/postcodegt
  • ltphonegt02082049565lt/phonegt
  • lt/contactgt
  • ltcontactgt
  • ltnamegtPeter Bloggslt/namegt

11
  • ltaddressgt230 The Valelt/addressgt
  • ltcitygtLondonlt/citygt
  • ltpostcodegtNW6 2BTlt/postcodegt
  • ltphonegt02082029517lt/phonegt
  • lt/contactgt
  • lt/addressbookgt
  • The above example is a well-formed XML document
    used to store contact information. However, it is
    not valid yet!
  • Note that the root element (ltaddressbookgt) has
    nested child elements that are defined with
    opening and closing tags respectively.

12
  • XML Data Modelling
  • Involves describing the structure of XML
    documents, for the purpose of validation.
  • After defining a data model, you can create
    structured XML documents that must adhere to that
    model, to be valid.
  • Valid vs Well-formed XML It is perfectly legal
    to create an XML document without a data model,
    in which case the document could be considered
    well-formed, but is not valid.
  • There are 2 approaches to creating data models
  • DTDs (Document Type Definitions) and
  • XML Schemas
  • The data model (DTD or XML Schema) defines the
    arrangement of mark-up and character data within
    a valid XML document, i.e. the order of nesting
    of the elements.

13
  • Modelling Data with DTDs
  • DTDs (Document Type Definitions) rely on
    specialized syntax for describing the structure
    of XML vocabulary (class of document).
  • DTDs can be broken down into 2 subsets
  • Internal or Local DTD Mark-up declarations are
    contained in the prolog (section of document
    preceding the root element) of the same document.
  • External DTD External mark-up declarations that
    can be referenced by one or more documents.
  • The 2 subsets may be combined, with Internal
    having higher precedence.
  • The DTD declares every element, attribute and
    entity used in the XML document.
  • It must be declared, or referenced in the
    document type declaration.

14
  • Example Addressbook.dtd
  • lt!ELEMENT addressbook (contact)gt
  • lt!ELEMENT contact (name, address, city, postcode,
    phone)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ELEMENT address (PCDATA)gt
  • lt!ELEMENT city (PCDATA)gt
  • lt!ELEMENT postcode (PCDATA)gt
  • lt!ELEMENT phone (PCDATA)gt
  • ltaddressbookgt
  • ltcontactgt
  • ltnamegtTony Bennlt/namegt
  • ltaddressgt210 Temple roadlt/addressgt
  • ltcitygtLondonlt/citygt
  • ltpostcodegtNW9 0RTlt/postcodegt
  • ltphonegt02082049565lt/phonegt
  • lt/contactgt

15
  • Document type declaration syntax
  • lt!DOCTYPE rootElem SYSTEM ExtDTDRef
    InternalDTDDeclgt
  • where rootElem is the root element, ExtDTDRef is
    the External DTD reference, and InternalDTDDecl
    is the Internal DTD declaration.
  • Illustration
  • lt!DOCTYPE movies SYSTEM Movies.dtd
  • lt!ELEMENT actor (PCDATA)gt gt
  • ltmoviesgt
  • lttitlegtLord of the ringslt/titlegt
  • lt! the other child elements go here -- gt
  • External DTDs are more commonly used, and are
    especially useful when you are creating multiple
    documents of the same class when you would like
    to use an existing DTD or to make your document
    as concise as possible.

16
  • Internal DTDs are preferable in situations where
    youre creating only one document, or to reduce
    the overhead associated with your documents.
  • Elements and Attributes
  • The primary contents described in a DTD are
    elements and attributes.
  • Think of an element as a logical unit of
    information, and an
  • Attribute as a characteristic of that
    information.
  • By looking at a document as a group of
    information objects, it is usually possible to
    associate each object with an element. Any
    leftover information would usually be represented
    as attributes.
  • Another approach is to consider the type of
    information and how it will be used.

17
  • Attributes provide tighter constraints on
    information, while elements on the other hand,
    are very loosely constrained and are better
    suited for long strings of text.
  • Attributes can be constrained against a
    predefined list of values, and can have default
    values.
  • Attributes are very concise, and are easier to
    parse.
  • They however can not contain nested information.
  • Elements
  • Declared with element declarations in the DTD.
  • Syntax lt!ELEMENT ElementName Typegt
  • ElementName corresponds to the tag used to mark
    up that element in the XML document.
  • Type specifies the content. 4 types are supported
    in XML

18
  • Empty types The element doesnt contain any
    content, but may contain attributes. In the DTD,
    they are declared in the form lt!ELEMENT
    ElementName EMPTYgt
  • E.g lt!ELEMENT img EMPTYgt
  • Empty elements are defined in the XML document in
    2 ways
  • ltstart taggtltend taggt with no space in between e.g
    ltimg srcpic.gifgtlt/imggt.
  • with an empty tag e.g ltimg/gt or ltimg
    srcpic.gif/gt
  • Element only type The element only type contains
    child elements. Denoted by lt!ELEMENT ElementName
    contentModelgt
  • The content model is specified using a
    combination of special element declaration
    symbols and child element names.
  • The symbols represent the relationship of the
    child, to the container element.

19
  • Table of Special Symbols

Symbol Usage
Parentheses (()) Enclose a sequence or choice group of child elements
Comma (,) Separates the items in a sequence and establishes the order in which they must appear.
Pipe () Separates items in a choice group of elements.
No symbol Implies that the child element must appear exactly once
Question mark (?) Child element must appear only once or not at all
Asterisk () Child element can appear any number of times
Plus sign () Must appear at least once
Example lt!ELEMENT resume (intro, (education
experience),hobbies?,references)gt

20
  • Mixed Elements
  • Contain both character and child elements. The
    simplest mixed element is that declared to
    contain only character data.
  • Take the following form
  • lt!ELEMENT ElementName (PCDATA)gt.
  • E.g. lt!ELEMENT city (PCDATA)gt
  • ANY Elements
  • The ANY element, so named because it is declared
    with the symbol ANY, can contain any type of
    element, or a combination of elements.
  • Due to its lack of structure, you should avoid
    using it.
  • Typically used during development of a DTD, but
    should not appear in a production DTD.
  • Form lt!ELEMENT ElementName ANYgt

21
  • Attributes
  • Used to specify additional information about
    elements.
  • Within an element, attributes are used to form
    name/value pairs that describe a particular
    property of the element.
  • Declared in a DTD with attribute list declaration
    which take the form
  • lt! ATTLIST ElementName AttrName AttrType Defaultgt
  • There are 4 types of default types that can be
    specified
  • REQUIRED The attribute is required
  • IMPLIED The attribute is optional
  • FIXED value The attribute has a fixed value
  • default The default value of the attribute
  • REQUIRED implies that the attribute is required,
    and you must define that attribute if you use the
    element.

22
  • Attribute Type
  • Must be specified, in addition to the attribute
    default value.
  • XML supports 10 attribute types
  • CDATA- Unparsed character data
  • Enumerated Series of string values
  • NOTATION A notation declared somewhere else in
    the DTD
  • ENTITY An external binary entity
  • ENTITIES Multiple external binary entities
    separated by whitespace.
  • ID A unique identifier
  • IDREF Reference to an ID declared somewhere else
    in the DTD
  • IDREFS Multiple references to IDs declared
    somewhere else in the DTD
  • NMTOKEN A name consisting of XML token
    characters (letters, numbers, periods, dashes,
    colons and underscores).
  • NMTOKENS Multiple names consisting of XML token
    characters.

23
  • String Attributes
  • Most commonly used attribute
  • Example
  • lt!ATTLIST player team CDATA REQUIREDgt
  • In the above example, the team to which a player
    belongs is a required character data attribute
    that must be defined in the player element.
  • lt!ATTLIST player team CDATA IMPLIEDgt would have
    made the team optional.
  • Another example
  • lt!ELEMENT movie (Producer, Director, Actor,
    Writer, Duration)
  • lt!ATTLIST movie type (comedy thriller)
    REQUIREDgt
  • In this example, the movie element contains the
    child elements defined, but it also has a
    mandatory attribute called Type which has 2
    possible values.
Write a Comment
User Comments (0)
About PowerShow.com