Extensible Markup Language XML - PowerPoint PPT Presentation


PPT – Extensible Markup Language XML PowerPoint presentation | free to view - id: d4de-OTc2N


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Extensible Markup Language XML


Similarly, an XML element might be tagged as name, gender, birth date, salary, price, ... Tagged elements may be nested to any depth to provide structured data, ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 58
Provided by: asuman9


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Extensible Markup Language XML

Extensible Markup Language (XML)
  • Extensible Markup Language has become the
    universal standard for representing data
  • XML started out as a standard data exchange
    format for the Web
  • Yet, it has quickly become the fundamental
    instrument in the development of Web-based online
    information services and electronic commerce
  • Almost all recent electronic commerce standards
    are based on XML

  • A subset of SGML (Standard Generalized Markup
    Language) it is defined by World Wide Web
    Consortium (http//www.w3.org)
  • It is a fee-free open standard.
  • HTML enables a universal method of displaying
    data XML provides a universal method of
    describing data
  • Provides the ability to describe data in an open
    text-based format and deliver it using standard
    http protocol

  • At present, many applications on the Web use XML
    for hosting large amounts of structured and
    semi-structured data
  • Representation of information in XML documents
    has been increasing at an astonishing pace
  • According to Meta Group, by 2003, about 65 of
    corporate data will be stored in an XML format

XML The Unifying Technology
XML Messaging
Maturity of Web Infrastructure
Browse the Web
Program the Web
XML helps address the challenge
  • The data is self-describing
  • e.g. the meaning of the data is included
    identifiers surround every bit of data,
    indicating what it means
  • Far more flexible method of representing
    transmitted information
  • e.g. batched orders sent together can have
    different fields and format without breaking apps
    on each end
  • Open, standard technologies for moving,
    processing and validating the data
  • e.g. the XML parser can automatically parse,
    validate, and feed the information to an
    application, instead of every application having
    to include this functionality

XML An Example
Data stream in a typical interface…
Electronic Commerce, 100, Turban, 25,
Same data stream in XML…
Electronic Commerce UANTITY 100 Turban
25 Addison-WesleyPUBLISHER
Markup (or Tagging)
  • XML uses textual markups to define data
  • An XML document is comprised of a collection of
    tagged elements each containing a start tag
    (), an end tag (), and the
    content between the two tags
  • Example
  • 1234ABCD

Tagging Data in XML
  • 1234ABCD
  • Considering the content only, it is not possible
    to understand what 1234ABCD stands for
  • The tag name PONumber intuitively tells that the
    content is a purchase order number
  • Similarly, an XML element might be tagged as
    name, gender, birth date, salary, price,…
  • XML is extensible in the sense that users can
    create their own vocabularies, the tag names are
    neither predefined nor limited

Adding Structure to data
  • Tagged elements may be nested to any depth to
    provide structured data, or may be repeated to
    represent a list of values
  • A valid XML document contains a single root
    element, which constitutes the top-level of
  • In other words, a valid XML document represents a
    tree of elements

Giving Meaning and Structure to Data
Start Tag
Start Tag
  • 1234ABCD
  • 20030601

  • 16

  • 95
  • …

An Element
Another Element
An Attribute
End Tag
Giving Structure to Data
Well-formed and valid XML documents
  • There are two levels of correctness of an XML
  • Well-formed. A well-formed document conforms to
    all of XML's syntax rules. For example, if an
    element has an opening tag with no closing tag
    and is not self-closing, it is not well-formed.
  • Valid. A valid document additionally conforms to
    some semantic rules. These rules are either
    user-defined, or included as an XML schema or

Well-formed documents XML syntax
  • The only indispensable syntactical requirement is
    that the document has exactly one root element
    (alternatively called the document element).
  • The root element can be preceded by an optional
    XML declaration.
  • version of XML
  • character encoding and external dependencies.
  • The specification requires that processors of XML
    support the pan-Unicode character encodings UTF-8
    and UTF-16

Well-formed documents XML syntax
  • XML comments start with .
  • The text enclosed by the root tags may contain an
    arbitrary number of XML elements. The basic
    syntax for one element is
  • content
  • Here, content is some text which may again
    contain XML elements.

Another example
Well-formed documents XML syntax
  • Attribute values must always be quoted, using
    single or double quotes ( OR )
  • Each attribute name should appear only once in
    any element.
  • Proper nesting elements may never overlap
  • Normal
    emphasized strong
    emphasized strong

  • Empty element tag, it has three equivalent
  • author"John" genre"science-fiction"
    date"2009-Jan-01" /

Entity references
  • An entity in XML is a named body of data, usually
    text, such as an unusual character.
  • An entity reference is a placeholder that
    represents that entity
  • It consists of the entity's name preceded by an
    ampersand ("") and followed by a semicolon
  • XML has five predeclared entities
  • amp ampersand
  • lt
  • gt greater than
  • apos apostrophe
  • quot quotation mark
  • More entities are declared in the document's
    (DTD). (will see)

Well-formed documents
  • The document complies with its declared character
  • The encoding may be declared either externally
    ("Content-Type" header of HTTP) or internally.
  • Element names are case-sensitive.
  • ...
  • Choosing meaningful names implies the semantics
    of elements and attributes to a human reader

Valid documents XML semantics
  • By leaving the names, allowable hierarchy, and
    meanings of the elements and attributes open and
    definable by a customizable schema or DTD, XML
    provides a syntactic foundation for the creation
    of purpose specific, XML-based markup languages.
  • The schema merely supplements the syntax rules
    with a set of constraints.
  • Schemas typically restrict element and attribute
    names and their allowable containment hierarchies

  • Such as, an element named 'birthday' contains 3
    elements year, 'month' and 'day. Each is only
    character data.

Valid documents XML semantics
  • An XML document that complies with a particular
    schema/DTD, in addition to being well-formed, is
    said to be valid.
  • An XML schema expressed in terms of constraints
    on the structure and content of documents
  • Before SGML and XML, software designers had to
    define special file formats and special-purpose
    parsers and writers.
  • XML's regular structure and strict parsing rules
    allow software designers to leave parsing to
    standard tools
  • Well-tested tools exist to validate an XML
    document "against" a schema

Document Type Definition (DTD)
  • The principle purpose of the DTD is to declare
    the hierarchy of document elements
  • A document type definition defines
  • The name of the elements,
  • The content model of each element,
  • How often and in which order elements may
  • If the end-tags can be shortcut,
  • The possible presence of attributes and their
    default values,
  • The names of the entities

An Example DTD
  • PurchaseOrderDate, LineItem)
  • QuantityOrdered, UnitPrice)
  • other elements are skipped -- ...

  • A DTD specifies the structure of an XML element
    by specifying the names of its sub-elements and
  • Sub-element structure is specified using the
  • set with zero or more elements
  • set with one or more elements
  • ? optional
  • or
  • All values are assumed to be string values,
    unless the type is ANY in which case the value
    can be an arbitrary XML fragment

  • There is a special attribute id which can occur
    once for each element
  • EMPTY- the element has no content
  • Empty elements usually have attributes that give
    them useful properties
  • There is no concept of a root of a document an
    XML document conforming to a DTD can be rooted at
    any element specified in the DTD

Element Identity, Ids, and ID References
  • To support element sharing, XML reserves an
    attribute of type ID, which allows a unique key
    to be associated with an element
  • An attribute of type IDREF allows an element to
    refer to another element with the designated key
    and IDREFS may refer to multiple elements
  • John
  • ...
  • ....
  • 1995

  • Entities represent the physical structure of an
    XML document
  • Two types of entities
  • General entities apply within the top level
    element and in attribute values
  • Parameter entities apply within the internal and
    external DTD subsets
  • Entity reference in a document
  • This contract is between receipent
    and contractor and the award is
  • Entity reference expanded
  • This contract is between METU and EC
    and the award is 1 EURO.
  • By changing the entity declarations you can
    create any contract.

General Entities
  • General entity declaration
  • Entity reference in a document
  • The xml is derived from ISO 8879, an
    International Standard. labelxml/
  • Entity reference expanded
  • The Extensible Markup Language is derived
    from ISO 8879, an International Standard.

Parameter Entities
  • is for use only in DTDs
  • Parameter entities carry information for use in
    the markup declaration, often a set of common
    attributes shared by several elements or a link
    to an outside DTD.
  • Parameter entities whose references are purely
    within DTD are known as internal entities,
    whereas references that draw information from
    outside files are external entities
  • Parameter entities use a sign both in their
    declaration and in their references to
    distinguish themselves from general entities

Parameter Entities
  • Parameter entity declaration
  • Parameter entity reference in DTD
  • Parameter entity reference expanded

  • The oldest schema format for XML
  • Disadvantages
  • It has no support for newer features of XML, most
    importantly namespaces.
  • It lacks expressiveness. Certain formal aspects
    of an XML document cannot be captured in a DTD.
  • It uses a custom non-XML syntax, inherited from
    SGML, to describe the schema.
  • Still used in many applications because it is
    considered the easiest to read and write.

Valid documents XML semantics
  • Other schema languages
  • XML Schema (XSD) (will see)
  • RELAX NG (specified by OASIS, now an ISO standard
    as part of DSDL)
  • ISO DSDL (Document Schema Description Languages)

  • Schematron

XML Namespaces
  • Namespaces are a simple and straightforward way
    to distinguish names used in XML documents, no
    matter where they come from
  • The only reason namespaces exist, is to give
    elements and attributes programmer-friendly names
    that will be unique across the whole Internet

  • xmlnsh"http//www.w3.org/HTML/1998/html4"
  • Book Review
  • XML A
  • AuthorPrice
  • PagesDate
  • Simon St. Laurenthtd
  • 31.98
  • 352
  • 1998/01

XML Namespaces
  • The prefixes are linked to the full names using
    the attributes on the top element whose names
    begin xmlns
  • The prefixes are just shorthand placeholders for
    the full names
  • Those full names are URIs, i.e. Web addresses

Extensibility in XML
  • Anyone can invent new tags and attach a meaning
    to those tags
  • But if every user creates its own XML definition
    for describing his data, it is not possible to
    achieve interoperability
  • For example, one may prefer to use the tag name
    POR, while another prefers using the tag name
  • In other words, a tagged document is not very
    useful without some kind of agreement on the tags
    among inter-operating applications

Extensibility in XML
  • Anyone can invent new tags and attach a meaning
    to those tags
  • For example
  • This device
  • This device
  • But if every user creates its own XML definition
    for describing his data, it is not possible to
    achieve interoperability

Agreement on tags is necessary
  • In other words, a tagged document is not very
    useful without some kind of agreement on the tags
    among inter-operating applications

Mobile Device
Hand Held Device
Many Efforts for Standardized Tags…
  • HL7 for healthcare
  • RosettaNet for supply chain integration in
    Information Technology and Electronic Components
  • GS1 again in supply chain
  • ebXML for eBusiness
  • Common Business Library (CBL) for electronic
    catalogs, purchase orders, etc.
  • …

XML Parsers
  • A parser takes an XML document and makes its
    structure and content available to an application
    through an API
  • There are two main Application Programming
    Interfaces (APIs) for writing parsers
  • Document Object Model (DOM) and
  • Simple API for XML (SAX)
  • Today, many parsers are both DOM and SAX compliant

XML DOM Parser
A parser validates and makes the data
contained in an XML document available
to the application
XSLT Processor
  • Converts an XML document to another form
  • An XSL style sheet is a set of transformation
    instructions for converting a source XML document
    to a target document

(No Transcript)
Why XML?
Critique of XML Advantages
  • It is text-based.
  • It supports Unicode, allowing almost any
    information in any written human language to be
  • It can represent the most general computer
    science data structures records, lists and
  • Its self-documenting format describes structure
    and field names as well as specific values.
  • The strict syntax and parsing requirements make
    the necessary parsing algorithms extremely
    simple, efficient, and consistent.
  • XML is heavily used as a format for document
    storage and processing, both online and offline.

  • It is based on international standards.

Critique of XML Advantages
  • It allows validation using schema languages such
    as XSD and Schematron, which makes effective
    unit-testing, firewalls, acceptance testing,
    contractual specification and software
    construction easier.
  • The hierarchical structure is suitable for most
    (but not all) types of documents.
  • It manifests as plain text files, which are less
    restrictive than other proprietary document
  • It is platform-independent
  • Forward and backward compatibility are relatively
    easy to maintain
  • Its predecessor, SGML, has been in use since
    1986, so there is extensive experience and
    software available.
  • An element fragment of a well-formed XML document
    is also a well-formed XML document.

Critique of XML Disadvantages
  • XML syntax is redundant or large relative to
    binary representations of similar data.
  • The redundancy may affect application efficiency
    through higher storage, transmission and
    processing costs.
  • XML syntax is verbose relative to other
    alternative 'text-based' data transmission
  • No intrinsic data type support XML provides no
    specific notion of "integer", "string",
    "boolean", "date", and so on

Critique of XML Disadvantages
  • The hierarchical model for representation is
    limited in comparison to the relational model or
    an object oriented graph.
  • Expressing overlapping (non-hierarchical) node
    relationships requires extra effort.
  • XML namespaces are problematic to use and
    namespace support can be difficult to correctly
    implement in an XML parser.
  • XML is commonly depicted as "self-documenting"
    but this depiction ignores critical ambiguities.

Some well-known XML based languages and
  • RSS Rich Site Summary
  • Ajax
  • SOAP Simple Object Access Protocol
  • WSDL Web Services Description Language
  • SVG Scalable Vector Graphics
  • Regarding Office Apps OASIS, OpenOffice,
    Microsoft Office
  • HL7 Clinical Document Architecture (CDA)
  • ...

HL7 Clinical Document Architecture (CDA)
  • A specification for document exchange using
  • XML,
  • the HL7 Reference Information Model (RIM)
  • Version 3 methodology
  • and vocabulary (SNOMED, ICD, local,…)
  • CDA Header
  • Metadata required for document discovery,
    management, retrieval
  • CDA Body
  • Clinical report
  • Discharge Summary
  • Referral

Clinical Document Architecture
  • Level One
  • The unconstraint CDA Specification
  • Only the header is well structured
  • Level Two
  • Section Level Templates are applied with coded
  • Level Three
  • Entry Level Templates are applied
  • Machine Processable!

HL7 CDA Example
About PowerShow.com