Why use XML - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Why use XML

Description:

Lecture Props /Lecture Lecture Predicates /Lecture Lecture Sets /Lecture /Course ... xml prop=value ... ? Param values must be quoted with single or ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 48
Provided by: scie314
Category:
Tags: xml | props | use

less

Transcript and Presenter's Notes

Title: Why use XML


1
Why use XML?
2
Suns slogan
  • Java XML
  • portable programs portable data

3
Markup
  • Information added to data to enhance its
    meaning
  • Identification of parts, boundaries and
    relationships of elements within documents
  • Identification of attributes

Without markup, most documents appear as
meaningful to machines as this document does to
humans
4
Whats wrong with HTML?
  • lthtmlgt
  • ltheadgt
  • lttitlegt
  • Martins page
  • lt/titlegt
  • lt/headgt
  • ltbody bgcolourffffffgt
  • ltp aligncentergt
  • Some text or other
  • lt/pgt
  • lt/bodygt
  • lt/htmlgt
  • HTML is the most successful electronic publishing
    language ever invented, but ....
  • .... HTML is for presentation, not content,
    limiting its applicability

5
XML example
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Bogdanov"gt
  • ltLecturegtPropslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

6
XML example
  • Encodes
  • boundaries
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Bogdanov"gt
  • ltLecturegtPropslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

7
XML example
  • Encodes
  • boundaries
  • roles
  • eg course v lecture
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Bogdanov"gt
  • ltLecturegtPropslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

8
XML example
  • Encodes
  • boundaries
  • roles
  • eg course v lecture
  • positions
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Bogdanov"gt
  • ltLecturegtPropslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

9
XML example
  • Encodes
  • boundaries
  • roles
  • eg course v lecture
  • positions
  • containment
  • eg lecture is part of course
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Bogdanov"gt
  • ltLecturegtPropslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

10
XML example
  • Encodes
  • boundaries
  • roles
  • eg course v lecture
  • positions
  • containment
  • eg lecture is part of course
  • attributes
  • eg title
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Bogdanov"gt
  • ltLecturegtPropslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

11
A brief history of markup
  • GML Generalised Markup Language
  • Developed in 60s and 70s by IBM
  • Used for IBM technical manuals
  • SGML Standardised GML
  • 70s, 80s with ANSI standard in 1983
  • Flexible and very general, but difficult and
    costly
  • HTML
  • Early 90s compact markup for hypertext docs
  • Now seen as a step backwards
  • XML

12
XML is
  • Simpler than SGML
  • More flexible than HTML
  • An application of SGML
  • Not a markup language, but a toolkit
  • however, common to refer to documents as being
    written in XML
  • Surrounded by a family of technologies which
    extend its use (eg transformation)

13
XML features
  • Represent most kinds of information,
    unambiguously (unlike HTML)
  • Easily customizable
  • Supports internationalization through UNICODE
  • Allows validation of documents
  • Easy to read by humans and machines
  • Open standard, managed by W3C

14
Applications of a better data format
  • Better search engines
  • find all places selling X
  • Customised data presentation from a single source
  • HTML
  • WML (wireless markup language)
  • PDF
  • Reliable information exchange
  • CML (chemical structures)
  • VoxML (voice)
  • B2B transactions of all types
  • MathML
  • etc

15
XHTML
Source http//www.w3.org/TR/xhtml1/
16
MathML example
  • Can be used for display or calculations
  • Source http//www.dessci.com/support/tutorials/ma
    thml/default.stm

17
SVG (scalable vector graphics)
  • SVG benefits
  • Zooming 
  • Text stays text. Text in SVG images remains
    editable and searchable
  • Small file size 
  • Display independence
  • Interactivity and intelligence

Source http//www-106.ibm.com/developerworks/educ
ation/transforming-xml/xmltosvg
18
VoiceXML
Source http//www.w3.org/TR/voicexml20/dml1.3.1
19
DocBook
Source http//nis-www.lanl.gov/rosalia/mydocs/do
cbook-intro/get-going.html
20
Apache documentation
Source http//jakarta.apache.org/ecs/index.html
21
WML
Source http//www.wap-uk.com/Developers/Tutorial.
htm
22
A closer look at XML documents
23
Simple example
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

24
Document prolog
  • indicates that this document is marked up in XML
  • Format
  • lt?xml propvalue ?gt
  • Param values must be quoted with single or double
    quotes (unlike HTML)
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

25
Document type declaration
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt
  • Specifies validation root
  • Format
  • lt!DOCTYPE root-element
  • SYSTEM dtdgt
  • DTD is optional (see later)

26
Elements
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt
  • Building blocks of XML documents
  • One root element
  • Format
  • ltname attvalue gt
  • content
  • lt/namegt
  • Or
  • ltname attval /gt
  • None-empty elements must have a closing tag
    (unlike HTML)

27
Elements
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt
  • Elements may contain other elements
  • An elements start and end tags must reside
    within the same parent (ie boxes cannot overlap)

28
Attributes
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt
  • Used to identify specific elements, or to
    elaborate elements
  • Values must be quoted (single or double)
  • Not always clear whether to use attributes or
    elements

29
Attributes or elements?
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt
  • Attributes shouldnt really hold content
  • Attribute order is ignored, whilst element order
    is significant
  • Attributes values can be restricted (see DTDs
    later)
  • Use attributes as unique refererences if needed

30
An XML document is a tree
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

Curriculum
Course
Course
Lecture
Lecture
Lecture
Lecture
Lecture
31
with content at its leaves
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

Curriculum
Course
Course
Lecture
Lecture
Lecture
Lecture
Class diagrams
Lecture
Use cases
Sets
Propositions
Predicates
32
and attributes
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
  • ltCurriculumgt
  • ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
  • ltLecturegtPropositionslt/Lecturegt
  • ltLecturegtPredicateslt/Lecturegt
  • ltLecturegtSetslt/Lecturegt
  • lt/Coursegt
  • ltCourse Title"UML" Lect"Marian Gheorge"gt
  • ltLecturegtUse Caseslt/Lecturegt
  • ltLecturegtClass Diagramslt/Lecturegt
  • lt/Coursegt
  • lt/Curriculumgt

Curriculum
Course
Course
TitleUML Lect
TitleZ Lect
Lecture
Lecture
Lecture
Lecture
Class diagrams
Lecture
Use cases
Sets
Propositions
Predicates
33
Well-formedness
  • Element containing text or elements must have
    start and end tags

Good ltcurriculumgt ltcoursegtZlt/coursegt
ltcoursegtJavalt/coursegt lt/curriculumgt
Bad ltcurriculumgt ltcoursegtZ
ltcoursegtJava lt/curriculumgt
34
Well-formedness
  • Element containing text or elements must have
    start and end tags
  • Empty elements tag must close with /gt

Good ltgraphic filenameicon.png/gt
Bad ltgraphic filenameicon.pnggt
35
Well-formedness
  • Element containing text or elements must have
    start and end tags
  • Empty elements tag must close with /gt
  • Attributes must be in quotes

Good ltcourse TitleJavagt ltcourse TitleZgt
Bad ltcourse TitleUMLgt
36
Well-formedness
  • Element containing text or elements must have
    start and end tags
  • Empty elements tag must close with /gt
  • Attributes must be in quotes
  • Elements must not overlap

Good ltcurriculumgt ltcoursegtZlt/coursegt lt/curri
culumgt
Bad ltcurriculumgt ltcoursegtZ
lt/curriculumgt lt/coursegt
37
Well-formedness
  • Element containing text or elements must have
    start and end tags
  • Empty elements tag must close with /gt
  • Attributes must be in quotes
  • Elements may not overlap
  • Markup chars must not appear in parsed content

Good ltequationgt5 lt 2lt/equationgt
Bad ltequationgt5 lt 2lt/equationgt
38
Well-formedness
  • Element containing text or elements must have
    start and end tags
  • Empty elements tag must close with /gt
  • Attributes must be in quotes
  • Elements may not overlap
  • Markup chars must not appear in parsed content
  • Element names start with letters or _, and
    contain letters, numbers, -, . and _

Good ltcurriculumgt lt_coursegt lttime-slotgt ltbook
.chaptergt
Bad lt1examplegt ltthe firstgt ltbookchaptergt
39
Why the rules?
  • Unlike HTML, an arbitrary XML document doesnt
    necessarily have a grammar
  • eg HTML knows that a ltpgt cannot contain another
    ltpgt, so the end tag is optional

40
Pros and cons of adding a grammar
  • CONS
  • More effort during development
  • Grammars need to be maintained
  • Can slow down processing
  • Need to learn a new syntax (although it is
    trivial for CS)
  • PROS
  • Grammars enables documents to be validated
  • Enforce restrictions such as required fields,
    limited choices
  • Serves as a clear description of the syntax for
    users and developers
  • Can act as a standard eg XHTML
  • Good for debugging

41
Document Type Definition (DTD)
  • lt!ELEMENT Curriculum (Course)gt
  • lt!ELEMENT Course (Lecture)gt
  • lt!ATTLIST Course
  • Title CDATA REQUIRED
  • Lect CDATA REQUIRED
  • gt
  • lt!ELEMENT Lect (PCDATA)gt
  • A DTD is a sequence of declarations
  • Doesnt conform to XML syntax
  • Easy to understand for CS
  • PCDATA keyword stands for parsed character
    data and means that the textual content will be
    parsed to look for XML entities (see later)

42
Element definitions
  • lt!ELEMENT article
  • (title,subtitle?,author,(paratablelist)
    ,biblio?)
  • gt

43
Attribute definitions
  • lt!ATTLIST Course Title CDATA REQUIRED
    Lect CDATA REQUIRED Lect2 CDATA
    IMPLIED Semester (firstsecond) first
  • gt

44
Entities
  • DTDs can also contain entity definitions
  • Simplest use is to substitute in any parsed text
    (PCDATA) eg
  • uos
  • CDATA is not parsed, so entities will not be
    substituted

lt!ENTITY uos The University of
Sheffield gt
45
Alternative to DTDs XML Schema
  • XML Schema is a proposal to introduce a grammar
    definining language which uses XML
  • Adds better typing
  • Predefined byte, float, long, time, date,
    boolean, binary, language, uri-reference,
  • Boundaries on data values
  • Pattern-matching

46
Overall summary
  • Knowledge of XML is essential for all computer
    scientists
  • Should lead to a better web with easier to find
    information
  • Interoperability
  • Impetus towards robust and open standards within
    industry sectors
  • Supports internationalisation via UNICODE
  • Hardly ever need to write a parser again!

47
Online documents
  • XML Tutorials for Programmers
  • http//www-106.ibm.com/developerworks/education/t
    utorial-prog/abstract.html
  • (online XML parser -- requires registration)
  • Transforming XML to PDF
  • http//www-106.ibm.com/developerworks/education/tr
    ansforming-xml/xmltopdf
  • Why XML?
  • http//www.w3.org/XML/1999/XML-in-10-points
  • XSL for fun and diversion
  • http//www-106.ibm.com/developerworks/library/hand
    s-on-xsl/
  • Simplify XML programming with JDOM
  • http//www-106.ibm.com/developerworks/java/library
    /j-jdom/
  • Easy Java/XML integration with JDOM, Part 1
  • http//www.javaworld.com/jw-05-2000/jw-0518-jdom.h
    tml
  • Tip Using JDOM and XSLT
  • http//www-106.ibm.com/developerworks/java/library
    /x-tipjdom.html

48
Websites
  • http//www.xml.org
  • http//www.jdom.org
  • Special thanks to Professor Martin Cooke,
    m.cooke_at_dcs.shef.ac.uk for the primary creation
    of these slides and their content.
Write a Comment
User Comments (0)
About PowerShow.com