ACE104 Lecture 2 - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

ACE104 Lecture 2

Description:

ACE104 Lecture 2 XML Simple XML Schema XML in messaging Most modern languages have method of representing structured data. Typical flow of events in application User ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 51
Provided by: peopleCsU
Category:

less

Transcript and Presenter's Notes

Title: ACE104 Lecture 2


1
ACE104Lecture 2
  • XML
  • Simple XML Schema

2
XML in messaging
  • Most modern languages have method of representing
    structured data.
  • Typical flow of events in application

Read data (file, db, socket)
Marshal objects
Manipulate in program
Unmarshal (file, db, socket)
  • Many language-specific technologies to reduce
    these steps RMI, object
  • serialization in any language, CORBA (actually
    somewhat language neutral),
  • MPI, etc.
  • XML provides a very appealing alternative that
    hits the sweet spot for
  • many applications

3
User-defined types in programming languages
  • One view of XML is as a text-based,
    programming-language-neutral way of representing
    structured information. Compare

4
Sample XML Schema
  • In XML, (a common) datatype description is
    called an XML schema.
  • DTD and Relax NG are other common alternatives
  • Below uses schema just for illustration purposes
  • Note that schema itself is written in XML
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltxsschema xmlnsxs"http//www.w3.org/2001/XMLSch
    ema"
  • elementFormDefault"qualified"
    attributeFormDefault"unqualified"gt
  • ltxselement name"student"gt
  • ltxscomplexTypegt
  • ltxssequencegt
  • ltxselement name"name"
    type"xsstring"/gt
  • ltxselement name"ssn"
    type"xsstring"/gt
  • ltxselement name"age"
    type"xsinteger"/gt
  • ltxselement name"gpa"
    type"xsdecimal"/gt
  • lt/xssequencegt
  • lt/xscomplexTypegt
  • lt/xselementgt
  • lt/xsschemagt

Ignore this For now
5
Alternative schema
  • In this example studentType is defined separately
    rather than anonymously
  • ltxsschemagt
  • ltxselement name"student" type"studentType/gt
  • ltxscomplexType name"studentType"gt
  • ltxssequencegt
  • ltxselement name"name" type"xsstring"/gt
  • ltxselement name"ssn" type"xsstring"/gt
  • ltxselement name"age" type"xsinteger"/gt
  • ltxselement name"gpa" type"xsdecimal"/gt
  • lt/xssequencegt
  • lt/xscomplexTypegt
  • lt/xsschemagt

new type defined separately
6
Alternative DTD
  • Can also use a DTD (Document Type Descriptor),
    but this is
  • much simpler than a schema but also much less
    powerful
  • (notice the lack of types)
  • lt!DOCTYPE Student
  • lt! Each XML file is stored in a document
    whose name is the same as the root node -- gt
  • lt! ELEMENT Student (name,ssn,age,gpa)gt
  • lt! Student has four attributes -- gt
  • lt!ELEMENT name (PCDATA)gt
  • lt! name is parsed character data --
    gt
  • lt!ELEMENT ssn (PCDATA)gt
  • lt!ELEMENT age (PCDATA)gt
  • lt!ELEMENT gpa (PCDATA)gt
  • gt

7
Another alternative Relax NG
  • Gaining in popularity
  • Can be very simple to write and at same time has
    many more features than DTD
  • Still much less common than Schema

8
Creating instances of types
In programming languages, we instantiate
objects struct Student s1, s2 s1.name
Andrew s1.ssn123-45-6789 Student s new
Student() s1.name Andrew s1.ssn123-45-6789
. type(Student) s1 s1name Andrew .
C
Java
Fortran
9
Creating XML documents
  • XML is not a programming language!
  • In XML we make a Student object in an xml file
    (Student.xml)
  • ltStudentgt
  • ltnamegtAndrewlt/namegt
  • ltssngt123-45-6789lt/ssngt
  • ltagegt39lt/agegt
  • ltgpagt2.0lt/gpagt
  • lt/Studentgt
  • Think of this as like a serialized object.

10
XML and Schema
  • Note that there are two parts to what we did
  • Defining the structure layout
  • Defining an instance of the structure
  • The first is done with an appropriate Schema or
    DTD.
  • The second is the XML part
  • Both can go in the same file, or an XML file can
    refer to an external Schema or DTD (typical)
  • From this point on we use only Schema
  • Exercise 1

11
?
  • Question What can we do with such a file?
  • Some answers
  • Write corresponding Schema to define its content
  • Write XSL transformation to display
  • Parse into a programming language

12
Exercise 1
13
Exercise 1 Solution
lt?xml version"1.0" encoding"UTF-8"?gt ltcarsgt
ltcargt ltmakegtdodgelt/makegt
ltmodelgtramlt/modelgt ltcolorgtredlt/colorgt
ltyeargt2004lt/yeargt ltmileagegt22000lt/mileagegt
lt/cargt ltcargt ltmakegtFordlt/makegt
ltmodelgtPintolt/modelgt ltcolorgtwhitelt/colorgt
ltyeargt1980lt/yeargt ltmileagegt100000lt/mileagegt
lt/cargt lt/carsgt
14
Some sample XML documents
15
Order / Whitespace
Note that element order is important, but
whitespace in element data is not. This is the
same as far as the xml parser is
concerned ltArticle gt ltHeadlinegtDirect Marketer
Offended by Term 'Junk Mail' lt/Headlinegt ltauthors
gt ltauthorgt Joe Gardenlt/authorgt ltauthorgt Tim
Harrodlt/authorgt lt/authorsgt ltabstractgtDan
Spengler, CEO of the direct-mail-marketing firm
Mailbox of Savings, took umbrage Monday at the
use of the term ltitgtjunk maillt/itgt lt/abstractgt ltbo
dy type"url" gt http//www.theonion.com/archive/3-
11-01.html lt/bodygt lt/Articlegt
16
Molecule Example
  • XML is extremely useful for standardizing data
    sharing within specialized domains. Below is a
    part of the Chemical Markup Language describing a
    water molecule and its constituents
  • lt?xml version "1.0" ?gt
  • ltCMLgt
  • ltMOL TITLE"Water" gt
  • ltATOMSgt
  • ltARRAY BUILTIN"ELSYM" gt H O Hlt/ARRAYgt
  • lt/ATOMSgt
  • ltBONDSgt
  • ltARRAY BUILTIN"ATID1" gt1 2lt/ARRAYgt
  • ltARRAY BUILTIN"ATID2" gt2 3lt/ARRAYgt
  • ltARRAY BUILTIN"ORDER" gt1 1lt/ARRAYgt
  • lt/BONDSgt
  • lt/MOLgt
  • lt/CMLgt

17
Rooms example
  • A typical example showing a few more XML
    features
  • lt?xml version"1.0" ?gt
  • ltroomsgt
  • ltroom name"Red"gt 
  • ltcapacitygt10lt/capacitygt
  • ltequipmentListgt
  • ltequipmentgtProjectorlt/equipmentgt  
  • lt/equipmentListgt 
  • lt/roomgt
  • ltroom name"Green"gt 
  • ltcapacitygt5lt/capacitygt  
  • ltequipmentList /gt
  • ltfeaturesgt 
  • ltfeaturegtNo Rooflt/featuregt  
  • lt/featuresgt 
  • lt/roomgt 
  • lt/roomsgt

18
Suggestion
  • Try building each of those documents in an XML
    builder tool (XMLSpy, Oxygen, etc.) or at least
    an XML-aware editor.
  • Note it is not required to create a schema to do
    this. Just create new XML document and start
    building.

19
Dissecting an XML Document
20
Things that can appear in an XML document
  • ELEMENTS simple, complex, empty, or mixed
    content model attributes.
  • The XML declaration
  • Processing instructions(PIs) lt? ?gt
  • Most common is lt?xml-stylesheet ?gt
  • lt?xml-stylesheet typetext/css hrefmys.css?gt
  • Comments lt!-- comment text --gt

21
Parts of an XML document
  • lt?xml version "1.0" ?gt
  • ltCMLgtltMOL TITLE"Water" gt ltATOMSgt
  • ltARRAY BUILTIN"ELSYM" gt H O Hlt/ARRAYgt
  • lt/ATOMSgt
  • ltBONDSgt
  • ltARRAY BUILTIN"ATID1" gt1 2lt/ARRAYgt
  • ltARRAY BUILTIN"ATID2" gt2 3lt/ARRAYgt
  • ltARRAY BUILTIN"ORDER" gt1 1lt/ARRAYgt
  • lt/BONDSgt
  • lt/MOLgt
  • lt/CMLgt

Declaration
Tags
Begin Tags End Tags
Attributes
Attribute Values
An XML element is everything from (including) the
element's start tag to (including) the element's
end tag.
22
XML and Trees
  • Tags give the structure of a document. They
    divide the document up into Elements, starting at
    the top most element, the root element. The stuff
    inside an element is its content content can
  • include other elements along with character
    data

Root element
CML
MOL
ATOMS
BONDS
ARRAY
ARRAY
ARRAY
ARRAY
CDATA sections
12
23
11
HOH
23
XML and Trees
  • lt?xml version "1.0" ?gt
  • ltCMLgt
  • ltMOL TITLE"Water" gt
  • ltATOMSgt
  • ltARRAY BUILTIN"ELSYM" gt H O Hlt/ARRAYgt
  • lt/ATOMSgt
  • ltBONDSgt
  • ltARRAY BUILTIN"ATID1" gt1 2lt/ARRAYgt
  • ltARRAY BUILTIN"ATID2" gt2 3lt/ARRAYgt
  • ltARRAY BUILTIN"ORDER" gt1 1lt/ARRAYgt
  • lt/BONDSgt
  • lt/MOLgt
  • lt/CMLgt

Root element
CML
MOL
ATOMS
BONDS
ARRAY
ARRAY
ARRAY
ARRAY
Data sections
12
23
11
HOH
24
XML and Trees
rooms
room
room
capacity
features
capacity
equipmentlist
equipmentlist
equipment
10
feature
5
projector
No Roof
25
More detail on elements
26
Element relationships
  • Book is the root element.
  • Title, prod, and chapter are
  • child elements of book.
  • Book is the parent element
  • of title, prod, and chapter.
  • Title, prod, and chapter are
  • siblings (or sister elements)
  • because they have the
  • same parent.

ltbookgt lttitlegtMy First XMLlt/titlegt ltprod
id"33-657" media"paper"gtlt/prodgt
ltchaptergtIntroduction to XML ltparagtWhat is
HTMLlt/paragt ltparagtWhat is XMLlt/paragt
lt/chaptergt ltchaptergtXML Syntax
ltparagtElements must have a closing taglt/paragt
ltparagtElements must be properly nestedlt/paragt
lt/chaptergt lt/bookgt
27
Well formed XML
28
Well-formed vs Valid
  • An XML document is said to be well-formed if it
    obeys basic semantic and syntactic constraints.
  • This is different from a valid XML document,
    which (as we will see in more depth) properly
    matches a schema.

29
Rules for Well-Formed XML
  • An XML document is considered well-formed if it
    obeys the following rules
  • There must be one element that contains all
    others (root element)
  • All tags must be balanced
  • ltBOOKgt...lt/BOOKgt
  • ltBOOK /gt
  • Tags must be nested properly
  • ltBOOKgt ltLINEgt This is OK lt/LINEgt lt/BOOKgt
  • ltLINEgt ltBOOKgt This is lt/LINEgt definitely NOT
    lt/BOOKgt OK
  • Element text is case-sensitive so
  • ltPgtThis is not ok, even though we do it all the
    time in HTML!lt/pgt

30
More Rules for Well-Formed XML
  • The attributes in a tag must be in quotes
  • lt ITEM CATEGORYHome and Garden Namehoe-matic
    t500gt
  • Comments are allowed
  • lt!- They are done just as in HTML --gt
  • Must begin with
  • lt?xml version1.0 ?gt
  • Special characters must be escaped the most
    common are
  • lt " ' gt
  • ltformulagt x lt y2x lt/formulagt
  • ltcd title"quot mmusic"gt

31
Naming Rules
  • Naming rules for XML elements
  • Names may contain letters, numbers, and other
    characters
  • Names must not start with a number or punctuation
    character
  • Names must not start with the letters xml (or XML
    or Xml ..)
  • Names cannot contain spaces
  • Any name can be used, no words are reserved, but
    the idea is to make names descriptive. Names
    with an underscore separator are typical
  • Examples ltfirst_namegt, ltdate_of_birthgt, etc.

32
XML Tools
  • XML can be created with any text editor
  • Normally we use an XML-friendly editor
  • e.g. XMLSpy
  • nXML emacs extensions
  • MSXML on Windows
  • Oxygen
  • Etc etc.
  • To check and validate XML, use either these tools
    and/or xmllint on Unix systems.

33
Another View
  • XML-as-data is one way to introduce XML
  • Another is as a markup language similar to html.
  • One typically says that html has a fixed tag set,
    whereas XML allows the definition of arbitrary
    tags
  • This analogy is particularly useful when the goal
    is to use XML for text presentation -- that is,
    when most of our data fields contain text
  • Note that mixed element/text fields are
    permissible in XML

34
Article example
ltArticle gt ltHeadlinegtDirect Marketer Offended
by Term 'Junk Mail' lt/Headlinegt ltauthorsgt
ltauthorgt Joe Gardenlt/authorgt ltauthorgt Tim
Harrodlt/authorgt lt/authorsgt ltabstractgtDan
Spengler, CEO of the direct-mail-marketing firm
Mailbox of Savings, took umbrage
Monday at the use of the term ltitgtjunk
maillt/itgt. lt/abstractgt ltbody type"url" gt
http//www.theonion.com/archive/3-11-01.html
lt/bodygt lt/Articlegt
35
More uses of XML
  • There is more!
  • A very popular use of XML is as a base syntax for
    programming languages (the elements become
    program control structures)
  • XSLT, BPEL, ant, etc. are good examples
  • XML is ubiqitous and must have a deep
    understanding to be efficient and productive
  • Many other current and potential uses -- up to
    the creativity of the programmer

36
XML Schema
  • There are many details to cover of schema
    specification. It is extremely rich, flexible,
    and somewhat complex
  • We will do this in detail next lecture
  • Now we begin with a brief introduction

37
XML Schema
  • XML itself does not restrict what elements
    existing in a document.
  • In a given application, you want to fix a
    vocabulary -- what elements make sense, what
    their types are, etc.
  • Use a Schema to define an XML dialect
  • MusicXML, ChemXML, VoiceXML, ADXML, etc.
  • Restrict documents to those tags.
  • Schema can be used to validate a document -- ie
    to see if it obeys the rules of the dialect.

38
Schema determine
  • What sort of elements can appear in the document.
  • What elements MUST appear
  • Which elements can appear as part of another
    element
  • What attributes can appear or must appear
  • What kind of values can/must be in an attribute.

39
lt?xml version"1.0" encoding"UTF-8"?gt ltlibrarygt
ltbook id"b0836217462" available"true"gt
ltisbngt 0836217462 lt/isbngt lttitle
lang"en"gt Being a Dog is a Full-Time Job
lt/titlegt ltauthor id"CMS"gt
ltnamegt Charles Schulz lt/namegt ltborngt
1922-11-26 lt/borngt ltdeadgt 2000-02-12
lt/deadgt lt/authorgt ltcharacter
id"PP"gt ltnamegt Peppermint Patty
lt/namegt ltborngt 1966-08-22 lt/borngt
ltqualificationgt bold,brash, and tomboyish
lt/qualificationgt lt/charactergt
ltcharacter id"Snoopy"gt ltnamegt
Snoopylt/namegt ltborngt1950-10-04lt/borngt
ltqualificationgtextroverted
beaglelt/qualificationgt lt/charactergt
ltcharacter id"Schroeder"gt
ltnamegtSchroederlt/namegt
ltborngt1951-05-30lt/borngt
ltqualificationgtbrought classical music to the
Peanuts Striplt/qualificationgt
lt/charactergt ltcharacter id"Lucy"gt
ltnamegtLucylt/namegt
ltborngt1952-03-03lt/borngt
ltqualificationgtbossy, crabby, and
selfishlt/qualificationgt lt/charactergt
lt/bookgt lt/librarygt
  • We start with sample
  • XML document and
  • reverse engineer a
  • schema as a simple
  • example
  • First identify the elements
  • author, book, born, character,
  • dead, isbn, library, name,
  • qualification, title
  • Next categorize by content
  • model
  • Empty contains nothing
  • Simple only text nodes
  • Complex only sub-elements
  • Mixed text nodes sub-elements
  • Note content model independent

40
Content models
  • Simple content model name, born, title, dead,
    isbn, qualification
  • Complex content model libarary, character, book,
    author

41
Content Types
  • We further distinguish between complex and simple
    content Types
  • Simple Type An element with only text nodes and
    no child elements or attributes
  • Complex Type All other cases
  • We also say (and require) that all attributes
    themselves have simple type

42
Content Types
  • Simple content type name, born, dead, isbn,
    qualification
  • Complex content type library, character, book,
    author, title

43
Exercise2 answer
  • In the previous example ltbookgt
  • book has element content, because it contains
    other elements.
  • Chapter has mixed content because it contains
    both text
  • and other elements.
  • Para has simple content (or text content) because
    it
  • contains only text.
  • Prod has empty content, because it carries no
    information

44
Building the schema
  • Schema are XML documents
  • They must contain a schema root element as such
  • lt?xml version"1.0"?gt
  • ltxsschema xmlnsxs"http//www.w3.org/2001/XML
    Schema" targetNamespace"http//www.w3schools.com"
    xmlns"http//www.w3schools.com"
    elementFormDefault"qualified"gt
  • ... ...
  • lt/xsschemagt
  • We will discuss details in a bit -- note for now
    that yellow part can be excluded for now.

45
Flat schema for library
Start by defining all of the simple types
(including attributes) ltxsschema
xmlnsxshttp//www.w3.org/2001/XMLSchemagt
ltxselement namename typexsstring/gt
ltxselement namequalification
typexsstring/gt ltxselement nameborn
typexsdate/gt ltxselement namedead
typexsdate/gt ltxselement nameisbn
typexsstring/gt ltxsattribute nameid
typexsID/gt ltxsattribute nameavailable
typexsboolean/gt ltxsattribute namelang
typexslanguage/gt / lt/xsschemagt
46
Complex types with simple content
Now to complex types with simple content lttitle
langengt Being a Dog is lt/titlegt ltxseleme
nt nametitlegt ltxscomplexTypegt
ltxssimpleContentgt ltxsextension
basexsstringgt ltxsattribute
reflang/gt lt/xsextensiongt
lt/xssimpleContentgt lt/xscomplexTypegt lt/xseleme
ntgt
the element named title has a complex type which
is a simple content obtained by extending the
predefined datatype xsstring by adding the
attribute defined in this schema and having the
name lang.
47
Complex Types
All other types are complex types with complex
content. For example ltxselement
namelibrarygt ltxscomplexTypegt
ltxssequencegt ltxselement refbook
maxOccursunbounded/gt lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt ltxselement
nameauthorgt ltxscomplexTypegt
ltxssequencegt ltxselement refname/gt
ltxselement refborn/gt ltxselement
refdead minOccurs0/gt lt/xssequencegt
ltxsattribute refid/gt lt/xscomplexTypegt lt/xs
elementgt
48
lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"gt
ltxselement name"name" type"xsstring"/gt
ltxselement name"qualification"
type"xsstring"/gt ltxselement name"born"
type"xsdate"gt lt/xselementgt ltxselement
name"dead" type"xsdate"gt lt/xselementgt
ltxselement name"isbn" type"xsstring"gt
lt/xselementgt ltxsattribute name"id"
type"xsID"gt lt/xsattributegt ltxsattribute
name"available" type"xsboolean"gt
lt/xsattributegt ltxsattribute name"lang"
type"xslanguage"gt lt/xsattributegt
ltxselement name"title"gt
ltxscomplexTypegt ltxssimpleContentgt
ltxsextension base"xsstring"gt
ltxsattribute ref"lang"gt
lt/xsattributegt lt/xsextensiongt
lt/xssimpleContentgt
lt/xscomplexTypegt lt/xselementgt
ltxselement name"library"gt
ltxscomplexTypegt ltxssequencegt
ltxselement maxOccurs"unbounded"
ref"book"gt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
lt/xselementgt ltxselement name"author"gt
ltxscomplexTypegt ltxssequencegt
ltxselement ref"name"gt
lt/xselementgt ltxselement
ref"born"gt lt/xselementgt
ltxselement ref"dead" minOccurs"0"gt
lt/xselementgt lt/xssequencegt
ltxsattribute ref"id"gt lt/xsattributegt
lt/xscomplexTypegt lt/xselementgt
ltxselement name"book"gt
ltxscomplexTypegt ltxssequencegt
ltxselement ref"isbn"gt lt/xselementgt
ltxselement ref"title"gt
lt/xselementgt ltxselement
ref"author" minOccurs"0" maxOccurs"unbounded/gt
ltxselement ref"character"
minOccurs"0" maxOccurs"unbounded"/gt
lt/xssequencegt ltxsattribute
ref"available"gt lt/xsattributegt
ltxsattribute ref"id"gt lt/xsattributegt
lt/xscomplexTypegt lt/xselementgt
ltxselement name"character"gt
ltxscomplexTypegt ltxssequencegt
ltxselement ref"name"/gt
ltxselement ref"born"/gt
ltxselement ref"qualification"/gt
lt/xssequencegt ltxsattribute
ref"id"gt lt/xsattributegt
lt/xscomplexTypegt lt/xselementgt lt/xsschemagt
49
lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"gt
ltxselement name"library"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"book"
maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"isbn" type"xsinteger"gt
lt/xselementgt
ltxselement name"title"gt
ltxscomplexTypegt
ltxssimpleContentgt
ltxsextension
base"xsstring"gt
ltxsattribute name"lang"
type"xslanguage"
gt lt/xsattributegt
lt/xsextensiongt
lt/xssimpleContentgt
lt/xscomplexTypegt
lt/xselementgt
ltxselement name"author"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"name"
type"xsstring"gt lt/xselementgt
ltxselement name"born"
type"xsdate"gt lt/xselementgt
ltxselement name"dead"
type"xsdate"gt lt/xselementgt
lt/xssequencegt
ltxsattribute name"id"
type"xsID"gt lt/xsattributegt
lt/xscomplexTypegt
lt/xselementgt
ltxselement name"character" minOccurs"0"
maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"name"
type"xsstring"gt lt/xselementgt
ltxselement name"born"
type"xsdate"gt lt/xselementgt
ltxselement
name"qualification" type"xsstring"
gt lt/xselementgt
lt/xssequencegt
ltxsattribute
name"id" type"xsID"gt lt/xsattributegt
lt/xscomplexTypegt
lt/xselementgt
lt/xssequencegt
ltxsattribute type"xsID" name"id"gt
lt/xsattributegt
ltxsattribute name"available" type"xsboolean"gt
lt/xsattributegt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt lt/xsschemagt
Same schema but with everything defined locally!
50
Next Lecture
  • Even with this simple example there are many
    design issues to discuss
  • When is a flat layout better
  • When is a nested layout better
  • What are scoping rules
  • When to use ref vs. defining new type
  • Schema in depth is topic of next lecture
Write a Comment
User Comments (0)
About PowerShow.com