eXtensible Markup Language - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

eXtensible Markup Language

Description:

Author Paul McCartney /Author Date July, 1998 /Date ISBN 94303-12021-43892 /ISBN ... Paul McCartney. Text. July, 1998. Text. 94303-12021-43892. Text ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 51
Provided by: iua6
Category:

less

Transcript and Presenter's Notes

Title: eXtensible Markup Language


1
eXtensible Markup Language
  • Jesús Ibáñez, Toni Navarrete, Josep Blat
  • Universitat Pompeu Fabra

2
eXtensible Markup Language
  • New Internet mark-up metalanguage
  • Previously SGML, HTML, DHTMLs
  • Extensibility, structure and validation
  • SGML adaptation for WWW

3
eXtensible Markup Language
  • Defined as standard by W3C (Generic SGML
    Editorial Review Board - XML Working Group)
  • XML ! HTML
  • XML SGML--
  • XML, DTD (Document Type Definition) and XSL
    (eXtensible Style Language)

4
Main Characteristics
  • Describing semantically document content
  • Uncoupling semantic description from presentation
  • Allowing each user community to define its own
    labels, for instance ltPRICEgt, ltAUTHORgt,
    ltSECTIONgt, ltDATEgt, ltIMPORTANCE LEVEL"Expert"gt

5
XML Example (without DTD)
  • lt?XML version"1.0" standalone"yes"?gt
  • ltconversationgt
  • ltgreetinggtHello world!lt/greetinggt
  • ltanswergtStop it, Im getting
    off!lt/answergt
  • lt/conversationgt

6
Example with DTD (1)
  • lt!DOCTYPE Book
  • lt!ELEMENT Book(Title, Author, Date, ISBN,
    Publisher)
  • lt!ELEMENT Title(PCDATA)gt
  • lt!ELEMENT Author(PCDATA)gt
  • lt!ELEMENT Date(PCDATA)gt
  • lt!ELEMENT ISBN(PCDATA)gt
  • lt!ELEMENT Publisher(PCDATA)gt
  • gt

7
Example with DTD (2)
  • lt?xml version"1.0"? standalonenogt
  • lt!DOCTYPE Book SYSTEM "file//localhost/xml-course
    /xsl/Book.dtd"gt
  • ltBookgt
  • ltTitlegtMy Life and Timeslt/Titlegt
  • ltAuthorgtPaul McCartneylt/Authorgt
  • ltDategtJuly, 1998lt/Dategt
  • ltISBNgt94303-12021-43892lt/ISBNgt
  • ltPublishergtMcMillan
    Publishinglt/Publishergt
  • lt/Bookgt

8
DTDs
  • Allow to create new sets of labels
  • Examples
  • lt!ELEMENT Title (PCDATA)gt
  • lt!ELEMENT Disk (Disk)gt (1 or more)
  • lt!ELEMENT Book (Book)gt (0 or more)
  • ? (0 or 1) , (sequence) (option)
  • Attributes
  • lt!ATTLIST ARTICLE DATE CDATAgt (CDATA means
    Character Data)
  • lt!ATTLIST PERSON GENDER (male female)
    IMPLIEDgt (optional)
  • lt!ATTLIST PERSON GENDER (male female) male
    REQUIREDgt (required)

9
DTDs
  • lt!DOCTYPE Discography
  • lt!ELEMENT Discography (disk)gt
  • lt!ELEMENT Disk (Title, Group, Song)gt
  • lt!ELEMENT Title(PCDATA)gt
  • lt!ELEMENT Group(PCDATA)gt
  • lt!ELEMENT Song (titleS, Durationgt
  • lt!ELEMENT titleS(PCDATA)gt
  • lt!ELEMENT Duration(PCDATA)gt
  • gt

10
DTDs
  • lt Discographygt
  • lt Diskgt
  • lt TitlegtBrother in armslt/ Titlegt
  • lt GroupgtDire Straitslt/ Groupgt
  • lt Songgt
  • lt titleSgtMoney for nothinglt/ titleSgt
  • lt Durationgt520lt/ Durationgt
  • lt/ Songgt
  • lt Songgt
  • lttitleSgtSo far awaylt/titleSgt
  • ltdurationgt410lt/durationgt
  • lt/ Songgt
  • ...
  • lt/Diskgt
  • ltDiskgt
  • ltTitlegtOn every streetlt/Titlegt
  • ltGroupgtDire Straitslt/Groupgt
  • ltSonggt
  • ...

11
DTDs
  • lt!DOCTYPE publications
  • lt!ELEMENT publications (disk book)gt
  • lt!ELEMENT book ... gt
  • lt!ELEMENT disk ... gt
  • gt

12
DTDs
  • ltpublicationsgt
  • ltdiskgt
  • lttitlediskgtBrother in armslt/titlediskgt
  • ltgroupgtDire Straitslt/groupgt
  • ltsonggt
  • lttitleSgtMoney for nothinglt/titleSgt
  • ltdurationgt520lt/durationgt
  • lt/songgt
  • ...
  • lt/discgt
  • ltbookgt
  • lttitlebookgtCien años de soledadlt/titlebookgt
  • ltwritergtGabriel García Márquezlt/writergt
  • ...
  • lt/bookgt
  • ltbookgt
  • lttitlebookgtLa ciudad de los prodigioslt/titlebook
    gt
  • ltwritergtEduardo Mendozalt/writergt
  • ...

13
DTDs
  • lt?xml version"1.0"?gt
  • lt!DOCTYPE file
  • lt!ELEMENT file (name, surname, address,
    picture?)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ATTLIST name sex (malefemale) IMPLIEDgt
  • lt!ELEMENT surname (PCDATA)gt
  • lt!ELEMENT address (PCDATA)gt
  • lt!ELEMENT picture EMPTYgt
  • gt
  • ltfilegt
  • ltname sexmalegtTonilt/namegt
  • ltsurnamegtNavarretelt/surnamegt
  • ltsurnamegtTerrasalt/surnamegt
  • ltaddressgtRambla 32lt/addressgt
  • lt/filegt

14
Well formed vs valid
  • Valid XML the content conforms to the rules of
    the associated DTD.
  • Completeness, good format and attribute values of
    the XML data is ensured.
  • Well formed adjusted to XML syntax
  • An XML document without DTD can be well formed
    but, of course, cannot be valid.

15
XML Schemata
  • XML Schemata to define the structure of XML
    documents (same as DTDs)
  • BUT in XML syntax. Advantage same parser to
    validate, tools for dynamic creation
  • Use of Namespaces
  • Improved data type definition (41 instead of 10,
    plus user-defined)
  • Object orientation allows new types by extension
    or restriction of previous ones
  • Validation (a document wrt a scheme, a scheme wrt
    scheme of schemes)

16
Schema definition
  • An XML document whose root is schema and within
    it elements and attributes are defined
  • lt?xml version"1.0?gt
  • ltschemagt
  • ... elements and attributes definition
  • lt/schemagt
  • element definition
  • ltelement namename of the element
  • typetype of the element
  • options...
  • gt

17
Simple types of elements
  • string characters chain
  • boolean (false, 0, true, 1)
  • float (32 bits)
  • double (64 bits)
  • decimal (integer)
  • timeDuration
  • recurringDuration (several subtypes)
  • binary
  • uriReference (Uniform Resource Indicator)
  • And derived from these basic ones

18
Data type structure
19
Example
  • lt?xml version"1.0 encoding"ISO-8859-1?gt
  • ltbookshopgt
  • ltbook isbn"84-111-1111-1"gt
  • lttitlegtEl Quijotelt/ titlegt
  • ltauthorgtMiguel de Cervanteslt/authorgt
  • ltpublishergtPlaza y Janéslt/publishergt
  • ltcharactergtDon Quijotelt/charactergt
  • ltcharactergtSancho Panzalt/charactergt
  • ltcharactergtDulcinealt/charactergt
  • ltcharactergtRocinantelt/charactergt
  • lt/bookgt
  • ltbook isbn"84-222-2222-2"gt
  • lttitlegtLa ciudad de los prodigioslt/ titlegt
  • ltauthorgtEduardo Mendozalt/authorgt
  • ltpublishergtSeix-Barrallt/publishergt
  • ltcharactergtOnofre Boubilalt/charactergt
  • ltcharactergtEfren Castellslt/charactergt
  • lt/bookgt
  • ltbook isbn"84-333-3333-3"gt

XML document previous to schema definition
20
Building blocks simple elements and cardinality
  • Simple elements
  • ltelement nametitle" type"string" /gt
  • ltelement name"author" type"string" /gt
  • ltelement namepublisher" type"string" /gt
  • ltelement namecharacter"
  • minOccurs"0" maxOccurs"unbounded" /gt
  • A DTD would be like
  • lt!ELEMENT title (PCDATA)gt
  • In the cardinality definition we replace the DTD
    symbols ?, ,

21
Building blocks Complex types
  • The element book is composite, thus we define it
    as a complex type
  • ltelement namebook"gt
  • ltcomplexTypegt
  • ltsequencegt
  • ltelement nametitle" type"string" /gt
  • ltelement name"author" type"string" /gt
  • ltelement namepublisher" type"string" /gt
  • ltelement namecharacter" minOccurs"0"
    maxOccurs"unbounded" /gt
  • lt/sequencegt
  • lt/complexTypegt
  • lt/elementgt

22
Alternative naming complex types
  • We could also define a complex type with a name
  • ltelement namebook typeBooktype /gt
  • ltcomplexType nameBooktypegt
  • ltelement nametitle" type"string" /gt
  • ltelement name"author" type"string" /gt
  • ltelement namepublisher" type"string" /gt
  • ltelement namecharacter" minOccurs"0"
    maxOccurs"unbounded" /gt
  • lt/complexTypegt

23
Remark the combination of both is not allowed
  • ltelement namebook typeBooktypegt
  • ltcomplexType nameBooktypegt
  • ltelement nametitle" type"string" /gt
  • ltelement name"author" type"string" /gt
  • ltelement namepublisher" type"string" /gt
  • ltelement namecharacter" minOccurs"0"
    maxOccurs"unbounded" /gt
  • lt/complexTypegt
  • lt/elementgt

24
Building blocks empty elements
  • Elements such as HTML tags lthrgt or ltimg ...gt are
    empty
  • lthr /gt
  • ltimg srcimage.gif /gt
  • Empty has to be declared as an implicit complex
    type

ltelement namehrgt ltcomplexType contentempty
/gt lt/elementgt
ltelement nameimggt ltcomplexType
contentemptygt ltattribute namesrc
typestring /gt lt/complexTypegt lt/elementgt
25
A level upwards ...
  • Let us define bookshop
  • ltelement namebookshop"gt
  • ltcomplexTypegt
  • ltelement namebook"
  • minOccurs"0 maxOccurs"unbounded"gt
  • ltcomplexTypegt
  • ...
  • lt/complexTypegt
  • lt/elementgt
  • lt/complexTypegt
  • lt/elementgt

A schema definition is a BOTTOM-UP process
26
Attribute definition
  • Elements can have attributes associated to them
  • In DTDs, we would write
  • lt!ATTLIST book isbn REQUIREDgt
  • In XML Schema
  • ltattribute namename of the attribute
  • typetype of the attribute
  • options of the attribute ...
  • gt

27
Attribute definition
  • At the end of the element definition
  • ltelement namebook" minOccurs"0"
    maxOccurs"unbounded"gt
  • ltcomplexTypegt
  • ltelement nametitle" type"string" /gt
  • ltelement name"autor" type"string" /gt
  • ltelement namepublisher" type"string" /gt
  • ltelement namecharacter"
  • minOccurs"0" maxOccurs"unbounded" /gt
  • ltattribute name"isbn" type"string" /gt
  • lt/complexTypegt
  • lt/elementgt

28
General ordering
  • The definitions are ordered for a better
    legibility
  • 1) Simple types definition
  • 2) Attributes definition
  • 3) Complex types definition

29
Referencing the schema
  • We then add the schema reference in the XML
    document assume it is book.xml and bookshop is
    book.xsd then we would write
  • lt?xml version"1.0" encoding"ISO-8859-1"?gt
  • ltbookshop
  • xmlnsxsi"http//www.w3.org/2000/10/XMLSchema-in
    stance
  • xsinoNamespaceSchemaLocationbook.xsd
  • gt
  • ...
  • lt/bookshopgt

30
Namespaces
  • An XML Namespace is a collection of names (of
    elements and attributes) identified by an URI
  • Namespaces are a very flexible tool. The re-use
    of schemata, names, mixing them is promoted.
  • For instance we could use elements from two name
    spaces
  • lt BOOKSgt
  • ltbk BOOK xmlnsbk"urn BookLovers.orgBookInfo
  • xmlnsmoney"urnFinanceMoney"gt
  • ltbkTITLEgtA Suitable Boylt/bkTITLEgt
  • ltbkPRICE moneycurrency"US Dollar"gt22.95lt/bkP
    RICEgt
  • lt/bkBOOKgt
  • lt/BOOKSgt

31
Namespaces
  • http//www.w3.org/2000/10/XMLSchema
  • This is the Namespace for the schemata. Suffix
    xsd is used if none, it is the default namespace
  • http//www.w3.org/2000/10/XMLSchema-instance
  • Namespace for the documents instantiated from a
    schema. The prefix xsi is usually used.

32
Example
  • ltschema xmlns"http//www.w3.org/2000/10/XMLSchema
    1
  • targetNamespace"http//www.upf.es/namespa
    ces/Book 2
  • elementFormDefault"qualified 3
  • xmlnsxsi"http//www.w3.org/2000/10/XMLSc
    hema-instance
  • xsischemaLocation
  • "http//www.w3.org/2000
    /10/XMLSchema
  • http//www.w3.org/2000
    /10/XMLSchema.xsd"
  • xmlnsbk"http//www.publishing.org/namesp
    aces/Book"gt

1 Indicates the default namespace, which is
XMLSchema 2 Indicates that the elements and
attributes in this schema are defined upon the
namespace http//www.upf.es/namespaces/Book 3 Ind
icates that all the elements created in this
namespace and used in the instantiated documents
have to be qualified with a prefix (if we had
used unqualified, only the global elements could
go)
33
Example (2)
  • ltschema xmlns"http//www.w3.org/2000/10/XMLSchema
  • targetNamespace"http//www.upf.es/namespa
    ces/Book
  • elementFormDefault"qualified
  • xmlnsxsi"http//www.w3.org/2000/10/XMLSc
    hema-instance 4
  • xsischemaLocation 5
  • "http//www.w3.org/2000
    /10/XMLSchema 6
  • http//www.w3.org/2000
    /10/XMLSchema.xsd" 7
  • xmlnsbk" http//www.upf.es/namespaces/Bo
    ok"gt

4 Indicates that this XML document is
instantiated from the general Schema on Schemata
5 This is the namespace where the attribute
schemaLocation is defined 6 The namespace for the
general Schema on Schemata 7 URI of this Schema
on Schemata
34
Example (3)
  • ltschema xmlns"http//www.w3.org/2000/10/XMLSchema
  • targetNamespace"http//www.upf.es/namespa
    ces/Book
  • elementFormDefault"qualified
  • xmlnsxsi"http//www.w3.org/2000/10/XMLSc
    hema-instance
  • xsischemaLocation
  • "http//www.w3.org/2000
    /10/XMLSchema
  • http//www.w3.org/2000
    /10/XMLSchema.xsd"
  • xmlnsbk"http//www.upf.es/namespaces/Boo
    k"gt 8

8 We give a prefix to the target namespace to
facilitate the use in documents, for
instance ltelement refbkTitle" minOccurs"1"
maxOccurs"1"/gt
35
Example (and 4)
  • In the instantiated document
  • ltbookshop xmlns "http//www.upf.es/namespaces/Boo
    k 1
  • xmlnsxsi"http//www.
    w3.org/2000/10/XMLSchema-instance 2
  • xsischemaLocationht
    tp//www.upf.es/namespaces/book.xsd"gt 3

1 We define the default namespace of the
document 2 We include the namespace where schema
instantiation is defined (xsi) 3 With
schemaLocation we specify where is the Schema for
this document (book.xsd)
36
Other important concepts
  • ID and IDREFS
  • DOM (Document Object Model)
  • X-path
  • X-pointer
  • X-link

37
ID and IDREFS
  • ID attribute for unique identification of
    element. Similar role of URI. Example assigning
    the identity attack
  • ltparagraph idattackgtSuddenly the skies were
    filled with aircraftlt/paragraphgt
  • IDREFS (identity reference) easiest way of
    referring to an ID. Example In a DTD defined
    attributes of employee empnumber as an ID and
    boss as IDREFS here we say that Hanks ID is
    126 and his boss is 124 (defined earlier)
  • lt employee empnumberemp126 bossemp124gt
    Hanklt/employeegt

38
DOM (Document Object Model)
  • DOM is a technology for accessing and
    manipulating parts of an XML document
  • DOM models a document as a tree whose nodes are
    its elements
  • Then some properties and methods exist for the
    objects, allowing the access and manipulation

39
X-PATH
  • X-Path is a language for referencing parts of an
    XML document
  • It is used, for instance, to transform a document
    through XSL
  • X-Path is based upon DOM and uses paths (similar
    to URLs) to reference parts of a document

40
X-POINTER
  • X-Pointer is a language for pointing at a part of
    an XML document
  • X-Pointer uses X-path for pointing
  • X-Pointer enables linking

41
Linking using XML X-LINK
  • X-Link is a language for describing how to link
    resources in XML
  • We use attributes for the element link in the
    NameSpace xlink at "http//www.w3.org/XML/XLink/1.
    0"
  • The attributes are used to describe end-points,
    traversal, effect, resources

42
Tools
  • XML Browsers (visualisers)
  • XML Editors
  • XML Parsers
  • XML Servers
  • Relational DB to XML converters
  • XSL Editors
  • XSL Processors

43
XSL
  • Allows to incorporate a design into an XML
    document, generating HTML, PDF, mail, SMS
    message, ...
  • Using CSS and DSSSL (SGML)

44
XSL
lt?xml version"1.0"?gt lt!DOCTYPE BookCatalogue
SYSTEM "file//localhost/xml-course/xsl/BookCatalo
gue.dtd"gt ltBookCataloguegt ltBookgt
ltTitlegtMy Life and Timeslt/Titlegt
ltAuthorgtPaul McCartneylt/Authorgt
ltDategtJuly, 1998lt/Dategt
ltISBNgt94303-12021-43892lt/ISBNgt
ltPublishergtMcMillin Publishinglt/Publishergt
lt/Bookgt ltBookgt
ltTitlegtIllusions The Adventures of a Reluctant
Messiahlt/Titlegt ltAuthorgtRichard
Bachlt/Authorgt ltDategt1977lt/Dategt
ltISBNgt0-440-34319-4lt/ISBNgt
ltPublishergtDell Publishing
Co.lt/Publishergt lt/Bookgt ltBookgt
ltTitlegtThe First and Last
Freedomlt/Titlegt ltAuthorgtJ.
Krishnamurtilt/Authorgt
ltDategt1954lt/Dategt
ltISBNgt0-06-064831-7lt/ISBNgt
ltPublishergtHarper amp Rowlt/Publishergt
lt/Bookgt lt/BookCataloguegt
45
XSL
Document /
PI lt?xml version1.0?gt
DocumentType lt!DOCTYPE BookCatalogue ...gt
Element BookCatalogue
Element Book
Element Book
Element Book
...
...
Element ISBN
Element Publisher
Element Author
Element Date
Element Title
Text McMillin Publishing
Text 94303-12021-43892
Text My Life ...
Text July, 1998
Text Paul McCartney
46
XSL
lt?xml version"1.0"?gt ltxslstylesheet
xmlnsxsl"http//www.w3.org/1999/XSL/Transform"
version"1.0"gt
ltxsltemplate match"/"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"BookCatalogue"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Book"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Title"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Author"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Date"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"ISBN"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Publisher"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"text()"gt
ltxslvalue-of select"."/gt lt/xsltemplategt lt/x
slstylesheetgt
47
XSL
lt?xml version"1.0"?gt ltxslstylesheet
xmlnsxsl"http//www.w3.org/1999/XSL/Transform"
version"1.0"gt
ltxsltemplate match"/"gt ltHTMLgt
ltHEADgt ltTITLEgtBook
Cataloguelt/TITLEgt lt/HEADgt
ltBODYgt ltxslapply-templates/gt
lt/BODYgt lt/HTMLgt
lt/xsltemplategt ltxsltemplate
match"BookCatalogue"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Book"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Title"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Author"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Date"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"ISBN"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Publisher"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"text()"gt
ltxslvalue-of select"."/gt lt/xsltemplategt lt/x
slstylesheetgt
added these
BookCatalogue.xsl
48
XML-based formats
  • XML is an architecture not an application
  • SMIL (Synchronized Multimedia Integration
    Language)
  • RDF (Resource Description Framework) for metadata
  • CDF (Channel Definition Format) canales Microsoft
  • MathML (Mathematical Markup Language)
  • CML (Chemical Markup Language)
  • BSML (Bioinformatic Sequence Markup Language)
  • JML
  • WIDL (B2B integration)

49
Processing
  • Two orientations to process XML documents using
    Java as programming language
  • DOM (Document Object Model)
  • tree structure (nodes, elements and text), most
    used
  • SAX (Serial Access with the Simple API for XML)
  • event based
  • Fastest, less memory requirements, more difficult
    to program

50
Some references
  • http//www.w3.org/
  • Official web with all the standards
  • http//www.xml.com/
  • Web from OReilly publishers. A lot of good
    documentation and resources.
  • http//www.xfront.com/
  • Very good tutorials of XSL and XML-Schema
  • http//xml.apache.org
  • Apache parsers and documentation (Xerces, Xalan,
    ...)
  • XML and Java. B. McLAUGHLIN. OReilly, 2000
  • Interesting about their combination using Apache
    parsers
Write a Comment
User Comments (0)
About PowerShow.com