Chapter 2 Structured Web Documents in XML - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Chapter 2 Structured Web Documents in XML

Description:

lecturer name='David Billington' phone=' 61 7 3875 507'/ Chapter 2. A ... If a prefix is not specified: xmlns='location' then the location is used by default ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 75
Provided by: ICS87
Category:

less

Transcript and Presenter's Notes

Title: Chapter 2 Structured Web Documents in XML


1
Chapter 2Structured Web Documents in XML
  • Grigoris Antoniou
  • Frank van Harmelen

2
An HTML Example
  • lth2gtNonmonotonic Reasoning Context-
  • Dependent Reasoninglt/h2gt
  • ltigtby ltbgtV. Mareklt/bgt and
  • ltbgtM. Truszczynskilt/bgtlt/igtltbrgt
  • Springer 1993ltbrgt
  • ISBN 0387976892

3
The Same Example in XML
  • ltbookgt
  • lttitlegtNonmonotonic Reasoning
    Context- Dependent Reasoninglt/titlegt
  • ltauthorgtV. Mareklt/authorgt
  • ltauthorgtM. Truszczynskilt/authorgt
  • ltpublishergtSpringerlt/publishergt
  • ltyeargt1993lt/yeargt
  • ltISBNgt0387976892lt/ISBNgt
  • lt/bookgt

4
HTML versus XML Similarities
  • Both use tags (e.g. lth2gt and lt/yeargt)
  • Tags may be nested (tags within tags)
  • Human users can read and interpret both HTML and
    XML representations quite easily
  • But how about machines?

5
Problems with Automated Interpretation of HTML
Documents
  • An intelligent agent trying to retrieve the names
  • of the authors of the book
  • Authors names could appear immediately after the
    title
  • or immediately after the word by
  • Are there two authors?
  • Or just one, called V. Marek and M.
    Truszczynski?

6
HTML vs XML Structural Information
  • HTML documents do not contain structural
    information pieces of the document and their
    relationships.
  • XML more easily accessible to machines because
  • Every piece of information is described.
  • Relations are also defined through the nesting
    structure.
  • E.g., the ltauthorgt tags appear within the ltbookgt
    tags, so they describe properties of the
    particular book.

7
HTML vs XML Structural Information (2)
  • A machine processing the XML document would be
    able to deduce that
  • the author element refers to the enclosing book
    element
  • rather than by proximity considerations
  • XML allows the definition of constraints on
    values
  • E.g. a year must be a number of four digits

8
HTML vs XML Formatting
  • The HTML representation provides more than the
    XML representation
  • The formatting of the document is also described
  • ?he main use of an HTML document is to display
    information it must define formatting
  • XML separation of content from display
  • same information can be displayed in different
    ways

9
HTML vs XML Another Example
  • In HTML
  • lth2gtRelationship matter-energylt/h2gt
  • ltigt E M c2 lt/igt
  • In XML
  • ltequationgt
  • ltmeaninggtRelationship matter
  • energylt/meaninggt
  • ltleftsidegt E lt/leftsidegt
  • ltrightsidegt M c2 lt/rightsidegt
  • lt/equationgt

10
HTML vs XML Different Use of Tags
  • In both HTML docs same tags
  • In XML completely different
  • HTML tags define display color, lists
  • XML tags not fixed user definable tags
  • XML meta markup language language for defining
    markup languages

11
XML Vocabularies
  • Web applications must agree on common
    vocabularies to communicate and collaborate
  • Communities and business sectors are defining
    their specialized vocabularies
  • mathematics (MathML)
  • bioinformatics (BSML)
  • human resources (HRML)

12
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

13
The XML Language
  • An XML document consists of
  • a prolog
  • a number of elements
  • an optional epilog (not discussed)

14
Prolog of an XML Document
  • The prolog consists of
  • an XML declaration and
  • an optional reference to external structuring
    documents
  • lt?xml version"1.0" encoding"UTF-16"?gt
  • lt!DOCTYPE book SYSTEM "book.dtd"gt

15
XML Elements
  • The things the XML document talks about
  • E.g. books, authors, publishers
  • An element consists of
  • an opening tag
  • the content
  • a closing tag
  • ltlecturergtDavid Billingtonlt/lecturergt

16
XML Elements (2)
  • Tag names can be chosen almost freely.
  • The first character must be a letter, an
    underscore, or a colon
  • No name may begin with the string xml in any
    combination of cases
  • E.g. Xml, xML

17
Content of XML Elements
  • Content may be text, or other elements, or
    nothing
  • ltlecturergt
  • ltnamegtDavid Billingtonlt/namegt
  • ltphonegt 61 - 7 - 3875 507 lt/phonegt
  • lt/lecturergt
  • If there is no content, then the element is
    called empty it is abbreviated as follows
  • ltlecturer/gt for ltlecturergtlt/lecturergt

18
XML Attributes
  • An empty element is not necessarily meaningless
  • It may have some properties in terms of
    attributes
  • An attribute is a name-value pair inside the
    opening tag of an element
  • ltlecturer name"David Billington" phone"61 - 7
    - 3875 507"/gt

19
XML Attributes An Example
  • ltorder orderNo"23456" customer"John Smith"
  • date"October 15, 2002"gt
  • ltitem itemNo"a528" quantity"1"/gt
  • ltitem itemNo"c817" quantity"3"/gt
  • lt/ordergt

20
The Same Example without Attributes
  • ltordergt
  • ltorderNogt23456lt/orderNogt
  • ltcustomergtJohn Smithlt/customergt
  • ltdategtOctober 15, 2002lt/dategt
  • ltitemgt
  • ltitemNogta528lt/itemNogt
  • ltquantitygt1lt/quantitygt
  • lt/itemgt
  • ltitemgt
  • ltitemNogtc817lt/itemNogt
  • ltquantitygt3lt/quantitygt
  • lt/itemgt
  • lt/ordergt

21
XML Elements vs Attributes
  • Attributes can be replaced by elements
  • When to use elements and when attributes is a
    matter of taste
  • But attributes cannot be nested

22
Well-Formed XML Documents
  • Syntactically correct documents
  • Some syntactic rules
  • Only one outermost element (called root element)
  • Each element contains an opening and a
    corresponding closing tag
  • Tags may not overlap
  • ltauthorgtltnamegtLee Honglt/authorgtlt/namegt
  • Attributes within an element have unique names
  • Element and tag names must be permissible

23
The Tree Model of XML Documents An Example
  • ltemailgt
  • ltheadgt
  • ltfrom name"Michael Maher"
  • address"michaelmaher_at_cs.gu.edu.au"/gt
  • ltto name"Grigoris Antoniou"
  • address"grigoris_at_cs.unibremen.de"/gt
  • ltsubjectgtWhere is your draft?lt/subjectgt
  • lt/headgt
  • ltbodygt
  • Grigoris, where is the draft of the paper you
    promised me
  • last week?
  • lt/bodygt
  • lt/emailgt

24
The Tree Model of XML Documents An Example (2)
25
The Tree Model of XML Docs
  • The tree representation of an XML document is an
    ordered labeled tree
  • There is exactly one root
  • There are no cycles
  • Each non-root node has exactly one parent
  • Each node has a label.
  • The order of elements is important
  • but the order of attributes is not important

26
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

27
Structuring XML Documents
  • Define all the element and attribute names that
    may be used
  • Define the structure
  • what values an attribute may take
  • which elements may or must occur within other
    elements, etc.
  • If such structuring information exists, the
    document can be validated

28
Structuring XML Dcuments (2)
  • An XML document is valid if
  • it is well-formed
  • respects the structuring information it uses
  • There are two ways of defining the structure of
    XML documents
  • DTDs (the older and more restricted way)
  • XML Schema (offers extended possibilities)

29
XML Schema
  • Significantly richer language for defining the
    structure of XML documents
  • Tts syntax is based on XML itself
  • not necessary to write separate tools
  • Reuse and refinement of schemas
  • Expand or delete already existent schemas
  • Sophisticated set of data types, compared to DTDs
    (which only supports strings)

30
XML Schema (2)
  • An XML schema is an element with an opening tag
    like
  • ltschema "http//www.w3.org/2000/10/XMLSchema"
  • version"1.0"gt
  • Structure of schema elements
  • Element and attribute types using data types

31
Element Types
  • ltelement name"email"/gt
  • ltelement name"head" minOccurs"1"
    maxOccurs"1"/gt
  • ltelement name"to" minOccurs"1"/gt
  • Cardinality constraints
  • minOccurs"x" (default value 1)
  • maxOccurs"x" (default value 1)
  • Generalizations of ,?, offered by DTDs

32
Attribute Types
  • ltattribute name"id" type"ID use"required"/gt
  • lt attribute name"speaks" type"Language"
  • use"default" value"en"/gt
  • Existence use"x", where x may be optional or
    required
  • Default value use"x" value"...", where x may
    be default or fixed

33
Data Types
  • There is a variety of built-in data types
  • Numerical data types integer, Short etc.
  • String types string, ID, IDREF, CDATA etc.
  • Date and time data types time, Month etc.
  • There are also user-defined data types
  • simple data types, which cannot use elements or
    attributes
  • complex data types, which can use these

34
Data Types (2)
  • Complex data types are defined from already
    existing data types by defining some attributes
    (if any) and using
  • sequence, a sequence of existing data type
    elements (order is important)
  • all, a collection of elements that must appear
    (order is not important)
  • choice, a collection of elements, of which one
    will be chosen

35
A Data Type Example
  • ltcomplexType name"lecturerType"gt
  • ltsequencegt
  • ltelement name"firstname" type"string"
  • minOccurs"0 maxOccurs"unbounded"/gt
  • ltelement name"lastname" type"string"/gt
  • lt/sequencegt
  • ltattribute name"title" type"string"
    use"optional"/gt
  • lt/complexTypegt

36
XML Schema The Email Example
  • ltelement name"email" type"emailType"/gt
  • ltcomplexType name"emailType"gt
  • ltsequencegt
  • ltelement name"head" type"headType"/gt
  • ltelement name"body" type"bodyType"/gt
  • lt/sequencegt
  • lt/complexTypegt

37
XML Schema The Email Example (2)
  • ltcomplexType name"headType"gt
  • ltsequencegt
  • ltelement name"from" type"nameAddress"/gt
  • ltelement name"to" type"nameAddress"
  • minOccurs"1" maxOccurs"unbounded"/gt
  • ltelement name"cc" type"nameAddress"
  • minOccurs"0" maxOccurs"unbounded"/gt
  • ltelement name"subject" type"string"/gt
  • lt/sequencegt
  • lt/complexTypegt

38
XML Schema The Email Example (3)
  • ltcomplexType name"nameAddress"gt
  • ltattribute name"name" type"string"
    use"optional"/gt
  • ltattribute name"address" type"string"
    use"required"/gt
  • lt/complexTypegt
  • Similar for bodyType

39
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

40
Namespaces
  • An XML document may use more than one DTD or
    schema
  • Since each structuring document was developed
    independently, name clashes may appear
  • The solution is to use a different prefix for
    each DTD or schema
  • prefixname

41
An Example
  • ltvuinstructors xmlnsvu"http//www.vu.com/empDT
    D"
  • xmlnsgu"http//www.gu.au/empDTD"
  • xmlnsuky"http//www.uky.edu/empDTD"gt
  • ltukyfaculty ukytitle"assistant professor"
  • ukyname"John Smith"
  • ukydepartment"Computer Science"/gt
  • ltguacademicStaff gutitle"lecturer"
  • guname"Mate Jones"
  • guschool"Information Technology"/gt
  • lt/vuinstructorsgt

42
Namespace Declarations
  • Namespaces are declared within an element and can
    be used in that element and any of its children
    (elements and attributes)
  • A namespace declaration has the form
  • xmlnsprefix"location"
  • location is the address of the DTD or schema
  • If a prefix is not specified xmlns"location"
    then the location is used by default

43
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

44
Addressing and Querying XML Documents
  • In relational databases, parts of a database can
    be selected and retrieved using SQL
  • Same necessary for XML documents
  • Query languages XQuery, XQL, XML-QL
  • The central concept of XML query languages is a
    path expression
  • Specifies how a node or a set of nodes, in the
    tree representation of the XML document can be
    reached

45
XPath
  • XPath is core for XML query languages
  • Language for addressing parts of an XML document.
  • It operates on the tree data model of XML
  • It has a non-XML syntax

46
Types of Path Expressions
  • Absolute (starting at the root of the tree)
  • Syntactically they begin with the symbol /
  • It refers to the root of the document (situated
    one level above the root element of the document)
  • Relative to a context node

47
An XML Example
  • ltlibrary location"Bremen"gt
  • ltauthor name"Henry Wise"gt
  • ltbook title"Artificial Intelligence"/gt
  • ltbook title"Modern Web Services"/gt
  • ltbook title"Theory of Computation"/gt
  • lt/authorgt
  • ltauthor name"William Smart"gt
  • ltbook title"Artificial Intelligence"/gt
  • lt/authorgt
  • ltauthor name"Cynthia Singleton"gt
  • ltbook title"The Semantic Web"/gt
  • ltbook title"Browser Technology Revised"/gt
  • lt/authorgt
  • lt/librarygt

48
Tree Representation
49
Examples of Path Expressions in XPath
  • Address all author elements
  • /library/author
  • Addresses all author elements that are children
    of the library element node, which resides
    immediately below the root
  • /t1/.../tn, where each ti1 is a child node of
    ti, is a path through the tree representation

50
Examples of Path Expressions in XPath (2)
  • Address all author elements
  • //author
  • Here // says that we should consider all elements
    in the document and check whether they are of
    type author
  • This path expression addresses all author
    elements anywhere in the document

51
Examples of Path Expressions in XPath (3)
  • Address the location attribute nodes within
    library element nodes
  • /library/_at_location
  • The symbol _at_ is used to denote attribute nodes

52
Examples of Path Expressions in XPath (4)
  • Address all title attribute nodes within book
    elements anywhere in the document, which have the
    value Artificial Intelligence
  • //book/_at_title"Artificial Intelligence"

53
Examples of Path Expressions in XPath (5)
  • Address all books with title Artificial
    Intelligence
  • /book_at_title"Artificial Intelligence"
  • Test within square brackets a filter expression
  • It restricts the set of addressed nodes.
  • Difference with query 4.
  • Query 5 addresses book elements, the title of
    which satisfies a certain condition.
  • Query 4 collects title attribute nodes of book
    elements

54
Tree Representation of Query 4
55
Tree Representation of Query 5
56
Examples of Path Expressions in XPath (6)
  • Address the first author element node in the XML
    document
  • //author1
  • Address the last book element within the first
    author element node in the document
  • //author1/booklast()
  • Address all book element nodes without a title
    attribute
  • //booknot _at_title

57
Lecture Outline
  • Introduction
  • Detailed Description of XML
  • Structuring
  • DTDs
  • XML Schema
  • Namespaces
  • Accessing, querying XML documents XPath
  • Transformations XSLT

58
Displaying XML Documents
  • ltauthorgt
  • ltnamegtGrigoris Antonioult/namegt
  • ltaffiliationgtUniversity of Bremenlt/affiliationgt
  • ltemailgtga_at_tzi.delt/emailgt
  • lt/authorgt
  • may be displayed in different ways
  • Grigoris Antoniou Grigoris Antoniou
  • University of Bremen University of Bremen
  • ga_at_tzi.de ga_at_tzi.de

59
Style Sheets
  • Style sheets can be written in various languages
  • E.g. CSS2 (cascading style sheets level 2)
  • XSL (extensible stylesheet language)
  • XSL includes
  • a transformation language (XSLT)
  • a formatting language
  • Both are XML applications

60
XSL Transformations (XSLT)
  • XSLT specifies rules with which an input XML
    document is transformed to
  • another XML document
  • an HTML document
  • plain text
  • The output document may use the same DTD or
    schema, or a completely different vocabulary
  • XSLT can be used independently of the formatting
    language

61
XSLT (2)
  • Move data and metadata from one XML
    representation to another
  • XSLT is chosen when applications that use
    different DTDs or schemas need to communicate
  • XSLT can be used for machine processing of
    content without any regard to displaying the
    information for people to read.
  • In the following we use XSLT only to display XML
    documents

62
XSLT Transformation into HTML
  • ltxsltemplate match"/author"gt
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltbgtltxslvalue-of select"name"/gtlt/bgtltbrgt
  • ltxslvalue-of select"affiliation"/gtltbrgt
  • ltigtltxslvalue-of select"email"/gtlt/igt
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt

63
Style Sheet Output
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltbgtGrigoris Antonioult/bgtltbrgt
  • University of Bremenltbrgt
  • ltigtga_at_tzi.delt/igt
  • lt/bodygt
  • lt/htmlgt

64
Observations About XSLT
  • XSLT documents are XML documents
  • XSLT resides on top of XML
  • The XSLT document defines a template
  • In this case an HTML document, with some
    placeholders for content to be inserted
  • xslvalue-of retrieves the value of an element
    and copies it into the output document
  • It places some content into the template

65
A Template
  • lthtmlgt
  • ltheadgtlttitlegtAn authorlt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltbgt...lt/bgtltbrgt
  • ...ltbrgt
  • ltigt...lt/igt
  • lt/bodygt
  • lt/htmlgt

66
Auxiliary Templates
  • We have an XML document with details of several
    authors
  • It is a waste of effort to treat each author
    element separately
  • In such cases, a special template is defined for
    author elements, which is used by the main
    template

67
Example of an Auxiliary Template
  • ltauthorsgt
  • ltauthorgt
  • ltnamegtGrigoris Antonioult/namegt
  • ltaffiliationgtUniversity of Bremenlt/affiliationgt
  • ltemailgtga_at_tzi.delt/emailgt
  • lt/authorgt
  • ltauthorgt
  • ltnamegtDavid Billingtonlt/namegt
  • ltaffiliationgtGriffith Universitylt/affiliationgt
  • ltemailgtdavid_at_gu.edu.netlt/emailgt
  • lt/authorgt
  • lt/authorsgt

68
Example of an Auxiliary Template (2)
  • ltxsltemplate match"/"gt
  • lthtmlgt
  • ltheadgtlttitlegtAuthorslt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltxslapply-templates select"authors"/gt
  • lt!-- Apply templates for AUTHORS children
    --gt
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt

69
Example of an Auxiliary Template (3)
  • ltxsltemplate match"authors"gt
  • ltxslapply-templates select"author"/gt
  • lt/xsltemplategt
  • ltxsltemplate match"author"gt
  • lth2gtltxslvalue-of select"name"/gtlt/h2gt
  • Affiliationltxslvalue-of
  • select"affiliation"/gtltbrgt
  • Email ltxslvalue-of select"email"/gt
  • ltpgt
  • lt/xsltemplategt

70
Multiple Authors Output
  • lthtmlgt
  • ltheadgtlttitlegtAuthorslt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • lth2gtGrigoris Antonioult/h2gt
  • Affiliation University of Bremenltbrgt
  • Email ga_at_tzi.de
  • ltpgt
  • lth2gtDavid Billingtonlt/h2gt
  • Affiliation Griffith Universityltbrgt
  • Email david_at_gu.edu.net
  • ltpgt
  • lt/bodygt
  • lt/htmlgt

71
Explanation of the Example
  • xslapply-templates element causes all children
    of the context node to be matched against the
    selected path expression
  • E.g., if the current template applies to /, then
    the element xslapply-templates applies to the
    root element
  • I.e. the authors element (/ is located above the
    root element)
  • If the current context node is the authors
    element, then the element xslapply-templates
    select"author" causes the template for the
    author elements to be applied to all author
    children of the authors element

72
Explanation of the Example (2)
  • It is good practice to define a template for each
    element type in the document
  • Even if no specific processing is applied to
    certain elements, the xslapply-templates element
    should be used
  • E.g. authors
  • In this way, we work from the root to the leaves
    of the tree, and all templates are applied

73
Summary
  • XML is a metalanguage that allows users to define
    markup
  • XML separates content and structure from
    formatting
  • XML is the de facto standard for the
    representation and exchange of structured
    information on the Web
  • XML is supported by query languages

74
Points for Discussion in Subsequent Chapters
  • The nesting of tags does not have standard
    meaning
  • The semantics of XML documents is not accessible
    to machines, only to people
  • Collaboration and exchange are supported if there
    is underlying shared understanding of the
    vocabulary
  • XML is well-suited for close collaboration, where
    domain- or community-based vocabularies are used
  • It is not so well-suited for global communication.
Write a Comment
User Comments (0)
About PowerShow.com