XML - PowerPoint PPT Presentation

About This Presentation
Title:

XML

Description:

person sex='f' name Elaine Vassal /name ... is precisely a context-free grammar (non-terminal ordered list of one or more ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 41
Provided by: sebastia45
Category:
Tags: xml | free | pics

less

Transcript and Presenter's Notes

Title: XML


1
XML
  • An introduction in relation to semistructured data

2
Overview
  • Background / History
  • Basic syntax
  • XML and semistructured data
  • Document type definitions
  • Extensions for XML
  • Paraphernalia

3
Overview
  • Background / History
  • SGML
  • SGML, HTML and XML
  • World Wide Web Consortium
  • Basic syntax
  • XML and semistructured data
  • Document type definitions
  • Extensions for XML
  • Paraphernalia

4
Standard Generalized Markup Language (SGML)
  • model information exclusively on basis of its
    inner laws and its function
  • ? platform independent storage of structured
    information
  • standard ISO 8879 from 1986

5
SGML, HTML and XML
  • SGML(web application) HTML (is one special
    instance of SGML)
  • XML ? SGML

6
Why XML from SGML?
  • SGML
  • is exceedingly complex and difficult to
    understand
  • is formally so complex, that online-applications
    have difficulties to process it in reasonable
    time
  • has many properties which were not designed for
    use in network environments (remember that it is
    a standard from 1986)

7
World Wide Web Consortium
  • Nov 1996 initial XML draft
  • Dec 1997 XML1.0 Proposed Recommendation
  • Feb 1998 W3C Recommendation Extensible Markup
    Language (XML) 1.0
  • Oct 2000 XML1.0 2nd edition

8
Overview
  • Background / History
  • Basic syntax
  • Elements
  • Attributes
  • Well-formed XML documents
  • XML and semistructured data
  • Document type definitions
  • Extensions for XML
  • Paraphernalia

9
Elements
  • element lttaggt content lt/taggt
  • lttaggt, lt/taggt markups
  • content structures between markups
  • no predefined tags
  • basic content (no markups) is treated as text
    PCDATA (Parsed Character Data)
  • abbreviation for empty elements lttag /gt

10
Example
  • ltpersonnelgt
  • ltpersongt
  • ltnamegt John Cage lt/namegt
  • ltfunctiongt Bearer lt/functiongt
  • lt/persongt
  • ltpersongt
  • ltnamegt Elaine Vassal lt/namegt
  • ltfunctiongt chief secretary lt/functiongt
  • lt/persongt
  • lt/personnelgt

11
Attributes
  • sometimes called property in data models
  • (namevalue) pairs
  • value always a string (type NMTOKEN)
  • allows building of groups of elements
  • ambiguity information as attribute or element?

12
Example
  • ltpersonnelgt
  • ltperson sexmgt
  • ltnamegt John Cage lt/namegt
  • ltfunction departmentcivil rightsgt Bearer
    lt/functiongt
  • lt/persongt
  • ltperson sexfgt
  • ltnamegt Elaine Vassal lt/namegt
  • ltfunction departmentadmingt chief secretary
    lt/functiongt
  • lt/persongt
  • lt/personnelgt

13
Well-formed XML documents
  • a XML document is well-formed, if
  • tags nest properly
  • (not ltt1gtltt2gtlt/t1gtlt/t2gt)
  • attributes are unique within one element
  • (not lttag atta attbgt)

14
Overview
  • Background / History
  • Basic syntax
  • XML and semistructured data
  • Simple transformations
  • Differences that make transformation more
    difficult
  • Additional constructs
  • Document type definitions
  • Extensions for XML
  • Paraphernalia

15
Simple transformations
  • with basic XML syntax (no attributes, tree as
    data structure)
  • from XML to ssd
  • ltpersongt
  • ltnamegt John Cage lt/namegt
  • ltfunctiongt Bearer lt/functiongt
  • lt/persongt
  • ? person name John Cage, function
    bearer

16
Simple transformations II
  • from ssd to XML (transformation function T)
  • T(atomic value) atomic value
  • T(l1 v1, , ln vn)
  • ltl1gt T(v1) lt/l1gt
  • ltlngt T(vn) lt/lngt

17
Differences that make transformation more
difficult
  • different semantic of labels
  • element or attribute
  • order
  • mixing elements and text

18
Semantics of labels
  • XML
  • graphs with labels on nodes
  • ssd
  • graphs with labels on edges

ltpersongt ltnamegtAlanlt/namegt ltagegt42lt/agegt ltemail
gtab_at_comlt/emailgt lt/persongt
person name Alan, age 42,
email ab_at_com
19
Element or attribute
  • ambiguity between representation of information
    as element or as attribute
  • ? different possibilities of encoding
  • in particular in combination with references
  • ltagt ltb ido123gt some string lt/bgtlt/agt
  • lta co123 /gt
  • or
  • lta bo123 /gt
  • ltagt ltc ido123gt some string lt/cgtlt/agt

20
Order
  • ssd model based on unordered collections
  • XML elements are ordered
  • but XML attributes are not
  • unordered data can be processed more efficiently
  • ? for data exchange applications ignore order of
    XML

21
Mixing elements and text
  • XML allows mixing of PCDATA and subelements
  • lttalkgt
  • XML - An introduction in relation to
    semistructured data
  • ltspeakergt Sebastian Bitzer lt/speakergt
  • lt/talkgt

22
Additional constructs in XML
  • comments
  • lt!-- comment --gt
  • processing instructions
  • lt?application-name instruction-textgt
  • CDATA (for escaping)
  • lt!CDATA markups wont be processed here gt
  • entities
  • e.g. auml but also external files can be
    declared as entities e.g. a gif-file as pic-1

23
Overview
  • Background / History
  • Basic syntax
  • XML and semistructured data
  • Document type definitions
  • DTDs as grammars
  • DTDs as schemas
  • Attributes
  • Valid XML documents
  • Limitations
  • Extensions for XML
  • Paraphernalia

24
DTDs as grammar
  • document type definition (DTD) serves as grammar
    for underlying XML document
  • is precisely a context-free grammar (non-terminal
    ? ordered list of one or more terminals and
    non-terminals)
  • can be recursive

25
Definitions
  • DTD
  • lt!DOCTYPE root-name element-def.s gt
  • element-def.s
  • lt!ELEMENT name ( content model )gt
  • content model
  • ordered list of names of elements which can
    occur in the outer element

26
Variations of content model
  • lt!ELEMENT r1 (a?, b, c d)gt
  • means that elements of type r1 contain
  • 0 or 1 a (a is optional) and
  • arbitrary many b (0 - 8) and
  • either exactly 1 c (c is obligatory)
  • or at least 1 d (d is required)
  • groups can be build, too
  • lt!ELEMENT r2 ((a, b), c?)gt
  • means at least one sequence of a followed by
    b comes in front of the optional c

27
DTDs as Schemas
  • DTD
  • lt!DOCTYPE db
  • lt!ELEMENT db ((r1,r2))gt
  • lt!ELEMENT r1 ((a,b,c)(a,c,b) (b,a,c) (b,c,a)
    (c,a,b) (c,b,a))gt
  • lt!ELEMENT r2 ((c, d) (d, c))gt
  • lt!ELEMENT a (PCDATA)gt
  • lt!ELEMENT b (PCDATA)gt
  • lt!ELEMENT c (PCDATA)gt
  • lt!ELEMENT d (PCDATA)gt
  • gt
  • can be seen as representation for relational
    schema r1(a,b,c), r2(c,d)

28
Declaring attributes
  • lt!ATTLIST el.name att.name1 type1 spec1
  • att.name2 type2 spec2
  • gt
  • el.name element which is modified by att.s
  • type often CDATA, but also more restricted
    e.g. (mf) for male or female in att. sex
  • spec REQUIRED, IMPLIED, FIXED or default value

29
Unique Identifiers
  • e.g.
  • lt!ATTLIST person id ID REQUIRED
  • mom IDREF IMPLIED
  • dad IDREF IMPLIED
  • children IDREFS IMPLIED
  • instance
  • ltperson idjohn momjane dadjames
    childrenjack jimgt

30
Valid XML documents
  • a XML document is valid, if
  • document is well-formed
  • additionally has a DTD
  • conforms to that DTD
  • elements only nested as described in DTD
  • just attributes used which are allowed by DTD
  • all attributes of type ID must have distinct
    values
  • all IDREFS must be to existing identifiers

31
Limitations of DTDs as schemas (summarized)
  • order
  • only one atomic type (PCDATA, but no INT etc.)
  • names are global (partial solution namespaces)
  • IDREFs are not constrained to a certain type
    (mother-reference should point to a person)

32
Overview
  • Background / History
  • Basic syntax
  • XML and semistructured data
  • Document type definitions
  • Extensions for XML
  • DCD
  • Document navigation
  • Paraphernalia

33
Document Content Definitions
  • making typing more precise
  • seems to be gone
  • recent approach XML Schema which must e.g.
  • provide for primitive data typing, including
    byte, date, integer, sequence, SQL Java
    primitive data types, etc.
  • allow creation of user-defined datatypes, such as
    datatypes that are derived from existing
    datatypes and which may constrain certain of its
    properties
  • mechanism for URI reference to standard semantic
    understanding of a construct
  • (http//www.w3.org/TR/NOTE-xml-schema-req)

34
XLink XPointer
  • pointing to arbitrary positions in documents
  • using IDs or relative position
  • links can be defined externally to both source
    and target (files)

35
Overview
  • Background / History
  • Basic syntax
  • XML and semistructured data
  • Document type definitions
  • Extensions for XML
  • Paraphernalia
  • RDF
  • Stylesheets
  • SAX and DOM

36
Resource Description Framework
  • for representing metadata
  • consists of data model and syntax
  • simple form edge-labelled graph
  • additionally
  • containers (bag, sequence or alternative)
  • higher-order statements (John says that )

37
Stylesheets
  • to specify presentation of data
  • Cascading Style Sheets (CSS)
  • associate with each element type a presentation
  • Extensible Stylesheet Language (XSL)
  • specifies the presentation of a class of XML
    documents by describing how an instance of the
    class is transformed into an XML document that
    uses the formatting vocabulary
  • http//www.w3.org/Style/XSL/

38
SAX and DOM
  • Application Programming Interfaces
  • Simple API for XML (SAX)
  • standard for parsing
  • Document Object Model (DOM)
  • interface that will allow programs and scripts to
    dynamically access and update the content,
    structure and style of documents
  • compile whole document and build a tree
    representation for it
  • http//www.w3.org/DOM/

39
Outlook
  • Database issues
  • How are we going to model XML? (graphs).
  • How are we going to query XML? (XML-QL)
  • How are we going to store XML (in a relational
    database? object-oriented?)
  • How are we going to process XML efficiently? (uh
    well..., um..., ah..., get some good grad
    students!)

Raghu Ramakrishnan http//www.cs.wisc.edu/cs784-1
/handouts/intro-ssxml.ppt
40
References
  • S. Abiteboul, P. Buneman, and D. Suciu, Data on
    the Web. From relations to Semistructured Data
    and XML, Morgan Kaufmann Publishers, San
    Francisco 2000
  • H. Lobin, Informationsmodellierung in XML und
    SGML, Berlin, Heidelberg, 2000
  • World Wide Web Consortium, Extensible Markup
    Language (XML), http//www.w3.org/XML/
Write a Comment
User Comments (0)
About PowerShow.com