DTD (Document Type Definition) PowerPoint PPT Presentation

presentation player overlay
1 / 34
About This Presentation
Transcript and Presenter's Notes

Title: DTD (Document Type Definition)


1
DTD(Document Type Definition)
  • Imposing Structure on
  • XML Documents
  • (W3Schools on DTDs)

2
Motivation
  • A DTD adds syntactical requirements in addition
    to the well-formed requirement
  • It helps in eliminating errors when creating or
    editing XML documents
  • It clarifies the intended semantics
  • It simplifies the processing of XML documents

3
An Example
  • In an address book, where can a phone number
    appear?
  • Under ltpersongt, under ltnamegt or under both?
  • If we have to check for all possibilities,
    processing takes longer and it may not be clear
    to whom a phone belongs

4
Document Type Definitions
  • Document Type Definitions (DTDs) impose structure
    on XML documents
  • There is some relationship between a DTD and a
    schema, but it is not close hence the need for
    additional typing systems (XML schemas)
  • The DTD is a syntactic specification

5
Example An Address Book
  • ltpersongt
  • ltnamegt Homer Simpson lt/namegt
  • ltgreetgt Dr. H. Simpson lt/greetgt
  • ltaddrgt1234 Springwater Road lt/addrgt
  • ltaddrgt Springfield USA, 98765 lt/addrgt
  • lttelgt (321) 786 2543 lt/telgt
  • ltfaxgt (321) 786 2544 lt/faxgt
  • lttelgt (321) 786 2544 lt/telgt
  • ltemailgt homer_at_math.springfield.edu lt/emailgt
  • lt/persongt

6
Specifying the Structure
  • name to specify a name element
  • greet? to specify an optional (0 or 1)
    greet elements
  • name, greet? to specify a name followed by
    an optional greet

7
Specifying the Structure (contd)
  • addr to specify 0 or more address lines
  • tel fax a tel or a fax element
  • (tel fax) 0 or more repeats of tel or fax
  • email 0 or more email elements

8
Specifying the Structure (contd)
  • So the whole structure of a person entry is
    specified by
  • name, greet?, addr, (tel fax), email
  • This is known as a regular expression

9
Element Type Definition
  • for each element type E, a declaration of the
    form
  • lt!ELEMENT E Pgt
  • where P is a regular expression, i.e.,
  • P EMPTY ANY PCDATA E
  • P1, P2 P1 P2 P? P P
  • E element type
  • P1 , P2 concatenation
  • P1 P2 disjunction
  • P? optional
  • P one or more occurrences
  • P the Kleene closure

10
Summary of Regular Expressions
  • A The tag (i.e., element) A occurs
  • e1,e2 The expression e1 followed by e2
  • e 0 or more occurrences of e
  • e? Optional 0 or 1 occurrences
  • e 1 or more occurrences
  • e1 e2 either e1 or e2
  • (e) grouping

11
The Definition of an Element Consists of Exactly
One of the Following
  • A regular expression (as defined earlier)
  • EMPTY means that the element has no content
  • ANY means that content can be any mixture of
    PCDATA and elements defined in the DTD
  • Mixed content which is defined as described on
    the next slide
  • (PCDATA)

12
The Definition of Mixed Content
  • Mixed content is described by a repeatable OR
    group
  • (PCDATA element-name )
  • Inside the group, no regular expressions just
    element names
  • PCDATA must be first followed by 0 or more
    element names, separated by
  • The group can be repeated 0 or more times

13
An Address-Book XML Document with an Internal DTD
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE addressbook
  • lt!ELEMENT addressbook (person)gt
  • lt!ELEMENT person
  • (name, greet?, address, (fax tel),
    email)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ELEMENT greet (PCDATA)gt
  • lt!ELEMENT address (PCDATA)gt
  • lt!ELEMENT tel (PCDATA)gt
  • lt!ELEMENT fax (PCDATA)gt
  • lt!ELEMENT email (PCDATA)gt
  • gt

The syntax of a DTD is not XML syntax
14
The Rest of theAddress-Book XML Document
ltaddressbookgt ltpersongt ltnamegt Jeff
Cohen lt/namegt ltgreetgt Dr. Cohen
lt/greetgt ltemailgt jc_at_penny.com lt/emailgt
lt/persongt lt/addressbookgt
15
Regular Expressions
  • Each regular expression determines a
    corresponding finite-state automaton
  • Lets start with a simpler example
  • name, addr, email

A double circle denotes an accepting state
This suggests a simple parsing program
16
Another Example
  • name,address,(tel fax),email

17
Some Things are Hard to Specify
  • Each employee element should contain name, age
    and ssn elements in some order
  • lt!ELEMENT employee
  • ( (name, age, ssn) (age, ssn, name)
  • (ssn, name, age) ...
  • )gt
  • Suppose that there were many more fields!

18
Some Things are Hard to Specify (contd)
  • lt!ELEMENT employee
  • ( (name, age, ssn) (age, ssn, name)
  • (ssn, name, age) ...
  • )gt
  • Suppose there were many more fields!

There are n! different orders of n elements It
is not even polynomial
19
Specifying Attributes in the DTD
  • lt!ELEMENT height (PCDATA)gt
  • lt!ATTLIST height
  • dimension CDATA REQUIRED
  • accuracy CDATA IMPLIED gt
  • The dimension attribute is required
  • The accuracy attribute is optional
  • CDATA is the type of the attribute it means
    character data, and may take any literal string
    as a value

20
The Format of an Attribute Definition
  • lt!ATTLIST element-name attr-name attr-type
    default-valuegt
  • The default value is given inside quotes
  • attribute types
  • CDATA
  • ID, IDREF, IDREFS

21
Summary of AttributeDefault Values
  • REQUIRED means that the attribute must by
    included in the element
  • IMPLIED
  • FIXED value
  • The given value (inside quotes) is the only
    possible one
  • value
  • The default value of the attribute if none is
    given

22
Recursive DTDs
  • ltDOCTYPE genealogy
  • lt!ELEMENT genealogy (person)gt
  • lt!ELEMENT person (
  • name,
  • dateOfBirth,
  • person, -- mother
  • person )gt -- father
  • ...
  • gt
  • What is the problem with this?
  • A parser does not notice it!

Each person should have a father and a mother.
This leads to either infinite data or a person
that is a descendent of herself.
23
Recursive DTDs (contd)
  • ltDOCTYPE genealogy
  • lt!ELEMENT genealogy (person)gt
  • lt!ELEMENT person (
  • name,
  • dateOfBirth,
  • person?, -- mother
  • person? )gt -- father
  • ...
  • gt
  • What is now the problem with this?

If a person only has a father, how can you
tell that he has a father and does not have a
mother?
24
Using ID and IDREF Attributes
  • lt!DOCTYPE family
  • lt!ELEMENT family (person)gt
  • lt!ELEMENT person (name)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ATTLIST person
  • id ID REQUIRED
  • mother IDREF IMPLIED
  • father IDREF IMPLIED
  • children IDREFS IMPLIEDgt
  • gt

25
IDs and IDREFs
  • ID attribute unique within the entire document.
  • An element can have at most one ID attribute.
  • No default (fixed default) value is allowed.
  • required a value must be provided
  • implied a value is optional
  • IDREF attribute its value must be some other
    elements ID value in the document.
  • IDREFS attribute its value is a set, each
    element of the set is the ID value of some other
    element in the document.
  • ltperson id898 father332 mother336
  • children982 984 986gt

26
Some Conforming Data
  • ltfamilygt
  • ltperson idlisa mothermarge
    fatherhomergt
  • ltnamegt Lisa Simpson lt/namegt
  • lt/persongt
  • ltperson idbart mothermarge
    fatherhomergt
  • ltnamegt Bart Simpson lt/namegt
  • lt/persongt
  • ltperson idmarge childrenbart lisagt
  • ltnamegt Marge Simpson lt/namegt
  • lt/persongt
  • ltperson idhomer childrenbart lisagt
  • ltnamegt Homer Simpson lt/namegt
  • lt/persongt
  • lt/familygt

27
ID References do not Have Types
  • The attributes mother and father are references
    to IDs of other elements
  • However, those are not necessarily person
    elements!
  • The mother attribute is not necessarily a
    reference to a female person

28
An Alternative Specification
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE family
  • lt!ELEMENT family (person)gt
  • lt!ELEMENT person (name, mother?, father?,
    children?)gt
  • lt!ATTLIST person id ID REQUIREDgt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ELEMENT mother EMPTYgt
  • lt!ATTLIST mother idref IDREF REQUIREDgt
  • lt!ELEMENT father EMPTYgt
  • lt!ATTLIST father idref IDREF REQUIREDgt
  • lt!ELEMENT children EMPTYgt
  • lt!ATTLIST children idrefs IDREFS REQUIREDgt
  • gt

29
The Revised Data
  • ltperson id"bart"gt
  • ltnamegt Bart Simpson lt/namegt
  • ltmother idref"marge"/gt
  • ltfather idref"homer"/gt
  • lt/persongt
  • ltperson id"lisa"gt
  • ltnamegt Lisa
  • Simpson lt/namegt
  • ltmother idref"marge"/gt
  • ltfather idref"homer"/gt
  • lt/persongt
  • lt/familygt
  • ltfamilygt
  • ltperson id"marge"gt
  • ltnamegt Marge
  • Simpson lt/namegt
  • ltchildren idrefs"bart lisa"/gt
  • lt/persongt
  • ltperson id"homer"gt
  • ltnamegt Homer
  • Simpson lt/namegt
  • ltchildren idrefs"bart lisa"/gt
  • lt/persongt

30
Consistency of ID and IDREF Attribute Values
  • If an attribute is declared as ID
  • The associated value must be distinct, i.e.,
    different elements (in the given document) must
    have different values for the ID attribute (no
    confusion)
  • Even if the two elements have different element
    names
  • If an attribute is declared as IDREF
  • The associated value must exist as the value of
    some ID attribute (no dangling pointers)
  • Similarly for all the values of an IDREFS
    attribute
  • ID, IDREF and IDREFS attributes are not typed

31
Adding a DTD to the Document
  • A DTD can be internal
  • The DTD is part of the document file
  • or external
  • The DTD and the document are on separate files
  • An external DTD may reside
  • In the local file system
  • (where the document is)
  • In a remote file system

32
Connecting a Document with its DTD
  • An internal DTD
  • lt?xml version"1.0"?gt
  • lt!DOCTYPE db lt!ELEMENT ...gt gt
  • ltdbgt ... lt/dbgt
  • A DTD from the local file system
  • lt!DOCTYPE db SYSTEM "schema.dtd"gt
  • A DTD from a remote file system
  • lt!DOCTYPE db SYSTEM
    "http//www.schemaauthority.com/schema.dtd"gt

33
Well-Formed XML Documents
  • An XML document (with or without a DTD) is
    well-formed if
  • Tags are syntactically correct
  • Every tag has an end tag
  • Tags are properly nested
  • There is a root tag
  • A start tag does not have two occurrences of the
    same attribute

An XML document must be well formed
34
Valid Documents
  • A well-formed XML document isvalid if it
    conforms to its DTD, that is,
  • The document conforms to the regular-expression
    grammar,
  • The types of attributes are correct, and
  • The constraints on references are satisfied
Write a Comment
User Comments (0)
About PowerShow.com