XML - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

XML

Description:

Well-Formed XML allows you to invent your own tags. Similar to labels in semistructured data. ... BEER NAME Bud /NAME PRICE 2.50 /PRICE /BEER ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 36
Provided by: jeff456
Category:
Tags: xml

less

Transcript and Presenter's Notes

Title: XML


1
XML
  • Semistructured Data
  • Extensible Markup Language
  • Document Type Definitions

2
Semistructured Data
  • Another data model, based on trees.
  • Motivation flexible representation of data.
  • Often, data comes from multiple sources with
    differences in notation, meaning, etc.
  • Motivation sharing of documents among systems
    and databases.

3
The Information-Integration Problem
  • Related data exists in many places and could, in
    principle, work together.
  • But different databases differ in
  • Model (relational, object-oriented?).
  • Schema (normalized/unnormalized?).
  • Terminology are consultants employees?
    Retirees? Subcontractors?
  • Conventions (meters versus feet?).

4
Example
  • Every bar has a database.
  • One may use a relational DBMS another keeps the
    menu in an MS-Word document.
  • One stores the phones of distributors, another
    does not.
  • One distinguishes ales from other beers, another
    doesnt.
  • One counts beer inventory by bottles, another by
    cases.

5
Two Approaches to Integration
  • Warehousing Make copies of the data sources at
    a central site and transform it to a common
    schema.
  • Reconstruct data daily/weekly, but do not try to
    keep it more up-to-date than that.
  • Mediation Create a view of all sources, as if
    they were integrated.
  • Answer a view query by translating it to
    terminology of the sources and querying them.

6
Warehouse Diagram
Warehouse
Wrapper
Wrapper
Source 1
Source 2
7
A Mediator
Mediator
Wrapper
Wrapper
Source 1
Source 2
8
Graphs of Semistructured Data
  • Nodes objects.
  • Labels on arcs (attributes, relationships).
  • Atomic values at leaf nodes (nodes with no arcs
    out).
  • Flexibility no restriction on
  • Labels out of a node.

9
Example Data Graph
root
beer
beer
bar
manf
manf
prize
A.B.
name
name
year
award
servedAt
Bud
Gold
1995
Miller
name
addr
Maple
Joes
10
XML
  • XML Extensible Markup Language.
  • While HTML uses tags for formatting (e.g.,
    italic), XML uses tags for semantics (e.g.,
    this is an address).
  • Key idea create tag sets for a domain (e.g.,
    genomics), and translate all data into properly
    tagged XML documents.

11
Well-Formed and Valid XML
  • Well-Formed XML allows you to invent your own
    tags.
  • Similar to labels in semistructured data.
  • Valid XML involves a DTD (Document Type
    Definition), a grammar for tags.

12
Well-Formed XML
  • Start the document with a declaration, surrounded
    by lt?xml ?gt .
  • Normal declaration is
  • lt?xml version 1.0 standalone yes ?gt
  • Standalone no DTD provided.
  • Balance of document is a root tag surrounding
    nested tags.

13
Tags
  • Tags, as in HTML, are normally matched pairs, as
    ltFOOgt lt/FOOgt .
  • Tags may be nested arbitrarily.
  • XML tags are case sensitive.

14
Example Well-Formed XML
  • lt?xml version 1.0 standalone yes ?gt
  • ltBARSgt
  • ltBARgtltNAMEgtJoes Barlt/NAMEgt
  • ltBEERgtltNAMEgtBudlt/NAMEgt
  • ltPRICEgt2.50lt/PRICEgtlt/BEERgt
  • ltBEERgtltNAMEgtMillerlt/NAMEgt
  • ltPRICEgt3.00lt/PRICEgtlt/BEERgt
  • lt/BARgt
  • ltBARgt
  • lt/BARSgt

15
XML and Semistructured Data
  • Well-Formed XML with nested tags is exactly the
    same idea as trees of semistructured data.
  • We shall see that XML also enables nontree
    structures, as does the semistructured data model.

16
Example
  • The ltBARSgt XML document is

BARS
BAR
BAR
BAR
NAME
. . .
BEER
BEER
Joes Bar
PRICE
PRICE
NAME
NAME
Bud
2.50
Miller
3.00
17
DTD Structure
  • lt!DOCTYPE ltroot taggt
  • lt!ELEMENT ltnamegt(ltcomponentsgt)gt
  • . . . more elements . . .
  • gt

18
DTD Elements
  • The description of an element consists of its
    name (tag), and a parenthesized description of
    any nested tags.
  • Includes order of subtags and their multiplicity.
  • Leaves (text elements) have PCDATA (Parsed
    Character DATA ) in place of nested tags.

19
Example DTD
  • lt!DOCTYPE BARS
  • lt!ELEMENT BARS (BAR)gt
  • lt!ELEMENT BAR (NAME, BEER)gt
  • lt!ELEMENT NAME (PCDATA)gt
  • lt!ELEMENT BEER (NAME, PRICE)gt
  • lt!ELEMENT PRICE (PCDATA)gt
  • gt

20
Element Descriptions
  • Subtags must appear in order shown.
  • A tag may be followed by a symbol to indicate its
    multiplicity.
  • zero or more.
  • one or more.
  • ? zero or one.
  • Symbol can connect alternative sequences of
    tags.

21
Example Element Description
  • A name is an optional title (e.g., Prof.), a
    first name, and a last name, in that order, or it
    is an IP address
  • lt!ELEMENT NAME (
  • (TITLE?, FIRST, LAST) IPADDR
  • )gt

22
Use of DTDs
  • Set standalone no.
  • Either
  • Include the DTD as a preamble of the XML
    document, or
  • Follow DOCTYPE and the ltroot taggt by SYSTEM and a
    path to the file where the DTD can be found.

23
Example (a)
  • lt?xml version 1.0 standalone no ?gt
  • lt!DOCTYPE BARS
  • lt!ELEMENT BARS (BAR)gt
  • lt!ELEMENT BAR (NAME, BEER)gt
  • lt!ELEMENT NAME (PCDATA)gt
  • lt!ELEMENT BEER (NAME, PRICE)gt
  • lt!ELEMENT PRICE (PCDATA)gt
  • gt
  • ltBARSgt
  • ltBARgtltNAMEgtJoes Barlt/NAMEgt
  • ltBEERgtltNAMEgtBudlt/NAMEgt ltPRICEgt2.50lt/PRICEgtlt/BEER
    gt
  • ltBEERgtltNAMEgtMillerlt/NAMEgt ltPRICEgt3.00lt/PRICEgtlt/B
    EERgt
  • lt/BARgt
  • ltBARgt
  • lt/BARSgt

24
Example (b)
  • Assume the BARS DTD is in file bar.dtd.
  • lt?xml version 1.0 standalone no ?gt
  • lt!DOCTYPE BARS SYSTEM bar.dtdgt
  • ltBARSgt
  • ltBARgtltNAMEgtJoes Barlt/NAMEgt
  • ltBEERgtltNAMEgtBudlt/NAMEgt
  • ltPRICEgt2.50lt/PRICEgtlt/BEERgt
  • ltBEERgtltNAMEgtMillerlt/NAMEgt
  • ltPRICEgt3.00lt/PRICEgtlt/BEERgt
  • lt/BARgt
  • ltBARgt
  • lt/BARSgt

25
Attributes
  • Opening tags in XML can have attributes.
  • In a DTD,
  • lt!ATTLIST E . . . gt
  • declares an attribute for element E, along with
    its datatype.

26
Example Attributes
  • Bars can have an attribute kind, a character
    string describing the bar.
  • lt!ELEMENT BAR (NAME BEER)gt
  • lt!ATTLIST BAR kind CDATA IMPLIEDgt

27
Example Attribute Use
  • In a document that allows BAR tags, we might see
  • ltBAR kind sushigt
  • ltNAMEgtAkasakalt/NAMEgt
  • ltBEERgtltNAMEgtSapporolt/NAMEgt
  • ltPRICEgt5.00lt/PRICEgtlt/BEERgt
  • ...
  • lt/BARgt

28
IDs and IDREFs
  • Attributes can be pointers from one object to
    another.
  • Compare to HTMLs NAME foo and HREF foo.
  • Allows the structure of an XML document to be a
    general graph, rather than just a tree.

29
Creating IDs
  • Give an element E an attribute A of type ID.
  • When using tag ltE gt in an XML document, give its
    attribute A a unique value.
  • Example
  • ltE A xyzgt

30
Creating IDREFs
  • To allow objects of type F to refer to another
    object with an ID attribute, give F an attribute
    of type IDREF.
  • Or, let the attribute have type IDREFS, so the F
    object can refer to any number of other objects.

31
Example IDs and IDREFs
  • Lets redesign our BARS DTD to include both BAR
    and BEER subelements.
  • Both bars and beers will have ID attributes
    called name.
  • Bars have SELLS subobjects, consisting of a
    number (the price of one beer) and an IDREF
    theBeer leading to that beer.
  • Beers have attribute soldBy, which is an IDREFS
    leading to all the bars that sell it.

32
The DTD
  • lt!DOCTYPE BARS
  • lt!ELEMENT BARS (BAR, BEER)gt
  • lt!ELEMENT BAR (SELLS)gt
  • lt!ATTLIST BAR name ID REQUIREDgt
  • lt!ELEMENT SELLS (PCDATA)gt
  • lt!ATTLIST SELLS theBeer IDREF REQUIREDgt
  • lt!ELEMENT BEER EMPTYgt
  • lt!ATTLIST BEER name ID REQUIREDgt
  • lt!ATTLIST BEER soldBy IDREFS IMPLIEDgt
  • gt

33
Example Document
  • ltBARSgt
  • ltBAR name JoesBargt
  • ltSELLS theBeer Budgt2.50lt/SELLSgt
  • ltSELLS theBeer Millergt3.00lt/SELLSgt
  • lt/BARgt
  • ltBEER name Bud soldBy JoesBar
  • SuesBar /gt
  • lt/BARSgt

34
Empty Elements
  • We can do all the work of an element in its
    attributes.
  • Like BEER in previous example.
  • Another example SELLS elements could have
    attribute price rather than a value that is a
    price.

35
Example Empty Element
  • In the DTD, declare
  • lt!ELEMENT SELLS EMPTYgt
  • lt!ATTLIST SELLS theBeer IDREF REQUIREDgt
  • lt!ATTLIST SELLS price CDATA REQUIREDgt
  • Example use
  • ltSELLS theBeer Bud price 2.50/gt
Write a Comment
User Comments (0)
About PowerShow.com