From Semistructured Data to XML: Migrating The Lore Data Model and Query Language - PowerPoint PPT Presentation

About This Presentation
Title:

From Semistructured Data to XML: Migrating The Lore Data Model and Query Language

Description:

From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University http://www-db.stanford ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 13
Provided by: Jame105
Category:

less

Transcript and Presenter's Notes

Title: From Semistructured Data to XML: Migrating The Lore Data Model and Query Language


1
From Semistructured Data to XMLMigrating The
Lore Data Model and Query Language
  • Roy Goldman, Jason McHugh, Jennifer Widom
  • Stanford University

http//www-db.stanford.edu/lore/
2
Introduction
  • Lore
  • Originally a DBMS designed specifically for
    semistructured data
  • Semistructured data models and XML share many
    similarities
  • Migrating Lore to work with XML
  • Modifications to data model
  • Changes to query language
  • Changes to DataGuides

3
OEM (Object Exchange Model)
  • Lores original data model
  • All entities are atomic or complex objects
  • Each object has a unique object identifier (oid)
  • Atomic objects contain a value from one of the
    atomic types (integer, real, string, etc)
  • Complex objects are sets of ltlabel, subobjectgt
    pairs
  • Can be thought of as a labeled directed graph
  • objects are nodes
  • complex objects have labeled outgoing edges
  • atomic objects contain their value

4
Differences between XML and OEM
  • XML has attributes
  • XML is ordered, OEM is not
  • XML does not directly support graph structure
  • Uses special attribute types to encode graph
    structure
  • Example
  • ltPerson Id P1 Name Jeff Ullman Colleague
    P2/gt
  • ltPerson Id P2 Name Jennifer Widom
    Colleague P1/gt
  • ltPublication Title A First Course In Database
    Systems Author P1 P2/gt
  • Attribute Id is of type ID, Colleague is of type
    IDREF, and Author is of type IDREFS

Colleague
Colleague
Jennifer Widom
Jeff Ullman
Author
Author
5
Literal vs. Semantic Data Model
  • Should an XML data model be a literal tree
    corresponding to XMLs text representation?
    (where IDREF(S) are nothing but string
    attributes)
  • Or should it be a graph that includes all the
    intended links? (preserving the semantic graph
    structure)
  • It should be... BOTH!
  • Both literal and semantic modes should be
    supported
  • The user or application can select between the two

6
Lores XML Data Model
  • An XML element is a pair lteid, valuegt
  • eid is a unique element identifier
  • value is either an atomic text string or a
    complex value containing the following four
    components
  • A string-valued tag corresponding to the XML tag
    for that element
  • An ordered list of attribute-name/atomic-value
    pairs (attribute-name is a string, atomic-value
    has an atomic type)
  • An ordered list of crosslink subelements of the
    form ltlabel, eidgt where label is a string.
    Crosslink subelements are introduced via an
    attribute of type IDREF(S)
  • An ordered list of normal subelements of the form
    ltlabel, eidgt where label is a string. Normal
    subelements are introduced via lexical nesting
    within an XML document

7
XML Document/Graph Example
  • eids appear within nodes (1, 2, etc)
  • Attributes appear within brackets next to the
    nodes
  • Two types of edges
  • Normal subelement edges labeled with destination
    subelements tag (solid line)
  • Crosslink edges labeled with the attribute name
    that introduced the link (dashed line)
  • Semantic vs. Literal
  • In semantic mode, omit attributes of type
    IDREF(S)
  • In literal mode, omit crosslink edges

8
Migrating Lorel (Lores query language)
  • Distinguishing between attributes and subelements
  • Lorel uses path expressions
  • A sequence of labels such as DBGroup.Member.Projec
    t.Title
  • Can also contain wildcards and regular
    expressions
  • Path expression qualifiers differentiate between
    attributes and subelements
  • Placing a gt before a label matches subelements
    only
  • Placing a _at_ before a label matches attributes
    only
  • Absence of qualifier means match both
  • Examples
  • DBGroup.Member.gtName will match name elements
    that are subelements of DBGroup.Member elements
  • DBGroup.Member._at_Name will match name attributes
    of DBGroup.Member elements
  • DBGroup.Member.Name will match both

9
Migrating Lorel (continued...)
  • Comparisons
  • How do we compare two different things? (for
    example, comparing constants with attribute
    values)
  • All XML components are treated as atomic
    values...
  • Functions that transform elements into strings
  • Flatten(e) Ignoring all tags, recursively
    serialize all text values in the subtree rooted
    at element e
  • Concatenate(e) Concatenates all immediate text
    children of element e (subelements are ignored)
  • Tag(e) Returns the XML tag of element e
  • Eid(e) Returns the eid of element e as a string
  • XML(e) Tranforms the graph, starting with
    element e, into an XML document
  • Default Semantics (when no functions are
    specified)
  • atomic (Text) element the text itself
  • elements with no attributes and only one or more
    Text elements as children concatenation of the
    childrens text values
  • all others the elements eid represented as a
    string

10
Migrating Lorel (continued...)
  • Range qualifiers
  • The expression range can be optionally applied
    to any path expression component or variable
  • Example select y from DBGroup.Member x,
    x.Office1-2 y
  • returns the first two Office subelements of every
    group member
  • Example select y1-2 from DBGroup.Member x,
    x.Office y
  • returns the first two Office subelements over ALL
    members
  • Order-by clause
  • Query results are ordered lists of eids that
    identify the elements selected by the query
    (attributes are coerced into elements)
  • order-by-document-order orders results based on
    original XML document
  • Newly constructed elements are placed at the end
    of the document order with no specified order
    among them

11
Migrating Lorel (continued...)
  • Transformations and structured results
  • Using queries to restructure XML data
  • The with clause (added to the standard
    select-from-where construct)
  • Query result will replicate all data selected by
    the select clause, along with all data reachable
    via a set of path expressions in the with clause
  • Skolem functions
  • Allows more expressive data restructuring
  • Accepts a list of variables as arguments and
    produces one unique element for every binding of
    elements and/or attributes to the arguments
  • Updates
  • Lorel supports an expressive update language
  • Changes for XML model
  • ability to create both attributes and elements
  • order-relevant updates

12
Migrating Lorel (continued...)
  • DataGuides
  • Can be used when a DTD is not supplied
  • A notion of order must be introduced
  • Problem - could result in very large DataGuides
  • When DTDs exist, DataGuides are built from those
    DTDs
  • Combining DTDs and DataGuides
  • DTDs available for specific portions of an XML
    database
  • DataGuides can be used over portions not
    specified by DTDs

Conclusion
  • As of June 1999, the migration of Lore to an XML
    model is nearly complete
Write a Comment
User Comments (0)
About PowerShow.com