Web Data Management with XML - PowerPoint PPT Presentation

Loading...

PPT – Web Data Management with XML PowerPoint presentation | free to download - id: 6912f5-ZTUyN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Web Data Management with XML

Description:

Web Data Management with XML Sanjay Kumar Madria Department of Computer Science University of Missouri-Rolla MO 65401 madrias_at_umr.edu www.umr.edu/~madrias – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 161
Provided by: sanjay65
Learn more at: http://web.mst.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Web Data Management with XML


1
Web Data Management with XML
  • Sanjay Kumar Madria
  • Department of Computer Science
  • University of Missouri-Rolla
  • MO 65401
  • madrias_at_umr.edu
  • www.umr.edu/madrias

2
Source and Acknowledgement
  • This tutorial is based on material obtained from
    different papers, web sites and students
    presentation. Selected papers are solely based on
    the speakers own decision.
  • It reflects speakers own understanding of the
    material and their importance.

3
XML V/S HTML
  • HTML Text Presentation
  • XML Data Structure
  • HTML Fixed Tags, difficult to extract data,
    good for human consumption
  • XML User defined tags, good for application,
    easy to extract data from structure (data is
    separated), self-describing, schema and data
    together
  • XML slow (about 1) but more used in intranet

4
XML - standards
  • Schema (XML schema), Stylesheet (XSL), Resource
    Description Framework (RDF...), XPATH, XPointer
  • WML, MathML, SMIL (multimedia), SportsML , RSS
    (news),...
  • Xquery

5
Some Upcoming Featureshttp//www.oasis-open.org/c
over/xml.html
  • WS-Inspection specification - how an application
    can discover an XML Web service description on a
    Web server (IBM and Microsoft )
  • Data Extraction Language - XML format for
    describing data conversion processes from other
    data formats to XML
  • VoiceXML
  • Cascading Style Sheets (CSS2) specification for
    mobile devices

6
How is XML Used
  • XML handles information content of diverse data
    sources including
  • Structured and semi-structured documents
  • Relational databases
  • Legacy databases
  • Object repositories
  • Database views (as XML)
  • Electronic data interchange (EDI)
  • Facilitate B2B communication
  • Data exchange between data sources

7
XML Data Management- Main Issues
  • XML Query Languages
  • XML Data and Schema Integration
  • XML Indexing
  • XML views and storing XML in other Data Models
    (Relational )
  • XML Updates

8
  • XML Query Language Requirements
  • Expressive power
  • Standard relational language operations SPJ,
    IDU
  • Restructuring operations reduction, merge,
  • Formal Semantics
  • Important for dealing with query transformation
    and optimization.
  • Compositionality
  • The output of a query should be in the same
    language as the input

9
XML Query Capabilities
  • Query multiple documents
  • Query distributed data stored in a variety of
    formats such as Relational and OO databases.
  • XML query must be translatable to query facility
    for underlying data.
  • Ability to create XML schemas from non-XML data
    sources
  • Query should be usable on a document without a
    schema
  • Presence of a schema should permit query
    validation

10
  • XML Query Languages
  • http//www.w3.org/TR/xquery/
  • Alin Deutsch, Mary F. Fernandez, Daniela
    Florescu, Alon Y. Levy, David Maier, Dan Suciu
    Querying XML Data. IEEE Data Engineering Bulletin
    22(3) 10-18 (1999)
  • A. Bonifati and S. Ceri. Comparative analysis of
    five XML query languages. SIG-MOD Record, March
    2000.
  • R. Goldman, J. Mchugh and J. Widom, From
    Semi-structured Data to XML Migrating Lore to
    XML Model and Query Language. In ACM SIGMOD WebDB
    workshop99.

11
Xquery (June, 2001)
  • Derived from Quilt, which borrowed features from
    several other languages.
  • From XPath and XQL -Path expression syntax
  • From XML-QL - Notion of binding variables
  • From SQL - SELECT-FROM-WHERE.
  • From OQL - Different kinds of expressions that
    can be nested with full generality.
  • Influenced by query languages such as Lorel and
    YATL.

12
XML
  • ltdbgt
  • ltlab ID"baselab manager"smith1"gt
  • ltnamegtSeattle Bio Lablt/namegt
  • ltlocationgt
  • ltcitygtSeattlelt/citygt
  • ltcountrygtUSAlt/countrygt
  • lt/locationgt
  • lt/labgt
  • ltbiologist ID"smith1"gt
  • ltlastnamegtSmithlt/lastnamegt
  • lt/biologistgt
  • lt/dbgt

13
XQL Node-labeled tree
14
Path Expression
  • (Q) In the second chapter of the document named
    "zoo.xml", find the figure(s) with caption "Tree
    Frogs".
  • document("zoo.xml")//chapter2//figurecaption
    "Tree Frogs"
  • First step locates the root node of a document.
  • Second step locates the second chapter of the
    document (ordinal number)
  • Third step finds figure elements occurring
    anywhere within the chapter, but retains those
    figure elements that have a caption with the
    value "Tree Frogs."

15
Element Constructors
  • A query often needs to generate new elements. The
    simplest way to generate a new element is to
    embed the element directly in a query using XML
    notation.
  • (Q) Generate an ltempgt element that has an "empid"
    attribute and nested ltnamegt and ltjobgt elements.
  • ltemp empid "12345"gt
  • ltnamegtJohn Smithlt/namegt
  • ltjobgtAnthropologistlt/jobgt
  • lt/empgt

16
Element Constructors
  • (Q) Generate an ltempgt element that has an "empid"
    attribute. The value of the attribute and the
    content of the element are specified by variables
    that are bound in other parts of the query.
  • ltemp empid idgt
  • name
  • job
  • lt/empgt

17
FLWR Expressions
  • A FLWR (pronounced "flower") expression is
    constructed from FOR, LET, WHERE, and RETURN
    clauses, which must appear in a specific order.
  • A FLWR expression binds values to one or more
    variables and then uses these variables to
    construct a result.

18
FLWR Expressions
  • (Q) List each publisher and the average price of
    its books.
  • FOR p IN distinct(document("bib.xml")//publisher)
  • LET a avg(document("bib.xml")//bookpublisher
    p/price)
  • RETURN
  • ltpublishergt
  • ltnamegt p/text() lt/namegt
  • ltavgpricegt a lt/avgpricegt
  • lt/publishergt

19
Sorting
  • A sequence can be ordered by means of a SORTBY
    clause that contains one or more "ordering
    expressions."
  • (Q)List all books with price greater than 100,
    in order by first author within each group of
    books with the same first author, list the books
    in order by title.
  • document("bib.xml")//bookprice gt 100 SORTBY
    (author1, title)

20
Operators and Expressions
  • (Q) Prepare a "critical sequence" report
    consisting of all elements that occur between the
    first and second incision in the first procedure.
  • ltcritical_sequencegt
  • LET p //procedure1
  • FOR e IN // AFTER (p//incision)1
  • BEFORE (p//incision)2
  • RETURN shallow(e)
  • lt/critical_sequencegt
  • The shallow function makes a shallow copy of a
    node, including attributes but not including
    subelements.

21
Conditional Expressions
  • (Q) Make a list of holdings, ordered by title.
    For journals, include the editor, and for all
    other holdings, include the author.
  • FOR h IN //holding
  • RETURN
  • ltholdinggt
  • h/title,
  • IF (h/_at_type "Journal")
  • THEN h/editor
  • ELSE h/author
  • lt/holdinggt
  • SORTBY (title)

22
Quantified Expressions
  • (Q) Find titles of books in which both sailing
    and windsurfing are mentioned in the same
    paragraph.
  • FOR b IN //book
  • WHERE SOME p IN b//para SATISFIES (contains(p,
    "sailing") AND contains(p, "windsurfing"))
  • RETURN b/title

23
XML-QL Features
  • Extraction How will data be extracted from
    large XML documents?
  • Transformation How will XML data be exchanged
    between user communities using different but
    related DTD's?
  • Integration How will XML data from multiple XML
    sources be integrated?
  • Conversion of data between relational or OO to XML

24
BIB . DTD
  • lt!ELEMENT book (author, title, publisher)gt
  • lt!ATTLIST book year CDATAgt
  • lt!ELEMENT article (author, title, year?,
    (shortversionlongversion))gt
  • lt!ATTLIST article type CDATAgt
  • lt!ELEMENT publisher (name, address)gt
  • lt!ELEMENT author (firstname?, lastname)gt

25
XML Data Model
root
Unique -id
book
book
(year1998)
(year1995)
author
title
author
title
publisher
author
publisher
Foundations for ...
lastname
lastname
An introduction
name
lastname
name
Date
Datwen
Addison-Wesley
Addison-Wesley
Date
26
Selection/Extraction
  • Find all the names of the authors whose publisher
    is
  • Addison-Wesley
  • WHERE ltbookgt
  • ltpublishergtltnamegt Addison-Wesley lt/namegt
    lt/publishergt
  • lttitlegt t lt/titlegt
  • ltauthorgt a lt/authorgt
  • lt/bookgt IN "www.a.b.c/bib.xml"
  • CONSTRUCT a
  • Result
  • ltlastnamegt Date lt/lastnamegt

27
Constructing new XML dataReduction Restructure
  • WHERE ltbookgt
  • ltpublishergt ltnamegt Addison-Wesley lt/gtlt/gt
  • lttitlegt t lt/gt
  • ltauthorgt a lt/gt
  • lt/gt IN "www.a.b.c/bib.xml"
  • CONSTRUCT ltresultgt
  • ltauthorgt a lt/gt
  • lttitlegt t lt/gt
  • lt/gt

28
XML-QL Example Data
ltbibgt ltbook year1995gt lttitlegt An
Introduction to DB Systems lt/titlegt ltauthorgt
ltlastnamegt Date lt/lastnamegtlt/authorgt ltpublishergt
ltnamegt Addison-Wesleylt/namegt lt/publishergt lt/bookgt
ltbook year1995gt lttitlegt Foundations for
OR Databases lt/titlegt ltauthorgt ltlastnamegt Date
lt/lastnamegtlt/authorgt ltauthorgt ltlastnamegt
Darwen lt/lastnamegtlt/authorgt ltpublishergtltnamegt
Addison-Wesleylt/namegt lt/publishergt lt/bookgt lt/bibgt
29
Grouping with Nested Queries Preserve structure
  • Where ltbookgt p lt/bookgt IN "www.a.b.c/bib.xml ,
  • ltpublishergt ltnamegtAddison-Wesley lt/gt lt/gt IN p ,
  • lttitlegt t lt/gt IN p
  • CONSTRUCT
  • ltresultgt
  • lttitlegt t lt/gt
  • WHERE ltauthorgt a lt/gt IN p
  • CONSTRUCT ltauthorgt a lt/gt
  • lt/gt

30
Reduction
  • Where ltbookgt ltpublishergt ltnamegt
  • Addition-wesleylt/gt lt/gt
  • lttitlegtt lt/gt Element_As x
  • ltauthorgt alt/gt Element_As y
  • lt/gt lt/gt IN www.a.b.c/bib/xml
  • Construct ltresultgt x y lt/gt

31
Joining element by values
  • WHERE ltarticlegt
  • ltauthorgt
  • ltfirstnamegt fn lt/gt -- firstname f
  • ltlastnamegt ln lt/gt -- firstname l
  • lt/gt
  • lt/gt CONTENT_AS a IN "www.a.b.c/bib.xml,
  • ltbook year y gt
  • ltauthorgt
  • ltfirstnamegt fn lt/gt -- join the same
    firstname f
  • ltlastnamegt ln lt/gt -- join the same
    lastname l
  • lt/gt
  • lt/gt IN "www.a.b.c/bib.xml,
  • y gt 1995
  • CONSTRUCT ltarticlegt a lt/gt

32
Element Identity, IDs, and ID Reference
  • ltperson ID o123gt
  • ltfirstnamegt John lt/firstnamegt
  • ltlastnamegt Smith lt/lastnamegt
  • lt/persongt
  • ltperson ID o234gt
  • ...
  • lt/persongt
  • ltarticle author o123 o234gt
  • lttitlegt lt/titlegt
  • ltyeargt 1995 lt/yeargt
  • lt/articlegt

33
XML graph including ID IDREFS
root
article
person
person
author
first name
first name
title
last name
year
last name
1995
John
Smith
34
Writing queries using IDs
  • Without IDs
  • WHERE ltarticlegtltauthorgtltlastnamegt n lt/gtlt/gtlt/gt IN
    abc.xml
  • Using IDREF (All last name, title pairs)
  • WHERE ltarticle author igt
  • lttitlegt lt/gt ELEMENT_AS t
  • lt/gt,
  • ltperson ID igt
  • ltlastnamegt lt/gt ELEMENT_AS l
  • lt/gt
  • CONSTRUCT ltresultgt t l lt/gt

35
Tag Variables, No schema required
WHERE lt p gt -- p can be article,
book lttitlegt t lt/gt ltyeargt1995 lt/gt --
referring attr. as an element lt e gtltlastnamegt
Date lt/gt lt/gt lt/gt IN "bib.xml", e IN
author, editor CONSTRUCT lt p gt
lttitlegt t lt/gt lt e gt Date lt/gt
lt/gt
All publications published in 1995 in which Date
is either an author, or an editor
36
Regular Path Expressions
  • XML data can specify nested and cyclic
    structures, such as trees, directed acyclic
    graphs, and arbitrary graphs.
  • The following DTD defines a self-recursive
    element part
  • lt!ELEMENT part (name brand part)gt
  • lt!ELEMENT name CDATAgt
  • lt!ELEMENT brand CDATAgt

37
Regular Path Expressions (cont)
  • Here part is a regular path expression, and
  • matches any sequence of edges, all of which
  • are labeled part
  • WHERE lt part gt
  • ltnamegt r lt/gt
  • ltbrandgt Ford lt/gt
  • lt/gt IN "www.a.b.c/parts.xml"
  • CONSTRUCT ltresultgt r lt/gt

38
Regular Path Expressions (cont)
  • The wildcard matches any tag and appear
    wherever a tag is permitted
  • Example
  • WHERE lt gt
  • ltnamegt r lt/gt
  • ltbrandgt Ford lt/gt
  • lt/gt IN "www.a.b.c/parts.xml"
  • CONSTRUCT ltresultgt r lt/gt

39
Transforming XML data with Skolem function
  • WHERE ltgt ltauthorgt
  • ltfirstnamegt fn lt/gt
  • ltlastnamegt ln lt/gt
  • lt/gt
  • lttitlegt t lt/gt
  • lt/gt IN "www.a.b.c/bib.xml",
  • CONSTRUCT ltperson IDPersonID(fn, ln)gt
  • ltfirstnamegt fn lt/gt
  • ltlastnamegt ln lt/gt
  • ltpublicationtitlegt t lt/gt
  • lt/gt
  • lt!ELEMENT person (lastname, firstname,
  • address?, phone?, publicationtitle)

40
Integrating data from multiple XML sources
  • WHERE ltpersongt
  • ltnamegtlt/gt ELEMENT_AS n
  • ltssngt ssn lt/gt
  • lt/gt IN payroll.xml",
  • lttaxpayergt
  • ltssngt ssn lt/gt
  • ltincomegtlt/gt ELEMENT_AS i
  • lt/gt IN "taxpayers.xml
  • CONSTRUCT ltresultgt n i lt/gt

41
Integrating data from multiple XML sources
(Skolem function)
WHERE ltpersongt ltnamegt lt/gt ELEMENT_AS
n ltssngt ssn lt/gt lt/gt IN payroll.xml"
CONSTRUCT ltresult IDSSNID(ssn)gt n lt/gt
WHERE lttaxpayergt ltssngt ssn lt/gt ltincomegt lt/gt
ELEMENT_AS i lt/gt IN "taxpayers.xml" CONSTRUCT
ltresult IDSSNID(ssn)gt n i lt/gt
42
Integrating data (cont)
All titles published in 95, in addition the
month of journal articles and the publishers for
books WHERE lt e gt lttitlegt t lt/gt ltyeargt
1995 lt/gt lt/gt CONTENT_AS p IN
"www.a.b.c/bib.xml" CONSTRUCT ltresult
IDResultID(p)gt lttitlegt t lt/gt lt/gt WHERE e
article", ltmonthgt m lt/gt IN p CONSTRUCT
ltresult IDResultID(p)gt ltmonthgt m lt/gt lt/gt
WHERE e "book", ltpublishergt q lt/gt IN p
CONSTRUCT ltresult IDResultID(p)gt ltpublishergt q
lt/gt lt/gt
43
Functions definitions and DTDs
  • function query()
  • CONSTRUCT ltresultgt
  • findDeclaredIncomes("taxpayers.xml","payroll.xml
    ")
  • lt/resultgt
  • function findDeclaredIncome(Taxpayers,Employees)
  • WHERE lttaxpayergtltssngt s lt/gt
  • ltincomegt x lt/gtlt/gt IN Taxpayer,
  • ltemployeegtltssngt s lt/gt
  • ltnamegt n lt/gt lt/gt IN Employees
  • CONSTRUCT ltresultgtltnamegt n lt/gtltincomegt x lt/gt
    lt/gt

44
Embedding queries in data
  • ltresultgt
  • ltarticlesgt
  • WHERE ltarticlegt lttitlegt t lt/gtltyeargt y lt/gt
  • lt/gt IN www.a.b.c/bib.xml, y gt
    1995
  • CONSTRUCT lttitlegt t lt/gt
  • lt/gt
  • ltbooksgt
  • WHERE ltbookgt lttitlegt t lt/gt ltyeargt y lt/gt
  • lt/gt IN www.a.b.c/bib.xml, y gt1995
  • CONSTRUCT lttitlegt t lt/gt
  • lt/gt
  • lt/gt

45
Indexes for element
  • XML support element-order variables.
  • Example
  • ltaigt lt/gt
  • ltxjgt lt/gt
  • here i and j are bind to an integer 0, 1, 2
    that represent the index in the local order of
    the edges.

46
Indexes for element (graph)
root
book 0
book1
( 1 )
( 8 )
(year1998)
( 2 )
(year1995)
author3
title0
author2
( 14 )
title0
publisher1
author2
publisher1
( 9 )
( 12 )
( 10 )
( 6 )
Foundations for ...
lastname0
( 4)
( 3 )
An introduction
lastname0
name0
lastname0
( 13 )
name0
( 11 )
( 15 )
Date
Datwen
( 5 )
Addison-Wesley
( 7 )
Addison-Wesley
Date
47
Indexes for element (cont.)
  • Example
  • Retrieves all the persons whose lastname
    precedes the firstname
  • WHERE ltpersongt p lt/gt IN www.a.b.c/people.xml
  • ltfirstname k gt x lt/gt IN p,
  • ltlastnamej gt y lt/gt IN p,
  • j lt k
  • CONSTRUCT ltpersongt p lt/gt

48
ORDER-BY
Reverse the order of all authors in a
publication WHERE ltpubgt p lt/gt IN
www.a.b.c/people.xml, CONSTRUCT ltpubgt WHERE
ltauthorkgt a lt/gt IN p ORDER-BY k
DESCENDING CONSTRUCT ltauthorgt a lt/gt WHERE lt e
gt v lt/gt IN p e ! author CONSTRUCT ltegt
v lt/gt lt/pubgt
49
Web Warehousing
  • http//www-rocq.inria.fr/verso/xyleme/
  • www.xyleme.com
  • www.cais.ntu.edu.sg/whoweda/

50
Current Web
  • Keywords to retrieve URLs Difficult to extract
    data of interest
  • Query results cannot be directly processed
  • Documents change on the Web expiration
  • Sites are unavailableFile not found
  • Wrappers
  • Expensive
  • Incomplete
  • Short-lived, not adapted to the Web constant
    changes

51
Web Warehouse Technology
  • A warehouse of XML documents from the Web, Xyleme
  • Stores huge quantities of data
  • Warehouse is not a search engine (only index) or
    a mediator (only virtual data)
  • Warehouse incorporates data evolution/changes

52
Warehouse (Xylene) Challenges
  • Data Acquisition and Maintenance Logical level
  • discover data of interest and maintain it up to
    date
  • Discover XML pages on the web that are of
    interest for customers
  • For this crawl the web (HTMLXML, store only XML)
  • Bounded resources
  • Natix Repository Physical Level
  • store this data and index it so that it can be
    processed efficiently
  • Query Processing Logical level
  • support efficiently an SQL-style query language

53
  • Semantic Integration Application Level
  • Understand DTD and tags, partition the Web into
    semantic domains, provide a simple view of each
    domain
  • Change Control - Application Level
  • Monitor the web and offer services such as Query
    Subscription
  • Metadata management

54
Page Scheduling
  • Decide which page to read next
  • discovery (read first) and refresh (read again)
  • Based on
  • Importance of the page
  • read often important pages
  • also used to order query results
  • Change rate of the page
  • dont read a page that is probably up-to-date

55
Natix Repository
  • Instead of storing each tree node in a separate
    record, we store whole documents( or subtrees of
    documents) together in one record.
  • Typical data trees may not fit on a single page.
    So the data trees are distributed data over
    several pages.

f1
Physical Tree
r1
p2
p1
Proxy object
h2
r3
h2
r2
Helper aggregate object
f7
f6
f5
f2
f3
f4
56
Natix Repository
  • Splitting a record
  • A records subtree before a split

57
The WHOWEDA Project
  • WHOWEDA A WareHouse of WEb DAta
  • To design and implement a web warehousing system
    capable of effective extraction, management, and
    processing of information on the World Wide Web
  • Data model WHOM (WareHouse Object Model)
    handles XML and HTML
  • Change Management
  • Knowledge Discovery and Mining
  • Journal papers in WWW journal00, DKE01, TKDE
    (under review),
  • Conference papers in ICDCS00, ICPADS00
    DOLAP00, DASFAA99, DAWAK99, FODO98, ER98,
    DEXA98, .
  • Under submission DKE, DPDS, CJ, KAIS

58
Detecting Changes in Web Documents
  • Gregory Cobena, Serge Abiteboul, Amélie Marian
    Detecting Changes in XML Documents. ICDE 2002
  • Sourav S. Bhowmick, Sanjay Kumar Madria, Wee
    Keong Ng, Ee-Peng Lim Detecting and Representing
    Relevant Web Deltas using Web Join. ICDCS 2000
    246-253

59
Change Control
60
Overview of Whoweda Approach
  • Step 1 Two snapshots of old and new relevant
    data coupled from the Web using global web
    coupling operation and materialized in two web
    tables.
  • Step 2 Web join, left outer join and right outer
    joined operations are performed on these two web
    tables
  • Result is joined, left and right outer joined web
    tables
  • Step 3 Delta web tables containing different
    types of web deltas generated from these
    resultant web tables.

61
XML Schema Integration
  • http//data.cs.washington.edu/integration/tukwila
  • Denise Draper, Alon Y. Halevy, Daniel S. Weld
    The Nimble XML Data Integration System. ICDE
    2001 155-160
  • MIX http//www.db.ucsd.edu/projects/MIX/

62
Data Integration
E-Commerce applications use data from different
sources and need to be integrated. A mediated
schema is created to represent a particular
application domain and data sources are mapped as
views over the mediated schema.
63
XML Integration
  • The modeling problem How to model the global
    schema, the sources, and the relationships
    between the two
  • The querying problem How to answer queries
    expressed on the global schema
  • Query is expressed in terms of the global schema,
    and the mediator reformulate the query in terms
    of a set of queries at the sources
  • Query Plan, decomposex the query into a set of
    subqueries to the sources
  • Subqueries are shipped to the sources, and the
    results are assembled into the final answer

64
Different ways to integrate XML data
  • Integrating XML documents
  • Mapping of local schemas to global integrated
    schema if the global schema is known or querying
    the data to obtain the required global schema.
  • Integrating XML Schema

65
Complexities in integrating XML Data
  1. Need to extract the schema from the document.
  2. Integrate the schemas obtained or perform mapping
    from the individual schema documents to the
    global schema if the global schema is already
    present.
  3. Parse the XML documents and integrate the data
    according to the global schema.
  4. Querying on XML documents can be done to obtain
    the integrated document.

66
Tukwila Data Integration System
  • Uses a mediated schema to integrate data from
    different sources.
  • User asks a query over the mediated schema and
    the data Integration system reformulates the
    query over the data sources and executes it.
  • Uses an Query Re-formulator and Optimizer to
    query large amounts of data efficiently.
  • Map the query from the mediated schema to data
    sources.
  • Uses an x-scan operator that can query streaming
    XML data.

67
Querying XML Stream of Data
  • In many applications involving XML we must be
    able to process queries over streams of incoming
    XML data (whose content changes continuously)
    without
  • first loading the data into a local repository
    stored in some internal representation
  • Evaluating regular path expressions using either
    index structures or join operations across the
    tables or objects
  • X-scan matches regular path expression patterns
    from the query, returning results in pipelined
    fashion as the data streams across the network.

68
(No Transcript)
69
  • WHERE ltdbgt       ltlabgt       
    ltnamegtnlt/gt        lt_gtltcitygtclt/gtlt/gt       lt/gt
    ELEMENT_AS l      lt/gt IN "fig1.xml"

70
Tukwila x-scan operator
71
Tukwila x-scan operator contd..
72
(No Transcript)
73
XML Schema Integration Model- Another Approach
  • Automated integration of XML schemas -
  • traditional forms of view integration and
    database integration.
  • Integrated schema forms the basis for a valid
    query over a particular set of XML documents.
  • Schemas to be integrated currently validate a
    set of existing XML documents, data integrity and
    continued document delivery are chief concerns of
    the integration process.

74
XSchema Integration Model
  • Object-oriented data model called XSDM ( XML
    Schema Data Model )
  • Three-layered architecture consisting of
    pre-integration, comparison and integration is
    used for the integration.
  • A global schema meets the following criteria
    completeness, minimality and understandability.

75
Three Phases of integration
Pre-Integration Element, attribute and datatype
definitions are extracted through parsing the
actual schema document. Comparison
Correspondences between elements and attributes
are determined either by using semantic learning
or using human interaction. Integration
Conflicts that exist between the corresponding
elements and/or attributes such as naming
conflicts, datatype conflicts and structural
conflicts are resolved.
76
XML Schema Data Model (XSDM)
  • Four structures are defined Node Object, Child
    Object, Datatype Object and Attribute Object.
  • Node Object Represents an element, which may be
    either non-terminal or terminal.
  • Each node represents another set of structures
    that define the node Name, Namespace,
    Attribute, Datatype, Substitution Group Name,
    Child list and Node Type which has six types
    terminal, sequence, choice, all, any or empty.
  • Child Object Represents an element, which is a
    part of childList. Each child has structures that
    define itself Name, namespace, Max Occurances,
    and Min Occurances.

77
XML Schema Data Model (XSDM) contd..
Datatype Object Represents datatype of elements
and attributes. The structures that define this
are Name, Variety(atomic, union, list),
Kind(simple and derived datatype), and
Constraining Facets. Attribute Object
Represents attributes associated with a
non-terminal or terminal element. The structures
that define an attribute Name, Namespace, Use,
DataType, and value(default value).
78
Example
79
Graphical Representation of XML Schemas
80
Graphical representation of sample schema for GSE1
81
Graphical representation of sample schema for GSE2
82
Conflict Resolution
  • Naming Conflicts
  • Synonym Naming Conflict Different names but same
    definition. Solved using substitution group
    names.
  • Homonym Naming conflict Same name but different
    structure.
  • Homonym conflicts at Non-terminals are called
    structural conflicts and
  • at terminals they are called datatype conflicts.

83
Conflict Resolution contd..
  • Datatype Scale differences
  • Disjoint or incompatible datatypes union
  • E.g. String, integer
  • Compatible datatypes scale adjustment
  • E.g. Integer, float
  • Enumerated datatype taking set of all the
    enumerations
  • E.g. a,b, b,c gt a,b,c
  • Scale differences constraint facet redefinition

84
Conflict Resolution contd..
  • Structural Conflicts
  • Type Conflicts Terminal in one schema and
    non-terminal in another schema Add both to the
    global schema.
  • Key conflicts
  • If both schemas have their individual keys, then
    the global schemas key should be a composite of
    both the keys.
  • If an element is declared as key in one schema
    and as a non-key in other schema, a complete
    knowledge of the data present in the documents is
    required.
  • If the same element is declared as key in both
    the schemas, a prefix can be added to the keys to
    make the key elements unique globally.

85
Integration phase
  • Constructing correspondences table - contain the
    information about the corresponding
    elements/attributes.
  • Constructing dependencies table The
    elements/attributes are integrated only after
    their dependencies are integrated.

86
Graphical representation of Global schema obtained
87
Indexing XML Data
  • Xset
  • Region algebra
  • Indexes for Arbitrary Semistructured Data
  • T-indexes
  • Index Fabric (VLDB01)
  • Resources
  • Index Structures for Path Expressions by T. Milo
    and Suciu, in ICDT'99
  • XSet description http//www.openhealth.org/XSet/
  • Data on the Web Abiteboul, Buneman, Suciu
    section 8.2

88
The Data
  • Semistructured data instance a large graph

89
The queries
  • Regular expressions (using Lorel-like syntax)

SELECT X fROM (Bib..author).(lastnamefirstname).
Abiteboul X
Select x from part._.supplier.name x
Requires to traverse data from root, return all
nodes x reachable by a path matching the given
path expression.
Select X From part._.supplier name X,
address Philadelphia
Need index on values to narrow search to parts of
the database that contain the string
Philadelphia.
90
Analyzing the problem
  • What kind of data
  • tree data (XML) easier to index
  • graph data used in more complex applications
  • What kind of queries
  • restricted regular expressions (e.g. XPath) may
    be more efficient
  • arbitrary regular expressions rarely encountered
    in practice

91
XSet a simple index for XML
  • Part of the Ninja project at Berkeley
  • Example XML data

92
XSet a simple index for XML
  • Each node a hashtable
  • Each entry list of pointers to data nodes (not
    shown)

93
XSet Efficient query evaluation
(R1) SELECT X FROM part.name X
-yes (R2) SELECT X FROM part.supplier.name X
-yes (R3) SELECT X FROM .supplier.name X
-maybe (R4) SELECT X FROM part..subpart.name X
-maybe
  • To evaluate R1, look for part in the root hash
    table h1, follow the link to table h2, then look
    for name.
  • R4 following part leads to h2 traverse all
    nodes in the index (corresponding to ), then
    continue with the path subpart.name.
  • Thus, explore the entire subtree dominated by h2.
  • Will be efficient if index is small and fits in
    memory
  • R3 leading wild card forces to consider all
    nodes in the index tree, resulting in less
    efficient computation than for R4.
  • Can index the index itself.
  • Retrieve all hash tables that contain a supplier
    entry, continue a normal search from there.

94
Region Algebras
  • Structured text text with tags (like XML)
  • Powerful indexing techniques
  • Critical limitationordered data only (like text)
  • Assume data given as an XML text file, and
    implicit ordering in the file.
  • Less critical limitation restricted regular
    expressions

95
Region Algebras Definitions
  • data sequence of characters c1c2c3
  • region segment of the text in a file
  • representation (x,y) cx,cx1, cy, x start
    position, y end position of the region
  • example ltsectiongt lt/sectiongt
  • region set a set of regions s.t. any two
    regions are either disjoint or one included in
    the other
  • example all ltsectiongt regions (may be nested)
  • Tree data each node defines a region and each
    set of nodes define a region set.
  • example region p2 consisting of text under p2,
    set p2,s2,s1 is a region set with three regions

96
Representation of a region set
  • Example the ltsubpartgt region set
  • region algebra operators on region set,
  • s1 op s2 defines a new region set

97
Region algebra some operators
  • s1 intersect s2 r r? s1, r ?s2
  • s1 included s2 r r?s1, ?r ? s2, r ? r
  • s1 including s2 r r? s1, ?r ? s2, r ? r
  • s1 parent s2 r r? s1, ?r ? s2, r is a
    parent of r
  • s1 child s2 r r? s1, ?r ? s2, r is child of
    r

Examples ltsubpartgt included ltpartgt s1, s2,
s3, s5 ltpartgt including ltsubpartgt p2,
p3 ltnamegt child ltpartgt n1, n3, n12
98
Efficient computation of Region Algebra Operators
  • Example s1 included s2
  • s1 (x1,x1'), (x2,x2'),
  • s2 (y1,y1'), (y2,y2'),
  • (i.e. assume each consists of disjoint regions)
  • Algorithm
  • if xi lt yj then i i 1
  • if xi' gt yj' then j j 1
  • otherwise print (xi,xi'), do i i 1
  • Can do in sub-linear time when one region is
    very small

99
From path expressions to region expressions
  • Use region algebra operators to answer regular
    path expressions
  • Only restricted forms of regular path expressions
    can be translated into region algebra operators
  • expressions of the form R1.R2Rn, where each Ri
    is either a label constant or the Kleene closure
    .

part.name name child (part child
root) part.supplier.name name child (supplier
child (part child root)) .supplier.name
name child supplier part..subpart.name name
child (subpart included (part child root))
Region expressions correspond to simple XPath
expressions
100
From path expressions to region expressions
  • Answering more complex queries
  • Translates into the following region algebra
    expression
  • Philadelphia denotes a region set consisting of
    all regions corresponding to the word
    Philadelphia in the text.
  • Such a region can be computed dynamically using a
    full text index.
  • Region expressions correspond to simple XPath
    expressions

Select X From .subpart name X,
.supplier.address Philadelphia
Name child (subpart includes (supplier parent
(address intersect Philadelphia)))
101
Indexes for Arbitrary Semistructured Data
  • A semistructured data instance that is a DAG

102
Indexes for Arbitrary Semistructured Data
  • The data represents employees and projects in a
    company.
  • Two kinds of employees programmers and
    statisticians
  • Three kinds of links to projects leads,
    workson, consultants
  • Index graph reduced graph that summarizes all
    paths from root in the data graph

103
  • Example node p1 paths from root to p1 labeled
    with the following five sequences
  • Project
  • Employee.leads
  • Employee.workson
  • Programmer.employee.leads
  • Programmer.employee.workson
  • Node p2 paths from root to p2 labeled by same
    five sequences
  • p1 and p2 are language-equivalent

104
Indexes for Arbitrary Semistructured Data
  • For each node x in the data graph,
  • Lx w ? a path from the root to x
    labeled w
  • ?x,y x ? y ? Lx Ly
  • x y x ? y
  • Nodes(I) x
    x ? nodes(G)
  • I
  • Edges(I) x
    y x? ? x, y? ? y, x? y?

105
Indexes for Arbitrary Semistructured Data
  • We have the following equivalences
  • e1 ? e2
  • e3 ? e4 ? e5
  • p1 ? p2
  • p3 ? p4
  • p5 ? p6 ? p7

106
Indexes for Arbitrary Semistructured Data
  • Computing path expression queries
  • Compute query on I and obtain set of index nodes
  • Compute union of all extents
  • Returns nodes h8, h9.
  • Their extents are p5, p6, p7 and p8,
    respectively
  • result set p5, p6, p7, p8
  • Always size(I) ? size(G)
  • Efficient when I can be stored in main memory
  • Checking x ? y is expensive.

Select X From statistician.employee.(leadsconsult
s) X
107
T-Indexes
  • Milo Suciu ICDT 99
  • 1-index
  • data graph
  • arbitrary regular expressions
  • 2-index, T-index for more complex queries,
    consisting of more regular expressions.

108
T-Indexes
  • T-index template index
  • Trades space for generality
  • The class of paths associated with a given
    T-index is specified by a path template
  • Example 1 x y. Here can
    be replaced by any regular expression.
  • Example 2 (.Restaurant) x y. The first
    regular expression is fixed this T-index takes
    less space but is less general.
  • T-indexes can be generated efficiently.
  • The size of a T-index associated to a single
    regular expression is at most linear in that of
    the database

P
P
P
P
109
1-Indexes
  • Database DB (V,E,Roots), V is finite set of
    nodes, E is a set of labeled edges, R is a set of
    root nodes.
  • Regular path expressions
  • P ? ? ƒ (PP) (P.P) P. where ƒ
    are formulas defined over predicates p1, p2,on
    the set of data values.
  • A path expression p v0 ? v1 ? v2vn-1 ? vn
  • Queries regular path expressions q(DB)
  • A query path is an expression of the form
  • P1 x1 P2 x2 Pn xn, xi variable names,
    Pis path expressions
  • A query has the form
  • Select x1, x2, , xn from P1 x1 P2 x2 Pn xn

a1
a2
an
110
1-Indexes
  • Path template t T1 x1 T2 x2 T3 x3, Ti a
    regular expression or or
  • Instantiating query paths
  • Query path q instantiating and
    by regular path expression and some formula,
    respectively, in template t
  • Example path template t (.Restaurant) x1
    x2 Name x3 x4
  • Query path instantiations
  • q1 (.Restaurant) x1 x2 Name x3 Fridays
    x4
  • q2 (.Restaurant) x1 x2 Name x3 _ x4
    ( _ is a predicate with True)
  • q3 (.Restaurant) x1 ( ? _ ) x2 Name x3
    Fridays x4

P
F
P
F
P
F
111
1-Indexes
  • Goal compute efficiently queries q ? inst(
    x)
  • A first attempt
  • ?u?V. Lu a1an v0 ? ? vn ?DB, v0?Root,
    vnu
  • ?u,v?V. u ? v ? Lu Lv
  • ?u?V. u v u ? v

P
a1
an
112
1-Indexes
  • Nodes(I) u u in nodes(DB)
  • Edges(I) u ? u? ?u ? u, ?u? ? u?,
    (u ? u?) ? Edges(DB)
  • Roots(I) r r ? roots(DB)

a
a
I
q(DB) u ? u? ? q(I), u ? u
Example
Inefficient construction cost
113
1-Indexes
  • Example

x ? y ? z, but x ?s y ?s z and x ?b y ?b z
114
Analyzing1-Indexes
  • Storing I-index
  • Associate an oid s to each node in I
  • Store graph I in standard form
  • Store for each node s, extent(s)
  • Extent(s) v s is an oid for v
  • Always size(I) lt size(DB) (unlike Dataguide)
  • Always can compute in O(nlogn) time nsize(DB)
  • When DB is a tree ?b , ?s , ? coincide
  • no penalty for ?b , ?s
  • 1-index Dataguide XSet

115
Analyzing1-Indexes
  • Do we have size(I) ltlt size(DB) ? No. Two worst
    cases
  • Facts
  • in theory except for these two DBs, size(I) ltlt
    size(DB)
  • in practice its a different story. Experiments
    size(I) ? 1/3 size(DB)

116
Evaluating Query Paths with 1-indexes
  • Example query q t.a x
  • The evaluation of q follows two paths t.a in I
    rather than five in DB and unions their extents
    7,13 ? 8,10,12
  • The extents in strong data guide overlap, hence
    storage may be larger

117
2-Indexes
  • Database DB (V, E, Roots)
  • Queries select x1, x2 from x1 P x2, with P a
    regular path expression
  • Template x1 x2.
  • Find pairs of nodes (x1, x2)
  • L(u,v) a1 an u ? ? v in DB
  • (u,v) ? (u?,v?) ? L(u,v) L(u?,v?)
  • Since computing ? is expensive, refinements ? are
    used instead that satisfy
  • (v,u) ? (v?,u?) ? (v,u) ? (v?,u?)

P
a1
an
118
2-Indexes
  • Nodes(I) (u,v) u,v ? Nodes(DB)
  • I2 Roots(I) (u,u) u ?
    Nodes(DB)
  • Edges(I) (u,v) ? (u,v?) v ? v? ?
    Edges(DB)
  • Storing I2
  • The graph
  • Extent(s) (v,u), for each node s representing
    the equivalence class (v,u)
  • L(v,u)(DB) L(v,u)(I2),
  • L(v,u)(DB) represents paths between v and u
  • L(v,u)(I2) represents the paths in the 2-index
    I2, between some root of the index and (v,u)
  • Query evaluation
  • To compute select x, y from x P y, we compute
    the query path P y on I2 and take the union of
    the extents.
  • This saves the search, but may have to start at
    several roots in I2, which is only one in case of
    acyclic databases

a
a
119
2-Index Example
  • Cost size(I) ? O(n2)
  • May be less in practice, similar to PAT trees
    (Patricia tree) for text databases

120
Querying and Storing XML Views of Relational Data
  • eXcelon The XML application development
    environment.
  • http//www.odi.com/excelon/main.htm.
  • Object Design's eXcelon XML repository maps XML
    documents to objects and supports queries over
    them based on the XQL language.
  • Jayavel, Jerry, et al. Querying XML Views of
    Relational Data, VLDB01)
  • www.microsoft.com/XML
  • Oracle 8i and DB2 supports XML views
  • Stored and Silkroute

121
Motivation
  • Most web data will continue to be stored in
    relational databases (more than 90)
  • Need some way to convert relational data to XML
  • XPERANTO (IBM) allows existing relational data to
    be viewed and queried as XML

122
Web Services Example
Supplier provides an XML View of its Data
XQuery over Catalog
Internet
Buyer
XQuery Result
XQuery
XQuery Result
Application CodeConvert XQuery toSQL Query
Application CodeConvert RelationalData to XML
Supplier
SQL Query
SQL Result
Relational Database
123
High-Level Architecture
XQuery Query
Query Result
XPERANTO
XQuery to SQLConverter
Tagger
SQL Query
SQL Result
Relational Database
push data- and memory-intensive computationdown
to relational engine
124
Example Relational Data
order
id custname custnum
10 Smith Construction 7734
9 Western Builders 7725
item
payment
oid desc cost
10 generator 8000
10 backhoe 24000
oid due amt
10 1/10/01 20000
10 6/10/01 12000
125
Default XML View
ltdbgt ltordergt ltrowgt ltidgt10 lt/idgt
ltcustnamegt Smith Construction lt/custnamegt
lt/rowgt ltrowgt ltidgt 9 lt/idgt
ltcustnamegtWestern Builders lt/custnamegt lt/rowgt
lt/ordergt ltitemgt ltrowgt ltoidgt 10
lt/oidgt ltdescgt generator lt/descgt ltcostgt 8000
lt/costgt lt/rowgt ltrowgt ltoidgt 10 lt/oidgt
ltdescgt backhoe lt/descgt ltcostgt 24000 lt/costgt
lt/rowgt lt/itemgt ltpaymentgt
similar to ltordergt and ltitemgt
lt/paymentgt lt/dbgt
126
XML View for Purchase Order
ltorder id10gt ltcustomergt Smith
Construction lt/customergt ltitemsgt
ltitem descriptiongenerator gt
ltcostgt 8000 lt/costgt lt/itemgt
ltitem descriptionbackhoegt
ltcostgt 24000 lt/costgt lt/itemgt
lt/itemsgt ltpaymentsgt
ltpayment due1/10/01gt
ltamountgt 20000 lt/amountgt lt/paymentgt
ltpayment due6/10/01gt
ltamountgt 12000 lt/amountgt
lt/paymentgt lt/paymentsgtlt/ordergt
127
Creating an XPERANTO View
create view orders as ( for order in
view(default)/order/row return ltorder
idorder/idgt ltcustomergt
order/custname lt/customergt
ltitemsgt lt/itemsgt
ltpaymentsgt
lt/paymentsgt lt/ordergt)
for item in view(default)/item/row where
order/id item/oid return ltitem
descriptionitem/desc gt
ltcostgt item/cost lt/costgt lt/itemgt
for payment in
view(default)/item/row
where order/id payment/oid
return ltpayment duepayment/dategt

ltamountgt payment/amount lt/amountgt
lt/paymentgt
sortby(_at_due)
128
Query for querying XML View
Get all orders of customer Smith
for order in view(orders)where
order/customer/text() like Smith return order
129
Query Processing in XPERANTO
XQuery
Query Result
XPERANTO Query Engine
XQuery Parser
XQGM
Query Rewrite View Composition
XQGM
Computation Pushdown
TaggerRuntime
Tagger Graph
SQL Query
Tuples
RDBMS
130
XQGM
  • Intermediate representation
  • General enough to capture semantics of a powerful
    language such as XQuery
  • Easy translation to SQL
  • XQGM based on
  • Borrows from other work on XML algebras
  • An extension of DB2s QGM

131
XQGM (contd.)
  • XQGM consists of
  • Operators
  • Functions (invoked inside operators)
  • Operators capture manipulation of relationships
    (similar to relational operators)
  • Functions capture manipulation of XML entities
    (elements, attributes, etc.)
  • XML construction functions
  • XML navigation functions

132
for order in view(orders)where
order/customer/text() like
Smith return order
133
Updating XML
  • XML stored in relations relational data
    published in XML.
  • Need to support updates of XML data stored in
    relations

Igor Tatarinov, Ives, Halvey et al. , Updating
XML , ACM Sigmod 2001
134
Example
  • ltbookdbgt
  • ltbook publisherphgt
  • lttitlegtA First Course in Database
    Systemslt/titlegt
  • ltauthorgt
  • ltnamegtUllmanlt/namegt
  • lt/authorgt
  • ltauthorgt
  • ltnamegtWidomlt/namegt
  • lt/authorgt
  • lt/bookgt
  • ltpublisher codephgt
  • ltnamegtPrentice Halllt/namegt
  • lt/publishergt
  • lt/bookdbgt

135
bookdb
publisher
book
publisherph
codeph
IDREF
name
title
author
author
A First Course
Prentice Hall
name
name
Ullman
Widom
XML Data Tree
136
XML Update Operations
  • Update primitives
  • Delete (child) (recursive)
  • Insert (content)
  • Replace (delete insert not efficient)
  • Move (copy delete not always feasible)
  • Multiple operations in a statement
  • Nested updates
  • Focus on semantics rather than syntax!

137
Querying XML Data
  • Xquery - Does not support updates

FOR b IN document(bookdb.xml)/book WHERE
b/author/name Bernstein RETURN b
  • Entire book elements are retrieved!

138
Multiple Update Operations
FOR book IN document(bookdb.xml)/book, pric
e IN book/price, WHERE price gt 100 UPDATE
book REPLACE price WITH price0.50 INSERT
ltcloseout/gt
  • Need to DELETE/INSERT/REPLACE in a single update!
  • Single UPDATE in SQL (when NULLs are used)

139
Semantic Issues in XML Updates
  • IDs and IDREFs
  • Cant duplicate XML IDs
  • Cant leave dangling references
  • Non-deterministic updates
  • There is more than one way (XPath expression) to
    get to an XML element
  • Would like to detect it at compile time

140
Deletion IDREFs
  • Solutions
  • Dont allow delete at all
  • Remove incoming refs
  • Delete entire referrers
  • Reattach to new parent? (schema violation likely)

141
Insertion (Copying) IDs IDREFs
  • Cant duplicate IDs
  • Elements with IDs cant be copied
  • XML IDs are created by user/app gt cant generate
    new IDs
  • OK to move (or copy into a different document)
  • Can copy IDREFs

142
Ambiguous Updates
  • Multiple paths to the same element gt
    non-deterministic update
  • Solution constrain the language
  • Update operations can only modify the children of
    the element being updated
  • Limit XPath expressions that can be used

Book/Price 10 Book/Price 0.8
book
IDREF
143
Implementing XML updates over a relational store
  • Map XML schema (DTD) to relations
  • Shared Inlining method SGT99
  • In our example
  • Book(tuple_id, title, price)
  • Author(parent_id, name)
  • tuple_id and parent_id link child tuples to their
    parents
  • Tuple ids are different from XML IDs!!
  • Map XQuery updates to SQL
  • Minimize the number of SQL statements

144
Deletion Methods
  • Trigger-based
  • Using per-tuple triggers
  • Using per-statement triggers
  • Cascading delete
  • Using Access Support Relations (ASRs)

145
Per-tuple Trigger-based Delete
  • Per-tuple trigger on table Book
  • DELETE FROM Author
  • WHERE parent_id deleted.tuple_id
  • Can use index fast !

146
Per-stmt Trigger-based Delete
  • Per-statement trigger on Book
  • DELETE FROM Author
  • WHERE parent_id NOT IN
  • (SELECT tuple_id FROM Book)
  • Have to scan entire Author table - slow!

147
Insertion Methods
  • Copying is the hard (interesting) case
  • Challenge assign new tuple ids so that the new
    parent/child pairs are linked correctly

148
Storing Semistructured Data with STORED
  • Technique to store and manage semistructured data
    with relational database systems
  • STORED - Semistructured TO Relational Data
  • Apply STORED to XML data as an instance of
    semistructured data

149
Introduction
  • Lorel and Tsimmis store data as graphs
  • Strudel stores data as structured text externally
    and as graph internally
  • XML data is stored along with tags that encode
    the schema
  • Advantages quick new data load, seamless changes
    of old data structures
  • Disadvantages space cost, processing time cost,
    no use for commercial RDBMS

150
STORED Usage
  • Aggressive mapping from semistructured to
    relational models (includes query mapping)
  • Overflow graphs store parts of semistructured
    data not fit in the schema
  • Usage 1 to store and manage existing
    semistructured data
  • Usage 2 to convert relational sources into
    semistructured format (XML for instance)

151
Contribution
  • STORED storage mapping of semistructured to
    relational data overflow graphs
  • Algorithm for constructing relational schema and
    STORED mapping
  • Algorithm for generating overflow mapping for a
    given relational mapping (can exploit DTD)
  • Query and update rewriting algorithm

152
(No Transcript)
153
(No Transcript)
154
Simple Storage Queries
  • M1 FROM Audit.taxpayer X
  • name N,
  • addr street S, zip Z
  • STORE Tpr1(X, N, S, Z)
  • M2 FROM Audit.taxpayer X
  • name N,
  • addr street S, zip Z ,
  • OPTaudited A,
  • OPTtaxamount T
  • STORE Tpr3(X, N, S, Z, A, T)
  • All variables (except intermediate) in the FROM
    clause must be used in STORE clause
  • Optional attributes can also be stored

155
Multiple Attributes
  • M1 FROM Audit.taxpayer X
  • name N,
  • audited A1,
  • OPT audited A2
  • STORE Taxpr7(N, A1, A2)
  • M2 FROM Audit.taxpayer X
  • name1 N,
  • audited1 A1,
  • OPT audited2 A2
  • STORE Taxpr7(N, A1, A2)
  • Objects may have multiple occurrences of the same
    attribute
  • STORED queries are rewritten with indexed
    attribute names

156
Label Variables
  • PhoneBook
  • John phone03-6540000, fax 09-7659900,
  • Joe phone 06-6542311,fax 08-8648000
  • FROM PhoneBook.L X
  • phone P, fax F
  • STORE R(X, L, P, F)
  • Possible to store data as attributes
  • Label variables in STORED are stored in relations
    as values

157
Overflow Queries
  • M1 FROM Audit.taxpayer X
  • name N,
  • addr street S, zip Z, O _ ,
  • OVERFLOW G(O)
  • STORE Taxpr1(X,
About PowerShow.com