Achieving%20Adaptivity%20for%20OLAP-XML%20Federations - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Achieving%20Adaptivity%20for%20OLAP-XML%20Federations

Description:

Torben Bach Pedersen. Aalborg University. Joint work with Dennis Pedersen, TARGIT ... Torben Bach Pedersen DOLAP 2003 Torben Bach Pedersen DOLAP 2003. 8 ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 26
Provided by: anjaj
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Achieving%20Adaptivity%20for%20OLAP-XML%20Federations


1
Achieving Adaptivity for OLAP-XML Federations
  • Torben Bach Pedersen
  • Aalborg University
  • Joint work with Dennis Pedersen, TARGIT

2
Overview
  • Background OLAP-XML federations
  • New challenges
  • XML data changes
  • Slow or unreliable XML sources
  • Schema changes in data sources
  • Other challenges
  • Integration in TARGIT architecture
  • Other applications of the techniques
  • Conclusion and future work
  • Related work

3
Data Warehousing OLAP
  • Multidimensional analysis TARGIT Analysis

4
OLAP
  • Good for complex ad hoc queries
  • Simple natural, graphical queries
  • Fast pre-aggregation
  • A number of problems with physical integration
  • Short-term and varying data needs
  • Population, product info, ...
  • Dynamical data
  • Stock quotes, competitor pricing, ...
  • Data with limited access
  • Competitor product info, public databases, ...

5
OLAP-XML Federations
  • Traditional OLAP architecture

6
OLAP-XML Federations
  • Logical integration of XML data
  • External dimensions
  • External measures
  • Data combined at query time

Client
Federation
XML
Cube
7
OLAP-XML Federations
  • Logical integration of XML data
  • External dimensions
  • External measures
  • Data combined at query time
  • Transparent for users
  • Flexible many XML sources
  • Quick running in a few mins
  • Data is always fresh
  • Performance often comparable to physical
    integration

8
XPath Queries for Fetching XML
  • ltBooksgt
  • ltBookgt
  • ltTitlegt1984lt/Titlegt
  • ltAuthorgtOrwelllt/Authorgt
  • lt/Bookgt
  • ltBookgt
  • ltTitlegtOf Mice and Menlt/Titlegt
  • ltAuthorgtSteinbecklt/Authorgt
  • lt/Bookgt
  • lt/Booksgt
  • /Books/BookAuthorSteinbeck/Title

XPath
Dimension value
9
Old And New TARGIT Architecture
10
New Challenges
  • Our previous work focused on basic aspects
  • Flexibility
  • General performance
  • Implementation
  • New what can go wrong? need for adaptivity
  • XML data changes
  • XML sources slow or unreliable
  • Schema changes (XML, OLAP, federation)
  • We often have no control over the XML sources
  • A solution has broad interest views over XML
    sources

11
XML Data Changes
  • Basic federation
  • XML data is integrated at query time gt XML data
    changes handled automatically
  • However, XML data is cached for performance
  • Cache timeout value ensures fresh data (set
    manually or automatically)
  • 0 cache timeout gt always fetch from source
  • Only few current XML databases inform about
    changes
  • Xyleme allows users to subscribe to changes
  • Only delta should be transferred

12
ICE Information and Content Exchange
  • Protocol proposed by W3C for automatically
    informing about and requesting changes
  • Supported by major vendors
  • Push subscribe to changes and keep cache
    up-to-date
  • Pull request changes from source at query time

13
Slow and Unreliable XML Sources
  • Overload, maintenance, HW breakdown, attacks
  • Often we no influence on this
  • Incremental presentation for user
  • What if source is too slow or no reply at all?
  • Inform user that the system is not working?
  • Specification of alternative sources
  • Several queries per external dimension/measure
  • Increased fault tolerance, also better performance

14
Slow and Unreliable XML Sources
  • Start several queries and use the fastest
  • Always uses the fastest, but heavy load on
    sources
  • Use first response time as indicator for total
    time
  • Start one query at a time
  • Minimal load on sources, but slower

15
Slow and Unreliable XML Sources
  • Alternative sources of lower quality better than
    no data?
  • Alternatives
  • Expired cache data
  • Google, Xyleme, The WayBack Machine
  • Backup-disk, tape
  • Etc.

Source Speed Quality
Local cache Fastest Fresh
Original source Fast? Freshest
Expired cache Fastest Old
Backup source Fast/slow Very old
16
Slow and Unreliable XML Sources
  • In practice?
  • Sources with equal priority chosen at random

17
Result Algorithm for Fetching XML Data
18
Experiments
  • 1st experiment fetching a 137 KB dimension
  • Start 8 queries, when first 3 respond, (cancel)
    last 5, when fastest query finish, (cancel)
    remaining 2
  • Fast reply good indication of overall speed
  • 2nd experiment search local cache, then Google
    cache

19
Schema Changes In XML Sources
  • How to synchronize XML views after schema change?
    (solution described in separate paper)

Bibliography
Bibliography
Book
Publisher
Publisher
Author
AName
Price
Book
PName
Author
Title
PName
Price
Title
AName
/Bibliography/AuthorANameOrwell/Book/Title
20
Additional Challenges
  • Changes to federation schema
  • Cache may be invalidated
  • Discard affected cache results (unproblematic)
  • OLAP data changes
  • Cache may be invalidated
  • Less frequent than XML data changes gt cache will
    often have expired anyway
  • OLAP schema changes
  • Federated schema may be invalidated
  • Rare and easy to detect (and correct)

21
Integrating Techniques - Architecture
22
Integrating Techniques Query Processing
  • Query Evaluator splits query into XMLOLAP parts
    and determines query plan based on cost
  • Execution Engine coordinates and executes plan
  • Cache Manager maintains cache, e.g., through ICE
  • XML Component interface fetches XML data, chooses
    between available XML sources (Algorithm 1)
  • View Synchronizer handles schema changes
  • Metadata Manager manages info about external
    dimensions and measures XML component
    characteristics

23
Other Applications
  • All XPath-based views on XML data
  • Links to parts of XML documents
  • Web pages
  • Documents (DocBook)
  • Software applications
  • and many more
  • Automatic recreation of broken links
  • Increased fault tolerance and performance using
    alternative sources

?
24
Conclusion and Future Work
  • Operational problems in OLAP-XML federations
  • XML data changes
  • Slow and unreliable XML sources
  • Using several sources (Algorithm 1)
  • Experiment with Algorithm1
  • Techniques integrated into federation
    architecture
  • Schema evolution and other challenges
  • Future work
  • TARGIT implementation and testing
  • Using techniques in other applications

25
Related Work
  • Data changes in XML/semistructured documents
  • Xyleme Zhuge
  • Schema changes in scientific documents
  • Not XML
  • Adaptive/dynamic query optimization
  • Telegraph project
  • We use once per source, rather than per tuple
  • Does not consider one or more of OLAPXML
    concepts, schema changes, slow and unreliable
    sources
  • Own previous OLAP-XML work is not adaptive
About PowerShow.com