Materialized%20View%20Selection%20in%20a%20Multidimensional%20Database - PowerPoint PPT Presentation

About This Presentation
Title:

Materialized%20View%20Selection%20in%20a%20Multidimensional%20Database

Description:

Time, which can be characterized by Timestamp, Date, Week, Month, Quarter, Year. ... It depends on the number of attributes of the dimensions of the MDDB ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 24
Provided by: ksu7
Learn more at: https://www.cs.kent.edu
Category:

less

Transcript and Presenter's Notes

Title: Materialized%20View%20Selection%20in%20a%20Multidimensional%20Database


1
Materialized View Selection in a Multidimensional
Database
  • Presenter Dong Wang
  • 3/14/2006

2
outlines
  • What is multidimensional database.
  • Why materialize views.
  • The cost evaluation.
  • The MDred-lattice.

3
Multidimensional Database
  • A multidimensional database (MDDB) is a data
    repository that provides an integrated
    environment for decision support queries that
    require complex aggregations on huge amounts of
    historical data.
  • An MDDB is a relational data warehouse where the
    information is organized following the so-called
    star-model.

4
A Practical Example
  • Consider the MDDB for a large store chain,
    characterized by a large number of stores, each
    of which is a supermarket selling a wide variety
    of different products. We can identify the
    following dimensions
  • Product, which can be characterized by
    Product_id, Department, Manufactured_date and
    Price.
  • Store, which can be characterized by Store_id,
    store address (which can be decomposed into City,
    State, and Zip).
  • Time, which can be characterized by Timestamp,
    Date, Week, Month, Quarter, Year.

5
The schema of the example
Sales
Transaction_id Timestamp Product_id Store_id
6
Example queries
  • Query 1 the total sales for year 2003.
  • SELECT SUM (Price)
  • FROM Sales, Time, Product
  • WHERE Sales.Product_id Product.Product_id
  • AND Sales.Timestamp Time.Timestamp
  • AND Time.Year 2003
  • Query 2 the total sales for store at Ohio.
  • SELECT SUM (Price)
  • FROM Sales, Store, Product
  • WHERE Sales.Product_id Product.Product_id
  • AND Sales.Store_id Store.Store_id
  • AND Store.State Ohio

7
How many views an MDDB can have?
  • It depends on the number of attributes of the
    dimensions of the MDDB
  • without hierarchies on the dimensional tables,
    the number is
  • In our example database with only 3 dimension
    tables of 6, 4, 4 attributes, this number is
    18785, but for a real-world database with 50
    attributes, this number is 2501015,

8
outlines
  • What is multidimensional database.
  • Why materialize views.
  • The cost evaluation.
  • Data-cube lattice, MD-lattice and MDred-lattice.

9
Materialized View
  • A materialized view is the result of some
    queries, which we choose to store in the
    database, rather than reconstructing it as needed
    in response to queries.
  • INSERT INTO SalesV1
  • SELECT SUM (Price)
  • FROM Sales, Time, Product
  • WHERE Sales.Product_id Product.Product_id
  • AND Sales.Timestamp Time.Timestamp
  • GROUP BY (Time.Year)

The materialized view SalesV1 can answer the
query 1 directly.
10
outlines
  • What is multidimensional database.
  • Why materialize views.
  • The cost evaluation.
  • MDred-lattice.

11
The MDmat-Problemthe cost
  • Query cost the cost of computing query qi,
    given a set of materializations M. We want to
    minimize this cost.
  • Update cost , here mi is the ith view in M
    and
  • fmi is the frequency mj is updated and cu(mi) is
    the update cost for mi. We want to minimize this
    cost too.
  • So, given the query set and the materialized
    view set, the cost of this solution is the sum of
    the above two costs

12
choose the right views to materialize
  • Compare to the possible views we can have, the
    number of queries is extremely small. Consider
    the data-cube lattice we have below, among the
    total 16 nodes, only 4 nodes may be used to
    answer queries. So we can only select a small
    number of views to materialize.

13
Functional Dependence
  • Functional dependency is a constraint on the
    content of the dimension table
  • for each tuple pair t1,t2 and fd Al?Ar,
    t1Alt2Al?t1Art2Ar
  • Examples
  • 1. In the dimension table Store, we have fds1
    Store_id ?Zip, fds2 Zip ? City, fds3 City
    ?State.
  • 2. In the dimension talbe Time, we have fdt1
    timestamp ?week, fdt2 timestamp ?date, fdt3
    date ?month, fdt4 month ?quarter, fdt5 quarter
    ?year.

Use the attributes hierarchy, we can get the
multidimensional lattice.
14
The MD-lattice
Timestamp
Date
Week
Month
Quarter
Year
all
all
The MD-lattice of the Store dimension
The MD-lattice of the Time dimension
15
Candidate Views
  • Its impossible (and no need) to materialize all
    the possible views in the data cube. We only need
    the views which can help us to answer the
    queries.
  • We only consider the views that can provide some
    contribution to reduce the total cost, the
    candidate views.
  • A view vi belonging to an MD-lattice is a
    candidate view if one of the following two
    conditions holds
  • 1. View vi is associated to some query qi
  • 2. There exist two candidate views vj and vk,
    and vi is the least upper bound of vj and
    vk.

16
The materialization of a non-candidate view will
not help
  • Suppose there is a non-candidate view vi and its
    materialized. We consider two cases
  • There is no candidate view depending on vi. Since
    vi will not change the query cost, and the update
    cost for view vi is always positive, so
    materialize vi will not help.
  • At least one candidate view exists depending on
    vi. Say theres a candidate view vj depending on
    vi. Since the size of vj is smaller than vi, we
    can see the update cost of vj is always smaller
    than vi. That means the materialization of vi
    always costs more.
  • Conclusion we should always choose the candidate
    view to materialize.

case 1 case 2 Both views are
materialized only the non-candidate view is
materialized
17
Candidate views examples
  • For query 1 on slide 5, we can choose the view
    SalesV1 to materialize.
  • For query 2, we can do
  • CREAT MATERIALIZED VIEW SalesV2
  • SELECT SUM (Price)
  • FROM Sales, Store, Product
  • WHERE Sales.Product_id Product.Product_id
  • AND Sales.Store_id Store.Store_id
  • GROUP BY (Store.State)
  • In both examples, we choose the view which is
    associated to the query to materialize.

18
outlines
  • What is multidimensional database.
  • Why materialize views.
  • The cost evaluation.
  • The MDred-lattice.

19
The MDred-lattice
  • Given an MD-lattice and a set of queries Q, the
    set of its candidate views forms the
    MDred-lattice.
  • The MDred-lattice Construction Algorithm

20
An MD-lattice construction
  • Suppose we have two queries
  • query 1 the total sale of the week 50.
  • query 2 the total sale of the 3rd quarter of
    year 2005.
  • From the MDred-lattice construction algorithm,
    first we need to materialize the views group by
    attribute Week and attribute Quarter to answer
    the queries, then we need to extend the view set
    by adding the least upper bound, attribute
    Timestamp to the view set.

21
The cost evaluation
  • Suppose we have two queries qj and qk, consider
    both the query cost and the update cost, we have
    two options
  • Option 1 materialize vj and vk. The total cost
    is
  • Option 2 only materialize vi, which is the
    least upper bound of vj and vk. The total cost is

22
The cost evaluation (cont.)
  • For option 1, let fu0.8, cu(vj)100, fqj0.5,
    cqj(vj)100, cu(vk)100, cqk(vk)100, we can get
  • C10.81000.51000.81000.5100260.
  • For option 2, let fu0.8, the update cost will be
    larger (since the cardinality of vi is larger),
    say cu(vi)120, the query cost will also be
    larger (since additional aggregation will be used
    to answer the queries), say cqj(vi)110,
    cqk(vi)110, we can get
  • C20.81200.51100.5110206.
  • So option 2 is the better choice!

23
References
  • Materialized view selection in a multidimensional
    database. Elena Baralis, Stefano Paraboschi and
    Ernest Teniente. Proceedings of the 23rd VLDB
    Conference.1997
  • Designing Data Warehouses. Dimitri Theodoratos,
    Timos Sellis. 1999
Write a Comment
User Comments (0)
About PowerShow.com