Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7 - PowerPoint PPT Presentation

About This Presentation
Title:

Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7

Description:

Title: Data Warehousing Overview: Issues, Terminology, Products and Research Author: Janet Wiener Last modified by: Shivnath Babu Created Date: 6/13/1997 9:22:38 PM – PowerPoint PPT presentation

Number of Views:349
Avg rating:3.0/5.0
Slides: 29
Provided by: Janet241
Category:

less

Transcript and Presenter's Notes

Title: Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7


1
Cube Computation and Indexes for Data Warehouses
CPS 196.03Notes 7
2
Processing
  • ROLAP servers vs. MOLAP servers
  • Index Structures
  • Cube computation
  • What to Materialize?
  • Algorithms

3
ROLAP Server
  • Relational OLAP Server

tools
Special indices, tuning Schema is denormalized
4
MOLAP Server
  • Multi-Dimensional OLAP Server

M.D. tools
multi-dimensional server
could also sit on relational DBMS
5
MOLAP
Total annual sales of TV in U.S.A.
6
MOLAP
B
7
Challenges in MOLAP
  • Storing large arrays for efficient access
  • Row-major, column major
  • Chunking
  • Compressing sparse arrays
  • Creating array data from data in tables
  • Efficient techniques for Cube computation

Topics are discussed in the paper for reading
8
Index Structures
  • Traditional Access Methods
  • B-trees, hash tables, R-trees, grids,
  • Popular in Warehouses
  • inverted lists
  • bit map indexes
  • join indexes
  • text indexes

9
Inverted Lists
. . .
data records
inverted lists
age index
10
Using Inverted Lists
  • Query
  • Get people with age 20 and name fred
  • List for age 20 r4, r18, r34, r35
  • List for name fred r18, r52
  • Answer is intersection r18

11
Bit Maps
. . .
age index
data records
bit maps
12
Bitmap Index
  • Index on a particular column
  • Each value in the column has a bit vector bit-op
    is fast
  • The length of the bit vector of records in the
    base table
  • The i-th bit is set if the i-th row of the base
    table has the value for the indexed column
  • not suitable for high cardinality domains

Base table
Index on Region
Index on Type
13
Using Bit Maps
  • Query
  • Get people with age 20 and name fred
  • List for age 20 1101100000
  • List for name fred 0100000001
  • Answer is intersection 010000000000
  • Good if domain cardinality small
  • Bit vectors can be compressed

14
Join
  • Combine SALE, PRODUCT relations
  • In SQL SELECT FROM SALE, PRODUCT WHERE ...

15
Join Indexes
join index
16
Cube Computation for Data Warehouses
17
Counting Exercise
  • How many cuboids are there in a cube?
  • The full or nothing case
  • When dimension hierarchies are present
  • What is the size of each cuboid?

18
Lattice of Cuboids
129
all
city
product
date
city, product
city, date
product, date
city, product, date
19
Dimension Hierarchies
all
state
city
20
Dimension Hierarchies
all
product
city
date
product, date
city, product
city, date
state
city, product, date
state, date
state, product
state, product, date
not all arcs shown...
21
Efficient Data Cube Computation
  • Data cube can be viewed as a lattice of cuboids
  • The bottom-most cuboid is the base cuboid
  • The top-most cuboid (apex) contains only one cell
  • How many cuboids in an n-dimensional cube with L
    levels?
  • Materialization of data cube
  • Materialize every (cuboid) (full
    materialization), none (no materialization), or
    some (partial materialization)
  • Selection of which cuboids to materialize
  • Based on size, sharing, access frequency, etc.

22
Derived Data
  • Derived Warehouse Data
  • indexes
  • aggregates
  • materialized views (next slide)
  • When to update derived data?
  • Incremental vs. refresh

23
Idea of Materialized Views
  • Define new warehouse tables/arrays

24
Efficient OLAP Processing
  • Determine which operations should be performed on
    available cuboids
  • Transform drill, roll, etc. into corresponding
    SQL and/or OLAP operations, e.g., dice
    selection projection
  • Determine which materialized cuboid(s) should be
    selected for OLAP
  • Let the query to be processed be on brand,
    province_or_state with the condition year
    2004, and there are 4 materialized cuboids
    available
  • 1) year, item_name, city
  • 2) year, brand, country
  • 3) year, brand, province_or_state
  • 4) item_name, province_or_state where year
    2004
  • Which should be selected to process the query?
  • Explore indexing structures compressed vs.
    dense arrays in MOLAP

25
What to Materialize?
  • Store in warehouse results useful for common
    queries
  • Example

total sales
day 2
. . .
day 1
129
materialize
26
Materialization Factors
  • Type/frequency of queries
  • Query response time
  • Storage cost
  • Update cost

Will study a concrete algorithm later
27
Iceberg Cube
  • Computing only the cuboid cells whose count or
    other aggregates satisfying the condition like
  • HAVING COUNT() gt minsup
  • Motivation
  • Only a small portion of cube cells may be above
    the water in a sparse cube
  • Only calculate interesting cellsdata above
    certain threshold

28
Challenges in MOLAP
  • Storing large arrays for efficient access
  • Row-major, column major
  • Chunking
  • Compressing sparse arrays
  • Creating array data from data in tables
  • Efficient techniques for Cube computation

Topics are discussed in the paper for reading
Write a Comment
User Comments (0)
About PowerShow.com