Data Warehousing and OLAP MIS 542 Chapter

4

- 2014/2015 Fall

Chapter 3 Data Warehousing and OLAP Technology

for Data Mining

- What is a data warehouse?
- A multi-dimensional data model
- Data warehouse architecture
- Data warehouse implementation
- Further development of data cube technology
- From data warehousing to data mining

What is Data Warehouse?

- Defined in many different ways, but not

rigorously. - A decision support database that is maintained

separately from the organizations operational

database - Support information processing by providing a

solid platform of consolidated, historical data

for analysis. - A data warehouse is a subject-oriented,

integrated, time-variant, and nonvolatile

collection of data in support of managements

decision-making process.W. H. Inmon - Data warehousing
- The process of constructing and using data

warehouses

Data WarehouseSubject-Oriented

- Organized around major subjects, such as

customer, product, sales. - Focusing on the modeling and analysis of data for

decision makers, not on daily operations or

transaction processing. - Provide a simple and concise view around

particular subject issues by excluding data that

are not useful in the decision support process.

Data WarehouseIntegrated

- Constructed by integrating multiple,

heterogeneous data sources - relational databases, flat files, on-line

transaction records - Data cleaning and data integration techniques are

applied. - Ensure consistency in naming conventions,

encoding structures, attribute measures, etc.

among different data sources - E.g., Hotel price currency, tax, breakfast

covered, etc. - When data is moved to the warehouse, it is

converted.

Data WarehouseTime Variant

- The time horizon for the data warehouse is

significantly longer than that of operational

systems. - Operational database current value data.
- Data warehouse data provide information from a

historical perspective (e.g., past 5-10 years) - Every key structure in the data warehouse
- Contains an element of time, explicitly or

implicitly - But the key of operational data may or may not

contain time element.

Data WarehouseNon-Volatile

- A physically separate store of data transformed

from the operational environment. - Operational update of data does not occur in the

data warehouse environment. - Does not require transaction processing,

recovery, and concurrency control mechanisms - Requires only two operations in data accessing
- initial loading of data and access of data.

Data Warehouse vs. Heterogeneous DBMS

- Traditional heterogeneous DB integration
- Build wrappers/mediators on top of heterogeneous

databases - Query driven approach
- When a query is posed to a client site, a

meta-dictionary is used to translate the query

into queries appropriate for individual

heterogeneous sites involved, and the results are

integrated into a global answer set - Complex information filtering, compete for

resources - Data warehouse update-driven, high performance
- Information from heterogeneous sources is

integrated in advance and stored in warehouses

for direct query and analysis

Data Warehouse vs. Operational DBMS

- OLTP (on-line transaction processing)
- Major task of traditional relational DBMS
- Day-to-day operations purchasing, inventory,

banking, manufacturing, payroll, registration,

accounting, etc. - OLAP (on-line analytical processing)
- Major task of data warehouse system
- Data analysis and decision making
- Distinct features (OLTP vs. OLAP)
- User and system orientation customer vs. market
- Data contents current, detailed vs. historical,

consolidated - Database design ER application vs. star

subject - View current, local vs. evolutionary, integrated
- Access patterns update vs. read-only but complex

queries

OLTP vs. OLAP

Why Separate Data Warehouse?

- High performance for both systems
- DBMS tuned for OLTP access methods, indexing,

concurrency control, recovery - Warehousetuned for OLAP complex OLAP queries,

multidimensional view, consolidation. - Different functions and different data
- missing data Decision support requires

historical data which operational DBs do not

typically maintain - data consolidation DS requires consolidation

(aggregation, summarization) of data from

heterogeneous sources - data quality different sources typically use

inconsistent data representations, codes and

formats which have to be reconciled

Chapter 2 Data Warehousing and OLAP Technology

for Data Mining

- What is a data warehouse?
- A multi-dimensional data model
- Data warehouse architecture
- Data warehouse implementation
- Further development of data cube technology
- From data warehousing to data mining

From Tables and Spreadsheets to Data Cubes

- A data warehouse is based on a multidimensional

data model which views data in the form of a data

cube - A data cube, such as sales, allows data to be

modeled and viewed in multiple dimensions - Dimensions
- A sale data warehouse with respect to dimension
- time, item, branch, location
- Dimension tables, such as
- item (item_name, brand, type), or
- time(day, week, month, quarter, year)
- Facts numerical measures
- dollars_sold sales amount in dollars
- units_sold number of units sold
- Fact table contains measures (such as

dollars_sold) and keys to each of the related

dimension tables

A 2-D data cube A table or spreadsheet for

sales from AllElectronics

AllElectronics sales data for items sold per

quarter in the city of Istanbul

Time dimension organized in quarters item

dimension by types of items sold The fact or

measure is dollar_sold

A three dimensional cube

Dimensions time, item, location for the cities

A Data Cube Representation of the same data

- Sales volume as a function of product, month, and

region

Dimensions Product, Location, Time Hierarchical

summarization paths

izmir

locations

ankara

istabbul

items

phon

comp

Quarters

Q1

A 4-D Cube as a series of 3-D cubes

Dimensions item,time,location,supplier

From Supplier A

From Supplier B

From Supplier C

locations

items

Quarters

n-Dimensional Cube

- Any n-D data as a series of (n-1)-D cubes
- In data warehousing literature,
- A data cube is referred to as a cuboid
- The lattice of cuboids forms a data cube.
- The cuboid holding the lowest level of

summarization is called a base cuboid. - the 4-D cuboid is the base cuboid for the given

four dimensions - The top most 0-D cuboid, which holds the

highest-level of summarization, is called the

apex cuboid. - Here this is the total sales, or dollars_sold

summarized over all four dimensions - typically denoted by all

Cube A Lattice of Cuboids

all

0-D(apex) cuboid

time

item

location

supplier

1-D cuboids

time,item

time,location

item,location

location,supplier

2-D cuboids

time,supplier

item,supplier

time,location,supplier

time,item,location

3-D cuboids

item,location,supplier

time,item,supplier

4-D(base) cuboid

time, item, location, supplier

Conceptual Modeling of Data Warehouses

- Modeling data warehouses dimensions measures
- Star schema A fact table in the middle connected

to a set of dimension tables - Snowflake schema A refinement of star schema

where some dimensional hierarchy is normalized

into a set of smaller dimension tables, forming a

shape similar to snowflake - Fact constellations Multiple fact tables share

dimension tables, viewed as a collection of

stars, therefore called galaxy schema or fact

constellation

Example of Star Schema

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

(No Transcript)

Example of Snowflake Schema

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

A unnormalized City Table

City province and Country is repeated For every

steet in Istanbul

A Normalized City Table

Unnecessary repitations of province end country

are eliminated Memory gain but complex queries

Example of Fact Constellation

Shipping Fact Table

time_key

Sales Fact Table

item_key

time_key

shipper_key

item_key

from_location

branch_key

to_location

location_key

dollars_cost

units_sold

units_shipped

dollars_sold

avg_sales

Measures

Measures Three Categories

- A multidimensional point in the data cube space
- dimension-value pairs
- (timeQ1,locationIstanbul,itemcomputer)
- A data cube measure is a numerical function that

can be evaluated at each point in the data cube

space - computed for a given point by aggregating the

data corresponding to the respective

dimension-value pairs defining the given point

Measures Three Categories

- distributive
- Suppose the data D is partitioned into n sets Di

i 1,..n - the computation of the function f on each

partition derives one aggregate value - Ai f(Di) i 1,..,n, f(D)f(A1,A2,..,An)
- if the result derived by applying the function to

n aggregate values is the same as that derived by

applying the function on all the data without

partitioning. - The function can be computed in a distributed

manner - E.g., count(), sum(), min(), max().

Example Sum()

- Data set D1,3,6,8,9
- Sum(D) 27
- Partition the set into D1 end D2 as
- D11,3,6), D28.9
- Sum(D1) 10, Sum(D2) 17
- Sum(sum(D1),sum(D2)) sum(10,17) 27

sum(D)

Measures Three Categories

- algebraic if it can be computed by an algebraic

function with M arguments (where M is a bounded

integer), each of which is obtained by applying a

distributive aggregate function. - E.g., avg(), min_N(), standard_deviation().
- E.g.,avg() sum()/count()
- both sum() and count() are distributive agg.

Functions - Show that
- min_N(), standard_deviation().

Measures Three Categories

- holistic if there is no constant bound on the

storage size needed to describe a subaggregate. - There is no an algebric function with M

arguments(M being bounded) that characterizes

the computation - E.g., median(), mode(), rank().

Example

- The relational database scheme for AE
- time(time_key,day,day_of_week,month,quarter,year)
- item(item_key,item_name,brand,type,supplier_type)
- branch(branch_key,branch_name,branch_type)
- location(location_key,street,city,province_or_stat

e,country) - sales(time_key,item_key,branch_key,location_key,nu

mber_of_units_sold,price)

Example cont.

- Select s.time_key,s.item_key,s.branch_key,s.locati

on_key, - sum(s.number_of_units_solds.price),
- sum(s.number_of_units_sold)
- from time t, item i,branch b,location l,sales s,
- where s.time_keyt.time_key and
- s.item_keyi.item_key and
- s.branch_keyb.branch_key and
- s.location_keyl.location_key
- group by s.time_key, s.item_key,

s.branch_key,s.location_key

Example Cont.

- The cube created is the base cuboid of the

sales_star datacube - it contains all of the dimensions
- granularity of each is at the join key level
- by changing the group by clauses
- E.g.,
- group by t.month sum up the measures of each

group by month - removing group by s.branch_key generate a

higher-level cuboid - recoving all group bys total sum of dollars sold

and total count of units_sold - zero-dimensional cuboid is apex cuboid

Time by month

- Select t.year,t.month,s.item_key,s.branch_key,s.lo

cation_key,sum(s.number_of_units_solds.price),sum

(s.number_of_units_sold) - from time t, item i,branch b,location l,sales s,
- where s.time_keyt.time_key and
- s.item_keyi.item_key and
- s.branch_keyb.branch_key and
- s.location_keyl.location_key
- group by t.year, t.month, s.item_key,

s.branch_key,s.location_key

A three dimensional cuboid

- Select s.time_key,s.item_key, s.location_key,
- sum(s.number_of_units_solds.price),
- sum(s.number_of_units_sold)
- from time t, item i,branch b,location l,sales s,
- where s.time_keyt.time_key and
- s.item_keyi.item_key and
- s.branch_keyb.branch_key and
- s.location_keyl.location_key
- group by s.time_key, s.item_key,s.location_key

Concept hierarchies

- Defines a sequence of mappings from a set of

low-level concepts to high-level more general

concepts - E.g., dimension location is described by
- number,street,city,province_or_state,zipcode and

country - are related by a total order, forming a concept

hierarchy - streetltcityltprovince_or_stateltcountry
- The attributes of a dimension may be organized in

a partial order, forming a lattice - day,week,month,quarter, year
- dayltmonthltquarter,weekltyear

A Concept Hierarchy Dimension (location)

all

all

Europe

North_America

...

region

Mexico

Canada

Spain

Germany

...

...

country

Vancouver

...

...

Toronto

Frankfurt

city

M. Wind

L. Chan

...

office

Partially ordered co

- The attributes of a dimension may be organized in

a partial order, forming a lattice - day,week,month,quarter, year
- dayltmonthltquarter,weekltyear
- predefined in the data mining system
- time
- fiscal year starting on April 1
- academic year starting on September 1

Multidimensional Data

- Sales volume as a function of product, month, and

region

Dimensions Product, Location, Time Hierarchical

summarization paths

Region

Industry Region Year Category

Country Quarter Product City Month

Week Office Day

Product

A partially ordered hierarchy

Month

Cuboids Corresponding to the Cube

all

0-D(apex) cuboid

location

item

time

1-D cuboids

item,time

item,location

time, location

2-D cuboids

3-D(base) cuboid

item, time, location

Set grouping hierarchy

- Set-grouping hierarchy
- discretizing or grouping values for a given

dimension or attribute - Ex price
- There may be more than one concept hierarchy for

a given attribute or dimension based on

different user viewpoints - price by defining ranges for
- inexpensive, moderately_priced,expensive

How defined?

- provided by manually by
- system users
- domain experts
- knowledge engineers or
- automatically generated based on statistical

analysis of the data distribution

Typical OLAP Operations

- In the multidimensional model data are organized

into multiple dimensions - each dimension contains multiple levels of

abstraction defined by concept hierarchies - This organization provides users with the

flexibility to view data from different

perspectives

Example

- Refer to figure 2.10 in Hans book
- data cube for AllElls sales
- dimensions location,time,item
- location -- city
- time -- quarters
- item -- item types
- measure displayed is dollars-sold

Cuboids Corresponding to the Cube

all

0-D(apex) cuboid

location

item

time

1-D cuboids

item,time

item,location

time, location

2-D cuboids

3-D(base) cuboid

item, time, location

Roll-up (drill-up)

- Climbing up a concept hierarchy for a dimension

or - by dimension reduction
- Exroll-up operation aggregates data by ascending

the location hierarchy - from the level of city
- to the level of country
- rather than grouping the data by city,the cubes
- groups the data by country

By a drill up opperation examine sales By country

rather than city level

roll up

- when performed by dimension reduction
- one or more dimensions are removed from the cube
- Ex a sales cube with location and time
- roll-up may remove the time dimension
- aggregation of total sales by location
- rather than by location and by time

Two dimensional cuboid

One dim. cuboid

Drill-down (roll-down)

- reverse of roll-up
- navigates from less detailed data to more

detailed data - from higher level summary to lower level summary

or detailed data, or - stepping down a concept hierarchy for a dimension
- introducing new dimensions
- Ex drill-down for time
- dayltmonthltquarterltyear
- form the level of quarter
- to the more detailed level of month
- Adding a new dimension to the data

Drill down

Slice and dice

- Slice a selection on one dimension of the cube
- resulting in subcube
- Ex sale data are selected for dimension time

using time Q1 - dice defines a subcube by performing a selection

on two or more dimensions - Ex a dice opp. Based on
- locationtoronto or vencover and
- time Q1 or Q2 and
- item home entertainment or computer

slice

dice

Pivot (rotate)

- Visualization opp. Rotates the data axes in view

to provide an alternative presentation of data - Exitem and location axes in a 2-D slice are

rotated - or transforming a 3-D cube into a series of 2-D

planes

Other OLAP operations

- drill across involving (across) more than one

fact table - drill through through the bottom level of the

cube to its back-end relational tables (using

SQL) - ranking the top N or bottom N items in lists
- moving averages
- growth rates
- interests

Parent Child Dimensions

- Based on two dimension table columns that

together define the lieage relationships among

the members of the dimension - Member key columnidentifies each member
- Parent key column identifies the parent of each

member

Example A HR dimension

Poal West

James Smith

Jill Kelly

Amy Joens

John Grande

Jo Brown

A Concept Hierarchy Dimension (location)

all

all

Europe

North_America

...

region

Mexico

Canada

Spain

Germany

...

...

country

Vancouver

...

...

Toronto

Frankfurt

city

M. Wind

L. Chan

...

office

Example Parent Child Dimension

- Emp._Name Emp_Number Man_Emp_Num
- James Smith 1 3
- Amy Jones 2 3
- Paul West 3 3
- Jill Kelly 4 3
- John Grande 5 1
- Jo Brown 6 1
- Emp_Num identifies each member
- Man_Emp_Nun identifies the parent of each member

A Star-Net Query Model

- Radial lines from a central point
- each line represents a concept hierarchy for a

dimension - each abstraction level is called a footprint
- granularities available for use by OLAP
- Exfigure 2.11
- four radial lines,for concept hierarchies
- location,customer,item,time
- time line has 4 footprints
- day,month,quarter,year

A Star-Net Query Model

Customer Orders

Shipping Method

Customer

CONTRACTS

AIR-EXPRESS

ORDER

TRUCK

PRODUCT LINE

Product

Time

DAILY

QTRLY

ANNUALY

PRODUCT ITEM

PRODUCT GROUP

CITY

SALES PERSON

COUNTRY

DISTRICT

REGION

DIVISION

Each circle is called a footprint

Location

Organization

Promotion

Chapter 2 Data Warehousing and OLAP Technology

for Data Mining

- What is a data warehouse?
- A multi-dimensional data model
- Data warehouse architecture
- Data warehouse implementation
- Further development of data cube technology
- From data warehousing to data mining

Design of a Data Warehouse A Business Analysis

Framework

- Four views regarding the design of a data

warehouse - Top-down view
- allows selection of the relevant information

necessary for the data warehouse - Data source view
- exposes the information being captured, stored,

and managed by operational systems - Data warehouse view
- consists of fact tables and dimension tables
- Business query view
- sees the perspectives of data in the warehouse

from the view of end-user

Data Warehouse Design Process

- Top-down, bottom-up approaches or a combination

of both - Top-down Starts with overall design and planning

(mature) - Bottom-up Starts with experiments and prototypes

(rapid) - From software engineering point of view
- Waterfall structured and systematic analysis at

each step before proceeding to the next - Spiral rapid generation of increasingly

functional systems, short turn around time, quick

turn around - Typical data warehouse design process
- Choose a business process to model, e.g., orders,

invoices, etc. - Choose the grain (atomic level of data) of the

business process - Choose the dimensions that will apply to each

fact table record - Choose the measure that will populate each fact

table record

Multi-Tiered Architecture

Monitor Integrator

OLAP Server

Metadata

Analysis Query Reports Data mining

Serve

Data Warehouse

Data Marts

Data Sources

OLAP Engine

Front-End Tools

Data Storage

Three Data Warehouse Models

- Enterprise warehouse
- collects all of the information about subjects

spanning the entire organization - Data Mart
- a subset of corporate-wide data that is of value

to a specific groups of users. Its scope is

confined to specific, selected groups, such as

marketing data mart - Independent vs. dependent (directly from

warehouse) data mart - Virtual warehouse
- A set of views over operational databases
- Only some of the possible summary views may be

materialized

Data Warehouse Development A Recommended Approach

Multi-Tier Data Warehouse

Distributed Data Marts

Enterprise Data Warehouse

Data Mart

Data Mart

Model refinement

Model refinement

Define a high-level corporate data model

Storage of the cube

- Cuboids are referred as aggregations
- One factor affecting storage requirements
- Sparsity the amount of empty cells in a cube
- The base cuboid is likely to contain many empty

cells - it is a spares cube or array
- the 0 or lower dimensional cuboids are less

spares than the higher dimensional ones - it is not likely that they contain empty cells
- Moving along higher levels for the dimension

hierarchy - the cuboids becomes less spares or more dense

Two dimensional sparse cuboid

PC Prt CD DV

01.10.2003 10

02.10.2003 1

03.10.2003 4 2

04.10.2003 2

items

01.10.2003 10

02.10.2003 1

03.10.2003 6

04.10.2003 2

One dimensional Densed cuboid

OLAP Server Architectures Relational OLAP

(ROLAP)

- Use relational or extended-relational DBMS to

store and manage warehouse data and OLAP middle

ware to support missing pieces - query response is generally slower
- low storage requirement
- Include optimization of DBMS backend,

implementation of aggregation navigation logic,

and additional tools and services - greater scalability
- appropriate for large data sets that are

infrequently queried - historical data from less recent previous years

Multidimensional OLAP (MOLAP)

- Array-based multidimensional storage engine

(sparse matrix techniques) - fast indexing to pre-computed summarized data
- a two-level storage representation
- dense subcubes are stored as array structures
- spars subcubes are stored by compression

techniques - appropriate for cubes with frequent use and rapid

query response

Hybrid OLAP (HOLAP)

- combines ROLAP and MOLAP benefiting from
- greater scalability of ROLAP
- faster computation of MOLAP
- Large volumes of data base cuboid is stored in a

relational database - aggregations are stored as arrays
- appropriate for for cubes that requre
- rapid query response for summaries based on a

large amount of base data

Efficient Data Cube Computation

- Data cube can be viewed as a lattice of cuboids
- The bottom-most cuboid is the base cuboid
- The top-most cuboid (apex) contains only one cell
- Example A data cube containing
- item,city,year as dimensions and
- sales_in_ as measure
- typical queries are
- compute sum of sales, grouping by item and city
- compute sum of sales, grouping by item
- compute sum of sales, grouping by city
- what is the total number of cuboids or group by s

for this data cube - 238

Cuboids Corresponding to the Cube

all

0-D(apex) cuboid

country

product

date

1-D cuboids

product,date

product,country

date, country

2-D cuboids

3-D(base) cuboid

product, date, country

Efficient Data Cube Computation

- How many cuboids in an n-dimensional cube with 1

levels? - 2n cuboids including the base cuboid
- in OLAP compute all or at least some of the

cuboids in advance - fast response time
- avoids some redundant computation

Number of cuboids

- if the cube has many dimensions with multiple

level hierarchies - T is total number of cuboids
- Li is the number of levels associated with

dimension i - excluding the top level all
- as generalizing to all is equivalent to the

removal of a dimension

Materialization of data cube

- There are three choices for data cube

materialization given a base cuboid - Materialize every (cuboid) (full

materialization), - huge amounts of memory space
- none (no materialization), or
- slow processing of queries
- some (partial materialization)
- trade-off between storage space and response time
- Selection of which cuboids to materialize
- Based on size, sharing, access frequency, etc.
- A heuristic approach for cuboid selection
- materialize the set of cuboids on which other

popularly referenced cuboids are based

Processing Cubes

- Complete load of the cube
- all dimension and fact table data is read and
- all specified aggregations cuboids are calculated
- process a cube when
- its structure is new or
- its dimensions or measures have been edited
- Incrementally updating a cube
- new data is added but existing data not changed

and cube structure si the same - Refreshing
- data cleared and reloaded
- its aggregations recalculated
- faster then processingno design of aggregation

tables

Calculated Members

- Dimension member or measure whose value is

computed at run time using an expression - Only the definitions are stored but values exists

only in memory upon a query - do not increase in cube size
- Ex if sales and cost are included in the base

fact table - a profit measure can be a calculated member
- profit sales cost
- Average_sales sales/_items_sold

Virtual cubes

- Combination of multiple cubes in one logical cube
- can be based on a single cube to expose only

selected subsets of measures and dimensions - Require no physical space
- store only the dimensions information not actual

data - provide a valuable security functiton
- limiting the access of some users

Member Properties

- Attribute of a dimension member
- provides additional information about the member
- a column in the same dimension table as the

associated members - used in queries
- provide users more options when analysing cube

data

Example time table

- A typical time table
- (time_id,day,month,quarter,year,business

day,leap,day of the week) - dimension levels dayltmonthltquarterltyear
- member properties for day
- weekend or business day0 or 1
- day of the week1,2,3,...,7
- a member property for year is
- whether it is leap year or not0 or 1

Virtual Dimensions

- Logical dimension based on a member property of a

level in a physical dimension - enables users to analyze cube data based on the

member properties of dimension levels - add a virtual dimension to a cube only if
- the dimension that supplies its member property

is also included in the cube - adding a virtual dimension does not increase cube

size - not affect cube processing time
- calculated in memory when needed
- query processing time is slower

Example

- The business day column was a member property for

day level of the time dimension - the user may want to investigate sales by
- type of day (business or weekend)
- makes business day member property as a virtual

dimension of the sale cube

Exercise

- How do you represent the same organizational

chart by treditional concept hyerarchies?

Data Warehouse Usage

- Three kinds of data warehouse applications
- Information processing
- supports querying, basic statistical analysis,

and reporting using crosstabs, tables, charts and

graphs - Analytical processing
- multidimensional analysis of data warehouse data
- supports basic OLAP operations, slice-dice,

drilling, pivoting - Data mining
- knowledge discovery from hidden patterns
- supports associations, constructing analytical

models, performing classification and prediction,

and presenting the mining results using

visualization tools. - Differences among the three tasks

From On-Line Analytical Processing to On Line

Analytical Mining (OLAM)

- Why online analytical mining?
- High quality of data in data warehouses
- DW contains integrated, consistent, cleaned data
- Available information processing structure

surrounding data warehouses - ODBC, OLEDB, Web accessing, service facilities,

reporting and OLAP tools - OLAP-based exploratory data analysis
- mining with drilling, dicing, pivoting, etc.
- On-line selection of data mining functions
- integration and swapping of multiple mining

functions, algorithms, and tasks. - Architecture of OLAM

An OLAM Architecture

Layer4 User Interface

Mining query

Mining result

User GUI API

OLAM Engine

OLAP Engine

Layer3 OLAP/OLAM

Data Cube API

Layer2 MDDB

MDDB

Meta Data

Database API

FilteringIntegration

Filtering

Layer1 Data Repository

Data Warehouse

Data cleaning

Databases

Data integration

Summary

- Data warehouse
- A subject-oriented, integrated, time-variant, and

nonvolatile collection of data in support of

managements decision-making process - A multi-dimensional model of a data warehouse
- Star schema, snowflake schema, fact

constellations - A data cube consists of dimensions measures
- OLAP operations drilling, rolling, slicing,

dicing and pivoting - OLAP servers ROLAP, MOLAP, HOLAP
- Efficient computation of data cubes
- Partial vs. full vs. no materialization
- Multiway array aggregation
- Bitmap index and join index implementations
- Further development of data cube technology
- Discovery-drive and multi-feature cubes
- From OLAP to OLAM (on-line analytical mining)

References (I)

- S. Agarwal, R. Agrawal, P. M. Deshpande, A.

Gupta, J. F. Naughton, R. Ramakrishnan, and S.

Sarawagi. On the computation of multidimensional

aggregates. In Proc. 1996 Int. Conf. Very Large

Data Bases, 506-521, Bombay, India, Sept. 1996. - D. Agrawal, A. E. Abbadi, A. Singh, and T. Yurek.

Efficient view maintenance in data warehouses.

In Proc. 1997 ACM-SIGMOD Int. Conf. Management of

Data, 417-427, Tucson, Arizona, May 1997. - R. Agrawal, J. Gehrke, D. Gunopulos, and P.

Raghavan. Automatic subspace clustering of high

dimensional data for data mining applications. In

Proc. 1998 ACM-SIGMOD Int. Conf. Management of

Data, 94-105, Seattle, Washington, June 1998. - R. Agrawal, A. Gupta, and S. Sarawagi. Modeling

multidimensional databases. In Proc. 1997 Int.

Conf. Data Engineering, 232-243, Birmingham,

England, April 1997. - K. Beyer and R. Ramakrishnan. Bottom-Up

Computation of Sparse and Iceberg CUBEs. In

Proc. 1999 ACM-SIGMOD Int. Conf. Management of

Data (SIGMOD'99), 359-370, Philadelphia, PA, June

1999. - S. Chaudhuri and U. Dayal. An overview of data

warehousing and OLAP technology. ACM SIGMOD

Record, 2665-74, 1997. - OLAP council. MDAPI specification version 2.0. In

http//www.olapcouncil.org/research/apily.htm,

1998. - J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D.

Reichart, M. Venkatrao, F. Pellow, and H.

Pirahesh. Data cube A relational aggregation

operator generalizing group-by, cross-tab and

sub-totals. Data Mining and Knowledge Discovery,

129-54, 1997.

References (II)

- V. Harinarayan, A. Rajaraman, and J. D. Ullman.

Implementing data cubes efficiently. In Proc.

1996 ACM-SIGMOD Int. Conf. Management of Data,

pages 205-216, Montreal, Canada, June 1996. - Microsoft. OLEDB for OLAP programmer's reference

version 1.0. In http//www.microsoft.com/data/oled

b/olap, 1998. - K. Ross and D. Srivastava. Fast computation of

sparse datacubes. In Proc. 1997 Int. Conf. Very

Large Data Bases, 116-125, Athens, Greece, Aug.

1997. - K. A. Ross, D. Srivastava, and D. Chatziantoniou.

Complex aggregation at multiple granularities.

In Proc. Int. Conf. of Extending Database

Technology (EDBT'98), 263-277, Valencia, Spain,

March 1998. - S. Sarawagi, R. Agrawal, and N. Megiddo.

Discovery-driven exploration of OLAP data cubes.

In Proc. Int. Conf. of Extending Database

Technology (EDBT'98), pages 168-182, Valencia,

Spain, March 1998. - E. Thomsen. OLAP Solutions Building

Multidimensional Information Systems. John Wiley

Sons, 1997. - Y. Zhao, P. M. Deshpande, and J. F. Naughton. An

array-based algorithm for simultaneous

multidimensional aggregates. In Proc. 1997

ACM-SIGMOD Int. Conf. Management of Data,

159-170, Tucson, Arizona, May 1997.