Title: A Framework for Data Warehouse Solutions Stefano Spaccapietra
1A Framework for Data Warehouse
SolutionsStefano Spaccapietra
2General Architecture
OLAP Server
Data acquisition
Data extraction
External Sources
OLAP
queries/ reports
Data Warehouse
Query and Data Analysis Component
Data Integration Component
data mining
Internal Sources
Monitoring Administration
Construction maintenance
3Business Intelligence Applications
F r o n t e n d
Business engine
models
Analysis Query/ reporting Data mining
Business manager Business analyst
data
DW
Transaction DB
Telecom transaction data call detail records
BI applications traffic analysis, customer
loyalty analysis, fraud detection,
promotion effectiveness measure
4OLAP servers
- Memory management
- Efficient computation over data cubes
- MOLAP stores cubes in Multidimensional DB
- queries MDB
- ROLAP cubes are virtual views over the DW
- queries Relational DB some view
materialization
5OLAP functionalities
- Drill-down
- Examine data at a lower level of detail
- Drill-up
- Examine data at a higher level of detail
- Slicing Dicing
- Combination of drill-down and drill-up on
different dimensions
6Drilling Slicing
7OLAP servers limitations
- Designed for static operation refresh
recompute - Centralized tools not possible to distribute
computations - Not well integrated with DM tools
8Distributed DW architecture
- Local OLAP servers and DW
- Global DW and OLAP server
- Distribution by geography or by business sector
- Advantage
- Fast local detection of significant events or new
trends - Particularly relevant if there is a leading site
9Virtual DW
- DW as views over the operational DB
- Reduced latency
- Additional query load
- Needs cleaning and consolidation on the fly
(performance?) - Needs appropriate policy for storing reusable
summaries
10Entreprise Information Portal
- Similar to virtual DW
- Provides Web-based consolidated views
11Dynamic DW
Data extraction
Transaction data
Archives
- DW updates at few minutes intervals
- Transaction stream staged for cleaning and
loading - Build snapshot profile cubes, incremental merge
- Computation of cubes pipilined at higher summary
levels (hours, days) - Caching
- Periodic archiving
12The Data Warehouse
- Stores summary data to support decision making
- Subject oriented
- Integrated
- Non-volatile
- Covers a large timespan
- Fast access
13Subject orientation
Sales system
Employee data
Payroll system
Customer data
Vendor data
Purchasing system
Operational data DW
14Integrated data set
Sales system
Payroll system
Customer data
Purchasing system
15Non-volatile
DBMS
DW
access
create
Customer data
Sales system
update
delete
load
insert
16Data issues
- Massive volume
- 5 terabytes already exists
- 10 terabytes expected soon
- Dispersed, often difficult to access
- Badly or not at all integrated
- Complex
- Not structured for business queries
17The Multidimensional Idea
Region
Sales
granularity
Year
Product category
Quarter
Product type
Product
3 dimensions
18The Cube
Region
Mobiles Fax Standard
Vaud Fribourg Neuchatel
Product type
1999
1998
1997
Year
Sales of standard telephones in 1997 in Vaud
region
19Star-join Schema
the dimension tables
the fact table
20Storing the Cube
- Relational database
- normalized
- Sales (Region, Year, Product
type, amount) - unnormalized
- Sales (Region, Product type, 1999, 1998, 1997)
21Multidimensional storage
- In multidimensional database
- Each normalized tuple can be represented as a
point in a virtual multidimensional space (the
number of dimensions is given by the number of
attributes) - As the database grows, the multidimensional space
is partitioned so that each part contains no more
than x tuples (typically, x tuples 1 physical
block) - A directory is kept to memorize where the parts
are stored - When a new tuple is inserted, the system computes
in which part it has to be stored according to
the current directory - If there is room, the new tuple is stored
- If not, a new part is added and the old part,
plus the new tuple, is split among the old and
new parts the directory is consequently uodated
22Data Integration
- Male,female
- 1, 0
- M,F
- cm
- inches
- yards
- Jan.22,1999
- 22/01/99
- 01/22/99
23Metadata
- Where the data comes from
- Any transformation that has been applied
- The current description
- Coding / reference system in use
- Versions, if any
- Quality indicator
- Security rules
- Update frequency, latency
Meta Data Coalition
24DW development lifecycle
- Quoted from W.H.Inmon Building the DW
- Implement warehouse
- Integrate data
- Test for bias
- Program against data
- Design DSS system
- Analyze results
- Understand requirements
25DW Monitoring
- Identify growth factors and rate
- Identify what data is being used
- Identify who is using the data, and when
- ? Avoid constant growth
- ? Plan for evolution (trends)
- Identify useful granularity levels
- Control response time (latency)
26Granularity
- High level of detail
- Details of every phone call by a customer for a
month - 200 records per month, 40000 bytes per month
- Low level of detail
- Summary of phone calls by a cusomer for a month
- 1 record per month, 200 bytes per month
- of calls, average duration, long distance
calls, of unsuccessful calls,
27Queries vs/ granularity
- High-level of detail
- Did Phil Rochat call Alex Smith in Geneva last
Friday? - Low level of detail
- How many long distance calls have been made, on
the average, by customers in Geneva last month? - of records to search of Geneva customers
- 200 times as much with high level of detail data
- 1 second ? 320 seconds
28Partitioning
- To improve performances flexibility without
giving up on the details - By date, business type, geography,
DW
? Data marts
29DW development strategy
30Themes for Data Warehousing
- Introduction to MIS M.A.
- A Framework for Data Warehouse Solutions S.S.
- Populating the Data Warehouse S.S.
- Demonstration Sybase B.I.
- Data Modeling and Metadata Design M.A.
- Keeping Spatial and Temporal Information S.S.
- Infrastructure M.A.
- Deployment Procedures M.A.
- Conclusion M.A. S.S.