Introduction to Data Warehousing - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Introduction to Data Warehousing

Description:

Data Granularity In an operational system, data is usually kept at the lowest level of detail. In a DW, data is summarized at different levels. – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 44
Provided by: swa133
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Data Warehousing


1
Introduction to Data Warehousing
2
Why Data Warehouse?
3
Scenario 1
ABC Pvt. Ltd is a company with branches at
Mumbai, Delhi, Chennai and Bangalore. The Sales
Manager wants quarterly sales report. Each branch
has a separate operational system.
4
Scenario 1 ABC Pvt Ltd.
Mumbai
Delhi
Sales per item type per branch for first quarter.
Sales Manager
Chennai
Banglore
5
Solution 1ABC Pvt Ltd.
  • Extract sales information from each database.
  • Store the information in a common repository at a
    single site.

6
Solution 1ABC Pvt Ltd.
Mumbai
Data Warehouse
Report
Delhi
Query Analysis tools
Sales Manager
Chennai
Banglore
7
Scenario 2
One Stop Shopping Super Market has huge
operational database. Whenever Executives wants
some report the OLTP system becomes slow and data
entry operators have to wait for some time.
8
Scenario 2 One Stop Shopping
Data Entry Operator
Operational Database
Report
Management
Wait
Data Entry Operator
9
Solution 2
  • Extract data needed for analysis from operational
    database.
  • Store it in another system, the data warehouse.
  • Refresh warehouse at regular intervals so that it
    contains up to date information for analysis.
  • Warehouse will contain data with historical
    perspective.

10
Solution 2
Data Entry Operator
Operational database
Data Warehouse
Report
Extract data
Manager
Transaction
Data Entry Operator
11
Scenario 3
Cakes Cookies is a small, new company. The
chairman of this company wants his company to
grow. He needs information so that he can make
correct decisions.
12
Solution 3
  • Improve the quality of data before loading it
    into the warehouse.
  • Perform data cleaning and transformation before
    loading the data.
  • Use query analysis tools to support adhoc queries.

13
Solution 3
14
Summing up?
  • Why do you need a warehouse?
  • Operational systems could not provide strategic
    information
  • Executive and managers need such strategic
    information for
  • Making proper decision
  • Formulating business strategies
  • Establishing goals
  • Setting objectives
  • Monitoring results

15
Why operational data is not capable of producing
valuable information?
  • Data is spread across incompatible structures and
    systems
  • Not only that, improvements in technology had
    made computing faster, cheaper and available

16
FAILURES OF PAST DECISION-SUPPORT SYSTEMS
OLTP systems
17
Decision support systems
18
Operational and informational
19
What is Data Warehouse??Is it the only viable
solution
20
Business intelligence at DW
21
Functional definition of a DW
  • The data warehouse is an informational
    environment that
  • Provides an integrated and total view of the
    enterprise
  • Makes the enterprises current and historical
    information easily available for decision making
  • Makes decision-support transactions possible
    without hindering operational systems
  • Renders the organizations information
    consistent
  • Presents a flexible and interactive source of
    strategic information

22
Questions????
  • Describe five differences between operational
    systems and informational systems
  • A data warehouse in an environment, not a
    product. Discuss.

23
)A data warehouse is not a single software or
hardware product that provide the strategic
information. It is a computing environment where
the users can find strategic information, an
environment where the users are put directly in
touch with the data to make better decisions. It
is a user centric environment. The
characteristics of this computing environment
are, 1. An ideal environment for data analysis
and decision support. 2. Fluid, flexible and
interactive 3. It is hundreds percent
user-driven 4. Provide the ability to discover
answer to complex and unpredictable questions
24
Building Blocks of a Datawarehouse
25
Inmonss definition
A data warehouse is - subject-oriented, -
integrated, - time-variant, -
nonvolatile collection of data in support of
managements decision making process.
26
Subject-oriented
  • Data warehouse is organized around subjects such
    as sales, product, customer.
  • It focuses on modeling and analysis of data for
    decision makers.
  • Excludes data not useful in decision support
    process.

27
(No Transcript)
28
Integration
  • Data Warehouse is constructed by integrating
    multiple heterogeneous sources.
  • Data Preprocessing are applied to ensure
    consistency.

RDBMS
Data Warehouse
Legacy System
Flat File
Data Processing Data Transformation
29
(No Transcript)
30
Integration
  • In terms of data.
  • encoding structures.
  • Measurement of
  • attributes.
  • physical attribute.
  • of data
  • naming conventions.
  • Data type format

remarks
31
Time-variant
  • Provides information from historical perspective,
    e.g. past 5-10 years
  • Every key structure contains either implicitly or
    explicitly an element of time, i.e., every record
    has a timestamp.
  • The time-variant nature in a DW
  • Allows for analysis of the past
  • Relates information to the present
  • Enables forecasts for the future

32
Non-volatile
  • Data once recorded cannot be updated.
  • Data warehouse requires two operations in data
    accessing
  • Initial loading of data
  • Incremental loading of data

33
(No Transcript)
34
Data Granularity
  • In an operational system, data is usually kept at
    the lowest level of detail.
  • In a DW, data is summarized at different levels.
  • Three data levels in a banking data warehouse

Daily Detail Monthly Summary Quaterly Summary
Account Account Account
Activity Date Month Month
Amount No. of transactions No. of transactions
Deposit/ Withdraw Withdrawals Withdrawals
Deposits Deposits
Beginning Balance Beginning Balance
Ending Balance Ending Balance
35
Operational v/s Information System
Features Operational Information
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database professional Knowledge workers
Function Day to day operation Decision support
Data Content Current Historical, archived, derived
View Detailed, flat relational Summarized, multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Read only
36
Operational v/s Information System
Features Operational Information
Focus Data in Information out
No. of records accessed tens/ hundreds millions
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Usage Predictable, repetitive Ad hoc, random, heuristic
Response Time Sub-seconds Several seconds to minutes
Priority High performance,high availability High flexibility,end-user autonomy
Metric Transaction throughput Query throughput
37
Two approaches in designing a DW
Top-down approach Bottom-up approach
Enterprise view of data Narrow view of data
Inherently architected Inherently incremental
Single, central storage of data Faster implementation of manageable parts
Centralized rules and control Each datamart is developed independently
Takes longer time to build Comparatively less time than a DW
Higher risk to failure Less risk of failure
Needs higher level of cross-functional skills Unmanageable interfaces

38
Bottom Up Approach
39
Top Down Approach
40
A Practical Approach-Kimball
  1. Plan and Define requirements
  2. Create a surrounding architecture
  3. Conform and Standardize the data Content
  4. Implement Data Warehouse as series of super-mart
    one at a time.

41
An Incremental Approach
Glossary
Common Business Metrics
Common Business Rules
Common Business Dimensions
Common Logical Subject Area ERD
Individual Architected Data Marts
42
The Eventual Result
Architected Enterprise Foundation
43
Data Warehouse Holds multiple subject
areas Holds very detailed information Works to
integrate all data sources Does not necessarily
use a dimensional model but feeds dimensional
models. Data Mart Often holds only one subject
area- for example, Finance, or Sales May hold
more summarised data (although many hold full
detail) Concentrates on integrating information
from a given subject area or set of source
systems Is built focused on a dimensional model
using a star schema.
44
Data Warehouse verses data marts
Write a Comment
User Comments (0)
About PowerShow.com