Title: Analyzing and Defining Business Requirements for a Data Warehouse
1Chapter 5
Data Warehouse Fundamentals
- Analyzing and Defining Business
Requirements for a Data Warehouse
Paul K Chen
1
2Chapter 5- Objectives
- Learn what is the definition of the business
requirements - Understand the role of business dimensions
related to DW business requirements - Learn specifically the steps in defining and
recording DW business requirements - Review methods for gathering requirements (JAD,
Interviews and Sampling) - Discuss briefly architecture concepts impacted by
business requirements -
3Definition of The Business Requirements
- The definition of requirements is the users
statement of how he or she wants to do business,
and the information required to support his or
her new methods of operations.
4Definition of The Business Requirements
- The requirements can be broadly divided into two
areas - 1 Functional requirementswritten in user
terminology since it is user operations that are
being described. - 2 Non-functional requirements these are the
limitations and demands imposed upon the
computing solutions such as architectural plan,
data storage specifications and information
system performance expectations.
5Requirements As the Driving Force for Data
Warehousing
- Understand why business requirements are the
driving force - Discuss how requirements drive every development
phase - Specifically Learn how requirements influence
data design - Review the impact of requirements on architecture
- Note the special considerations for ETL and
metadata - Examine how requirements shape information
delivery
6Business Requirements As the Driving Force
Business Requirements
Maintenance
Planning Management
Deployment
Design Architecture Infrastructure Data
Acquisition Data Storage Information Delivery
Construction Architecture Infrastructure Data
Acquisition Data Storage Information Delivery
7Dimensional Nature of Business Data
The business data of sales units (fact) is
measured and analyzed in three dimensional.
Product
Geography
Time
8Examples of Business Facts and Dimensions
Manufacturing Company
Supermarket Chain
Supermarket Chain
Cust-ship-to
Time
Time
Promotion
Ship from
Sale Unit
Shipment
Ship Mode
Store
Product
Deal
Product
Airline Company
Insurance Business
Customer
Time
Agent
Time
Flight
Claim
Frequent Flyer Flights
Claims
Fare Class
Insured Party
Airport
Status
Status
Policy
9Defining and Recording Information Requirements
for a Data Warehouse
- Nine-Step Methodology includes the following
steps - Step 1 Choosing the process
- Step 2 Choosing the grain
- Step 3 Identifying and conforming the dimensions
- Step 4 Choosing the facts
- Step 5 Storing pre-calculations in the fact
table - Step 6 Rounding out the dimension tables
- Step 7 Choosing the duration of the database
- Step 8 Tracking slowly changing dimensions
- Step 9 Deciding the query priorities and the
query modes.
10Step 1 Choosing The Process (Subject Area)
- The process (function) refers to the subject
matter of a particular data mart. - First data mart built should be the one that is
most likely to be delivered on time, within
budget, and to answer the most commercially
important business questions.
11Subject Area
- Selecting the first subject area or areas to be
populated - Use the enterprise level data model in
selecting appropriate subject area(s) - Three Options
- -- Implement a single subject area (best
option) - -- Implement a subset of a subject area
- -- Implement a subset of several subject areas
(most common) - Determine how much data should be loaded and its
variety
12Step 2 Choosing The Grain
- Decide what a record of the fact table is to
represent. - Identify dimensions of the fact table. The grain
decision for the fact table also determines the
grain of each dimension table. - Also include time as a core dimension, which is
always present in star schemas. Due to disk space
constraint, data selected must be time relevant
in terms of trend, predictability, and
profitability for the enterprise.
13Step 3 Identifying And Conforming The Dimensions
- Dimensions set the context for asking questions
about the facts in the fact table. - If any dimension occurs in two data marts, they
must be exactly the same dimension, or one must
be a mathematical subset of the other. - A dimension used in more than one data mart is
referred to as being conformed (shared).
14Step 4 Choosing The Facts
- The grain of the fact table determines which
facts can be used in the data mart. - Facts should be numeric and additive.
- Unusable facts include
- non-numeric facts
- non-additive facts
- fact at different granularity from other facts in
table.
15Fact Criteria
- Weight the Fact attributes based upon the
following - criteria
- They exhibit measurable results to the Users and
Management. - They are visible within the business and through
management. - They are manageable.
16Subject Area
- Subject areas are collections of like data that
support - analysis of the major subjects in a business.
Election - criteria
- They consist of two or more attributes.
- They are essential to the successful operation of
the target system or business area to meet client
objectives. - They can be defined by governing business rules.
17Step 5 Storing Pre-Calculations In The Fact Table
- Once the facts have been selected each should be
re-examined to determine whether there are
opportunities to use pre-calculations.
18ADD DERIVED DATA
- Benefits
- Less space used
- Enhanced performance
-
- Breaking_lease
- Percentage_of_breaking_lease (lt 3_months)
- Percentage_of_breaking_lease (gt3 but lt 6
months) - Percentage_of_breaking_lease (gt6 but lt9
months) - Percentage_of_breaking_lease (gt9 but lt12
months) - Percentage_of_breaking_lease (gt 12 months)
19Add Summarization Schemes
- Simple summation
- Summation by group
- Aggregation
- Vertical summarization
20Simple Summation --Add Summarization Schema
Individual Daily sales Date Product Qty
Sales Jan 1 nuts 100 300 Jan 1
nuts 200 600 Jan 2 nuts 300
900 Jan 2 nuts 100 300 Jan 3
Nuts 50 150 Jan 3 Nuts 40
120
Daily Sales Summary Date Product Qty
Sales Jan 1 Nuts 300 900 Jan 2
Nuts 400 1,200 Jan 3 Nuts 90
20
21Summation By Group
- Group data attributes based on usage and
- stability.
- Group stable and slowly changing data all in one
table - Group unstable and frequently changing data all
in another table
22Aggregation
- Aggregation is used to create data marts.
- For instance, a group of users frequently
perform analysis comparing sales across
geographic regions, broken by product line. If a
data mart were created that stores the sales data
already aggregated to the desired level, the
users queries would be simpler.
23Aggregation
Add up amounts by day in sql SELECT date, sum
(amt) FROM SALE GROUP BY date
p
sale
Store
date
amt
ans
date
sum
p1 p2 p1 p2
1 1 2 1
1 2 4 3
c1 c2 c3 c1
6 4
1 2
Roll Up Drill Down
24Vertical Summarization
- Summarization building upon a single dimensional
- theme
-
- Monthly renters
- Total of all renters
- Total of new renters
- Total rental income
-
- Monthly sales
- Staff name
- Total sales
- Total houses sold
25Step 6 Rounding Out The Dimension Tables
- Text descriptions are added to the dimension
tables. - Text descriptions should be as intuitive and
understandable to the users as possible. - Usefulness of a data mart is determined by the
scope and nature of the attributes of the
dimension tables.
26Step 7 Choosing The Duration Of The Database
- Duration measures how far back in time the fact
table goes. For ex. Insurance Tax
Considerations. - Very large fact tables raise at least two very
significant data warehouse design issues. - Often difficult to source increasing old data.
- It is mandatory that the old versions of the
important dimensions be used, not the most
current versions, known as the Slowly Changing
Dimension problem.
27Step 8 Tracking Slowly Changing Dimensions
- Slowly changing dimension problem means that the
proper description of the old dimension data must
be used with old fact data. - Often, a generalized key must be assigned to
important dimensions in order to distinguish
multiple snapshots of dimensions over a period of
time.
28Step 9 Deciding The Query Priorities And The
Query Modes
- Most critical physical design issues affecting
the end-users perception includes - physical sort order of the fact table on disk
- presence of pre-stored summaries or aggregations.
- Additional physical design issues include
administration, backup, indexing performance, and
security.
29Criteria For Assessing The Dimensionality Of A
Data Warehouse
- Criteria proposed by Ralph Kimball to measure the
extent to which a system supports the dimensional
view of data warehousing. - Twenty criteria divided into three broad groups
architecture, administration, and expression.
30Architectural Criteria
- Architectural criteria describes way the entire
system - is organized.
- Explicit declaration
- Conformed dimensions and facts
- Dimensional integrity
- Open aggregate navigation
- Dimensional symmetry
- Dimensional scalability
- Sparsity tolerance
-
31Administration Criteria
- Administration criteria are considered to be
essential - to the smooth running of a dimensionally-oriente
d - data warehouse.
- Graceful modification
- Dimensional replication
- Changed dimension notification
- Surrogate key administration
- International consistency
-
-
32Expression Criteria
- Expression criteria are mostly analytic
capabilities - that are needed in real-life situations.
- Multiple-dimension hierarchies
- Ragged-dimension hierarchies
- Multiple valued dimensions
- Slowly changing dimensions
- Roles of a dimension
- Hot-swappable dimensions
- On-the-fly fact range dimension
- On-the-fly behaviour dimension
-
33Business Requirements (Use Automaker Sales as an
example)
- In order to get an idea of the data to be used by
the sales and - Inventory department, a facilitation session was
held with - 15 key end users and the IT data warehouse team.
The - following business questions were generated from
that - meeting
- What is the sales trend in quantity and dollar
amounts sold each Make, Model, Series and Color
for a specific dealer, for each
34Matching User Requirements to DW Data
Requirements (Develop Fact Table)
- Primary Key
- dealer_id
- month_year
- sales_area_id
- make
- model
- series
35Determine Dimensions Attributes
- Dimensions
- sales_area_dim
- sales_time_dim
- dealer_dim
- Attributes
- dealer_mms_sales_qty
- dealer_mms_sales_dollar_amt
- dealer_ytd_mms_sales_qty
- dealer_ytd_mms_sales_amt
- dealer_inventory_qty
36Determine Dimensions Attributes
37Collecting The Business Requirements via JAD
Sessions
- JAD (Joint Application Development) vs.
Traditional Way of - Gathering Requirements
- JAD sessions (also called facilitated session)
are used to gather information and feedback and
confirm the results of requirements gathering. - JAD sessions replace the traditional way of
conducting a series of interviews on a one-to-one
basis with the users. - Advantages Achieving consensus during the
session when - multiple sources of information exist, raising
and - addressing issues or assigning them for
resolution, and - immediately confirming information.
38JAD Session
- JAD sessions are used to scope the project. Each
session should last two to three day. They are
very focused and fast-paced. - JAD sessions can be very formal and follow strict
guidelines or be informal group sessions.
39JAD - Roles
- Whether they are formal or informal, there are
four - necessary roles to be filled
- Facilitator
- The Facilitator is the session leader. It is the
facilitators - responsibility to ensure that the objectives of
the sessions - are met.
- Scribes(s)
- Scribes are responsible for recording the
minutes of the - session and optionally constructing deliverables
using an - automated tool as the session progresses.
40JAD - Roles
- User
- The users provide knowledge specific to the
scope of the - project.
- Developers
- Developers are the team members who will be
building - the system.
41JAD Session
- The session is divided into three segments
- Introduction Welcoming remarks description of
the facilities such as rest room locations,
messages, reviewing the agenda and setting
expectations. - Conducting the session To confirm deliverables
set out in the session objectives. - Wrapping up the session By summarizing progress
towards the objectives reviewing the agenda for
the next one and obtaining feedback from the
participants.
42JAD Session
- Potential drawbacks
- The commitment of a large block of time for all
participants - Requirements collected could be less than
satisfactory due to unpredictability of the JAD
session or organizational culture not
sufficiently developed to enable the concerted
efforts required to be productive in a JAD
setting.
43Five Steps in Interview Preparation
- Reading background material
-
- Establishing interview objectives
- Deciding when to interview
- Preparing the interviewee
- Deciding on question type and structure
-
44Two Types of Questions Open-End Questions vs.
Closed Questions
- Open-ended interview questions
- Open describes the interviewees options for
responding. - They are open.
- Advantages
- Putting the interviewee at ease
- Allowing more spontaneity
- Disadvantages
- Possibly losing control of the interview
- May not get the types of answers you want
-
45Two Types of Questions Open-End Questions vs.
Closed Questions
- Closed interview questions
- Such as How many subordinates do you have?
- Benefits
- Getting to relevant data
- Keeping control over the interview
- Drawbacks
- Failing to obtain rich detail
- Intimidating the interviewee
46Three Basic Ways of Structuring Interviews
- Pyramid Structure Starting from closed
questions, then gradually expand into open
territory. - Funnel Structure The reverse of pyramid
structure approach. - Diamond-Shaped A combination of the two above
structures.
47The Needs for Sampling
- Containing costs
- Speeding up the data gathering
- Improving effectiveness
- Reducing bias
48Sampling Design
- Four steps
- Determine the data to be collected or described
- Determine the population to be sampled
- Choose the type of sample
- Decide on the sample size
49Kinds of Information Sought in Investigation
- Type of hard data (other than interviewing
and observation) - Quantitative Data - Reports for decision making
- Performance reports
- Records
- Data capture forms
50Kinds of Information Sought in Investigation
- Qualitative Data
- Memos
- Signs in bulletin boards and in work areas
- Corporate websites
- Manuals
- Policy handbooks
51The Architectural Plan (Non-Functional
Requirements)
Relevant Architecture Concepts
Impacted by Requirements
- Client / Server Architecture
- Data Warehouse Parallel Database Technology
- RAID Technology
52Client / Server Architecture
Machine Configuration Example
53Client / Server architecture provides flexibility
to support different combinations of host machine
configurations.
3 Tier hosting configuration for Typical Queries
(or any process that makes a high volume of DB
calls) suffers in performance due to Network
Messaging overhead.
542 Tier hosting configuration supports the
performance requirements of Typical Queries
by eliminating 99.8 of Network Messaging
overhead.
55Data Warehouse Parallel Database Technologies
- Shared memory architecture (SMP)
- All the servers share all the data
- Shared nothing architecture (MPP)
- Each server has its own partition of data
56RAID Technology
- RAID 0 Sector interleave, no error checking (no
redundancy) - RAID 1 Mirroring (duplicate copy)
- RAID 2 Bit interleave with error correction
codes on multiple drives - RAID 3 Bit interleave with error correction on
single drive - RAID 4 Sector interleave with dedicated parity
drive - RAID 5 Sector interleave, parity stored on all
drives
57Tugas
- Jelaskan apa yang dimaksud dengan bussines
requirement? - Apa yang Anda lakukan dalam bussiness requirement?