Analyzing and Defining Business Requirements for a Data Warehouse - PowerPoint PPT Presentation

About This Presentation
Title:

Analyzing and Defining Business Requirements for a Data Warehouse

Description:

Transparencies for Chapter 1 of textbook Database Systems: A Practical Approach to Design, Implementation and Management – PowerPoint PPT presentation

Number of Views:948
Avg rating:3.0/5.0
Slides: 58
Provided by: ThomasCon63
Category:

less

Transcript and Presenter's Notes

Title: Analyzing and Defining Business Requirements for a Data Warehouse


1
Chapter 5
Data Warehouse Fundamentals
  • Analyzing and Defining Business
    Requirements for a Data Warehouse

Paul K Chen
1
2
Chapter 5- Objectives
  • Learn what is the definition of the business
    requirements
  • Understand the role of business dimensions
    related to DW business requirements
  • Learn specifically the steps in defining and
    recording DW business requirements
  • Review methods for gathering requirements (JAD,
    Interviews and Sampling)
  • Discuss briefly architecture concepts impacted by
    business requirements

3
Definition of The Business Requirements
  • The definition of requirements is the users
    statement of how he or she wants to do business,
    and the information required to support his or
    her new methods of operations.

4
Definition of The Business Requirements
  • The requirements can be broadly divided into two
    areas
  • 1 Functional requirementswritten in user
    terminology since it is user operations that are
    being described.
  • 2 Non-functional requirements these are the
    limitations and demands imposed upon the
    computing solutions such as architectural plan,
    data storage specifications and information
    system performance expectations.

5
Requirements As the Driving Force for Data
Warehousing
  • Understand why business requirements are the
    driving force
  • Discuss how requirements drive every development
    phase
  • Specifically Learn how requirements influence
    data design
  • Review the impact of requirements on architecture
  • Note the special considerations for ETL and
    metadata
  • Examine how requirements shape information
    delivery

6
Business Requirements As the Driving Force
Business Requirements
Maintenance
Planning Management
Deployment
Design Architecture Infrastructure Data
Acquisition Data Storage Information Delivery
Construction Architecture Infrastructure Data
Acquisition Data Storage Information Delivery
7
Dimensional Nature of Business Data
The business data of sales units (fact) is
measured and analyzed in three dimensional.
Product
Geography
Time
8
Examples of Business Facts and Dimensions
Manufacturing Company
Supermarket Chain
Supermarket Chain
Cust-ship-to
Time
Time
Promotion
Ship from
Sale Unit
Shipment
Ship Mode
Store
Product
Deal
Product
Airline Company
Insurance Business
Customer
Time
Agent
Time
Flight
Claim
Frequent Flyer Flights
Claims
Fare Class
Insured Party
Airport
Status
Status
Policy
9
Defining and Recording Information Requirements
for a Data Warehouse
  • Nine-Step Methodology includes the following
    steps
  • Step 1 Choosing the process
  • Step 2 Choosing the grain
  • Step 3 Identifying and conforming the dimensions
  • Step 4 Choosing the facts
  • Step 5 Storing pre-calculations in the fact
    table
  • Step 6 Rounding out the dimension tables
  • Step 7 Choosing the duration of the database
  • Step 8 Tracking slowly changing dimensions
  • Step 9 Deciding the query priorities and the
    query modes.

10
Step 1 Choosing The Process (Subject Area)
  • The process (function) refers to the subject
    matter of a particular data mart.
  • First data mart built should be the one that is
    most likely to be delivered on time, within
    budget, and to answer the most commercially
    important business questions.

11
Subject Area
  • Selecting the first subject area or areas to be
    populated
  • Use the enterprise level data model in
    selecting appropriate subject area(s)
  • Three Options
  • -- Implement a single subject area (best
    option)
  • -- Implement a subset of a subject area
  • -- Implement a subset of several subject areas
    (most common)
  • Determine how much data should be loaded and its
    variety

12
Step 2 Choosing The Grain
  • Decide what a record of the fact table is to
    represent.
  • Identify dimensions of the fact table. The grain
    decision for the fact table also determines the
    grain of each dimension table.
  • Also include time as a core dimension, which is
    always present in star schemas. Due to disk space
    constraint, data selected must be time relevant
    in terms of trend, predictability, and
    profitability for the enterprise.

13
Step 3 Identifying And Conforming The Dimensions
  • Dimensions set the context for asking questions
    about the facts in the fact table.
  • If any dimension occurs in two data marts, they
    must be exactly the same dimension, or one must
    be a mathematical subset of the other.
  • A dimension used in more than one data mart is
    referred to as being conformed (shared).

14
Step 4 Choosing The Facts
  • The grain of the fact table determines which
    facts can be used in the data mart.
  • Facts should be numeric and additive.
  • Unusable facts include
  • non-numeric facts
  • non-additive facts
  • fact at different granularity from other facts in
    table.

15
Fact Criteria
  • Weight the Fact attributes based upon the
    following
  • criteria
  • They exhibit measurable results to the Users and
    Management.
  • They are visible within the business and through
    management.
  • They are manageable.

16
Subject Area
  • Subject areas are collections of like data that
    support
  • analysis of the major subjects in a business.
    Election
  • criteria
  • They consist of two or more attributes.
  • They are essential to the successful operation of
    the target system or business area to meet client
    objectives.
  • They can be defined by governing business rules.

17
Step 5 Storing Pre-Calculations In The Fact Table
  • Once the facts have been selected each should be
    re-examined to determine whether there are
    opportunities to use pre-calculations.

18
ADD DERIVED DATA
  • Benefits
  • Less space used
  • Enhanced performance
  • Breaking_lease
  • Percentage_of_breaking_lease (lt 3_months)
  • Percentage_of_breaking_lease (gt3 but lt 6
    months)
  • Percentage_of_breaking_lease (gt6 but lt9
    months)
  • Percentage_of_breaking_lease (gt9 but lt12
    months)
  • Percentage_of_breaking_lease (gt 12 months)

19
Add Summarization Schemes
  • Simple summation
  • Summation by group
  • Aggregation
  • Vertical summarization

20
Simple Summation --Add Summarization Schema
Individual Daily sales Date Product Qty
Sales Jan 1 nuts 100 300 Jan 1
nuts 200 600 Jan 2 nuts 300
900 Jan 2 nuts 100 300 Jan 3
Nuts 50 150 Jan 3 Nuts 40
120
Daily Sales Summary Date Product Qty
Sales Jan 1 Nuts 300 900 Jan 2
Nuts 400 1,200 Jan 3 Nuts 90
20
21
Summation By Group
  • Group data attributes based on usage and
  • stability.
  • Group stable and slowly changing data all in one
    table
  • Group unstable and frequently changing data all
    in another table

22
Aggregation
  • Aggregation is used to create data marts.
  • For instance, a group of users frequently
    perform analysis comparing sales across
    geographic regions, broken by product line. If a
    data mart were created that stores the sales data
    already aggregated to the desired level, the
    users queries would be simpler.

23
Aggregation
Add up amounts by day in sql SELECT date, sum
(amt) FROM SALE GROUP BY date
p
sale
Store
date
amt
ans
date
sum
p1 p2 p1 p2
1 1 2 1
1 2 4 3
c1 c2 c3 c1
6 4
1 2
Roll Up Drill Down
24
Vertical Summarization
  • Summarization building upon a single dimensional
  • theme
  • Monthly renters
  • Total of all renters
  • Total of new renters
  • Total rental income
  • Monthly sales
  • Staff name
  • Total sales
  • Total houses sold

25
Step 6 Rounding Out The Dimension Tables
  • Text descriptions are added to the dimension
    tables.
  • Text descriptions should be as intuitive and
    understandable to the users as possible.
  • Usefulness of a data mart is determined by the
    scope and nature of the attributes of the
    dimension tables.

26
Step 7 Choosing The Duration Of The Database
  • Duration measures how far back in time the fact
    table goes. For ex. Insurance Tax
    Considerations.
  • Very large fact tables raise at least two very
    significant data warehouse design issues.
  • Often difficult to source increasing old data.
  • It is mandatory that the old versions of the
    important dimensions be used, not the most
    current versions, known as the Slowly Changing
    Dimension problem.

27
Step 8 Tracking Slowly Changing Dimensions
  • Slowly changing dimension problem means that the
    proper description of the old dimension data must
    be used with old fact data.
  • Often, a generalized key must be assigned to
    important dimensions in order to distinguish
    multiple snapshots of dimensions over a period of
    time.

28
Step 9 Deciding The Query Priorities And The
Query Modes
  • Most critical physical design issues affecting
    the end-users perception includes
  • physical sort order of the fact table on disk
  • presence of pre-stored summaries or aggregations.
  • Additional physical design issues include
    administration, backup, indexing performance, and
    security.

29
Criteria For Assessing The Dimensionality Of A
Data Warehouse
  • Criteria proposed by Ralph Kimball to measure the
    extent to which a system supports the dimensional
    view of data warehousing.
  • Twenty criteria divided into three broad groups
    architecture, administration, and expression.

30
Architectural Criteria
  • Architectural criteria describes way the entire
    system
  • is organized.
  • Explicit declaration
  • Conformed dimensions and facts
  • Dimensional integrity
  • Open aggregate navigation
  • Dimensional symmetry
  • Dimensional scalability
  • Sparsity tolerance

31
Administration Criteria
  • Administration criteria are considered to be
    essential
  • to the smooth running of a dimensionally-oriente
    d
  • data warehouse.
  • Graceful modification
  • Dimensional replication
  • Changed dimension notification
  • Surrogate key administration
  • International consistency

32
Expression Criteria
  • Expression criteria are mostly analytic
    capabilities
  • that are needed in real-life situations.
  • Multiple-dimension hierarchies
  • Ragged-dimension hierarchies
  • Multiple valued dimensions
  • Slowly changing dimensions
  • Roles of a dimension
  • Hot-swappable dimensions
  • On-the-fly fact range dimension
  • On-the-fly behaviour dimension

33
Business Requirements (Use Automaker Sales as an
example)
  • In order to get an idea of the data to be used by
    the sales and
  • Inventory department, a facilitation session was
    held with
  • 15 key end users and the IT data warehouse team.
    The
  • following business questions were generated from
    that
  • meeting
  • What is the sales trend in quantity and dollar
    amounts sold each Make, Model, Series and Color
    for a specific dealer, for each

34
Matching User Requirements to DW Data
Requirements (Develop Fact Table)
  • Primary Key
  • dealer_id
  • month_year
  • sales_area_id
  • make
  • model
  • series

35
Determine Dimensions Attributes
  • Dimensions
  • sales_area_dim
  • sales_time_dim
  • dealer_dim
  • Attributes
  • dealer_mms_sales_qty
  • dealer_mms_sales_dollar_amt
  • dealer_ytd_mms_sales_qty
  • dealer_ytd_mms_sales_amt
  • dealer_inventory_qty

36
Determine Dimensions Attributes
37
Collecting The Business Requirements via JAD
Sessions
  • JAD (Joint Application Development) vs.
    Traditional Way of
  • Gathering Requirements
  • JAD sessions (also called facilitated session)
    are used to gather information and feedback and
    confirm the results of requirements gathering.
  • JAD sessions replace the traditional way of
    conducting a series of interviews on a one-to-one
    basis with the users.
  • Advantages Achieving consensus during the
    session when
  • multiple sources of information exist, raising
    and
  • addressing issues or assigning them for
    resolution, and
  • immediately confirming information.

38
JAD Session
  • JAD sessions are used to scope the project. Each
    session should last two to three day. They are
    very focused and fast-paced.
  • JAD sessions can be very formal and follow strict
    guidelines or be informal group sessions.

39
JAD - Roles
  • Whether they are formal or informal, there are
    four
  • necessary roles to be filled
  • Facilitator
  • The Facilitator is the session leader. It is the
    facilitators
  • responsibility to ensure that the objectives of
    the sessions
  • are met.
  • Scribes(s)
  • Scribes are responsible for recording the
    minutes of the
  • session and optionally constructing deliverables
    using an
  • automated tool as the session progresses.

40
JAD - Roles
  • User
  • The users provide knowledge specific to the
    scope of the
  • project.
  • Developers
  • Developers are the team members who will be
    building
  • the system.

41
JAD Session
  • The session is divided into three segments
  • Introduction Welcoming remarks description of
    the facilities such as rest room locations,
    messages, reviewing the agenda and setting
    expectations.
  • Conducting the session To confirm deliverables
    set out in the session objectives.
  • Wrapping up the session By summarizing progress
    towards the objectives reviewing the agenda for
    the next one and obtaining feedback from the
    participants.

42
JAD Session
  • Potential drawbacks
  • The commitment of a large block of time for all
    participants
  • Requirements collected could be less than
    satisfactory due to unpredictability of the JAD
    session or organizational culture not
    sufficiently developed to enable the concerted
    efforts required to be productive in a JAD
    setting.

43
Five Steps in Interview Preparation
  • Reading background material
  • Establishing interview objectives
  • Deciding when to interview
  • Preparing the interviewee
  • Deciding on question type and structure

44
Two Types of Questions Open-End Questions vs.
Closed Questions
  • Open-ended interview questions
  • Open describes the interviewees options for
    responding.
  • They are open.
  • Advantages
  • Putting the interviewee at ease
  • Allowing more spontaneity
  • Disadvantages
  • Possibly losing control of the interview
  • May not get the types of answers you want

45
Two Types of Questions Open-End Questions vs.
Closed Questions
  • Closed interview questions
  • Such as How many subordinates do you have?
  • Benefits
  • Getting to relevant data
  • Keeping control over the interview
  • Drawbacks
  • Failing to obtain rich detail
  • Intimidating the interviewee

46
Three Basic Ways of Structuring Interviews
  • Pyramid Structure Starting from closed
    questions, then gradually expand into open
    territory.
  • Funnel Structure The reverse of pyramid
    structure approach.
  • Diamond-Shaped A combination of the two above
    structures.

47
The Needs for Sampling
  • Containing costs
  • Speeding up the data gathering
  • Improving effectiveness
  • Reducing bias

48
Sampling Design
  • Four steps
  • Determine the data to be collected or described
  • Determine the population to be sampled
  • Choose the type of sample
  • Decide on the sample size

49
Kinds of Information Sought in Investigation
  • Type of hard data (other than interviewing
    and observation) - Quantitative Data
  • Reports for decision making
  • Performance reports
  • Records
  • Data capture forms

50
Kinds of Information Sought in Investigation
  • Qualitative Data
  • Memos
  • Signs in bulletin boards and in work areas
  • Corporate websites
  • Manuals
  • Policy handbooks

51
The Architectural Plan (Non-Functional
Requirements)
Relevant Architecture Concepts
Impacted by Requirements
  • Client / Server Architecture
  • Data Warehouse Parallel Database Technology
  • RAID Technology

52
Client / Server Architecture
Machine Configuration Example
53
Client / Server architecture provides flexibility
to support different combinations of host machine
configurations.
3 Tier hosting configuration for Typical Queries
(or any process that makes a high volume of DB
calls) suffers in performance due to Network
Messaging overhead.
54
2 Tier hosting configuration supports the
performance requirements of Typical Queries
by eliminating 99.8 of Network Messaging
overhead.
55
Data Warehouse Parallel Database Technologies
  • Shared memory architecture (SMP)
  • All the servers share all the data
  • Shared nothing architecture (MPP)
  • Each server has its own partition of data

56
RAID Technology
  • RAID 0 Sector interleave, no error checking (no
    redundancy)
  • RAID 1 Mirroring (duplicate copy)
  • RAID 2 Bit interleave with error correction
    codes on multiple drives
  • RAID 3 Bit interleave with error correction on
    single drive
  • RAID 4 Sector interleave with dedicated parity
    drive
  • RAID 5 Sector interleave, parity stored on all
    drives

57
Tugas
  • Jelaskan apa yang dimaksud dengan bussines
    requirement?
  • Apa yang Anda lakukan dalam bussiness requirement?
Write a Comment
User Comments (0)
About PowerShow.com