Dimensional Modeling Overview - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Dimensional Modeling Overview

Description:

Dimensional Modeling Overview * * * * You need to consider two factors ,user requirements and the realities of your data,in tandem to decide on these key design ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 57
Provided by: docserveF
Category:

less

Transcript and Presenter's Notes

Title: Dimensional Modeling Overview


1
Dimensional Modeling Overview
2
Agenda
  • Dimensional Modeling Overview
  • Dimensional Modeling Steps
  • Dimensional Modeling Framework
  • Retail Sales - Case Study
  • Dimensional Modeling Tips

3
(No Transcript)
4
From Requirement to Data Design
  • The requirements definition completely drives the
    Data design for the DW.
  • Data design consists of putting together the data
    structures.
  • A group of data elements form a data structure.
  • Logical data design includes determination of the
    various data elements that are needed and
    combination of the data elements into structures
    of data and also establishing relationships among
    the data structures.

5
Design Decisions
  • Choosing the process.
  • Selecting the subjects from the information
    packages for the first set of logical structures
    to be designed.
  • Choosing the grain.
  • Determining the level of detail for the data
    structures.
  • Identifying and conforming the dimensions.
  • Choosing the business dimensions (such as
    product, market, time, etc.) to be included in
    the first set of structures .
  • Choosing the facts.
  • Selecting the metrics or units of measurements
    (such as product sale units, dollar sales, dollar
    revenue, etc.) included in set of structures.
  • Choosing the duration of the database.
  • Determining how far back in time you should go
    for historical data.

6
Dimensional Modeling Overview
  • What is Dimensional Modeling
  • It is a logical design technique to structure the
    business dimensions and the metrics that are
    analyzed along these dimensions.
  • A logical design technique that seeks to present
    the data in a standard framework that is
    intuitive and allows for high performance access
  • It is inherently dimensional and adheres to a
    discipline that uses relational model with some
    important restrictions
  • The fundamental idea of dimensional modeling is
    that nearly every type of business data can be
    represented as a kind of cube of data
  • The model has also proved to provide high
    performance for queries and analysis

7
Dimensional Modeling Overview
  • Components of Dimensional Model
  • Fact Table
  • The fact table contains facts or measurements of
    the business
  • Dimension Table
  • The dimension tables contain textual attributes
    that describe the facts

8
Dimensional Modeling Overview Sample Report
Translation

Sales Rep Performance Report Sales Rep Performance Report Sales Rep Performance Report Sales Rep Performance Report Sales Rep Performance Report
Central Region        
      Jul-00 Aug-00
      (Dollars) (Dollars)
Chicago District Chicago District Chicago District 879 878
Adams Adams Adams 345 456
Brown Brown Brown 564 565
Frederickson Frederickson Frederickson 657 768
         
Minneapolis District Minneapolis District Minneapolis District 890 789
Andersen Smith Andersen Smith Andersen Smith 909 978
Central Region Total Central Region Total Central Region Total 4244 4434
Facts
Dimension
Report,row and column heading
Numeric report values
9
Dimensional Modeling Overview
  • Dimension Tables
  • Dimension Tables are the entry points into the
    data warehouse
  • Dimension tables are designed especially for
    selection and grouping under a common head
  • Determine contextual background for facts
  • Parameters for OLAP
  • Common Dimensions
  • Date
  • Product
  • Location/Region
  • Customers
  • The dimensional model to represent the
    information contained in the information package,
    the data structure must be represent with
    Metrics, business dimensions and attributes for
    each business dimension.

10
Dimensional Modeling Overview
  • Dimension Tables
  • Dimension Table Characteristics
  • Serve as report labels and query constraints
  • By words
  • Where clauses
  • Provide Descriptive Information
  • Minimal codes
  • Embedded meaning as attributes
  • Represent hierarchical relationships
  • Let see Product business dimension example,
  • When we want to analyze the fact by products.

11
Dimensional Modeling Overview
Product Dimension Table with Sample Rows
Product Key
Product Desc
Size
Brand Desc
Class
CHEERIOS
0001
CHEERIOS 10 OZ
Family
10 OZ
0002
CHEERIOS 24 OZ
CHEERIOS
Family
24 OZ
0003
LUCKY CHARMS 10 OZ
LUCKY CHARMS
Kids
10 OZ
12
Big Dimensions
Retail_Fact
time_Key
Store_Key
Customer_Key
Product_key
Sales_dollars
Units_sold
Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
13
Big Dimensions
Customer Dimension
Customer_key
Customer_ID
Customer_name
Customer_address
Date_of_birth
Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
Customer Demographics Dimension
Customer_demographics_key
Age_Band
Gender
Income_Band
Number_of_children
Marital_status
14
Dirty dimensions
  • A dirty dimension is the one in which data
    quality cannot be guaranteed
  • Data about the same customer can appear multiple
    times

Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
15
Hierarchies in Dimensions
  • Multiple Hierarchies
  • Dimension tables can represent multiple
    hierarchies roll-ups
  • For example ,Store Dimension could have
  • the following hierarchies
  • Physical Geography
  • Zip, City, County, State, Country
  • Sales Organization
  • District, Region, Zone
  • Distribution Roll-up
  • Distribution Center, Distribution Center Region

Store Dimension
Store_key
Store_description
Store_type
Zip
City
State
Sales_region
Sales_zone
Distribution Center
Distribution Center Region
16
Hierarchies in Dimensions
Dimension Tables can represent multiple
hierarchical roll-ups
17
Dimensional Modeling Overview
  • Fact Table
  • The fact table is where the numerical
    measurements of the business are stored
  • Facts
  • The detail information in a Fact tables
  • For Examples Sales Quantity, Unit Sales Price,
    Sales Amount etc.
  • Key performance indicators of the business
  • Numeric in Nature
  • Analyzed across the dimensions
  • Multi-part key
  • Foreign keys to dimension tables
  • Date is always a key

Sales Fact
DATE KEY
ITEM KEY
STORE KEY
PROMOTION KEY
POS TRXN
Sales Quantity
Unit Sales Price
Sales Amount
18
Dimensional Modeling Overview
  • So far we have formed fact table and dimension
    tables.
  • How should these tables be arranged in the
    dimensional model?
  • What are the relationships and how should we mark
    the relationships in the model?
  • The dimensional model should primarily facilitate
    queries and analyses. What would be the types of
    queries and analyses?
  • Before combining these tables in dimensional
    model. What are the requirements?
  • The model should provide the best data access.
  • The whole model must be query-centric.
  • It must be optimized for queries and analysis.
  • The model must show that the dimension tables
    interact with the fact table.
  • It should structured in such a way that every
    dimension can interact equally with the fact
    table.
  • The model should allow drilling down or rolling
    up along dimension hierarchies.

19
  • With this rqts., each of the dimension tables are
    directly relates to fact table in the middle.
  • Such an arrangement in the dimensional model
    looks like a star formation, with the fact table
    at the core of a star and the dimension tables
    along the spikes of the star.
  • The dimensional model is therefore called a STAR
    schema.
  • See figure

20
Star like Model
Dimension2
Dimension3
Fact
Dimension1
Dimensionn
21
STAR schema for AUTO Sales
PRODUCT
AUTO SALES
TIME
DEALER
PAYMENT METHOD
CUSTOMER DEMO GRAPHICS
22
Typical Star model
Product Dimension
Product_key
Description
Brand
Category
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Sales Fact
Date_key
Product_key
Store_key
Dollars_sold
Units_sold
Dollars_cost
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
23
Star model..
  • It consists of sales fact table in the middle of
    schema diagram. It have 3 dimension tables of
    Date, Product and Store.
  • The user will analyze the sales using dollar
    sold, unit sold and dollar cost.
  • From the STAR schema structure intuitively
    answers the questions for a given amount of
    dollars, what was the product sold? Who was the
    customer? Which store sold the product? When was
    the order placed?
  • Constraints and filters of queries are easily
    understood by looking at the star.
  • A common type of analysis is the drilling down
    the summary numbers to get at the details at the
    lower levels by filtering queries.

24
Snowflake Design
Low cardinality redundant attributes moved to
sub dimension tables
25
Snowflake Design
  • Issues
  • Only few tools optimized for snowflake schema
  • When the tool is not optimized for snowflake
    design
  • Presentation more complex
  • Browsing is slower
  • Problems with multiple joins

26
Snowflake Design
27
Dimensional Modeling Steps
  • Identify the Business Process
  • A major operational process that is supported by
    some kind of legacy system(s) from which data can
    be collected for the purpose of the data
    warehouse
  • Example orders, invoices, shipments, inventory,
    sales
  • Identify the Grain
  • The fundamental lowest level of data represented
    in a fact table for the business process
  • Example individual transactions, individual
    daily snapshots
  • Identify the Dimensions
  • Choose the dimensions that will apply to each
    fact table record
  • Identify the Facts
  • Choose the measured facts that will populate each
    fact table record

28
Dimensional Modeling Steps
Key Inputs
Dimensional Modeling Steps
Output
Business Requirements
  1. Identify the business Process
  2. Identify the Grain
  3. Identify the Dimensions
  4. Identify the Facts

Dimensional Model
Data Reality
Resist the temptation to model data by looking
copy books alone
29
Shared Dimensions Must Conform
Conformed Dimension
  • Option 1 Identical dimensions with the same
    keys, labels, definitions and values

DATE KEY
ITEM KEY
STORE KEY
PROMO KEY
Sales Fact
Item Key
Item Desc.
Brand Desc.
Category
..
Sales Schema
Inventory Schema
DATE KEY
ITEM KEY
STORE KEY
Inventory Fact
Item Key
Item Desc.
Brand Desc.
Category
..
30
Conformed Dimension
  • Option 2 Subset of base dimension

Sales Schema
DATE KEY
ITEM KEY
STORE KEY
PROMO KEY
Sales
Item Key
Item Desc.
Brand Desc.
Category
Desc.
..
DATE KEY
Day-of-week
Week Desc
Month Desc
Item key Item Desc Brand Desc
Category Desc 0001 Cheerios 10oz
Cheerios Cereal
Forecast Schema
Month Key
Brand Key
Estimate
Sales
Brand Key
Brand Desc.
Category
Desc.
..
Month KEY
Month Desc
Brand key Brand Desc Category Desc 1001
Cheerios Cereal
31
Slowly Changing Dimensions
  • Dimension attributes evolve over time
  • For example, customers change their names, move,
    have children, adjust their Incomes
  • For every dimension attribute, need to identify
    Changes strategy
  • May use combination of strategies within same
    dimension table

32
Slowly Changing Dimensions
Type 1 Overwrite the changed
attributes Original record Item Key Item
Desc Dept 12345 Sim City 3000 Educational
S/W Updated record Item Key Item
Desc Dept 12345 Sim City 3000 Strategy S/W
33
Slowly Changing Dimensions
Type 2 Add a New Dimension Record Original
record Item Key Item Desc Dept 12345 Sim City
3000 Educational S/W Additional record Item
Key Item Desc Dept 12345 Sim City
3000 Strategy S/W
34
Slowly Changing Dimensions
Type 3 Add a Prior Attribute Original
record Item Key Item Desc Dept 12345 Sim City
3000 Educational S/W Updated record Item
Key Item Desc Dept Prior Dept 12345 Sim
City 3000 Strategy S/W Educational S/W
35
Slowly Changing Dimensions
  • Data Warehouse Keys ie., STAR schema keys
  • All tables (facts and dimensions) should use Data
    Warehouse generated surrogate keys
  • It is possible that the customer no., of
    discontinued customers are reassigned to new
    customers. We will have a problem because the
    same customer no,. Could relate to the data for
    the newer customer and also to the data of the
    retried customer. Therefore do not use such keys
    as a primary keys for dimensional tables.
  • The surrogate keys are simply system generated
    sequence numbers.
  • Each row in a dimension table is identified by a
    unique value of an attribute designated as the
    primary key of the dimension.
  • Each dimension table is in 1M relationship with
    central fact table. So the primary key of each
    dimension table must be a foreign key in the fact
    table.

36
Additive Measures
  • The ability of measures to be added across all
    dimensions of the fact table.
  • Measures could be fully additive, semi additive
    or non additive
  • Fully Additive - The values of the attributes
    summed up by simple addition, Aggregation is a
    fully additive measures is done by simple
    addition. Sales Quantity, Revenue
  • Semi Additive - Account Balance, Inventory,
    number of customers (Measure of Intensity, head
    counts)
  • Non-Additive - Profit margin (Ratios and
    Percentages) i.e., Ratio or Percentages should
    not be added 13, 30, etc.

37
Factless Fact tables to track attendance of
students
Time Dimension
time_key
time attributes..
Student Dimension
Student_key
Student attributes..
Student Attendance Fact
time_Key
Student_Key
Course_Key
Faculty_Key
Facility_key
Course Dimension
Course_key
Course attributes..
Faculty Dimension
Faculty_key
Faculty attributes..
Facility Dimension
Facility_key
Facility attributes..
38
Factless Fact tables Coverage tables
Sales Fact (revisited)
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Product Dimension
Product_key
Description
Brand
Category
Sales Fact
Date_key
Product_key
Promotion_key
Store_key
Dollars_sold
Units_sold
Dollars_cost
Promotion Dimension
Promotion_key
Promotion_name
Discount
Event
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
39
Factless Fact tables Coverage tables
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Product Dimension
Product_key
Description
Brand
Category
Sales Fact
Date_key
Product_key
Promotion_key
Store_key
Promotion Dimension
Promotion_key
Promotion_name
Discount
Event
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
40
(No Transcript)
41
Dimensional Modeling Framework
Identify Subject Area, Grain
Conceptual Level
Identify Major Dimension Facts, Conform
Dimensions across Facts
Detail Facts with Measures
Level of detail
Logical Level
  • Detail Dimensions with Hierarchies Attributes
  • Slowly changing Dimensions Policies

Source-Data Model Mapping
Physical Level
Pre-calculations, Aggregates, Indexes, Data
Structures, Source-Physical Model Mapping
42
Dimensional Modeling Framework
  • STEP1 Choose the process
  • Chose a process or subject area from the list of
    subject areas identified
  • Examples of this could be Sales Analysis,
    Strategic Sourcing, Human Resources
  • STEP2 Choose the Grain
  • Choose the level of detail
  • Every data mart / warehouse should be based on
    the most granular (atomic) data that can possibly
    be collected and stored.

43
Dimensional Modeling Framework
  • STEP3 Identifying Dimensions Dimension
    Hierarchy

44
Dimensional Modeling Framework
  • STEP4 Choose the Measures
  • Identify all the measures for the fact table
  • STEP5 Conforming the dimensions
  • Common dimensions across the Facts/ data marts
    have to be exactly same or subset of the main
    dimension table

45
Dimensional Modeling Framework
  • STEP6 Adding Attributes to Dimension Tables
  • Enhance the depth of analysis
  • Examples Customer Age, Address, Profession,
    Product color, flavor, product size, packaging
    type etc.

46
Dimensional Modeling Framework
  • STEP7 Storing Pre-calculations in the Fact
    table
  • Calculated based on one or more base measures
  • STEP8 Choosing the Duration of the Database
  • Need for analyzing the data over a period of time

47
Dimensional Modeling Framework
  • STEP9 Track Slowly Changing Dimensions
  • Certain kinds of dimension attribute changes need
    to be handled differently in Data Warehouse
  • Type I Overwrite
  • Type II - History
  • Type III Add new column example -
    Organizational changes

48
(No Transcript)
49
Retail Sales - Case Study
  • Background
  • Chain consists of over 100 grocery stores in five
    states
  • Stores average 60,000 SKUs in departments such as
    frozen foods, dairy etc.
  • Bar codes are scanned directly into the cash
    registers PoS system
  • Items are promoted via coupons, temporary price
    reductions, ads and in-store promotions
  • Analytical Requirements
  • Need to know what is selling in store each day in
    order to evaluate product movement, as well as to
    see how sales are impacted by promotions
  • Need to understand the mix of items in a consumer
    market basket

50
Retail Sales - Case Study
  • Identify the Business Process
  • Identify the Grain
  • Identify the Dimensions
  • Identify the Measures
  • Sales
  • Transaction Item (Daily Sales)
  • Date, Location, Item, Promotion
  • Quantity, Price, Amount

51
Retail Sales - Case Study
Resultant Sales Schema
52
Retail Sales - Case Study
Time Dimension Time Dimension Time Dimension
Date Key Date Day of Week Day Number in Month Month Quarter Year Holiday Indicator
1 1/1/1999 Friday 1 January Q1 1999 Holiday
2 1/2/1999 Saturday 2 January Q1 1999 Non-Holiday
3 1/3/1999 Sunday 3 January Q1 1999 Non-Holiday
4 1/4/1999 Monday 4 January Q1 1999 Non-Holiday

Item Dimension Item Dimension Item Dimension
Itemkey Item Description SKU Number Dept Size Package Type Brand Category
1 Lasagna 6 OZ 90706287103 Grocery 6 OZ Box Cold Gourmet Frozen Foods
2 Beef Stew 6 OZ 16005393282 Grocery 6 OZ Box Cold Gourmet Frozen Foods
3 Extra Nougat 2 OZ 46817560065 Grocery 6 OZ Can Chewy Candy
53
Retail Sales - Case Study
Promotion Dimension Promotion Dimension Promotion Dimension
Promo Key Promo Name Price Reduction Ad Type Media Type Promo Begin Date End Date
1 Blue Ribbon Discounts Temporary Daily Paper Paper 2000 1/1/1999 1/15/1999
2 Red Carpet Closeout Markdown Sunday Paper Paper 1000 1/3/1999 1/10/1999
3 Ad Blitz None Paper and Radio Paper and Radio 7000 1/15/1999 1/30/1999

Sales Fact Sales Fact
Date Key Item Key Store Key Promo Key POS Trxn Sales Qty Unit Sales Price Sales Amt
1 1 1 15 763457893 1 4.59 4.59
1 2 1 1 763457893 2 0.89 1.78
1 5 11 19 763457894 1 2.56 2.56
2 13 5 8 763457923 1 0.33 0.33
2 5 11 12 763457998 1 1.29 1.29
54
(No Transcript)
55
Dimensional Modeling Tips
  • Carefully choose the labels to identify data
    marts, dimension, attributes and facts
  • An attribute can live in one and only one
    dimension, whereas a fact can be repeated in
    multiple fact tables
  • If a single dimension appears to reside in more
    than one places, several roles are probably being
    played. Name the roles uniquely and treat them as
    separate dimensions
  • A single field in the underlying source data can
    have one or more logical columns associated with
    it
  • E.g., A product attribute field may translate to
    product code, product short description, and
    product long description
  • Every fact should have a default aggregation rule
    (sum, min, max, latest, semi-additive, special
    algorithm, and not aggregatable)
  • This will serve as a requirements list for query
    and report writers tools evaluations

56
Thank You
  • References
  • Ralph Kimball
  • The Data Warehouse Toolkit
  • The Data Warehouse Lifecycle Toolkit
Write a Comment
User Comments (0)
About PowerShow.com