Title: Building a data Warehouse in a Manufacturing Environment Brian Laks Alexey Leontovich
1Building a data Warehouse in a Manufacturing
EnvironmentBrian LaksAlexey Leontovich
2What is a Data Warehouse?
- Transactional data which has been reorganized
specifically for fast data retrieval. - A system which answers Business Intelligence (BI)
questions. - A set of tools to query, present, and analyze
information.
3Advantages of Data Warehouse
- Allows to query multiple databases in a unified
way. - Queries do not impact transactional performance.
- Quickly answers Business Intelligence questions.
- Generates answers to questions you may not have
thought of.
4Steps to Creating Data Warehouse
5Gathering Requirements
- In the classic software engineering sense, this
is the most important step - Identify Stakeholders and Experts
- Identify Key Business Intelligence questions from
the top down.
6Example BI Questions from Manufacturing
- What types of defects are for a particular part
number? - What are the top 10 defects for a particular
Manufacturing Order? - Which operators are finding the defects?
- Which Site has the most defects
7Designing the Data Warehouse
- Fact Tables Grow quickly. Represent Key BI
information. - What Kind of work was done.
- What kinds of problems were found
- Dimensions Slowly growing, represent
information about a fact or several fact tables - When was the work done
- Who did the work
8Convert ER Diagram in OLTP System to OLAP Cubes
Operator
Test Instance
Defect
Repair
Tracked History
Unit
Manufacturing Order
Part Number
T I m e
T I m e
T I m e
Defects
Defects
Defects
P art Number
Manufacturing Order
Operator
9Data Warehouse Schema
- Star Schema
- Single fact with many dimensions
- Snowflake schema adds additional dimensions about
dimensions.
Employee Dimension -employee key -Employee
Name -shift -Pay Rate
Station Dimension -Station key -Factory -division
-Line -Operation
Defect Fact Table -Employee Key -Part key -Route
key -Time Key -Station Key -MO Key -Defect
Code -Repair Code
Route Dimension -Route key -Part Number -Route
Step -Operation
Time Dimension -Time key -Hour -Shift -Day -Week -
Quarter
MO Dimension -MO key -Part key -MO start
Date -Promised Date
Part Dimension -Part key -Part Number -Part
Revision
10Schema Guidelines
- KISS Simple means faster
- Dimensions can have relationships between them
- Fact tables need not be normalized, but caution
should be used. - Relationships can exist between Fact tables but
not recommended. - Should be designed with the BI questions in mind.
11Extracting Data from OLTP DB
- Extracting required data from Online
Transactional Processes may be slow. - System may currently be in production
- OLTP DBs are designed for fast transactions, not
necessarily fast queries. - Queries become quite complex as Fact and
Dimensional data is extracted simultaneously.
Many joins typically exist in these queries - Data is loaded into a temporary location.
12Incremental Load Approach
- Must identify what has been already extracted, to
prevent duplication. - Older data can be further aggregated.
- Must provide rollback function in case of
failure. - Can update records much closer to real time with
smaller and more frequent loads.
13Transformations
- Changing of data from one type to another.
- Processing some data to mean something different.
- Often applications are written which manipulate
the data and do the transformations. - Data can be tested and validated in the temporary
Location.
14Loading the OLAP DB
- Always need to check for duplication of records.
- Rollback method may be beneficial.
- Complex system of updating existing records may
be needed. - Additional aggregation after loading.
15Testing
- Data Accuracy.
- Data retrieval speed.
- Does data allow answering of BI questions.
16Reporting
- Reports from properly designed data warehouse
are - Accurate
- Timely
- Fast
- Understandable
- Flexible (slice and dice)
- These reports answer the BI questions
17Analytical Reporting / ETL Tools
- Common Reporting
- Excel
- Crystal Reports
- Enterprise Reporting and Analysis / ETL
- Cognos
- Business Objects
- ETL
- Data Junction
- Ascential DataStage
- Ab Initio
- Informatica
- Data Mirror
- SQL server, Oracle, DB2/400
18(No Transcript)
19(No Transcript)
20(No Transcript)