Zhangxi Lin - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Zhangxi Lin

Description:

Case - Adventure Works Cycles (AWC) data warehousing. Course Structure ... Microsoft Visual Studio .NET is the fundamental tool for application development ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 48
Provided by: zlinB
Category:

less

Transcript and Presenter's Notes

Title: Zhangxi Lin


1
ISQS 6339 Business IntelligenceData Warehousing
Approaches
  • Zhangxi Lin
  • Texas Tech University

2
Outline
  • Microsoft SQL Server for data warehousing
  • Data warehousing methodology
  • Case - Adventure Works Cycles (AWC) data
    warehousing
  • Course Structure

3
Microsoft SQL Server for Data Warehousing
4
Unified Dimensional Model (UDM)
  • A SQL Server 2005 technology
  • A UDM is a structure that sits over the top of a
    data mart and looks exactly like an OLAP system
    to the end user.
  • Advantages
  • No need for a data mart.
  • Can be built over one or more OLTP systems.
  • Mixed data mart and OLTP system data
  • Can include data from database from other vendors
    and XML-formatted data
  • Allows OLAP cubes to be built directly on top of
    transactional data
  • Low latency
  • Ease of creation and maintenance

5
Microsoft BI Toolset
  • Relational engine (RDBMS)
  • T-SQL
  • .NET Framework Command Language Runtime (CLR)
  • SQL Server Integration Services (SSIS) ETL
  • Data Transformation Pipeline (DTP)
  • Data Transformation Runtime (DTR)
  • SQL Server Analysis Service (SSAS) queries, ad
    hoc use, OLAP, data mining
  • Multi-Dimensional eXpressions (MDX) a scripting
    language for data retrieval from dimensional
    database
  • Dimension design
  • Cube design
  • Data mining
  • SQL Server Reporting Services (SSRS) ad hoc
    query, report building
  • Microsoft Visual Studio .NET is the fundamental
    tool for application development

6
Structure and Components of Business Intelligence
MS SQL Server 2005
SSMS
SSIS
SSAS
BIDS
SSRS
SAS EG
SAS EM
7
Understanding the Cube Designer Tabs
  • Cube Structure Use this tab to modify the
    architecture of a cube.
  • Dimension Usage Use this tab to define the
    relationships between dimensions and measure
    groups, and the granularity of each dimension
    within each measure group.
  • Calculations Use this tab to examine
    calculations that are defined for the cube, to
    define new calculations for the whole cube or for
    a subcube, to reorder existing calculations, and
    to debug calculations step by step by using
    breakpoints.
  • KPIs Use this tab to create, edit, and modify
    the Key Performance Indicators (KPIs) in a cube.
  • Actions Use this tab to create or modify
    drillthrough, reporting, and other actions for
    the selected cube..
  • Partitions Use this tab to create and manage the
    partitions for a cube. Partitions let you store
    sections of a cube in different locations with
    different properties, such as aggregation
    definitions.
  • Perspectives Use this tab to create and manage
    the perspectives in a cube. A perspective is a
    defined subset of a cube, and is used to reduce
    the perceived complexity of a cube to the
    business user.
  • Translations Use this tab to create and manage
    translated names for cube objects, such as month
    or product names.
  • Browser Use this tab to view data in the cube.

8
Data Warehousing Methodology
9
Dimensional Modeling Process
  • High level dimensional model design
  • Choosing business model
  • Declaring the grain
  • Choosing dimensions
  • Identifying the facts
  • Detailed dimensional model development
  • Dimensional model review and validation
  • IS
  • Core users
  • Business community
  • Final design iteration

10
Warehouse Development Approaches
  • Big bang approach
  • Incremental approach
  • Top-down incremental approach
  • Bottom-up incremental approach

11
Big Bang Approach
12
Top-Down Approach
  • Analyze requirements at the enterprise level
  • Develop conceptual information model
  • Identify and prioritize subject areas
  • Complete a model of selected subject area
  • Map to available data
  • Perform a source system analysis
  • Implement base technical architecture
  • Establish metadata, extraction, and load
    processes for the initial subject area
  • Create and populate the initial subject area
    data mart within the overall warehouse
  • framework

13
Bottom-Up Approach
  • Define the scope and coverage of the data
    warehouse and analyze the source systems
    within this scope
  • Define the initial increment based on the
    political pressure, assumed business benefit
    and data volume
  • Implement base technical architecture and
    establish metadata, extraction, and load
    processes as required by increment
  • Create and populate the initial subject
    areas within the overall warehouse framework

14
Incremental Approach to Warehouse Development
  • Multiple iterations
  • Shorter implementations
  • Validation of each phase

Increment 1
Strategy
Definition
Analysis
Design
Build
Iterative
Production
15
Data Warehousing Process Components
  • Methodology
  • Architecture
  • Extraction, Transformation, and Load (ETL)
  • Implementation
  • Operation and Support

16
Methodology
  • Ensures a successful data warehouse
  • Encourages incremental development
  • Provides a staged approach to an enterprisewide
    warehouse
  • Safe
  • Manageable
  • Proven
  • Recommended

17
Architecture
  • Provides the planning, structure, and
    standardization needed to ensure integration of
    multiple components, projects, and processes
    across time.
  • Establishes the framework, standards, and
    procedures for the data warehouse at an
    enterprise level.
  • The Data Warehousing Institute

18
Extraction, Transformation, and Load (ETL)
  • Effective data extract, transform and load (ETL)
    processes represent the number one success factor
    for your data warehouse project and can absorb up
    to 70 percent of the time spent on a typical data
    warehousing project.
  • DM Review, March 2001

19
Implementation
Data Warehouse Architecture
Ex., Incremental Implementation
Implementation
Increment 1
Increment 2
Increment n
20
Operation and Support
  • Data access and reporting
  • Refreshing warehouse data
  • Monitoring
  • Responding to change

21
Phases of theIncremental Approach
  • Strategy
  • Definition
  • Analysis
  • Design
  • Build
  • Production

Increment 1
22
Strategy Phase Deliverables
  • Business goals and objectives
  • Data warehouse purpose, objectives, and scope
  • Enterprise data warehouse logical model
  • Incremental milestones
  • Source systems data flows
  • Subject area gap analysis

23
Strategy Phase Deliverables
  • Data acquisition strategy
  • Data quality strategy
  • Metadata strategy
  • Data access environment
  • Training strategy

24
Data Warehouse Case - AWC
25
Adventure Works Cycles (AWC)
  • A fictitious multinational manufacturer and
    seller of bicycles and accessories
  • Based on Bothell, Washington, USA and has
    regional sales offices in several countries
  • http//www.msftdwtoolkit.com/

26
Basic Business Information
  • Product orders by category
  • Product Orders by Country/Region
  • Product Orders by Sales Channel
  • Customers by Sales Channel Snapshot

27
AWC Business Requirements - Interview summary
  • Interviewee Brian Welker, VP of Sales
  • Sales to resellers 37 million last year
  • 17 people report to him including 3 regional
    sales managers
  • Previous problem Hard to get information out of
    the companys system
  • Major analytic areas
  • Sales planning
  • Growth analysis
  • Customer analysis
  • Territory analysis
  • Sales performance
  • Basic sales reporting
  • Price lists
  • Special offers
  • Customer satisfaction
  • International support
  • Success criteria
  • Easy data access, Flexible reporting and
    analyzing, All data in one place
  • Whats missing? A lot No indication of
    business value

28
Business Processes
  • Purchase Orders
  • Distribution Center Deliveries
  • Distribution Center Inventory
  • Store Deliveries
  • Store Inventory
  • Store Sales

29
Analytic Themes
  • See the Excel file AW_Analytic_Themes_List.xls

30
AWCs Bus Matrix
31
Prioritization Grid
High
Customer Profitability
Orders
Orders Forecast
Product Profitability
Business Value / Impact
Call Tracking
Exchange Rates
Returns
Manufacturing Costs
Feasibility
Low
High
Low
32
Business Dimensional Lifecycle
Business Reqts definition
Technical Arch. Design
Product Selection Installation
Growth
Dimensional Modeling
Physical Design
ETL design Development
Deployment
Project Planning
BI Appl. Specification
BI Application Development
Maintenance
Project Management
33
Sub-activities before project requirement
Enterprise (horizontal) Business Requirements Def
inition
Enterprise (business process) Business Requireme
nts Definition
Requirements Prioritization Process
Initial Project Scope
Project Planning
Project management
34
Enterprise requirements definition process
Preparation
Conduct Business And IT Interview
Use Data Profiling To Research Data Sources as
Needed
Write up Interview Summaries with Analytic Themes
Identify Business Process from Analytic Themes
Build Initial Bus Matrix
Conduct Senior Management Prioritization Session
Write up the Overall Requirements
Definition Document
35
ISQS 6339 Data Management Business
IntelligenceCourse Structure Review

36
Outline
  • Objectives
  • Components of this course
  • How this course is conducted
  • How to do better in this course

37
Objectives
  • To learn the general principles of business
    intelligence
  • To be able to design and construct a data
    warehouse/data mart
  • To develop the data processing skills,
    specifically ETL system implementation.
  • To be able to perform OLAP tasks
  • To develop general data preparation skills for
    analytic tasks

38
Technical skills
  • Data warehousing with Microsoft SQL Server 2005
  • Data analysis with Enterprise Guide
  • Basic data processing by SAS programming

39
Components of the course
  • Principles of data warehousing
  • Dimensional data model
  • Information integration and flow design in the
    ETL (extraction, transformation, and loading)
    process 
  • Online analytical processing (OLAP)
  • Data reporting and query techniques 
  • Data preparation for analytics

40
How this course is conducted
  • Lectures in basic principles of data warehousing
    and data analysis
  • Students are expected to read the textbook to
    enhanced the knowledge
  • In-class exercises
  • Skill acquisition
  • Homework assignments
  • Review and practice what you are supposed to have
    learned

41
Sample Quiz for the Graduate Level BI Course
(fall 2007)
  • 1. True/False questions (1 point each)
  • (Five questions)
  • 2. Using SSRS to create a list report in four
    columns from any database table (5 points).
  • 3. Using SAS Enterprise Guide to accomplish the
    following tasks (10 points)
  • Open three Cotton data warehousing tables,
    location, variety and the fact table.
  • Create a query by linking all tables (hint to
    improve the efficiency of joining tables, you may
    import data into SAS format first)
  • By filtering the data to keep main variety
    information, location, a few interested columns
    that have numeric values.
  • Create a one-way frequency table from the
    filtered data showing whatever table that makes
    sense.

42
Evolving Focuses of BI Courses
  • ISQS 3358, Spring 2007
  • Focusing on data analysis with SAS Enterprise
    Guide, 5 projects based on real dataset from the
    industry
  • Teaching effectiveness evaluation 4.00
  • Main comments
  • Positive Learned from projects
  • Negative Not learned enough as data warehousing
    was missing. Some students complained wasting of
    time.
  • ISQS 6339, Fall 2007
  • Focusing on data warehousing with Microsoft SQL
    Server, plus SAS Programming and the applications
    of SAS Enterprise Guide
  • Teaching effectiveness evaluation 4.67
  • Main comments
  • Positive The contents are useful
  • Negative The instructor is not familiar enough
    with SQL Server
  • ISQS 3358, Spring 2008
  • Same focuses as ISQS 6339
  • A different textbook with different datasets
    coming with the book

43
Strategies to Do Better
  • You need the textbook.
  • Read it and do exercises with it
  • Do the in-class exercises seriously. At the
    beginning you may feel difficult to follow, but
    you will get better soon. Again, you need preview
    of the chapters to be covered.
  • Always follow the pace of the exercises. Your
    current results of the exercise could be the
    basis of the next one. If you missed one you must
    make it up.
  • Be focused on skill development. So you need to
    spend more time practicing. Make use of Lab 363
    in other time.
  • Contact the instructor for any questions

44
Grading Policy
  • Five quizzes out of six (150 points)
  • In-class exercises (100 points)
  • Project(s) (150 points)
  • Open-book final Exam (100 points)
  • The total is 500 points
  • Extra bonus credit up to 30 points based on the
    evaluation of the involvement in the class
    activities.

45
Bonus Credit
  • Extra exercises for those willing to spend more
    time in learning BI
  • Creative activities in BI learning to benefit the
    class
  • Actively involved in class discussion and helping
    others in learning

46
Preview/review - Chapter 4, 5 and 6
  • Main topics
  • Data warehousing process and design phase
  • Dimensional modeling (UMD and data mart)
  • How to define a data mart with SQL Server
    Management Studio (SSMS)
  • How to create a data mart with Business
    Intelligence Development Studio (BIDS) based on a
    cube template

47
Questions
  • What is the relationship between OLAP system and
    cubes in the context of data warehousing?
  • Can you tell what is main difference between UMD
    and data mart?
  • Do you know how to design a dimensional model?
  • Do you know how to construct a cube from data
    sources you defined?
  • Do you know how to create a data mart
  • Using SSMS (manually defining tables, import from
    files or databases)
  • Using BIDS based on a cube template
Write a Comment
User Comments (0)
About PowerShow.com