Data Warehousing and Online Analytical Processing OLAP - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Data Warehousing and Online Analytical Processing OLAP

Description:

http://msdn2.microsoft.com/en-us/library/aa902672(SQL.80).aspx ... Microsoft battling Oracle and IBM (DB2) for dominance in the corporate database market ... – PowerPoint PPT presentation

Number of Views:2090
Avg rating:3.0/5.0
Slides: 48
Provided by: marki152
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing and Online Analytical Processing OLAP


1
Data Warehousing and Online Analytical Processing
(OLAP)
  • Data is good

2
BI Consumer Pool
Operations research / management science
Data Warehouse Design Considerations
http//msdn2.microsoft.com/en-us/library/aa902672(
SQL.80).aspx
3
Microsoft Examples
Excel, Access, Reporting Services, Data Analyzer,
Office, 3rd party tools
SQL Server, Access
Analysis Services (SQL Server
SQL Server, Access
SSIS
4
What is a Data Warehouse?
Data warehouses are databases designed for
analysis.
Data is
  • Subject oriented
  • Integrated
  • Time-Variant
  • Nonvolatile
  • Data enters DW from operational environment,
    transaction processing systems (TPS).

5
1. Subject Orientation
  • TPS organized around processes, functions
  • billing, banking, purchasing, payroll, etc.
  • DW organized around subjects
  • customers, vendors, encounters, sales
  • Transactions
  • TPS processes transactions
  • DW stores summary info related to transactions
  • TPS - keeps data needed for transaction
  • DW - keeps data needed for analysis

6
2. Integration
  • DW must integrate data from different apps
  • Create consistency across applications
  • naming conventions
  • measurement of variables (units)
  • data types
  • encoding
  • DSS analyst - use the data, not worry about
    credibility/consistency of data
  • often best person to find subtle data problems

7
3. Time Variancy
  • TPS accurate at moment of access
  • DW accurate as of some moment in time

Operational - current value data
Data warehouse - snapshot data
  • Time horizon 5-10 years
  • Key contains an element of time
  • Once snapshot made, data cannot be updated
  • Time horizon 60-90 days
  • Key may or may not have an element of time
  • Data can be updated

8
4. Nonvolatility
Change
Replace
Insert
Insert
Load
Access
Delete
Change
Operational
Data warehouse
9
Case for Data Warehousing
  • Can perform querying/reporting on servers/disks
    not associated with TPS - performance of both
    affected
  • Can use query/reporting optimization which would
    not be appropriate for TPS
  • Less technical expertise needed to create
    queries/reports
  • Facilitates reporting across different TPSs
  • Facilitates storage of historical data
  • Prevents analysts from requiring access to live
    TPS

Source Data Warehouse Information Center,
http//www.dwinfocenter.org/
10
What is a Data Mart
  • Departmental based DSS database
  • Same architectural foundation as DW
  • Data mart is reconcilable with DW
  • Users are more likely business analysts than IT
    technicians (some overlap likely)
  • farmers (know what they want)
  • explorers (not sure what they want)
  • more farmers than explorers

11
Why Data Marts?
  • DW gets large, competition among users
  • Data becomes harder to customize in DW
  • Analysis tools for dealing with LARGE datasets
    may not be as elegant as tools designed for
    smaller datasets
  • Dept can customize data as it flows into data
    mart - no need to serve entire corporation
  • Dept can choose amount of historical data to
    include in data mart.
  • Less concern about impact on DW of
    computationally intensive analyses
  • Dept can choose own data mart DB software
  • Dept can choose specialized analytical software

12
Multi-dimensional Data Modeling
  • Designed to facilitate analysis (not
    transactions)
  • Extremely common in data warehousing
  • A de-facto standard
  • Concept has been around for a long time
  • Intuitive concept of many dimensions or
    perspectives on business measures or facts
  • view sales facts (such as total sales) from
    customer, product and time perspective
  • Multidimensional conceptual view of data for user
  • Rich dimensional structuring with hierarchical
    referencing
  • Efficient specification of dimensions and
    calculations
  • Separation of structure and representation
  • Underlying database technology will vary -
    relational, proprietary data cubes
  • Hypercube an n-sided cube
  • User can slice, dice, rotate, browse through
    multidimensional cubes of data

13
Data Models Relational vs. Multi-dimensional
  • Transaction focused
  • Focus on many linked, normalized tables
  • One big complex data model
  • Very little redundancy
  • Analysis focused
  • Normalized fact table joined to a few highly
    non-normalized dimension tables
  • Many simple, intuitive data models
  • Lots of redundancy

14
Data (Hyper) cubes
2-d to 3-d cube
Rotating the cube
15
A 6-D Multidimensional Type Structure Example
Data generating events happen at the intersection
of specific instances of each dimension. Total
Sales(Store5, 65 Males, Feb, Actual, Shirts)
OLAP Solutions by Thomsen
16
Bringing Facts and Dimensions TogetherThe
Star-Schema
simple
few joins
symmetric
extendable
Dimensions are highly non-normalized (i.e. lots
of redundancy
Atomic data
From 1st ed of Kimballs Data Warehouse
Toolkit, notice no transaction in the Sales
Fact table
17
What is OLAP?
  • Software tool providing multi-dimensional view of
    data for business analysis
  • Example of Decision Support or Business
    Intelligence tool
  • Fast data access and fast computations
  • Interactive, flexible user interface
  • Slice, dice, drill-down
  • Excel Pivot Table and Pivot Chart are examples of
    simple OLAP tools

18
Defining OLAP - ANALYSIS
  • Business logic and statistical analysis relevant
    to end user
  • Should not require programming for everything
  • Analysis can be via vendors tools or link to
    generic analytical platform such as spreadsheet
  • Examples include time series analysis, cost
    allocation, currency translation, goal seeking,
    ad-hoc multi-dimensional structural changes (cube
    building), non-procedural modeling, exception
    alerting, and data mining.
  • Capabilities vary widely by vendor and market

19
Common cube operations
  • Pivot or Rotate change which dimensions and/or
    levels within dimensions are shown on row and
    column axes
  • Roll-up aggregate or combine cells within a
    dimension according to some mathematical
    operation
  • Uses a hierarchy definition for the dimension
  • Commonly this is summation or count
  • Drill down examine data a greater level of
    detail
  • Add another row or column header which is further
    down the concept hierarchy
  • Slice select a subset of a cube by constraining
    the value of some dimension
  • Ex Select cells for month January in time
    dimension
  • Dice select a subset of a cube by constraining
    two or more dimensions
  • Drill through access atomic level detail data

20
A Few Products
  • Microsoft Analysis Services
  • Part of SQL Server 2005
  • Create OLAP cubes, 10 data mining algorithms
  • Tableau
  • A new, pretty amazing pivoting tool
  • Cognos
  • Recently bought by IBM
  • Hyperion Essbase
  • Full suite of business intelligence developer and
    end user tools
  • Purchased by Oracle
  • Business Objects (Crystal)
  • Full suite of business intelligence developer and
    end user tools
  • Microstrategy
  • Oracle
  • Information Builders
  • Home of WebFocus, a web based OLAP tool
  • Pentaho
  • A new open source business intelligence project
  • http//www.pentaho.org/

21
Lets OLAP
  • Download the following from the Downloads section
    of course web
  • Lets look at Excel Pivot Tutorial
  • Can even publish Pivot Tables to Web

22
A Call Center ExampleTech Support for MS Office
  • Technology enabled business processes
  • Massive amount of data captured by ACD
  • Some data analysis done by ACD
  • Difficult operational questions related to
    staffing/scheduling impact on service level
  • Many call centers in many industries

23
Steps in Multi-dimensional ModelingCall Center
Example
  • Choose business process
  • Servicing technical support calls
  • Choose grain of process
  • Individual phone calls
  • Choose dimensions
  • Customer, application, problem, time
  • Choose measured facts
  • time on hold, service time of call

24
The Star SchemaA multi-dimensional data model
Non-normalized
Non-normalized
Non-normalized
Normalized
Non-normalized
25
One E-R vs. Many Stars
Analysis focus
Transaction focus
One E-R model for all the business process.
One star per modeled business process.
26
What is SQL Server?
  • Microsofts industrial strength database
    management system
  • Intended for client/server architecture
  • Includes numerous tools for creating, modifying
    and working with SQL Server databases
  • It is NOT a front end development tool (i.e. no
    Forms, Reports, VB code modules as in Access)
  • Microsoft battling Oracle and IBM (DB2) for
    dominance in the corporate database market
  • SQL Server using price, performance, ease of use
    and Microsoft ubiquity as marketing levers
  • Comes in various flavors such as Enterprise,
    Developer and Personal
  • You can get Personal through the MSDNAA program
    here at OU
  • Personal edition (which is installed in this lab)
    has server and client running on same machine

27
MS Access Two tools in one
(1) Tools for creating Forms, Reports, Macros,
Modules
(2) Database engine (Jet) .mdb file is container
for both data tables as well as front end
elements.
MS SQL Server
Web applications
SQL Server client side tools such as Enterprise
Manager, Query Analyzer, and DTS
Windows apps created with VB, C, C, Java
Application software such as Excel or Access
Crystal Reports, Business Objects, etc.
SQL Server Databases (1) data file(2)
transaction log file
Transact-SQL or T-SQL is SQL Servers SQL
dialect
28
Main Components of SQL Server 2005
(2) Important Client Tools
(1) On the Server Side
  • SQL Server database engine
  • runs as a service on computer acting as the
    database server
  • server listens for requests from clients
  • Analysis Services
  • OLAP and data mining
  • Reporting Services
  • Integration Services
  • Management Studio primary tool for managing SQL
    Server databases, security and other objects
  • Query Builders tools for building and analyzing
    SQL statements
  • Business Intelligence Development Studio - build
    various BI applications
  • Analysis Services projects
  • Integration Services projects
  • Report Services projects

29
Understanding MS Analysis Services (AS)
Manage cubes with Analysis Manager
  • Included in SQL Server 2000
  • Was called OLAP Services in SQL Server 7
  • Includes cube building capabilities
  • Draws data from the data warehouse and constructs
    MD cubes for analysis
  • Cube builder is heavily wizardized
  • Includes a few data mining models
  • Runs as a service in Windows
  • AS includes various administrative tools as well
    as a simple data browsing tool
  • Its NOT meant to be an OLAP front end pivoting
    tool

Cubes get data from warehouse and make data
available to reports
Facilitates DSS app dev
Various dbs
AS got another huge facelift in SQL Server 2005
and is gaining strategic importance for Microsoft
30
Introducing Analysis Services Projects
  • Tools for managing OLAP cubes and data mining
    models in MS SQL Server AS
  • Uses special version of MS Visual Studio
  • SQL Server Books Online
  • A one stop shop for creating analytical
    applications using Microsoft technology

31
Databases
Analysis Services Databases
32
Cube Editor
Star Schema
Our Call Center Data Warehouse
Dimensions for analysis BY
33
Dimension Hierarchies
  • A conceptual relationship between related items
    in a dimension
  • Highest level in hierarchy usually has least
    number of distinct values
  • Hierarchies facilitate roll up of lower levels
    into higher levels as well as drill down from
    higher levels to lower levels
  • A dimension can have multiple hierarchies
  • MS AS restricts to one hierarchy per dimension
  • Other dimension attributes can be designated as
    Member Properties
  • MS AS 2005 will support multiple hierarchies
    within a dimension

34
Time Dimension
  • Almost all MD models have a time or date
    dimension
  • Can build this in advance independent of the
    application
  • Include many, many attributes that describe
    dates/times such as various time periods in
    calendar time as well as fiscal time, holiday
    indicators, weekday names, etc.
  • Lets explore the smallish time dimension in
    FoodMart and a slightly richer one in grocer.mdb
  • Some tools (including AS) will let you create a
    time dimension from any date/time field
  • Probably better to create your own fully seeded
    date table for the time dimension
  • Ensures no time gaps due to no facts
  • Kimball recommends creating separate date table
    and time table
  • Far less space required

35
Designing Cubes - Facts
  • Often cubes based on underlying data modeled as a
    star-schema (or one of its variants)
  • Identifying fact table and associated measures
  • Fact table also contains keys that join to the
    dimension tables
  • Dimensions often shared across multiple cubes

36
Here are the three measured facts in our call
center cube.
Cubes know about hierarchical relationships
within dimensions. This is how the cube can roll
up calculations between levels of the hierarchy.
37
Designing Cubes - Dimensions
  • We are picking a set of dimension tables as well
    as the fields within each table making up the
    levels of each dimension.
  • Dimension design involves picking dimension table
    and fields representing hierarchical levels
    within the dimension

38
Reporting Options for Analysis Services Cubes MS
Excel
  • Excel can link to OLAP cubes
  • Can build Excel Pivot Tables based on cubes
  • Need MS Query to connect to a cube
  • Differences between pivoting on data in Excel vs.
    data in OLAP cube
  • Row limits
  • Hierarchies
  • Some limitations in pivot table functionality
    with OLAP cubes (e.g. no calculated fields or
    items)
  • Can create a local cube for offline pivoting
  • From Pivot Table toolbar Offline OLAP and then
    youre prompted to save a .cub file

39
Reference Library
40
BI Resources
  • The Data Warehousing Institute
    http//www.tdwi.org/
  • Kimball and Associates http//www.ralphkimball.com
    ./html/articles.html A Dimensional Modeling
    Manifesto Kimball, R. http//www.dbmsmag.com/970
    8d15.html
  • DSS Resources http//dssresources.com/
  • Data Warehousing Information Center
    http//www.dwinfocenter.org/
  • Intelligent Enterprise http//www.intelligententer
    prise.com/
  • DM Review http//dmreview.com/
  • KDNuggets http//www.kdnuggets.com/
  • IT Toolbox http//www.ittoolbox.com/
    http//businessintelligence.ittoolbox.com/http/
    /datawarehouse.ittoolbox.com/
  • OLAP Report http//www.olapreport.com/
  • Some free stuff (nice history of OLAP and
    commentary on industry trends
  • Other stuff costs
  • http//www.mosha.com/msolap/
  • Awesome set of resources from the lead developer
    on MS SQL Server Analysis Server team

41
Some Good Books and Articles
  • The Data Warehouse Toolkit Kimball, R.
  • Definitive, Microsoft SQL Server 2005 based 3rd
    edition now out
  • OLAP Solutions Thomsen, E.
  • Definitive, abstract and dense, good
  • MDX Solutions With Microsoft SQL Server Analysis
    Services 2005 and Hyperion Essbase by George
    Spofford
  • MDX is the SQL for cubes
  • Data Mining with SQL Server 2005 (Paperback) by
    ZhaoHui Tang (Author), Jamie MacLennan
  • Data Warehouse Design Solutions Adamson and
    Venerable
  • Multi-D DW designs from lots of different
    industries
  • Very practical, uses realistic situations to
    reinforce the concepts
  • Summers Rubber Company designs its data warehouse
  • Gorla, Narasimhaiah Krehbiel, Steve
  • Interfaces Mar/Apr 1999 29, 2 ABI/INFORM
    Global

42
More AS Tutorials and Resources
  • http//www.mosha.com/msolap/
  • This is the granddaddy of MS SQL Server Analysis
    Services resoures. Mosha Pasumansky is the MS
    development lead on AS engine.
  • Site is gold mine of information and links
    regarding AS and related software
  • He participates in microsoft.public.sqlserver.olap
  • Great Blog at http//www.sqljunkies.com/WebLog/mos
    ha/
  • Introduction to Analysis Services - by William
    Pearson (series of articles) http//www.databasejo
    urnal.com/article.php/1459531/
  • Very nice series of MS AS tutorials
  • Best practices for Business Intelligence using
    the Microsoft Data Warehousing Framework
  • A white paper from Microsoft

43
The various MS pivot tools
  • Pivot Table Service
  • Included as part of AS but also installed as part
    of MS Office
  • Client liason to AS
  • Can also create and communicate with local cubes
    that are not part of AS
  • Excel Pivot Table
  • A flexible crosstab type reporting tool which is
    part of Excel
  • Office Pivot Table List
  • Part of Office Web Components
  • Its an Active X control
  • Similar to Excel Pivot Table but with some
    differences in functionality

44
1. Select Business Process to Model
  • Some natural business activity supported by one
    or more information systems
  • Examples purchasing, orders, shipments,
    inventory, admissions, service calls,
    installations, bookings, production
  • We represent each activity as a table of facts
    with many related dimension tables
  • Designing and creating the fact and dimension
    tables is the critical part of creating a useful
    multidimensional data model the cube
  • This is a combination of business analysis and
    technical design

45
2. The Fact Table
  • Fact table contains
  • numeric measures describing some aspect of the
    business process
  • Foreign keys into dimension tables
  • Degenerate dimensions such as transaction
    numbers, invoice numbers, etc.
  • Each row is a measurement
  • Each row is at the same grain
  • Many useful facts are numeric and additive
  • Cubes tend to be quite sum and count centric
  • Well encounter additive, semi-additive and
    non-additive facts
  • Fact tables are usually deep (many rows) and
    narrow (not so many columns)
  • Three common fact table types
  • Transaction facts EX sales
  • Periodic snapshot facts EX daily inventory
    facts
  • Accumulating snapshot facts EX status of
    something as it moves through relatively well
    defined phases or stages such as a product as it
    moves from order to manufacture to shipment

46
2. Declare Grain of the Process
  • Specify exactly what a fact table row represents
  • An individual line item on a customers retail
    sales ticket as measured by a scanner device
  • A line item on a hospital bill
  • A single hotel reservation
  • Daily inventory snapshot for each product in a
    warehouse
  • Monthly snapshot of a 401k account
  • This is the lowest level of detail you are
    choosing to store in the multidimensional data
    model
  • Table is sparse in sense that rows representing
    the fact that nothing happened (i.e. values of 0)
    are usually NOT stored
  • EXAMPLE Lets check the grain and measures of
    the fact tables in FoodMart 2000.
  • Sales Facts
  • Inventory Facts

47
4. Choose dimensions that apply to each fact
table row
  • How do people describe the data that results from
    the fact table rows?
  • Dimension richness is what really makes the data
    warehouse
  • Lots of them filled with descriptive business
    information
  • The business perspectives BY which people
    describe their data
  • Time, customers, products, locations, suppliers,
    prospect type, problem, scenarios
  • Source of query constraints, groupings, report
    labels
  • Dimension tables are wide (many fields),
    denormalized (plenty of redundancy), with usually
    not too many rows
  • Use surrogate keys (meaningless long integers)
    instead of natural keys (actual meaningful
    primary keys from source systems such as customer
    s, invoice numbers, etc.)
  • Robust to changes in source systems
  • Can handle source system key reuse
  • Source keys may be complex strings (slow) with
    embedded info (hard to use)
  • So, is something a Measured fact or Dimension
    attribute?
  • Is it a measurement that takes on lots of values
    participates in calculations?
  • Or is it a relatively constant value that
    participates in constraints?
Write a Comment
User Comments (0)
About PowerShow.com