Chapter%208%20(Part%20C) - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter%208%20(Part%20C)

Description:

Core requirement of building a realistic' business model. ... Red Brick Systems was first to implement many essential OLAP functions (as ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 101
Provided by: thomas850
Category:

less

Transcript and Presenter's Notes

Title: Chapter%208%20(Part%20C)


1
Chapter 8 (Part C)
Database Modeling and Design
  • OLAP IN THE DATA WAREHOUSE

Paul Chen
2
Summary of Topics
  • 1. OLAP Definition, Key features and Benefits
  • 2. How OLAP differs from OATP?
  • 3. Multidimensional data ? What? Why? How To
    Use?
  • 4. OLAP Server Query, Features and
    Applications.
  • 5. Category of OLAP tools -Multi-Dimensional
    OLAP (MOLAP), Relational OLAP (ROLAP), and
    Managed Query Environment (MQE).
  • 6. OLAP extensions to SQL.
  • 7. The Microsoft Data Warehousing Framework.

3
Data Warehousing and End-User Access Tools
  • Accompanying growth in data warehouses is
    increasing demands for more powerful access tools
    providing advanced analytical capabilities.
  • Key developments include
  • Online analytical processing (OLAP)
  • SQL extensions for complex data analysis
  • Data mining tools.

4
Limitations of Other Analysis Methods
  • SQL has been the accepted interface for
    retrieving and manipulating data from relational
    databases. These methods are used in OLTP systems
    and in data warehousing environments (referring
    to the environments with simple queries and
    routine reports).
  • Now consider information retrieval and
    manipulation in these environments- reports
    writers and spreadsheets.
  • Report writers two features- the ability to
    point and click for generating and issuing SQL
    calls, and the capability to format the output
    reports. However, report writers do not support
    multidimensionality. With basic report writers,
    you cannot drill down to lower levels in the
    dimensions. You cannot rotate the results by
    switching rows and columns. The report writers do
    not provide aggregate navigation. Once the report
    is formatted and run, you cannot alter the
    presentation of the result data sets.
    Spreadsheets are still cumbersome for showing all
    the aggregate levels and multidimensional views,
    let alone doing calculations for roll-up and
    drill-down.

5
Topic 1 OLAP Definition
  • On-Line Analytical Processing (OLAP) is a
    category of software technology that enables
    analysts, managers and executives to gain insight
    into data through fast, consistent, interactive
    access to a wide variety of possible views of
    information that has been transformed from raw
    data to reflect the real dimensionality of the
    enterprise as understood by the user
  • -- DBMS Magazine, April, 1995

6
Key Features of OLAP
  • Supports analysis, dynamic synthesis and
    consolidation of large volumes of
    multi-dimensional data. Types of analysis ranges
    from basic navigation and browsing (slicing and
    dicing) to calculations, to more complex analyses
    such as time series and complex modeling.
  • Is able to drill down or roll up with each
    dimension.
  • Is capable of applying mathematical formulas and
    calculations to measures.

7
Key Features of OLAP
  • Can easily answer who? and what?.
  • Ability to answer what if? and why? type
    questions.
  • Distinguishes OLAP from general-purpose query
  • tools.
  • Enables users to gain a deeper understanding and
    knowledge about various aspects of their
    corporate data through fast, consistent,
    interactive access to a wide variety of possible
    views of the data.
  • Can be implemented on the web.

8
OLAP Benefits
  • Increased productivity of end-users.
  • Reduced backlog of applications development for
    IT staff.
  • Retention of organizational control over the
    integrity of corporate data.
  • Reduced query drag and network traffic on OLTP
    systems or on the data warehouse.
  • Improved potential revenue and profitability.

9
Topic 2 OLAP VS OLTP (ON-LINE TRANSACTION
PROCESSING)
  • OLTP (RELATIONAL)
  • ATOMIZED
  • PRESENT
  • RECORD-AT-A-TIME
  • PROCESS ORIENTED
  • OLAP (MULTIDIMENSIONAL)
  • SUMMARIZED
  • HISTORICAL
  • MANY RECORDS-AT-A-TIME
  • SUBJECT ORIENTED

10
OLAP VS OLTP
  • WHILE OLTP APPLICATIONS GENERALLY DO NOT
    REQUIRE HISTORICAL DATA, NEARLY EVERY OLAP
    APPLICATION IS CONCERNED WITH VIEWING TRENDS AND
    THEREFORE REQUIRES HISTORICAL DATA. OLTP
    APPLICATIONS AND DATABASE TEND TO BE ORGANIZED
    AROUND SPECIFIC PROCESSES (SUCH AS ORDER ENTRY),
    OLAP APPLICATIONS TEND TO BE SUBJECT-ORIENTED
    ANSWERING SUCH QUESTIONS AS WHAT PRODUCTS ARE
    SELLING WELL OR WHAT ARE MY WEAKEST SALES
    OFFICES?

11
Topic 3 Multi-dimensional Database
  • In a multidimensional database, data is stored as
    Facts and Dimensions instead of rows and columns
  • Multi-dimensional structures are best visualized
    as cubes of data, and cubes within cubes of data.
    Each side of a cube is a dimension.

12
Sample Star Schema
Time
Fact Table
Sales
Time key Date Month quarter year
Store
Product Key Store Key Time Key Fixed
Cost Variable cost Profit margin YTD_Sales_dollars
_by_store YTD_Sales_dollar_by_category YTD_Sales_B
y_department
Month
Product
Product
Store
Product Key
Store key Store name region
Product Name
Category
Product line
Department
13
Kinds of Queries
  • Display the total sales of all products for past
    five years in all stores.
  • Compare total sales for all stores, product by
    product, between years 2000 and 1999
  • Show comparison of total sales for all stores,
    product by product, between years 2000 and 1999
    only for those products for reduced sales.

14
Cube
  • Users often view and analyze data
    multidimensionality, using hierarchical
    segmentation along each dimension. Thus a user
    may analyze sales along the time dimension (such
    as months within quarters with years), along
    geographical dimension (cities with regions
    within countries), along the organizational
    dimension (sales persons within branches within
    territories). We can conceptualize the approach
    as a cube.

15
Cube
Fact Table View
Multi-Dimensional Cube
Property sale
p
Branch
c1
c2
c3
sale
week
price
p2
2
week 2
p1 p2 p3 p4
1 2 2 1
1 2 4 3
c1 c2 c3 c1
4
p3
C3
C1
C2
week 1
1
P1 P4
3
P property
Cells roughly equivalent to records in a
relational database.

16
WHAT IS MULTIDIMENSIONAL DATA?
  • RELATIONAL DATABASES ARE ORGANIZED AROUND A
    LIST OF RECORDS. EACH RECORD CONTAINS RELATED
    INFORMATION WHICH ARE ORGANIZED INTO FIELDS.
  • CUS NAME CUSTOMER TELEPHONE ADDRESS
  • JACK 10001 345-4444 40
    MAIN
  • WALTER 10002 345-6666 30
    ELM
  • HOOVER 10003 345-8588 6
    BELLRED
  • THIS TABLE HAS ONLY ONE DIMENSION.

17
WHAT IS MULTIDIMENSIONAL DATA?
  • LOOKING AT CUSTOMER BY TELEPHONE OR
  • TELEPHONE BY CUSTOMER ONLY PRODUCES
  • A ONE-FOR-ONE CORRESPONDENCE.

18
WHAT IS MULTIDIMENSIONAL DATA?
  • Lets take a look at an example of a relational
    table where
  • there is more than one-to-one correspondence
    between
  • the fields.
  • In the following table, we have sales data for
    each product
  • In each region-- four products sold in three
    regions. The
  • Sales data is a two-dimensional matrix (Product
    and
  • Region).

19
product/region/sales table
Product Region Sales Nuts
East 50 Nuts
West 40 Nuts
Central 30 Screws East
60 Screws West
50 Screws Central 60
Bolts East 100
Bolts West 120 Bolts
Central 80 Washers
East 90 Washers
West 100 Washers Central
40
20
A Much Better Way To Represent the Data
Two-Dimensional Matrix
  • PRODUCT EAST WEST CENTRAL
  • NUTS 50 40 30
  • SCREWS 60 50 60
  • BOLTS 100 120 80
  • WASHERS 90 100 40

21
QUERY ON MULTIDIMENSIONAL DATA
  • QUESTIONS LIKE WHAT WERE TOTAL SALES OF
    NUTS? OR WHAT WERE TOTAL SALES FOR THE EAST? TO
    FIND THE ANSWER IN THE TWO DIMENSIONAL TABLE,
    JUST FIND THE CELL CALLED EAST AND ADD UP ALL
    THE NUMBERS IN THE COLUMN.

22
AGGREGATION THE KEY TO CONSISTENTLY FAST RESPONSE
  • PRODUCT EAST WEST CENTRAL TOTAL
  • NUTS 50 40 30
    120
  • SCREWS 60 50 60 170
  • BOLTS 100 120 80
    300
  • WASHERS 90 100 40 230
  • TOTAL 300 310
    210 820

23
Multiple Reads Database Writes
  • In the above example, computing the totals
    involves 28 (44 34) database reads and eight
    database writes. A typical relational database
    can read about 200 records per second and writes
    perhaps 20 records per second. So consolidating
    this tiny database would take less than one
    second. However, for some larger tables,
    computing for totals could take days or even
    weeks to consolidate.

24
Relational Database vs. Multidimensional OLAP
Server
  • With a multidimensional OLAP server, we can
    perform the same consolidation with row and
    column arithmetic. Whereas a relational database
    can access a few hundred records per second, a
    good OLP server should be capable of
    consolidating 20,000 to 30,000 cells (equivalent
    to records in the relational table) per second,
    including the time to write the totals to the
    database. The multidimensional OLAP database will
    take up less space since the names of the regions
    and products are not repeated in the
    multidimensional database as they are in the
    relational database.

25
Multiple Hierarchies And Classes Within Dimensions
  • The single biggest factor in determining how many
    dimensions for a database is the existence of
    multiple hierarchies and classes within
    dimensions.
  • Classes are typically attributes such as size,
    color and other characteristics that define a
    subset of the members of a dimension.
  • For example, a database of shampoo sales might
    want to roll up product sales by size (6 oz, 15
    oz), by type(dry hair, oily hair) and possibly by
    other attributes such as scented/unscented, brand
    name, and so on.

26
TERMINOLOGY
  • Dimension roughly equivalent to Fields in a
    relational database.
  • In the multidimensional data, Product and
    Region are both dimensions.
  • Cells roughly equivalent to Records in a
    relational
  • database.

27
Simple Hierarchies (Roll up) Within Dimensions
  • The preceding table can be represented
    graphically as follows

Region Total
Central
East
West
Product Total
Bolts
Washers
nuts
Screws
Individual products roll up into a Product Total
28
Multiple Levels of Hierarchies
29
Slice and Dice
  • A three-dimensional array has a total of six
    faces, or views. A four-dimensional array has
    twelve views. An n-dimensional array has n(n-1)
    views. The ability to rotate the data cube is the
    main technology for multi-dimensional reporting
    and is sometimes called slice and dice.

Region
Product
Actual/Forecast
30
Practical Limitations on Database Size
  • In general, as the number of dimensions
    increases, the number of cells in the database
    increases exponentially.
  • for ex., a two-dimensional database with 100
    products and 100 regions would have 10,000 cells.
    If we add a third dimension for time with 52
    weeks, we now have 520,000 cells.
  • Most commercial OLAP servers hit the cell
    limit long before they run out of dimensions.

31
Topic 4 A list of some important features
supported by some OLAP Servers
  • Special time-series data types
  • Special dimensions for variables (complex
    mathematical relationships, such as computed
    averages, and simultaneous equations)
  • Multiple hierarchies within a dimension
  • Classes with a dimension
  • Virtual variables (computed on the fly at run
    time, such as gross margin derived from
    revenues and expenses.

32
Examples of OLAP applications in various
functional areas
33
OLAP Applications
  • Although OLAP applications are found in widely
    divergent functional areas, all have following
    key features
  • multi-dimensional views of data
  • support for complex calculations
  • time intelligence.

34
OLAP Applications - multi-dimensional views of
data
  • Core requirement of building a realistic
    business model.
  • Provides basis for analytical processing through
    flexible access to corporate data.
  • The underlying database design that provides the
    multi-dimensional view of data should treat all
    dimensions equally.

35
OLAP Applications - support for complex
calculations
  • Must provide a range of powerful computational
    methods such as that required by sales
    forecasting, which uses trend algorithms such as
    moving averages and percentage growth.
  • Mechanisms for implementing computational methods
    should be clear and non-procedural.

36
OLAP Applications time intelligence
  • Key feature of almost any analytical application
    as performance is almost always judged over time.
  • Time hierarchy is not always used in same manner
    as other hierarchies.
  • Concepts such as year-to-date and
    period-over-period comparisons should be easily
    defined.

37
Representing Multi-Dimensional Data
  • Example of two-dimensional query.
  • What is the total revenue generated by property
    sales in each city, in each quarter of 1997?
  • Choice of representation is based on types of
    queries end-user may ask.
  • Compare representation - three-field relational
    table versus two-dimensional matrix.

38
Multi-dimensional Data as Three-field table
versus Two-dimensional Matrix
39
Representing Multi-Dimensional Data
  • Example of three-dimensional query.
  • What is the total revenue generated by property
    sales for each type of property (Flat or House)
    in each city, in each quarter of 1997?
  • Compare representation - four-field relational
    table versus three-dimensional cube.

40
Multi-dimensional Data as Four-field Table versus
Three-dimensional Cube
41
Representing Multi-Dimensional Data
  • Cube represents data as cells in an array.
  • Relational table only represents
    multi-dimensional data in two dimensions.

42
Multi-Dimensional OLAP Servers
  • Use multi-dimensional structures to store data
    and relationships between data.
  • Multi-dimensional structures are best visualized
    as cubes of data, and cubes within cubes of data.
    Each side of cube is a dimension.
  • A cube can be expanded to include other
    dimensions.

43
Multi-Dimensional OLAP Servers
  • A cube supports matrix arithmetic.
  • Multi-dimensional query response time depends on
    how many cells have to be added on the fly.
  • As number of dimensions increases, number of the
    cubes cells increases exponentially.

44
Multi-Dimensional OLAP Servers
  • However, majority of multi-dimensional queries
    use summarized, high-level data.
  • Solution is to pre-aggregate (consolidate) all
    logical subtotals and totals along all
    dimensions.
  • Pre-aggregation is valuable, as typical
    dimensions are hierarchical in nature.
  • (e.g. Time dimension hierarchy - years, quarters,
    months, weeks, and days)

45
Multi-Dimensional OLAP Servers
  • Predefined hierarchy allows logical
    pre-aggregation and, conversely, allows for a
    logical drill-down.
  • Supports common analytical operations
  • Consolidation
  • Drill-down
  • Slicing and dicing.

46
Multi-Dimensional OLAP Servers
  • Consolidation - aggregation of data such as
    simple roll-ups or complex expressions
    involving inter-related data.
  • Drill-Down - is reverse of consolidation and
    involves displaying the detailed data that
    comprises the consolidated data.
  • Slicing and Dicing - (also called pivoting)
    refers to the ability to look at the data from
    different viewpoints.

47
Multi-Dimensional OLAP servers
  • Can store data in a compressed form by
    dynamically selecting physical storage
    organizations and compression techniques that
    maximize space utilization.
  • Dense data (ie., data that exists for high
    percentage of cells) can be stored separately
    from sparse data (ie., significant percentage of
    cells are empty).

48
Multi-Dimensional OLAP Servers
  • Ability to omit empty or repetitive cells can
    greatly reduce the size of the cube and the
    amount of processing.
  • Allows analysis of exceptionally large amounts of
    data.

49
Multi-Dimensional OLAP Servers
  • In summary, pre-aggregation, dimensional
    hierarchy, and sparse data management can
    significantly reduce the size of the cube and the
    need to calculate values on-the-fly.
  • Removes need for multi-table joins and provides
    quick and direct access to arrays of data, thus
    significantly speeding up execution of
    multi-dimensional queries.

50
Codds Rules for OLAP Systems
  • In 1993, E.F. Codd formulated twelve rules as the
    basis for selecting OLAP tools.
  • Multi-dimensional conceptual view
  • Transparency
  • Accessibility
  • Consistent reporting performance
  • Client-server architecture
  • Generic dimensionality

51
Codds rules for OLAP Systems
  • Dynamic sparse matrix handling
  • Multi-user support
  • Unrestricted cross-dimensional operations
  • Intuitive data manipulation
  • Flexible reporting
  • Unlimited dimensions and aggregation levels.

52
Codds Rules for OLAP Systems
  • There are proposals to re-defined or extended the
    rules. For example, to also include
  • Comprehensive database management tools
  • Ability to drill down to detail (source record)
    level
  • Incremental database refresh
  • SQL interface to the existing enterprise
    environment

53
Categories of OLAP Tools
  • OLAP tools are categorized according to the
    architecture of the underlying database.
  • Three main categories of OLAP tools include
  • Multi-dimensional OLAP (MOLAP or MD-OLAP)
  • Relational OLAP (ROLAP), also called
    multi-relational OLAP
  • Managed query environment (MQE)

54
Topic 5 Multi-Dimensional OLAP (MOLAP)
  • Use specialized data structures and
    multi-dimensional Database Management Systems
    (MDDBMSs) to organize, navigate, and analyze
    data.
  • Data is typically aggregated and stored according
    to predicted usage to enhance query performance.

55
Multi-Dimensional OLAP (MOLAP)
  • Use array technology and efficient storage
    techniques that minimize the disk space
    requirements through sparse data management.
  • Provides excellent performance when data is used
    as designed, and the focus is on data for a
    specific decision-support application.

56
Multi-Dimensional OLAP (MOLAP)
  • Traditionally, require a tight coupling with the
    application layer and presentation layer.
  • Recent trends segregate the OLAP from the data
    structures through the use of published
    application programming interfaces (APIs).

57
Typical Architecture for MOLAP Tools
58
MOLAP Tools - Development Issues
  • Underlying data structures are limited in their
    ability to support multiple subject areas and to
    provide access to detailed data.
  • Navigation and analysis of data is limited
    because the data is designed according to
    previously determined requirements.

59
MOLAP Tools - Development Issues
  • MOLAP products require a different set of skills
    and tools to build and maintain the database,
    thus increasing the cost and complexity of
    support.

60
Relational OLAP (ROLAP)
  • Fastest-growing style of OLAP technology.
  • Supports RDBMS products using a metadata layer -
    avoids need to create a static multi-dimensional
    data structure - facilitates the creation of
    multiple multi-dimensional views of the
    two-dimensional relation.

61
Relational OLAP (ROLAP)
  • To improve performance, some products use SQL
    engines to support complexity of
    multi-dimensional analysis, while others
    recommend, or require, the use of highly
    denormalized database designs such as the star
    schema.

62
Typical Architecture for ROLAP Tools
63
ROLAP Tools - Development Issues
  • Middleware to facilitate the development of
    multi-dimensional applications. (Software that
    converts the two-dimensional relation into a
    multi-dimensional structure).
  • Development of an option to create persistent,
    multi-dimensional structures with facilities to
    assist in the administration of these structures.

64
Managed Query Environment (MQE)
  • Relatively new development.
  • Provide limited analysis capability, either
    directly against RDBMS products, or by using an
    intermediate MOLAP server.

65
Managed Query Environment (MQE)
  • Deliver selected data directly from DBMS or via a
    MOLAP server to desktop (or local server) in form
    of a datacube, where it is stored, analyzed, and
    maintained locally.
  • Promoted as being relatively simple to install
    and administer with reduced cost and maintenance.

66
Typical Architecture for MQE Tools
67
MQE Tools - Development Issues
  • Architecture results in significant data
    redundancy and may cause problems for networks
    that support many users.
  • Ability of each user to build a custom datacube
    may cause a lack of data consistency among users.
  • Only a limited amount of data can be efficiently
    maintained.

68
Topic 6 OLAP Extensions to SQL
  • SQL promoted as easy-to-learn, nonprocedural,
    free-format, DBMS-independent, and international
    standard.
  • However, major disadvantage has been inability to
    represent many of the questions most commonly
    asked by business analysts.
  • IBM and Oracle jointly proposed OLAP extensions
    to SQL early in 1999, adopted as an amendment to
    SQL.

69
OLAP Extensions to SQL
  • Many database vendors including IBM, Oracle,
    Informix, and Red Brick Systems have already
    implemented portions of specifications in their
    DBMSs.
  • Red Brick Systems was first to implement many
    essential OLAP functions (as Red Brick
    Intelligent SQL (RISQL)), albeit in advance of
    the standard.

70
OLAP Extensions to SQL - RISQL
  • Designed for business analysts.
  • Set of extensions that augments SQL with a
    variety of powerful operations appropriate to
    data analysis and decision-support applications
    such as ranking, moving averages, comparisons,
    market share, this year versus last year.

71
Use of the RISQL CUME Function
  • Show the quarterly sales for branch office B003,
    along with the monthly year-to-date figures.
  • SELECT quarter, quarterlySales,
    CUME(quarterlySales) AS Year-to-Date
  • FROM BranchSales
  • WHERE branchNo 'B003'

72
Use of the RISQL MOVINGAVG / MOVINGSUM Function
  • Show the first six monthly sales for branch
    office B003 without the effect of seasonality.
  • SELECT month, monthlySales,
  • MOVINGAVG(monthlySales) AS 3-MonthMovingAvg,
  • MOVINGSUM(monthlySales) AS 3-MonthMovingSum
  • FROM BranchSales
  • WHERE branchNo 'B003'

73
Topic 7 The Microsoft Data Warehousing Framework
  • The Microsoft Data Warehousing Framework is an
    open architecture that is easily integrated with
    existing systems. The Microsoft SQL Server DTS
    tool is used to import, export, and repair or
    transform data (where it is necessary). The
    Framework contains a rich object-oriented
    programming interface for customized data
    warehousing implementations. There is also a
    user interface, the Microsoft SQL Server Analysis
    Services Manager that can be used to configure
    the data warehouse and to populate or update the
    content in a cube. It can be used to schedule
    tasks, monitor performance, and perform queries
    on the data warehouse.

74
The Microsoft Data Warehousing Framework
Microsoft Office

OLAP Services

Operation data
Web Brouser, tools, portals Servers
OLE DB
OLE DB
OLE DB
Data Transformation Services
User
Other DW
Relational Data Store

DWA member query Reporting, and Analytical tools
External data
Microsoft Repository
75
Object Linking and Embedding for Databases (OLE
DB)
  • Microsoft has defined set of data objects,
    collectively known as OLE DB.
  • Allows OLE-oriented applications to share and
    manipulate sets of data as objects.
  • OLE DB is an object-oriented specification based
    on C API.
  • Components can be treated as data consumers and
    data providers. Consumers take data from OLE DB
    interfaces and providers expose OLE DB
    interfaces.

76
Microsoft Tools To Implement A Data Warehouse
System
  • Microsoft SQL Server 2000 provides a set of tools
    to implement a data
  • warehouse system, including decision support and
    analytical tools
  • SQL Server Relational Databases
  • Data Transform Services
  • Meta Data Servicesusually stored in SQL Server
    or Microsoft Access Databases that can be
    accessed through administrative interfaces.
  • Analytical Services providing OLAP technology
    to organize large quantities of DW data into
    cubes for rapid analysis and sophisticated data
    mining technology.
  • Replicationoften used to distribute data and
    coordinate updates of distributed data in OLTP
    systems.
  • English querya development tool for creating
    client applications that transform English into
    the syntax of SQL to query relational databases,
    or the syntax of Multidimensional (MDX) to query
    OLAP cubes.

77
Microsoft Data WarehouseProduct
  • Data warehousing software has been included with
    Microsoft SQL Servertrade since the release
    of version 7.0 in 1998.

78
The Cube as a Model for the Data Warehouse
  • The cube is an imperfect yet satisfactory name
    for a data warehouse repository. How does a data
    warehouse cube differ from a geometrical cube?
    There are a few important differences. A data
    warehouse cube is defined by any number of
    dimensions (it is not limited to three, and
    sometimes a data-warehousing cube may have fewer
    than three dimensions). Dimensions describe a
    data-warehousing cube just as width, height, and
    depth describe a geometrical cube. Where it is
    appropriate, dimensions can be organized into
    any number of levels.

79
The Cube as a Model for the Data Warehouse
  • The relationship between two dimensions can be
    modeled using a grid. Dimensions are like the
    labels of along the axes of the grid. The cells
    are facts. Facts correspond to the cross product
    of each dimension of the cube. The data in the
    cell is a measure. Measures are the whole reason
    for the cube. If the cube is about the number of
    items sold, the measure is a count of the number
    of items sold. To repeat the grid example, the
    measure is the number that you would find in the
    grid cell.

80
Dimensions and Levels
  • Levels are used to organize dimensions into
    smaller units where necessary. Levels may also
    contain other levels, depending on how they are
    configured in the cube. For example, consider
    that there is a region dimension. Perhaps this
    grocery store operates in three states and uses
    the state boundaries as territorial boundaries.
    Let's say that the region dimension contains
    three levels California, Oregon, and Washington.
    If the business has additional sub-regions such
    as Seattle, Olympia, Yakima, and Spokane in the
    state of Washington, these levels can be added as
    levels to the Washington region, even if such
    detail is not needed for the California and
    Oregon region. Levels are just a convenient way
    to organize facts for a dimension.

81
Facts and Measures
  • A fact is about the combination of the various
    dimensions. Locating a fact is like using a
    coordinate system. Just like a position in a
    mathematical cube such as the origin, which
    might be represented by (x0,y0,z0), a fact
    would be represented by specific combination of
    dimensions such as
  • (Productbroccoli, RegionSeattle,
    TimeWednesday) yielding a specific fact about
    broccoli being sold in Seattle on Wednesday.
    Depending on the way that the cube is being
    used, the fact may show a measure of something
    like 580 units sold or perhaps a different
    measure like 860.00 in sales. The meaning of
    the measure depends on how the cube is defined.

82
Aggregations
  • The mathematical operations of count and sum are
    an essential part of the reason data warehouses
    are useful. These are aggregations. Once
    dimensions are organized and a cube is being
    processed, the aggregations are calculated.
    Generally, aggregations are calculated
    immediately after the cube is initially
    populated or there is a change to the content of
    the cube.

83
Using a Data Warehouse to Make Decisions
  • Consider a grocery store. Let's say that a
    promotion has been running for a few days and
    the grocer needs to decide whether or not to run
    the promotion again. A question that the grocer
    might pose would be something like, "Has more of
    the product been sold during the promotion
    period compared to prior periods?"

84
Using a Data Warehouse to Make Decisions
  • A grocery store inventory system may record
    prices, products, sales, and
  • promotions in a transactional database using
    normalized structures.
  • Inventory systems are optimized for inserting
    and updating records and
  • perhaps for simple procedural selections such
    as retrieving the cost of an
  • item. It is much less likely that the system is
    organized in such a way that it
  • would be just as efficient to produce a report
    that details on a day-by-day
  • and product-by-product basis the effectiveness
    of a sale. In fact, there is
  • usually a contradiction between systems that
    are designed for transactional
  • efficiency and those that are designed for
    efficient queries. This is where a
  • data warehouse should be used. The data
    warehouse is a separate
  • repository that uses the relevant data from
    existing sources in a structure
  • that has been optimized for selection.

85
Using a Data Warehouse to Make Decisions
  • In this case, it is easy to use a data warehouse
    to answer the grocer's question. The sum of the
    fact records that measure the number of items
    sold using the cube dimensions of products,
    promotion, and time can produce the needed
    results.

86
Using a Data Warehouse to Make Decisions
  • To contrast this technique, the information in
    the other systems may not even be in the same
    database. The inventory data source may not be
    the same data source as the customer data
    source or the employee data source. Even if the
    systems are in the same database, it would still
    be a chore to build a system of queries that
    will combine and aggregate the results in a way
    that will produce the correct answer. In fact,
    this effort of combining data sources and
    aggregating the results is just what the data
    warehousing software does best.

87
Viewing a Slice and the Programming Interface to
A Cube
  • Although fashioning a data warehouse in multiple
    dimensions may be a simple design choice, and
    performing queries that produce results that span
    several dimensions may not be a significant
    chore for the processor, the very constitution
    of multi-dimensional output can make it difficult
    to display. Charts, graphs, and tables are
    almost always presented in two dimensions.
    There are some good three-dimensional charting
    tools, but beyond that, the chart becomes more
    of a puzzle than a visual aid. A common
    technique of viewing multi-dimensional output is
    to view the output one two-dimensional "slice"
    of a cube at a time. This is the way that the
    Microsoft SQL Server Analysis Services Tool
    displays output.

88
Using DSO (Decision Support Object)
  • Fortunately, output is not restricted to two
    dimensions. Microsoft SQL Server Analysis
    Services provides a programming interface to
    multi-dimensional data warehouse output the DSO
    (Decision Support Object). DSO can be used
    programmatically to access the various
    dimensions.

89
Using MDX
  • MDX (multidimensional extensions) is a syntax
    designed for querying multidimensional objects
    and data. For such systems, it is more efficient
    and intuitive to use than SQL, which was designed
    for an entirely different set of objects. The
    grammar of an MDX query has a similar feel to the
    grammar of an SQL query. Observe the MDX query
    below which would select the sales figures for
    the broccoli sold in Seattle on Wednesday that I
    used in the example above

90
Using MDX
  • Microsoft SQL Server OLAP Services provides an
    architecture for access to multidimensional data.
    This data is summarized, organized, and stored
    in multidimensional structures for rapid response
    to user queries. Through OLE DB for OLAP, a
    PivotTable Service provides client access to
    this multidimensional online analytical
    processing
  • (OLAP) data. For expressing queries to this
    data, OLE DB for OLAP employs a full-fledged,
    highly functional expression syntax
    multidimensional expressions (MDX).
  •  

91
Using MDX
  • OLE DB for OLAP is a set of Component Object
    Model (COM) interfaces designed to extend OLE DB
    for efficient access to multidimensional data.
    ADO has been extended with new objects,
    collections, and methods that are designed to
    take advantage of OLE DB for OLAP. These
    extensions are collectively known as ADO MD
    (multidimensional) and are designed to provide a
    simple, high-level object model for accessing
    tabular and OLAP data.
  •  

92
MDX expression
  • SELECT axis specification ON COLUMNS,
  • axis specification ON ROWS
  • FROM cube_name
  • WHERE slicer_specification

93
SELECT
  • Measures.Sales ON COLUMNS
  • Time.Wednesday ON ROWS
  • FROM MySalesCube
  • WHERE Region.Washington.Seattle
  • AND Product.Vegetable.Broccholi
  •  
  • The output of this query would be a column
    labeled Sales, a row labeled Wednesday and a
    single grid cell at the intersection with the
    sales figure
  • 860.00.

94
SELECT
  • The SQL Server Analysis Server Manager has an
    interface that accepts MDX queries.
    Alternatively, MDX queries can be incorporated
    into programs that employ the DSO.

95
Microsoft SQL Data Transformation Services (DTS)
  • DTS is a set of graphical tools and programmable
    objects
  • to extract, transform, and consolidate data from
    disparate
  • services into single or multiple destinations.
  • By using DTS, you can
  • Access heterogeneous data sources-including
    relational and non-relational data sources.
  • Import, export, and transform data- moving entire
    tales, and copying other types of database
    objects.
  • Create reusable data transformations and
    functions-creating data transformations that
    manipulate data as it moves from sources to
    destination and creating functions that perform
    operations external to DTS, such as sending
    e-mail.

96
Microsoft SQL Data Transformation Services
(Contd)
  • Automate data loads-to accommodate differing
    deployment environments, you can automate DTS
    data loads for recurring executions by using the
    SQL Server Agent a scheduling component of SQL
    Serve.
  • Manage metadata- recording metadata about data
    loads, such as the owner of a data load.
  • Customize and extend functionality-providing a
    customized data transport solution that you can
    use to integrate supplied functionality with
    customized programmatic extensions.

97
Microsoft SQL Data Transformation Services
(Contd)
Transformation Package Workflow
Transformation Package Designer
Schema Designer

  1. Define Data Flow
  2. Define Transformations
  3. Populate Star Schema

98
Populating Data Warehouse Structures
  • By using DTS you can
  • Access heterogeneous data structures by using
    OLE DB.
  • Import, export, and transform datausing the DTS
    import/export wizard.
  • Create reusable data transformations and
    functions.
  • Automate data loads.
  • Manage metadata.

99
Applying the Wizard
  • Because of its generic, easy-to-use features,
    you can apply the DTS
  • Import/Export Wizard in many data transfer
    situations
  • Performing ad-hoc table and data transfers.
  • Copying data subsets.
  • Transferring data to new destinations.
  • Developing a prototyping package.
  • When you configure the source and destination,
    you can connect to
  • Both
  • OLE DB data sources.
  • ODBC data sources.

100
The Microsoft Data Warehousing Framework
  • There is more information about how to build a
    cube using the Microsoft SQL Server Analysis
    Services Manager at MSDN Online, as well as
    guidelines to consider for the design and
    configuration of your data warehouse. See the
    "How to" article at this link
  • http//msdn.microsoft.com/library/psdk/sql/aghti
    ntro_2vov.htm.
Write a Comment
User Comments (0)
About PowerShow.com