Data Warehouses - PowerPoint PPT Presentation


PPT – Data Warehouses PowerPoint presentation | free to download - id: 4a0131-YmI2Y


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Data Warehouses


Data Warehouses Kathy S. Schwaig Outline Data Explosion Data Warehouses Multi-dimensional databases Now that we have gathered so much data, What do we do with it? – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 35
Provided by: ksumailKe


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data Warehouses

Data Warehouses
  • Kathy S. Schwaig

  • Data Explosion
  • Data Warehouses
  • Multi-dimensional databases

Portions of this presentation are adapted from _at_
J. Han, Simon Fraser University, Canada, 2000
Now that we have gathered so much data, What do
we do with it?
  • I never waste memory on things that can easily
    be stored and retrieved from elsewhere.

  • - Albert Einstein

Data Explosion Problem
  • Automated data collection tools and mature
    database technology lead to tremendous amounts of
    data stored in databases.
  • We are drowning in data, but starving for

What is a Data Warehouse?
  • An integrated and consistent store of
    subject-oriented data, structured for query and
    retrieval in order to support management decision

  • A data warehouse is where the information systems
    department puts data to be turned into
  • One cannot just dump masses of data into a disk
    drive and expect it to be usable.

Goal of Data Warehousing
  • Resolve enormous data access difficulties
  • Unavailable data hidden in transaction systems
  • Delays as underpowered systems try to perform
    huge, complex queries
  • Complex, user-hostile interfaces
  • Difficulties in discovering patterns in large
    amounts of data
  • Competition for computer resources between
    transaction systems and decision support systems

On-line Transaction Processing (OLTP)
  • Traditional database management systems (DBMS)
    used for on-line transaction processing (OLTP).
  • Order entry update status field of order 445522
  • Banking transfer 100 from account 55779 to
    account 99321
  • Characteristics
  • detailed up-to-date data
  • structured, repetitive tasks
  • short transactions
  • read and/or update a few records

An OLTP example
  • You call retailer Lands End, where you have done
    business before. The exchange might be
  • Hi, this is Mr. Smith. Id like to place an
  • Your phone number, please?
  • 555-555-1212
  • (Pulls up your file) Yes, Mr. Smith. What can I
    help you with?
  • Id like to order merchandise number 2222
  • I see you were a little late last year in
    getting your Christmas presents ordered. Would
    you like some suggestions to get the process
    started earlier?
  • Sure, Why not?
  • Last year, you bought your Aunt Jennifer a
    scarf. We have a lovely pair of gloves to match
    --they are on special for only 19. Should I add
    those to your order?
  • Uh...sure.
  • And would you like the card to say the same as
    last year?
  • Yes, please.

Decision Support versus Transaction Processing
  • Characteristics and usage patterns of operational
    systems (transaction processing systems) used to
    automate business processes and those of a
    Decision Support System are fundamentally
    different but linked. Why?

What is a Data Warehouse?
  • Facility for integrating data
  • Organizes and stores data for analytical
    processing from historical perspective
  • Maintained separately from organizations
    operational database

Data Warehouse Architecture
DSS Server
Analysis Query Reports Data mining
other sources
Data Warehouse
Operational DBs
Data Marts
Data Sources
Characteristics of a Data Warehouse
  • Subject-oriented
  • Integrated
  • Non-volatile
  • Time-varying

1. Subject Oriented
  • Oriented to the major subject areas of the
  • E.g. insurance company customer, product,
    transaction, policy, claim, account
  • Operational database and applications may be
    organized differently
  • E.g. based on type of insurance's auto, life,
    medical, fire.

2. Integrated
  • Inconsistencies in encoding and naming
    conventions exist among data sources. Why?
  • Data converted

3. Non-Volatile
  • Operational data regularly accessed and
    manipulated a record at a time. Update performed
    in operational environment.
  • Warehouse data loaded and accessed.
  • Update of data does not occur in the data
    warehouse environment.

4. Time Variant...
  • A data warehouse is a time-variant collection
    of data, meaning time is a variable in accessing
    the data.

Time Variant
  • Time horizon for data longer than that of
    operational systems.
  • Operational database contains current value data.
  • Data warehouse data is a sophisticated series of
  • The key structure of operational data may or may
    not contain some element of time. The key
    structure of the data warehouse always contains
    some element of time.

Data Mart
  • A data mart is a smaller version of a data
    warehouse, typically containing data related to a
    single functional area of the firm or having
    limited scope in some other way.
  • It can be a useful first step to a full-scale
    data warehouse.

Data The Critical Issue
  • Users need to gather, analyze, report on business
    information to help organizations gain
    competitive advantage.
  • Most companies have a wealth of legacy data.
  • Worthless if
  • existence unknown
  • cannot be found
  • cannot be understood
  • incorrect

Data Transformation
  • Simple transformation -- e.g. change data type of
    field from integer to character
  • Cleansing scrubbing -- consistent format, valid
  • Integration -- data from multiple sources and map
    field by field into data warehouse.
  • Aggregation / summarization

Sample Operations
  • Roll up -- summarize data
  • total sales volume last year by product category
    by region
  • Roll down, drill down, drill through -- go from
    higher level summary to lower level summary or
    detailed data
  • For a particular product category, find the
    detailed sales data for each salesperson by date
  • Slice and dice
  • Sales of beverages in the West over the last 6

Why Multi-Dimensional Databases?
  • No single "best" data structure for all
    applications within an enterprise. Need good
    conceptual fit with the way end-users visualize
    business data
  • Most business people already think about their
    businesses in multidimensional terms
  • Managers tend to ask questions about product
    sales in different markets over specific time

Adapted from Arun Rai 1999
What is a Multi-Dimensional Database?
  • A multidimensional database (MDD) is a computer
    software system designed for the efficient and
    convenient storage and retrieval of large volumes
    of data that are
  • (1) intimately related
  • (2) stored, viewed and analyzed from different
    perspectives. Perspectives called dimensions.

Contrasting Relational and Multi-Dimensional
Models An Example
The Relational Structure
Multidimensional Array Structure
Mutlidimensional Representation
View Data An Example
  • Assume that each dimension has 10 positions, as
    shown in the cube above
  • How many records would be there in a relational
  • Implications for viewing data from an end-user

Data Warehousing and The World Wide Web
  • Access and transfer large numbers of data
    relatively easily and economically
  • Integration of external data into data warehouse
  • Issues of data integrity, accuracy, quality
  • Quality rating versus price

  • Data Mining
  • Data Visualization
  • (Coming Next)

  • Data versus Information
  • Data Warehouse Architecture
  • Characteristics
  • Applications

Appendix Operational Data Store and Data
Warehouse Characteristic
Characteristic Operational Data Store
Data Warehouse
  • How is it built?
  • User requirements
  • Area of support

One application or subject area at a time. Well
defined prior to logical design. Day-to-day
business operations. Relatively small number of
records retrieved via a single query.
Typically multiple subject areas at a time.
Often vague and conflicting.
Decision support for managerial activities.
Large data sets scanned to retrieve results
from either single or multiple queries.
Type of access
Tuned for frequent access to small amounts of
Tuned for infrequent access to large amounts of
Frequency of access
Volume of data
Similar to typical daily volume of operational
Much larger than typical daily transaction volume.
Appendix Operational Data Store and Data
Warehouse Characteristic (contd)
Characteristic Operational Data Store
Data Warehouse
Retention period Currency of
data Availability of data Typical unit of
analysis Design focus
Retained as necessary to meet daily operating
Retention period is indeterminate and must
support historical reporting, comparison, and
Up-to-the-minuet real time.
Typically represents a static point in time.
High and immediate availability may be required.
Immediate availability is less critical.
Small, manageable, transaction-level units.
Large, unpredictable,variable units.
High-performance, limited flexibility.
High flexibility, high-performance.
Appendix Characteristics of a Data Warehouse
  • Subject orientation. Data are organized
    based on how the
  • users
    to them.
  • Integrated. All
    inconsistencies regarding naming

  • convention and value representations are

  • removed.
  • Nonvolatile. Data are stored in
    read-only format and do not
  • change
    over time.
  • Time variant. Data are not current
    but normally time-series.
  • Summarized. Operational data are
    mapped into a decision-usable
  • format.
  • Large volume. Time-series data sets
    are normally quite large.
  • Not normalized. DW data can be, and
    often are, redundant.
  • Metadata. Data about data
    are stored.
  • Data sources. Data come from
    internal and external unintegrated

  • operational systems