Fundamentals of Data Warehousing - PowerPoint PPT Presentation

Loading...

PPT – Fundamentals of Data Warehousing PowerPoint presentation | free to download - id: 6f59a0-ZTU2M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Fundamentals of Data Warehousing

Description:

Fundamentals of Data Warehousing Dr. Akhtar Ali School of Computing, Engineering and Information Sciences – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 21
Provided by: UNIVERSI182
Learn more at: http://computing.unn.ac.uk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Fundamentals of Data Warehousing


1
Fundamentals of Data Warehousing
  • Dr. Akhtar Ali
  • School of Computing, Engineering and Information
    Sciences

2
Lecture Outline
  • Inmons Four Characteristics of a DW
  • Subject-Oriented, Integrated, Time Variant and
    Non-Volatile
  • Some Useful Definitions
  • Issues in Data Warehousing
  • DW Design
  • Extraction
  • Integration
  • Creation and Loading
  • DW Maintenance
  • Querying and Optimization

3
1.1 Subject-Oriented
  • Data is categorized and stored by business
    subject rather than
  • by application.

Equity Plans
Shares
Loans
Insurance
Operational Systems
4
1.2 Integrated
Data on a given subject is defined and stored
once.
Savings Application
No Application Flavor
Current Accounts Application
Loans Application
Subject Customer
Data Warehouse
Operational Environment
5
1.3 Time Variant
Data is stored as a series of snapshots, each
representing a period of time.
Data Warehouse
6
1.4 Non-Volatile
Typically data in the data warehouse is not
directly updated or deleted.
Load
INSERT Read (e.g. SELECT) UPDATE DELETE
Read
7
2.1 What is an Operational Data Store?
  • An operational data store (ODS) is the point of
    integration for operational/transaction-oriented
    systems.
  • For example Banks typically have several
    independent systems set up to support different
    financial products e.g. loans, checking accounts,
    savings accounts etc.
  • The advent of ATMs helped push many banks to
    create an ODS to integrate current balances and
    recent transactional data from these separate
    accounts under one customer number.
  • Such ODSs are normally kept separate from a DW.
  • An ODS may be seen as a lowest layer of a DW for
    lower-management to access detailed as well as
    integrated data.
  • This means that an ODS may be seen as a front
    edge of a DW.
  • Such ODSs are normally kept as part of a DW.

8
2.2 What is a Data Mart?
  • A data mart (DMT) is a logical subset of a
    complete DW.
  • A DMT is a complete pie-wedge of the overall DW
    pie.
  • A DW is a made up of the union of all its DMTs.
  • Some people take the definition literally. They
    create several independent DMTs to meet the needs
    of several departments.
  • Will everyone be happy? Well, maybe. There may be
    serious issues of integrating these DMTs
    together.
  • A DMT is an extension of a DW.
  • Data is integrated as it enters the DW. DMTs then
    derive data from the central source, the DW.
  • Each department gets its own DMT.
  • Each department determines which of the data
    warehouse contents are of interest .
  • These subject areas are then replicated into the
    smaller and local DMT so that users can get to
    the data they want with less interference from
    other departments.

9
2.3 What is On-Line Analytical Processing (OLAP) ?
  • OLAP is complementary to data warehousing.
  • OLAP embodies general activities of querying and
    presenting text and number data from DWs.
  • OLAP is based on dimensional modelling as opposed
    to entity-relationship (ER) modelling.
  • A dimensional model may contain the same
    information as an ER model but packages data in a
    symmetric form.
  • A dimensional model is geared towards user
    understandability and high performance query
    processing.
  • ROLAP (Relational OLAP)
  • A set of user interfaces and applications that
    give a relational database (RDB) a dimensional
    flavour.
  • MOLAP (Multi-dimensional OLAP)
  • A set of user interfaces, applications and
    proprietary database technologies that have a
    strong dimensional flavour.

10
2.4 What is Data Mining (DM) ?
  • DM is often defined as finding hidden
    information in a database.
  • Alternatively, DM is exploratory data analysis,
    data driven discovery, and deductive learning.
  • Data mining software is a class of tools that
    apply artificial intelligence techniques to the
    analysis of data.
  • Given access to data, DM tools dig through the
    data looking for patterns and discovering
    relationships that the user might have never
    suspected.
  • DM tools work against an operational database or
    a DW.
  • Since data in a DW is usually integrated and
    summarized it may be more efficient to use it for
    DM.
  • But a DM tool may find more useful information
    from an operational database (compared to a DW)
    as a DW usually hosts data to support anticipated
    DSS and may miss out data useful for DM.

11
3.0 Generic Data WarehouseArchitecture
12
3.1 Warehouse Design
  • Influenced by both maintenance and querying
  • Many trade-offs
  • Space vs. update time vs. query performance
  • Logical model of data
  • ER vs. Dimensional, Relational vs. OR vs. OO,
    ROLAP vs. MOLAP
  • Identify sources of data
  • Identify warehouse data what to materialize?
  • Which summary tables?
  • Which fact/dimensional tables?
  • Which indices?
  • Choose software and hardware

13
3.2 Data Extraction
  • Selecting relevant data from data sources (DSs)
    and moving it into DW.
  • DS types
  • Database (e.g. relational), flat file, WWW, XML,
    COBOL, etc
  • How to obtain the data?
  • Using data replication servers/tools
  • Dump file or Export tools
  • ODBC/JDBC/CORBA/RMI/COM and DCOM
  • Third party Wrappers/Middleware/Agents
  • Other activities data transformation, change
    detection (monitoring), cleansing etc.

14
3.2.1 Monitors
  • Detecting changes (of interest to a DW) in data
    sources and propagate to DW.
  • How?
  • Triggers
  • Replication servers/tools
  • Log Sniffer
  • Compare query results
  • Compare snapshots/dumps/exported data

15
3.2.1 Data Cleansing
  • Finds and removes duplicate tuples
  • For example Judie Harris Morris vs. Judie H.
    Morris
  • Detect inconsistent or wrong data
  • Attribute values that do not match (because of
    wrong data types or violating certain constraints
    e.g. for Gender attribute a value of N meaning
    neither may be rejected).
  • Unreadable or incomplete data
  • Notify DSs of errors found during the cleansing
    process

16
3.3 Data Integration
  • Receive data (changes) from multiple
    wrappers/monitors/data cleansers and integrate
    into DW.
  • Often Rule-based
  • Actions
  • Resolve inconsistencies
  • Eliminate duplicates
  • Integrate into DW
  • Summarize data
  • Fetch more data from DSs
  • Notify users that DW is now up-to-date

17
3.4 .1 Warehouse Loading
  • Includes all of the previous processes
  • Similar to loading/populating a database but
    complex due to heterogeneity of data and dealing
    with multiple DSs, possibly remote and external.
  • Building indices
  • Checking integrity constraints, etc.
  • Issue huge volumes of data but small time window
    to complete the process.
  • Computation of additional data
  • Auxiliary data to facilitate DW maintenance and
    support/speed up querying and analysis.

18
3.4.2 Warehouse Creation
  • A DW can be seen as a collection of materialized
    views (MVs) over DSs.
  • Contains a copy of data (collected from DSs)
    tailored to end-users.
  • Steps
  • Create DW schema (e.g. creating fact and
    dimensional tables, defining MVs)
  • Load warehouse
  • Start monitoring for changes at DSs
  • Update/Maintain DW as needed.

19
3.5 DW Maintenance
  • DSs on which a DW is based may change over time.
  • Changes at DSs may require changes at a DW.
  • How often to propagate changes to a DW?
  • At night, weekly/fortnightly/monthly,
    immediately, etc.
  • Off-line or on-line
  • Most current vendor products take a warehouse
    off-line during maintenance
  • How to propagate changes to a DW?
  • Completely re-build all affected tables at the DW
    (easy but inefficient)
  • Apply changes to affected tables incrementally
    (efficient but difficult)
  • Read my paper about MOVIE or wait until we
    discuss this topic in detail.

20
3.6 Querying and Optimization
  • Queries are long-running and complex
  • Multiple/Nested joins and aggregation
  • Usually touch all tuples kind of queries
  • Query language or analysis tools must support
    multi-dimensional operations
  • Pivot, Slice/Dice, Rollup, Percentile, etc
  • Standard SQL does not provide adequate operations
  • Solution pre-compute partial answers and reuse
  • Drawback it may increase DW maintenance
  • Emergence of warehouse management systems (WHMS
    e.g. ADMS, WHIPS)
About PowerShow.com