Difference between Data Warehouse and Data Mining? - PowerPoint PPT Presentation

About This Presentation
Title:

Difference between Data Warehouse and Data Mining?

Description:

What exactly is a Data Warehouse? Termed as a special type of database, a Data Warehouse is used for storing large amounts of data, such as analytics, historical, or customer data, which can be leveraged to build large reports and also ensure data mining against it.@ What is Data mining? The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions’ Call us at For any queries, please contact: +1 940 440 8084 / +91 953 383 7156 TODAY to join our Online IT Training course & find out how Max Online Training.com can help you embark on an exciting and lucrative IT career. – PowerPoint PPT presentation

Number of Views:1133

less

Transcript and Presenter's Notes

Title: Difference between Data Warehouse and Data Mining?


1
(No Transcript)
2
  • Data Warehousing
  • OLAP
  • Data Mining
  • Further Reading

Enroll Now https//goo.gl/QbTVal
3
Data Warehousing
  • OLTP (online transaction processing) systems
  • range in size from megabytes to terabytes
  • high transaction throughput
  • Decision makers require access to all data
  • Historical and current
  • 'A data warehouse is a subject-oriented,
    integrated, time-variant and non-volatile
    collection of data in support of managements
    decision-making process' (Inmon 1993)

Enroll Now https//goo.gl/QbTVal
4
Benefits
  • Potential high returns on investment
  • 90 of companies in 1996 reported return of
    investment (over 3 years) of gt 40
  • Competitive advantage
  • Data can reveal previously unknown, unavailable
    and untapped information
  • Increased productivity of corporate
    decision-makers
  • Integration allows more substantive, accurate and
    consistent analysis

5
Typical Architecture
Mainframe operational n/w,h/w data
Warehouse mgr
Reporting query, app development,EIS tools
Meta-data
Highly summarized data
Departmental RDBMS data
Load mgr
Query manager
OLAP tools
Lightly summarized data
Private data
DBMS
Detailed data
Warehouse mgr
Data-mining tools
External data
Archive/backup
Source Connolly and Begg p1157
6
Data Warehouses
  • Types of Data
  • Detailed
  • Summarised
  • Meta-data
  • Archive/Back-up

Enroll Now https//goo.gl/QbTVal
7
Information Flows
Operational data source 1
Warehouse Mgr
Meta-flow
Reporting query, app development,EIS tools
Meta- data
Highly summ. data
Inflow
Outflow
Load mgr
Query manager
OLAP tools
Lightly summ.
Upflow
DBMS
Detailed data
Warehouse mgr
Data-mining tools
Downflow
Archive/backup
Operational data source n
Source Connolly and Begg p1162
8
Information Flow Processes
  • Five primary information flows
  • Inflow - extraction, cleansing and loading of
    data from source systems into warehouse
  • Upflow - adding value to data in warehouse
    through summarizing, packaging and distributing
    data
  • Downflow - archiving and backing up data in
    warehouse
  • Outflow - making data available to end users
  • Metaflow - managing the metadata

9
Problems of Data Warehousing
  1. Underestimation of resources for data loading
  2. Hidden problems with source systems
  3. Required data not captured
  4. Increased end-user demands
  5. Data homogenization
  6. High demand for resources
  7. Data ownership
  8. High maintenance
  9. Long duration projects
  10. Complexity of integration

10
Data Warehouse Design
  • Data must be designed to allow ad-hoc queries to
    be answered with acceptable performance
    constraints
  • Queries usually require access to factual data
    generated by business transactions
  • e.g. find the average number of properties rented
    out with a monthly rent greater than 700 at each
    branch office over the last six months
  • Uses Dimensionality Modelling

11
Dimensionality Modelling
  • Similar to E-R modelling but with constraints
  • composed of one fact table with a composite
    primary key
  • dimension tables have a simple primary key which
    corresponds exactly to one foreign key in the
    fact table
  • uses surrogate keys based on integer values
  • Can efficiently and easily support ad-hoc
    end-user queries

Enroll Now https//goo.gl/QbTVal
12
Star Schemas
  • The most common dimensional model
  • A fact table surrounded by dimension tables
  • Fact tables
  • contains FK for each dimension table
  • large relative to dimension tables
  • read-only
  • Dimension tables
  • reference data
  • query performance speeded up by denormalising
    into a single dimension table

13
E-R Model Example
Enroll Now https//goo.gl/QbTVal
14
Star Schema Example
Enroll Now https//goo.gl/QbTVal
15
Other Schemas
  • Snowflake schemas
  • variant of star schema
  • each dimension can have its own dimensions
  • Starflake schemas
  • hybrid structure
  • contains mixture of (denormalised) star and
    (normalised) snowflake schemas

16
OLAP
  • Online Analytical Processing
  • dynamic synthesis, analysis and consolidation of
    large volumes of multi-dimensional data
  • normally implemented using specialized
    multi-dimensional DBMS
  • a method of visualising and manipulating data
    with many inter-relationships

17
Codds OLAP Rules
  • 1. Multi-dimensional conceptual view
  • 2. Transparency
  • 3. Accessibility
  • 4. Consistent reporting performance
  • 5. Client-server architecture
  • 6. Generic dimensionality
  • 7. Dynamic sparse matrix handling
  • 8. Multi-user support
  • 9. Unrestricted cross-dimensional operations
  • 10. Intuitive data manipulation

18
OLAP Tools
  • Categorised according to architecture of
    underlying database
  • Multi-dimensional OLAP
  • data typically aggregated and stored according to
    predicted usage
  • use array technology
  • Relational OLAP
  • use of relational meta-data layer with enhanced
    SQL
  • Managed Query Environment
  • deliver data direct from DBMS or MOLAP server to
    desktop in form of a datacube

19
MOLAP
RDB Server
MOLAP server
Request
Result
Load
Database/Application Logic Layer
Presentation Layer
Enroll Now https//goo.gl/QbTVal
20
ROLAP
ROLAP server
Request
SQL
RDB Server
Result
Result
Database Layer
Presentation Layer
Application Logic Layer
Enroll Now https//goo.gl/QbTVal
21
MQE
End-user tools
RDB Server
SQL
Result
MOLAP server
Request
Load
Result
Enroll Now https//goo.gl/QbTVal
22
Data Mining
  • The process of extracting valid, previously
    unknown, comprehensible and actionable
    information from large databases and using it to
    make crucial business decisions
  • focus is to reveal information which is hidden
    or unexpected
  • patterns and relationships are identified by
    examining the underlying rules and features of
    the data
  • work from data up
  • require large volumes of data

23
Example Data Mining Applications
  • Retail/Marketing
  • Identifying buying patterns of customers
  • Finding associations among customer demographic
    characteristics
  • Predicting response to mailing campaigns
  • Market basket analysis

Enroll Now https//goo.gl/QbTVal
24
Example Data Mining Applications
  • Banking
  • Detecting patterns of fraudulent credit card use
  • Identifying loyal customers
  • Predicting customers likely to change their
    credit card affiliation
  • Determining credit card spending by customer
    groups

Enroll Now https//goo.gl/QbTVal
25
Data Mining Techniques
  • Four main techniques
  • Predictive Modeling
  • Database Segmentation
  • Link Analysis
  • Deviation Direction

Enroll Now https//goo.gl/QbTVal
26
Data Mining Techniques
  • Predictive Modelling
  • using observations to form a model of the
    important characteristics of some phenomenon
  • Techniques
  • Classification
  • Value Prediction

Enroll Now https//goo.gl/QbTVal
27
Classification Example- Tree Induction
Customer renting property gt 2 years
No
Yes
Customer age gt 25 years?
Rent property
No
Yes
Buy property
Rent property
Enroll Now https//goo.gl/QbTVal
28
Data Mining Techniques
  • Database Segmentation
  • to partition a database into an unknown number of
    segments (or clusters) of records which share a
    number of properties
  • Techniques
  • Demographic clustering
  • Neural clustering

Enroll Now https//goo.gl/QbTVal
29
Segmentation Scatterplot Example
Enroll Now https//goo.gl/QbTVal
30
Data Mining Techniques
  • Link Analysis
  • establish associations between individual records
    (or sets of records) in a database
  • e.g. when a customer rents property for more
    than two years and is more than 25 years old,
    then in 40 of cases, the customer will buy the
    property
  • Techniques
  • Association discovery
  • Sequential pattern discovery
  • Similar time sequence discovery

31
Data Mining Techniques
  • Deviation Detection
  • identify outliers, something which deviates
    from some known expectation or norm
  • Statistics
  • Visualisation

Enroll Now https//goo.gl/QbTVal
32
Deviation Detection Visualisation Example
33
Mining and Warehousing
  • Data mining needs single, separate, clean,
    integrated, self-consistent data source
  • Data warehouse well equipped
  • populated with clean, consistent data
  • contains multiple sources
  • utilises query capabilities
  • capability to go back to data source

34
Further Reading
  • Connolly and Begg, chapters 31 to 34.
  • W H Inmon, Building the Data Warehouse, New York,
    Wiley and Sons, 1993.
  • Benyon-Davies P, Database Systems (2nd ed),
    Macmillan Press, 2000, ch 34, 35 36.

Enroll Now https//goo.gl/QbTVal
35
(No Transcript)
36
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com