Data Warehouse CSC5301 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Data Warehouse CSC5301

Description:

Hachim Haddouti, born in 1969, married, one baby 9 weeks ... Seasonal adaptions, e.g. when to produce how many skis, bikinis, convertibles, ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 23
Provided by: university73
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouse CSC5301


1
Data WarehouseCSC5301
  • Hachim Haddouti

2
About Me
  • Hachim Haddouti, born in 1969, married, one baby
    9 weeks
  • Ph.D. in Computer Science (Database Management
    Systems) at Technical University of Munich under
    Supervision of Prof. Bayer (Inventor of B-Tree)
  • Master in Computer Science (Knowledge Management
    Systems) at Techical University of Berlin
  • Project Manager at BMW Munich Germany
  • Senior Consultant and Project Manager at
    DaimlerChrysler Services (now called T-Systems,
    Deutsche Telekom)
  • Research Scientist with Prof. R. Bayer in
    Technical University of Munich
  • UNESCO Consultant
  • Visiting Scientist at Tsukuba University, Japan,
    University of Sta. Barbara University,
    California University of Catania, Italy Beijing
    Univ China
  • Area of Interest DBMS, Digital Libraries,
    Document Content Knowledge Management, XML
    databases and Web technologies, Multilinguality
    etc.
  • More at www.haddouti.de

3
Do You Remember?
DSS
Data cube
MD
OLTP
RollUp
drill down
Slice/dice
MD
ROLAP
Star schema
MOLAP
Data mining
Data extraction
Fact table
4
Why DW?
  • Mining of mobile phone calls
  • (Caller, Callee, Time, Duration, Geogr.
    Location) 100 B/tuple
  • In Germany
  • 107 users 10 calls/(dayuser) 100 B/call
  • 1010 B/day 31012 B/year 3 TB/year
  • Scanning data at 107 B/s takes
  • 31012/107 3105 s 3 days

5
Data Warehouses
  • Subject-oriented, integrated, time-variant,
    nonvolatile collection of data in support of
    managements decision-making process Inmon (AP
    analytical processing is missing)
  • Used for analysis of existing data
  • Resolves performance issues suffered by
    operational RDBMSs and OLTPs

6
Data Warehouse Architecture Figure 9.7
7
  • Model
  • need abstract model with above operations
  • suitable datastructures
  • very large databases
  • Relational Model?
  • one-dimensional access via primary key
  • nm relationships are 2-dimensional (FK1,
    FK2)

8
The Multidimensional Data Model
  • Requirements must support typical analyses,
    queries like
  • Sales of a product group digital cameras in
    Nov, Dec Jan Feb in Munich area
  • sorted by sales of each product in
  • sorted by sales in numbers
  • sorted by shops

9
Data model
  • MD Model
  • better performance
  • Better data organisation
  • Better visualization
  • Business queries (why, what if)
  • ER Model
  • a disaster for querying a huge amount of data
    (time)
  • not understandable for users and they can not be
    navigated usefully by DBMS software.
  • hard to visualize many possible connections
    between tables,
  • To avoid redundancy

10
Typical DWH Analyses/Queries
  • What are the consequences of new orders for
    production capacity w.r. to investment,
    personnel, maintenance, extra hours, ...
  • Seasonal adaptions, e.g. when to produce how many
    skis, bikinis, convertibles, ...
  • Influence of external financing on profits

11
  • Operations
  • aggregation
  • slice
  • dice (cube)
  • rollup to coarser level
  • drill down to more detailed level
  • grouping
  • sorting

12
Data Cube Representation
13
Slicing on Time Dimension
14
Dicing on Part Dimension
15
Steps to build a DWH
  • Acquisition of data
  • Data cleansing
  • Storage
  • Processing AP
  • Maintenance, ...
  • Not possible with classical DB-technology alone

16
On-Line Analytical Processing
  • OLTP (online transaction processing) for
    operational data of enterprise, e.g. in
    relational DBMS, IMS, SAP/R3, ...
  • DSS Decision Support System to store
    data/information for strategic management
    decisions aggregations, summaries, etc.
  • Optimized to work with data warehouses
  • Used to answer questions
  • Allows users to perceive data as a
    multidimensional data cube
  • Data mining

17
OLTP versus OLAP
  • Thematic focus
  • OLTP many small transactions (microscopic view
    of business processes, individual steps at lowest
    level, single order, delivery)
  • OLAP finances in general, personnel in general,
    ...
  • OLAP requires integration and unification of many
    detailed data into big picture
  • Time orientation
  • Durability data extracted once, no updates

18
Technical Comparison OLTP vs OLAP
  • OLTP high rate of updates, several thousand t/s
  • OLAP read only transactions, very complex, DWH
    is loaded at certain time intervals, e.g. after
    the end of the month, quarter
  • Compute intensive
  • Special systems with new access methods, e.g.
    multidimensional data organization and access
    methods
  • Special OLAP systems necessary to offload OLTP
    systems

19
ROLAP and MOLAP
  • Solution 1 ROLAP relational online analytical
    processing, built on top of relational DBS,
    additional middleware or client front end (star
    schema)
  • Solution 2 MOLAP multidimensional online
    analytical processing
  • new model
  • new data organizations
  • new algorithms
  • new query languages
  • new optimization techniques

20
Data Warehouse Structure
21
Rules for OLAP Systems
  • Multidimensional conceptual view
  • Transparency
  • Accessibility
  • Consistent reporting performance
  • Client/server architecture
  • Generic dimensionality

22
Rules for OLAP Systems
  • Dynamic sparse matrix handling
  • Multiuser support
  • Unrestricted, cross-dimensional operations
  • Intuitive data manipulation
  • Flexible reporting
  • Unlimited dimensions and aggregation levels
Write a Comment
User Comments (0)
About PowerShow.com