Data Warehousing - PowerPoint PPT Presentation

Loading...

PPT – Data Warehousing PowerPoint presentation | free to download - id: 41f70b-YzMwY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Data Warehousing

Description:

Data Warehousing Tyler Helmle Introduction Definition History Processes and Terms Management Techniques Definition Collection of Data Subject Oriented Integrated Time ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 55
Provided by: helm4
Learn more at: http://www.uwplatt.edu
Category:
Tags: data | warehousing

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data Warehousing


1
Data Warehousing
  • Tyler Helmle

2
Introduction
  • Definition
  • History
  • Processes and Terms
  • Management Techniques

3
Definition
  • Collection of Data
  • Subject Oriented
  • Integrated
  • Time-Variant
  • Non-Volatile
  • To Support Managements Decision Making Process

4
Subject Oriented
  • Particular Subject
  • Instead of Companies Ongoing Operations

5
Integrated
  • Gathered from variety of Sources
  • Merged into a Whole

6
Time-Variant
  • Specific Time Period
  • Monthly
  • Weekly
  • Daily
  • Etc.

7
Non-Volatile
  • Stable
  • Never Remove Data
  • Consistent Picture of Business

8
Definition
  • Bill Inmon
  • 1990
  • Single Subject
  • Data Mart
  • Possible to be Volatile
  • Size Contraints
  • Rolling Lengths

9
Definition
  • Ralph Kimball
  • a copy of transaction data specifically
    structured for query and analysis.
  • Less insight
  • Less depth
  • Not Less Accurate

10
History
  • Concept Originated (Late 1980s)
  • IBM Researchers
  • Barry Delvin and Paul Murphy
  • Business Data Warehouse
  • Operational Systems to Decision Support

11
History
  • Trends
  • Corporations had multiple Decision Support
    Systems
  • Expensive
  • Redundant
  • Get to one efficient warehouse
  • Data Marts

12
History
  • Bill Inmon
  • Father of the Data Warehouse
  • Building of the Data Warehouse
  • 1991
  • Taught

13
Data Warehouse
14
Design
  • Designing and Rolling Out Warehouse
  • Surajit Chaudhuri and Umeshwar Dayal

15
Design
  • Define the architecture, do capacity planning,
    and select the storage servers, database and OLAP
    servers, and tools
  • Integrate the servers, storage, and client tools
  • Design the warehouse schema and views
  • Define the physical warehouse organization, data
    placement, partitioning, and access methods

16
Design (Cont.)
  • Connect the sources using gateways, ODBC drivers,
    or other wrappers
  • Design and implement scripts for data extraction,
    cleaning, transformation, load, and refresh
  • Populate the repository with the schema and view
    definitions, scripts, and other metadata

17
Design (Cont.)
  • Design and implement end-user applications
  • Roll out the warehouse and applications

18
Design (Cont.)
  • Database Designs
  • Multidimensional Approach
  • Normalized Approach

19
Data Warehouse
20
Raw Data
  • Business will Decide
  • Department?
  • Market Data?
  • Customer Data?
  • Data Granularity

21
Data Warehouse
22
Data Extracting
  • Extract from Source Systems
  • Database
  • Flat Files
  • Converts into format for transformation

23
Data Transforming
  • Data Validation
  • Difficult
  • Common Problems Occur

24
Data Transforming
  • Selection of Columns
  • Translation of Data
  • Make all data consistent
  • Derive New Calculated Value
  • Filtering
  • Sorting
  • Joining of Data

25
Data Transforming
  • Transposing
  • Splitting Columns

26
Data Transforming Tools
  • Data Migration
  • Transformation of Data
  • Data Scrubbing
  • Domain Specific Knowledge
  • Fixing or Eliminating Data
  • Data Auditing
  • Patterns
  • Inconsistencies

27
Data Load
  • Batch Processing
  • Visibility and Management
  • Administrator
  • Start, Cancel, Suspend, Resume Load
  • Large
  • Quick and Efficient

28
Data Load
  • Parallel Processing
  • Splitting Data Files to provide Parallel Access
  • Pipeline
  • Simultaneous running of several components
  • Component
  • Simultaneous running of multiple processes
  • All three types usually operate at same time

29
Data Refresh
  • When?
  • How?

30
Data Warehouse
31
Data Mart
  • Subset of Data
  • Major Data Subject
  • Increase Performance
  • Separate Security
  • Prove ROI before adding to Data Warehouse

32
Data Mart
  • Ease of Creation
  • Lower Cost than Data Warehouse

33
Data Mart
  • Cons
  • Limited Scalability
  • Duplication of Data
  • Data Inconsistency
  • Low ability to leverage enterprise sources of data

34
Data Warehouse
35
Metadata
  • Data about Data
  • Clarifies Data
  • Explains Data
  • Helps Navigate Through Data
  • Facilitates (Both Human and Computers)
  • Understanding
  • Usage
  • Management

36
Metadata
  • Different from Data
  • Sometimes can be both
  • Point of view
  • Types
  • Descriptive
  • Administrative
  • Structural
  • Technical
  • Use

37
Metadata
  • Content
  • Describe Resource
  • Describe Content
  • Mutability
  • Immutable
  • Mutable
  • Logical Function
  • Sub-symbolic
  • Symbolic
  • Logical

38
Metadata
  • Back End
  • Extract, Transform, Load
  • Front End
  • Label Screens
  • Create Reports

39
Data Warehouse
40
End Product
  • Analysis
  • Querying
  • Reporting
  • Data Mining

41
Data Mining
  • Requires Large Amount of Data
  • Determine Patterns in Data
  • Determine Relationships in Data
  • Analyze Results and Conclude

42
Standard Reporting
  • Weekly/Monthly/Yearly
  • Same Constant Data
  • Recent Information
  • Arrives at Expected Time on Expected Medium

43
Configurable Reports
  • Select Parameters to Show on Report
  • Ex.
  • Time Period
  • Region
  • Support of Data Remains the Same

44
Ad Hoc Reporting
  • End Users Pick and Choose Data
  • Requires
  • Training
  • Knowledge of Data
  • Do Not Want Users Miss Using the Data

45
Dashboards / Scorecards
  • Like Standard Reports
  • Summarized
  • Easy to Read
  • Graphical

46
Data Warehouse
47
Data Warehouse Management
  • Data Warehouse Administrator
  • Visibility
  • Knowledge of what is happening
  • Loading, Extracting, Etc.

48
Data Warehouse Management
  • Responsibilities
  • What is data being used for?
  • Who is using the data?
  • What is the response time?
  • What kinds of activities are being submitted?

49
Data Warehouse Management
  • Data Management (Data Monitor)
  • Counts of Data
  • Analyzes Profiles of Data
  • Determines Threshold of Data
  • Makes DWAs job easier

50
Data Warehouse Management
  • Security
  • Confidential?
  • Access Restriction
  • Encryption
  • Decryption

51
Pros
  • One Common Data Source
  • Ease of Reporting and Analysis
  • Inconsistencies Identified Right Away
  • Retrieve Data Without Slowing Down Operational
    Systems
  • Show Actual Performance

52
Cons
  • Data Is Not Always Most Current
  • High Costs
  • Installation
  • Maintenance
  • Support
  • Hard To Stay Ahead

53
Summary
  • Data Warehousing can prove to be a success
  • Common Knowledge of a database technology
  • As technology and research continues, the
    possibilities are endless

54
Questions?
About PowerShow.com