Data Warehouses - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Data Warehouses

Description:

... slow or unpredictable, providing inadequate support to online analytical users. ... For example finance organizations keeps access rights restricted. ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 53
Provided by: liorr
Category:
Tags: data | warehouses

less

Transcript and Presenter's Notes

Title: Data Warehouses


1
?????? ?????? ?????
????? ??????Data Warehousing
???? ????? ????
2
??????
  • ?????? ???
  • ????? ?????? ????? ??????
  •  
  • ???? ?????
  • ????? ???? ??? ??????, ????? ?????? ?? ?????
    ?????? ?????? ??? ????? ????????? ????? ?????
    ??????? ??????? ???? ??????. ???? ??? ?? ????? ??
    ?????, ????? ?????? ?? ??? ?????? RDBMS, ?????
    ?????? ????? ETL ?????? ?????? ?????? ????? ????
    ?????? ?"? ?????? ??????? ??? OLAP.
  •  
  • ????? ????? ?????? ????????? ?? ???? ?????????
    ??????? ????? ?? ??? ????? ????? ?????? ???????
    ??? ????? ????? ??????? ???? ?????? Analysis
    Services ?-Data Transformation Services ?? SQL
    SERVER.

3
???? ?????
4
Impact of Technological Revolution
CPU, Disk, Memory power
Desktop Power and ease
Server Power and ease
Hardware prices
Software prices
(based upon Gupta, An Introduction to Data
Warehousing, 1997)
5
?????? ????? ?????? ?????? ?????? ?? ??????
  • ????? ???? ???? ??????-???? ?????? ????????
    ?????? ????? (??? ???? ?????, ????? ???????)
  • ?????? ??? ?????? ???? ???? ????.
  • ?? ???? ????? ?? ??????? ?????

6
We are drowning in information, but starving
for knowledge
Now that we have gathered so much data, what do
we do with it?
U. Fayyad
J. Naisbett
7
The Big Picture
Data Out
Data In
8
Operational Business Applications
  • Support daily process in the organization.
  • Used by operational stuff.
  • Transaction based - OLTP On Line Transaction
    Processing

Examples ERP Enterprise Resource Planning CRM
Customer Relationship Management Billing Vertica
l Applications Banking, Insurance, Communication
etc.
9
Operational database Properties
  • Support large numbers of concurrent users who are
    actively adding and modifying data.
  • Represent the constantly changing state of an
    organization but don't save its history.
  • Contain large amounts of data, including
    extensive data used to verify transactions.
  • Have complex structures.
  • Are tuned to be responsive to transaction
    activity.
  • Provide the technology infrastructure to support
    the day-to-day operations of an organization.

10
Decision Support Applications/Systems (DSA/DSS)
  • Support the decisions in the organization
    Information technology to help the make faster
    better decisions
  • Used by knowledge worker (executive, manager,
    analyst, etc.) to analyze large volumes of data?
  • Report Oriented
  • What were the sales volumes by region and
    product category for the last year?
  • Which orders should we fill to maximize
    revenues?
  • Dynamic Not a routine

11
Why DSS cant use Operational databases?
  • Different goals -gt different data
  • missing data Decision support requires
    historical data which operational DBs do not
    typically maintain
  • data consolidation DS requires consolidation of
    data from heterogeneous sources.
  • data quality different sources typically use
    inconsistent data representations, codes and
    formats which have to be reconciled.
  • Data is not understood and users disagree on data
    definitions
  • Reports are inconsistent and Users don't trust
    the reports.
  • Operational Systems performance Analytical
    queries that summarize large volumes of data
    adversely affect the ability of the system to
    respond to online transactions.

12
Why DSS cant use Operational databases?
(continued)
  • DSS performance as DB tuned for OLTP, responding
    to complex analysis queries can be slow or
    unpredictable, providing inadequate support to
    online analytical users.
  • Analysts do not have the technical expertise
    required to create ad hoc queries against the
    complex data structure. As we said analysis is
    dynamic, meaning IT has to work hard.
  • Security becomes more complicated when online
    analysis is combined with online transaction
    processing.
  • For example finance organizations keeps access
    rights restricted. To let analyst explorer the
    entire database may jeopardize security.

13
Data Warehouse - Definition
  • A subject-oriented, integrated, time-variant, and
    read-only database in support of managements
    decision making

14
Data Warehouse is a Database with unique
properties
15
Data WarehouseSubject-Oriented
  • Focusing on the modeling and analysis of data for
    decision makers, not on daily operations or
    transaction processing. Simply put, this means
    that the data warehouse is focused on a business
    concept (for example, sales) rather than a
    business process (for example, issuing invoices).
  • Provide a simple and concise view around
    particular subject issues by excluding data that
    are not useful in the decision support process.

16
Data WarehouseIntegrated
  • Constructed by integrating multiple,
    heterogeneous data sources
  • relational databases, flat files, external data,
    etc
  • Data cleaning and data integration techniques are
    applied.
  • Ensure consistency in naming conventions,
    encoding structures, attribute measures, etc.
    among different data sources
  • E.g. currency
  • When data is moved to the warehouse, it is
    converted.

17
Data WarehouseTime Variant
  • The time horizon for the data warehouse is
    significantly longer than that of operational
    systems.
  • Operational database current value data.
  • Data warehouse data provide information from a
    historical perspective (e.g., past 5-10 years)
  • Every key structure in the data warehouse
  • Contains an element of time, explicitly or
    implicitly
  • But the key of operational data may or may not
    contain time element.

18
Data WarehouseRead Only
  • Operational update of data does not occur in the
    data warehouse environment.
  • Does not require transaction processing,
    recovery, and concurrency control mechanisms
  • Requires only two operations in data accessing
  • initial loading of data and access of data.

19
Data Warehouse Other Definitions
  • A read-only analytical database that is used as
    the foundation of a decision support process -
    Poe and Reeves, 1995
  • Managed data situated after and outside the
    operational systems - Gupta, 1997

20
Data Warehouse Goals
  • Foundation for DS Information Access that is
    Understandable, Navigable, Fast.
  • Combining data from heterogeneous data sources
    into a single homogenous structure.
  • Organize data in simplified structures for
    efficiency of analytical queries rather than for
    transaction processing.
  • Contain transformed data that is valid,
    consistent, and formatted for analysis.
  • Provide stable data that represents business
    history.
  • Security Requirements.
  • Operational Database efficiency.

21
OLTP vs. DW
22
(No Transcript)
23
Data Warehouse Concept
OLAP
ERP
ETL
Security
Historic Data
CRM
Data Mining
Operational Databases
Data Warehouse
External
24
????? ????? ?? ???? ??????
  • This is an ongoing process, not a one-time
    solution.

25
Design
  • Determining which subject areas should be
    included and developing a set of agreed-upon
    definitions.
  • This requires interviews with end users, business
    analysts, and executives to understand and
    document the scope of the information
    requirements.
  • Only after a thorough understanding of the
    business issues can we translate requirements
    into a data warehouse logical design.

26
Populate
  • Following the design, systems are put in place to
    populate the data warehouse from operational
    systems on an ongoing basis.
  • Because the operational systems and the data
    warehouse have different representations of the
    data, populating the data warehouse requires
    transformations of the data summarization,
    translation, decoding, elimination of invalid
    data, and so on.
  • These processes need to be automated so that they
    can be performed on an ongoing basis extracting,
    transforming, and moving the source data as often
    as needed to meet the business requirements of
    the data warehouse.

27
Analyze
  • Finally, information is made available to the
    business analysts and executives for browsing,
    analysis, and reporting.
  • Many tools can be used to assist in the analysis
    stage, from simple report writers to advanced
    data mining tools.
  • Ultimately, however, the analysis effort drives
    the final iterations of the data warehousing
    process revisions in the design of the data
    warehouse in order to accommodate new
    information, improve system performance, or allow
    new types of analysis. With these changes the
    process begins again, and continues through the
    life of the data warehouse.

28
DW 1980s
29
DW 1980s
30
DW Early 1990s
31
DW Late 1990s
32
DW Late 1990s
33
ERP-Based Data WarehouseAn Example
  • Order processing
  • 2 second response time
  • Last 6 months orders

Daily closed orders
  • Data Warehouse
  • Last 5 years data
  • Response time 2 seconds to 60 minutes
  • Data is not modified

  • Product Inventory
  • 10 second response time
  • Last 20 inventory transactions

Weekly product inventory
Weekly programs
  • Marketing
  • 30 second response time
  • Last 2 years programs

(based upon Gupta, An Introduction to Data
Warehousing, 1997)
34
DW Projects - Examples
  • Banking industry. (Credit analysis)
  • Communications. (Call analysis)
  • Retail Industry. (Forecasting)
  • Healthcare industry. (Logistics)

35
Pitfalls (Or Why not?)
  • Design implementation of ETL will take more
    time than planned.
  • Need of data that is not captured
  • Maintenance.
  • inappropriate Implementation.
  • Synchronization of Data.
  • Dependency on other systems. (See next slide)
  • Immature.
  • Cost.
  • Vague Goals - Most data warehouse missions are
    stated in vague data management terms.
  • Project scope will increase
  • Knowledge workers will reject the system.
  • Trying to solve the wrong problem.

36
The dangers in dependency on other systems
  • The people responsible for setting up and
    maintaining the application have moved on.
  • The application was outsourced, and the vendor no
    longer exists, or has merged with another company
    with new priorities.
  • An emergency update was made under extreme time
    pressures, and proper documentation procedures
    were not followed
  • The documentation regarding changes was misfiled
    or lost
  • Proper data entry standards, policies and
    accountability were not in place, permitting
    users to creatively use fields for unintended
    purposes.

37
Top Five Challenges
  • 58 - Obtaining skilled experienced staff
  • 56 - Managing data quality and consistency
  • 42 - Managing project scope
  • 41 - Managing business meta data
  • 37 - Managing end-user expectations
  • Filter very challenging
  • TDWI Conference Surveys, based on 344 responses,

38
Is The Organization Ready?
  • Strong business sponsorship
  • Compelling business motivation
  • Adequate infrastructure and resources
  • Alignment between IT and business
  • Strong analytic culture

39
Other DW Stores
  • Data Mart
  • Staging Area

40
Data Mart
  • A data mart like data warehouse is
    subject-oriented, integrated, time-variant, and
    read-only.
  • However it is point solution to specific business
    need, i.e. Data marts are often used to provide
    information to functional segments of the
    organization.
  • From many aspects data mart is also considered as
    a miniature data warehouse

41
Data Mart
  • Typical examples are data marts for the sales
    department, the inventory and shipping
    department, the finance department, upper level
    management, and so on.
  • Data marts can also be used to segment data
    warehouse data to reflect a geographically
    compartmentalized business in which each region
    is relatively autonomous. For example, a large
    service organization may treat regional operating
    centers as individual business units, each with
    its own data mart that contributes to the master
    data warehouse.

42
Data Mart
  • Another way to differentiate a data warehouse
    from a data mart is to look at the data's
    consumers and format.
  • DW is not necessarily optimized to direct access.
    Canned reporting utilities consume warehouse
    data, whose storage is usually cryptic.
  • Data Mart is optimized to direct access The user
    community consumes data mart data, whose storage
    is usually in a more readable format. For
    example, to reduce the need for complex queries
    and assist business users who might be
    uncomfortable with the SQL language.

43
Types of Architectures
  • Top Down
  • Data warehouse centric
  • Bottom Up
  • Data mart centric
  • Hybrid
  • Start with bottom up, migrate to top down
  • Federated
  • Do the best you can

44
Top-down or Bottom-up?
  • In the top-down approach, a data warehouse is
    built in an iterative manner, business area by
    business area, and underlying dependent data
    marts are created as required from the EDW
    contents.
  • In the bottom-up approach, independent data marts
    are created with the view to integrating them
    into an enterprise data warehouse at some time in
    the future (Federated Data Warehouse)
  • There is much debate in the industry about the
    pros and cons of the two approaches.
  • Top-Down offers several powerful positive
    attributes and if the site has the high-level
    sustainable pain and political will required to
    be successful, then you should closely examine
    this option.
  • Nowadays there is a steady trend toward the use
    of independent data marts This is less than
    elegant, less than perfect, but a political and
    practical reality.

45
Top down vs. Bottom Up
46
Federated Data Warehouse
A federated DW architecture is an overall system
architecture that accommodates multiple DW/data
mart (DM) systems. As the Internet is a network
of networks, a federated DW architecture is an
architecture of architectures. It provides a
framework for the integration, to the greatest
extent possible, of disparate DW, DM and
analytical application systems The idea A
federated DW architecture shares as much core
information among the various systems as
possible.
47
Cost Justification and ROI
  • Do the benefits of a data warehouse justify its
    cost?
  • The white paper by W.H. Inmon contains an
    analysis of the cost-effectiveness of data
    warehousing.
  • Read the white paper

48
Data Warehouse Related Tools
  • Data Retrieval and analysis
  • Query and reports tools.
  • Data mining.
  • OLAP Clients.
  • Infrastructure
  • ETL tools.
  • Databases.
  • Design tools.
  • Administration.
  • OLAP Engines.

49
???? ?????
  • 40 ??????
  • 60 ???? ???? (??? ???? ????)

50
?????? - ???? ????
???? ??????? ?????, ????? ?????? ?? ???? ??????
??? (Data Mart)
  • ???? ????????
  • ????? ??????? ??????? ???? ????? ????? (????? !)
  • ?????? ?????? ???????? ?? ??????? ?????? ?????
  • ?????? ?????? ?? ????? ?????
  • ???? ??????? (?????? ???????)
  • ????? ?-7 ?????? ?????? ?????? ???????.
  • ??? 100 ?-1000 ?????? ??? ???? ?????? ??????
    ???????.
  • ????? ????? ??? ?? ????? ?????? ?-20 ??????.

51
?????? - ?????
  • ???? ?????? (????? ?????? ????? ???????)
  • ???? ?????? ????? ???????????.
  • ???? ???? ??????? ?????????.
  • ????? ?????? ?? ???? ???????.
  • ????? ????? ??????? ?? ???? ???????.
  • ????? ????? ????? ?? ???? ??????? ????? ?????.
  • ????? ????? ????? ?????? ???? ?????? (?-ACCESS
    ?? SQL-SERVER)
  • ????? ?-OLAP ??????? DAS.
  • ????? ????? ?- ETL.
  • ????? ?- ETL ??????? DTS.
  • ????? ?-OLAP ???? ????? (Data Analyzer)

52
?????? ???????
  • ????? ?? ???????.
  • ???? ???? 20 ???? ???? ???????? ????????. ?-5
    ???? ?????? ??????.
  • ???? ??"? ???? ????? ?????? ?? ?????.
  • ???"? ????? ????? ?? ????? WORD.
  • ???? ?????? ?????? ??? ?????? ?????? ????????.
  • ???? ??????? ??????.
  • ???? ??????? ????? ?? ???? ???????.
  • ???? ????? ?? ???? ?-OLAP.
  • ???? ?-DTS.
  • ????? ???? ?? ?????? ?????? ?? ??????? ????
    ??????? ?????.
Write a Comment
User Comments (0)
About PowerShow.com