Title: Data Warehouses
1?????? ?????? ?????
????? ??????Data Warehousing
???? ????? ????
2??????
- ?????? ???
- ????? ?????? ????? ??????
-
- ???? ?????
- ????? ???? ??? ??????, ????? ?????? ?? ?????
?????? ?????? ??? ????? ????????? ????? ?????
??????? ??????? ???? ??????. ???? ??? ?? ????? ??
?????, ????? ?????? ?? ??? ?????? RDBMS, ?????
?????? ????? ETL ?????? ?????? ?????? ????? ????
?????? ?"? ?????? ??????? ??? OLAP. -
- ????? ????? ?????? ????????? ?? ???? ?????????
??????? ????? ?? ??? ????? ????? ?????? ???????
??? ????? ????? ??????? ???? ?????? Analysis
Services ?-Data Transformation Services ?? SQL
SERVER.
3???? ?????
4Impact of Technological Revolution
CPU, Disk, Memory power
Desktop Power and ease
Server Power and ease
Hardware prices
Software prices
(based upon Gupta, An Introduction to Data
Warehousing, 1997)
5?????? ????? ?????? ?????? ?????? ?? ??????
- ????? ???? ???? ??????-???? ?????? ????????
?????? ????? (??? ???? ?????, ????? ???????) - ?????? ??? ?????? ???? ???? ????.
- ?? ???? ????? ?? ??????? ?????
6We are drowning in information, but starving
for knowledge
Now that we have gathered so much data, what do
we do with it?
U. Fayyad
J. Naisbett
7The Big Picture
Data Out
Data In
8Operational Business Applications
- Support daily process in the organization.
- Used by operational stuff.
- Transaction based - OLTP On Line Transaction
Processing
Examples ERP Enterprise Resource Planning CRM
Customer Relationship Management Billing Vertica
l Applications Banking, Insurance, Communication
etc.
9Operational database Properties
- Support large numbers of concurrent users who are
actively adding and modifying data. - Represent the constantly changing state of an
organization but don't save its history. - Contain large amounts of data, including
extensive data used to verify transactions. - Have complex structures.
- Are tuned to be responsive to transaction
activity. - Provide the technology infrastructure to support
the day-to-day operations of an organization.
10Decision Support Applications/Systems (DSA/DSS)
- Support the decisions in the organization
Information technology to help the make faster
better decisions - Used by knowledge worker (executive, manager,
analyst, etc.) to analyze large volumes of data? - Report Oriented
- What were the sales volumes by region and
product category for the last year? - Which orders should we fill to maximize
revenues? - Dynamic Not a routine
11Why DSS cant use Operational databases?
- Different goals -gt different data
- missing data Decision support requires
historical data which operational DBs do not
typically maintain - data consolidation DS requires consolidation of
data from heterogeneous sources. - data quality different sources typically use
inconsistent data representations, codes and
formats which have to be reconciled. - Data is not understood and users disagree on data
definitions - Reports are inconsistent and Users don't trust
the reports. - Operational Systems performance Analytical
queries that summarize large volumes of data
adversely affect the ability of the system to
respond to online transactions.
12Why DSS cant use Operational databases?
(continued)
- DSS performance as DB tuned for OLTP, responding
to complex analysis queries can be slow or
unpredictable, providing inadequate support to
online analytical users. - Analysts do not have the technical expertise
required to create ad hoc queries against the
complex data structure. As we said analysis is
dynamic, meaning IT has to work hard. - Security becomes more complicated when online
analysis is combined with online transaction
processing. - For example finance organizations keeps access
rights restricted. To let analyst explorer the
entire database may jeopardize security.
13Data Warehouse - Definition
- A subject-oriented, integrated, time-variant, and
read-only database in support of managements
decision making
14Data Warehouse is a Database with unique
properties
15Data WarehouseSubject-Oriented
- Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing. Simply put, this means
that the data warehouse is focused on a business
concept (for example, sales) rather than a
business process (for example, issuing invoices). - Provide a simple and concise view around
particular subject issues by excluding data that
are not useful in the decision support process.
16Data WarehouseIntegrated
- Constructed by integrating multiple,
heterogeneous data sources - relational databases, flat files, external data,
etc - Data cleaning and data integration techniques are
applied. - Ensure consistency in naming conventions,
encoding structures, attribute measures, etc.
among different data sources - E.g. currency
- When data is moved to the warehouse, it is
converted.
17Data WarehouseTime Variant
- The time horizon for the data warehouse is
significantly longer than that of operational
systems. - Operational database current value data.
- Data warehouse data provide information from a
historical perspective (e.g., past 5-10 years) - Every key structure in the data warehouse
- Contains an element of time, explicitly or
implicitly - But the key of operational data may or may not
contain time element.
18Data WarehouseRead Only
- Operational update of data does not occur in the
data warehouse environment. - Does not require transaction processing,
recovery, and concurrency control mechanisms - Requires only two operations in data accessing
- initial loading of data and access of data.
19Data Warehouse Other Definitions
- A read-only analytical database that is used as
the foundation of a decision support process -
Poe and Reeves, 1995 - Managed data situated after and outside the
operational systems - Gupta, 1997
20Data Warehouse Goals
- Foundation for DS Information Access that is
Understandable, Navigable, Fast. - Combining data from heterogeneous data sources
into a single homogenous structure. - Organize data in simplified structures for
efficiency of analytical queries rather than for
transaction processing. - Contain transformed data that is valid,
consistent, and formatted for analysis. - Provide stable data that represents business
history. - Security Requirements.
- Operational Database efficiency.
21OLTP vs. DW
22(No Transcript)
23Data Warehouse Concept
OLAP
ERP
ETL
Security
Historic Data
CRM
Data Mining
Operational Databases
Data Warehouse
External
24????? ????? ?? ???? ??????
- This is an ongoing process, not a one-time
solution.
25Design
- Determining which subject areas should be
included and developing a set of agreed-upon
definitions. - This requires interviews with end users, business
analysts, and executives to understand and
document the scope of the information
requirements. - Only after a thorough understanding of the
business issues can we translate requirements
into a data warehouse logical design.
26Populate
- Following the design, systems are put in place to
populate the data warehouse from operational
systems on an ongoing basis. - Because the operational systems and the data
warehouse have different representations of the
data, populating the data warehouse requires
transformations of the data summarization,
translation, decoding, elimination of invalid
data, and so on. - These processes need to be automated so that they
can be performed on an ongoing basis extracting,
transforming, and moving the source data as often
as needed to meet the business requirements of
the data warehouse.
27Analyze
- Finally, information is made available to the
business analysts and executives for browsing,
analysis, and reporting. - Many tools can be used to assist in the analysis
stage, from simple report writers to advanced
data mining tools. - Ultimately, however, the analysis effort drives
the final iterations of the data warehousing
process revisions in the design of the data
warehouse in order to accommodate new
information, improve system performance, or allow
new types of analysis. With these changes the
process begins again, and continues through the
life of the data warehouse.
28DW 1980s
29DW 1980s
30DW Early 1990s
31DW Late 1990s
32DW Late 1990s
33ERP-Based Data WarehouseAn Example
- Order processing
- 2 second response time
- Last 6 months orders
Daily closed orders
- Data Warehouse
- Last 5 years data
- Response time 2 seconds to 60 minutes
- Data is not modified
- Product Inventory
- 10 second response time
- Last 20 inventory transactions
Weekly product inventory
Weekly programs
- Marketing
- 30 second response time
- Last 2 years programs
(based upon Gupta, An Introduction to Data
Warehousing, 1997)
34DW Projects - Examples
- Banking industry. (Credit analysis)
- Communications. (Call analysis)
- Retail Industry. (Forecasting)
- Healthcare industry. (Logistics)
35Pitfalls (Or Why not?)
- Design implementation of ETL will take more
time than planned. - Need of data that is not captured
- Maintenance.
- inappropriate Implementation.
- Synchronization of Data.
- Dependency on other systems. (See next slide)
- Immature.
- Cost.
- Vague Goals - Most data warehouse missions are
stated in vague data management terms. - Project scope will increase
- Knowledge workers will reject the system.
- Trying to solve the wrong problem.
36The dangers in dependency on other systems
- The people responsible for setting up and
maintaining the application have moved on. - The application was outsourced, and the vendor no
longer exists, or has merged with another company
with new priorities. - An emergency update was made under extreme time
pressures, and proper documentation procedures
were not followed - The documentation regarding changes was misfiled
or lost - Proper data entry standards, policies and
accountability were not in place, permitting
users to creatively use fields for unintended
purposes.
37Top Five Challenges
- 58 - Obtaining skilled experienced staff
- 56 - Managing data quality and consistency
- 42 - Managing project scope
- 41 - Managing business meta data
- 37 - Managing end-user expectations
- Filter very challenging
- TDWI Conference Surveys, based on 344 responses,
38Is The Organization Ready?
- Strong business sponsorship
- Compelling business motivation
- Adequate infrastructure and resources
- Alignment between IT and business
- Strong analytic culture
39Other DW Stores
40Data Mart
- A data mart like data warehouse is
subject-oriented, integrated, time-variant, and
read-only. - However it is point solution to specific business
need, i.e. Data marts are often used to provide
information to functional segments of the
organization. - From many aspects data mart is also considered as
a miniature data warehouse
41Data Mart
- Typical examples are data marts for the sales
department, the inventory and shipping
department, the finance department, upper level
management, and so on. - Data marts can also be used to segment data
warehouse data to reflect a geographically
compartmentalized business in which each region
is relatively autonomous. For example, a large
service organization may treat regional operating
centers as individual business units, each with
its own data mart that contributes to the master
data warehouse.
42Data Mart
- Another way to differentiate a data warehouse
from a data mart is to look at the data's
consumers and format. - DW is not necessarily optimized to direct access.
Canned reporting utilities consume warehouse
data, whose storage is usually cryptic. - Data Mart is optimized to direct access The user
community consumes data mart data, whose storage
is usually in a more readable format. For
example, to reduce the need for complex queries
and assist business users who might be
uncomfortable with the SQL language.
43Types of Architectures
- Top Down
- Data warehouse centric
- Bottom Up
- Data mart centric
- Hybrid
- Start with bottom up, migrate to top down
- Federated
- Do the best you can
44Top-down or Bottom-up?
- In the top-down approach, a data warehouse is
built in an iterative manner, business area by
business area, and underlying dependent data
marts are created as required from the EDW
contents. - In the bottom-up approach, independent data marts
are created with the view to integrating them
into an enterprise data warehouse at some time in
the future (Federated Data Warehouse) - There is much debate in the industry about the
pros and cons of the two approaches. - Top-Down offers several powerful positive
attributes and if the site has the high-level
sustainable pain and political will required to
be successful, then you should closely examine
this option. - Nowadays there is a steady trend toward the use
of independent data marts This is less than
elegant, less than perfect, but a political and
practical reality.
45Top down vs. Bottom Up
46Federated Data Warehouse
A federated DW architecture is an overall system
architecture that accommodates multiple DW/data
mart (DM) systems. As the Internet is a network
of networks, a federated DW architecture is an
architecture of architectures. It provides a
framework for the integration, to the greatest
extent possible, of disparate DW, DM and
analytical application systems The idea A
federated DW architecture shares as much core
information among the various systems as
possible.
47Cost Justification and ROI
- Do the benefits of a data warehouse justify its
cost? - The white paper by W.H. Inmon contains an
analysis of the cost-effectiveness of data
warehousing. - Read the white paper
48Data Warehouse Related Tools
- Data Retrieval and analysis
- Query and reports tools.
- Data mining.
- OLAP Clients.
- Infrastructure
- ETL tools.
- Databases.
- Design tools.
- Administration.
- OLAP Engines.
49???? ?????
- 40 ??????
- 60 ???? ???? (??? ???? ????)
50?????? - ???? ????
???? ??????? ?????, ????? ?????? ?? ???? ??????
??? (Data Mart)
- ???? ????????
- ????? ??????? ??????? ???? ????? ????? (????? !)
- ?????? ?????? ???????? ?? ??????? ?????? ?????
- ?????? ?????? ?? ????? ?????
- ???? ??????? (?????? ???????)
- ????? ?-7 ?????? ?????? ?????? ???????.
- ??? 100 ?-1000 ?????? ??? ???? ?????? ??????
???????. - ????? ????? ??? ?? ????? ?????? ?-20 ??????.
51?????? - ?????
- ???? ?????? (????? ?????? ????? ???????)
- ???? ?????? ????? ???????????.
- ???? ???? ??????? ?????????.
- ????? ?????? ?? ???? ???????.
- ????? ????? ??????? ?? ???? ???????.
- ????? ????? ????? ?? ???? ??????? ????? ?????.
- ????? ????? ????? ?????? ???? ?????? (?-ACCESS
?? SQL-SERVER) - ????? ?-OLAP ??????? DAS.
- ????? ????? ?- ETL.
- ????? ?- ETL ??????? DTS.
- ????? ?-OLAP ???? ????? (Data Analyzer)
52?????? ???????
- ????? ?? ???????.
- ???? ???? 20 ???? ???? ???????? ????????. ?-5
???? ?????? ??????. - ???? ??"? ???? ????? ?????? ?? ?????.
- ???"? ????? ????? ?? ????? WORD.
- ???? ?????? ?????? ??? ?????? ?????? ????????.
- ???? ??????? ??????.
- ???? ??????? ????? ?? ???? ???????.
- ???? ????? ?? ???? ?-OLAP.
- ???? ?-DTS.
- ????? ???? ?? ?????? ?????? ?? ??????? ????
??????? ?????.