Data Warehouse and Data Mining - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

Data Warehouse and Data Mining

Description:

Title: Data Warehouse and Data Mining Subject: Data Warehouse& Data Mining Application Author: Zhujianqiu Last modified by: hdj Created Date: 4/11/2001 10:27:14 AM – PowerPoint PPT presentation

Number of Views:1473
Avg rating:5.0/5.0
Slides: 82
Provided by: Zhu45
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouse and Data Mining


1
???????????
  • ?????????????

?????? 2001?6?7?
2
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

3
??????
  • ????
  • ??????????

4
????????
  • Data warehouse is a subject oriented,
    integrated,non-volatile and time variant
    collection of data in support of managements
    decision Inmon,1996.
  • Data warehouse is a set of methods,
    techniques,and tools that may be leveraged
    together to produce a vehicle that delivers data
    to end-users on an integrated platform
    Ladley,1997.
  • Data warehouse is a process of crating,
    maintaining,and using a decision-support
    infrastructure Appleton,1995Haley,1997Gar
    dner 1998.

5
?????????? Inmon,1996
  • ????
  • ??????????????????(?????,???????????????)
  • ???????????????
  • ????????????????
  • ???????????????(??????Customer ID)
  • ??????????(??????????????)
  • ???????????????(???,???,???)
  • ??
  • ?????????????
  • ???
  • ?????,?????????????
  • ??????(???)
  • ??????

6
????Data Mart, ODS
  • Data Mart
  • ???? -- ???,??????????????
  • Operation Data Store
  • ?????? ODS??????????????????,????DB?????????,
    ?DW ????????????????????????(Subject
    -Oriented)????????? ?????????

7
????ETL, ???,??,??
  • ETL
  • ETL(Extract/Transformation/Load)?????????????Micr
    osoft DTS IBM Visual Warehouse etc.
  • ???
  • ???????,??????????????????,???????????
  • ??
  • ????????????????????????????????,?????
  • ??
  • ??????????????,?????????

8
??????????
  • ?????OLAP
  • ??????
  • ????
  • ?????????????
  • ?????????
  • ????????

9
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

10
???????????
  • ????
  • ETL??
  • ????(Repository)??????
  • ?????????

11
???? Pieter ,1998
Mid- Tier
Relational
Central Data Warehouse
Appl. Package
Data Mart
Mid- Tier
Local Metadata
RDBMS
Legacy
Metadata Exchange
Data Mart
External
Data Cleansing Tool
MDB
RDBMS
End-User DW Tools
Source Databases
Architected Data Marts
Central Data Warehouse
Data Access and Analysis
12
?ODS?????
ODS
OLTP Tools
Mid- Tier
Central Data Warehouse
Data Mart
Mid- Tier
Local Metadata
RDBMS
Metadata Exchange
Data Mart
RDBMS
End-User DW Tools
Source Databases
Architected Data Marts
Data Access and Analysis
Central Data Ware- house and ODS
13
???????Douglas Hackney ,2001
Oracle Financials
i2 Supply Chain
Siebel CRM
3rd Party
e-Commerce
Packaged Oracle Financial Data Warehouse
Custom Marketing Data Warehouse
Packaged I2 Supply Chain Non- Architected Data
Mart
Subset Data Marts
14
???????/????????
Oracle Financials
i2 Supply Chain
Siebel CRM
3rd Party
e-Commerce
Common Staging Area
Real Time ODS
Federated Financial Data Warehouse
Federated Marketing Data Warehouse
Real Time Data Mining and Analytics
Federated Packaged I2 Supply Chain Data Marts
Real Time Segmentation, Classification,
Qualification, Offerings, etc.
Subset Data Marts
Analytical Applications
15
??????BI????
Front- and back-office OLTP
e-Business systems
External information providers
ETL tools DW templates Data profiling reengineering tools Demand-driven data acquisition analysis
Metadata Interchange Federated data warehouse and data mart systems Decision engine models, rules and metrics
OLAP data mining tools, Analysis templates Analytic application development tools components Analytic applications
Informed decisions actions
CRM Analytics Reporting
Supply Chain Analytics Reporting
EPM Analytics Reporting
Financial Analytics Reporting
HR Analytics Reporting
EKP - Enterprise Knowledge Management Portal
Business information recommendations
16
?????????-???????????
  • ????????????????
  • ??????????????????

Relational
Enterprise Data Warehouse
Data Staging
Package
RDBMS
Legacy
RDBMS ROLAP
External source
Data Clean Tool
17
ETL??
  • ????????????????
  • ????????????
  • ???????????
  • ??????????
  • ??????????

18
ETL??????
19
??????????
  • ?????????????????????????-Alex Berson etc,
    1999
  • ?????
  • ??????????????????????????,??????????????????
  • ?????
  • ????(????????????????,?????????)
  • ????????????????
  • ????????????
  • ??????
  • ????,????,????,??????,??????,????,??

20
??????????
  •   ?????
  • ??????????,??
  • ??????????,?????????????????
  • Internet??
  • ???????????,??????????????????????????????????????
    ,?
  • ?????????
  • ??,????(??,??),???,???????,????

21
??????????
  • ????(metadata repository)??? Martin
    Stardt,2000

22
?????????
  • ??
  • OLAP
  • ????

23
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

24
??????
  • ????(Top-Down)
  • ????(Bottom Up)
  • ?????
  • ??????

25
Top-down Approach
  • Build Enterprise data warehouse
  • Common central data model
  • Data re-engineering performed once
  • Minimize redundancy and inconsistency
  • Detailed and history data global data discovery
  • Build datamarts from the Enterprise Data
    Warehouse (EDW)
  • Subset of EDW relevant to department
  • Mostly summarized data
  • Direct dependency on EDW data availability

External Data
Operational Data
Local Data Mart
Local Data Mart
26
????????
  • ?????????
  • ???????????
  • ??? ROI -- ???????????
  • ????? -- ????????
  • ????????????????
  • ?????????
  • ????????????
  • ?????????????
  • ?????????
  • ?????????
  • ??EDB?????????

????? (??)
??????
??????
?????? EDB
27
?????? ????
  • Example of Star Schema

28
?????? ????
  • Example of Snowflake Schema

Year

Month
Year
Date
Sales Fact Table
Month Year
Date Month
Date
Product
Store
Customer
unit_sales
dollar_sales
Yen_sales
Measurements
29
???(OLTP)??? --- ???
30
????
???
???
31
????
32
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

33
?????? Inmon,1996
  • ??????
  • ???????????
  • ????????
  • ?????(??)
  • ?????????????DASD????????
  • ????
  • ?????????
  • ????????????
  • ???????????
  • ?????????
  • ?????????????
  • ???,??????????
  • ???/?????????????(?/?)
  • ???????/??
  • ?????

34
?????? Inmon,1996
  • ????????
  • ??????????
  • ??????????
  • ?????????
  • ?SQL??
  • ???????
  • ???????
  • ?????????????
  • ????
  • I/O???CPU?????,?????????????
  • ????(?????????)
  • ????
  • ????(?????????????)
  • ??????(????????????)
  • ????

35
?????? Inmon,1996
  • ??????,?????????
  • ????????????/???????????????
  • ??DBMS?????DBMS??
  • ??????????????
  • ??????10GB/100GB/TB
  • ??DBMS???????,????Lock???Commit????CheckPoint????
    ?Log?????DeadLock??? Roolback.
  • ??????,????,??DBMS??????
  • ??????DBMS??????,????DBMS????
  • ??DBMS?????????????,???????DSS????
  • ??DBMS??
  • ??DBMS?????
  • ??DBMS????????????,?????????
  • ??DBMS(OLAP)?????,??????????????
  • ??????(DASD/??)

36
?????? Inmon,1996
  • ???????????
  • DSS?????IT??????,????????
  • ??????????????????????
  • ?????????????,????????????/??
  • ??????(????)
  • ???????(????/??/????/??)
  • ???????(????/????/??/??/????)
  • ???????(??????????????/????/????/????)
  • ??????
  • ????(???)
  • ??????(CDC)(??)

37
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

38
?????? Inmon, 1999
  • ??
  • ??
  • ??
  • ????

??? ??,??????, ???????,2000?5?
39
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

40
?????? DW??????
DW????? ?100-500????? ???????? ?????
DW????? ???? Meta Group Survey ????3000 ???????
41
DW???????
DW????? ???? Meta Group Survey ????3000 ???????
42
How Much?
  • 3-6m for mid-size company, less if smaller, more
    if larger
  • 10m for large organizations, large data sets
  • 10-50 annual maintenance costs
  • 33 Hardware / 33 Software / 33 Services

43
How Long?
  • 2-4 years for 80/20 of full system for mid-size
    company
  • 6-12 months for initial iteration
  • 3-6 months for subsequent iterations

44
How Risky?
  • For EDW Projects, 20 (Meta) to 70 (OTR, DWN)
    fail
  • High failure rate for non-business driven
    initiatives
  • Very few systems meet the expectations of the
    business
  • Failure not due to technology, due to soft
    issues
  • Massive upside to successful projects (100 -
    2000 ROI)
  • 99 politics - 1 technology

45
????
  • Inmon,W.H., Building the Data Warehouse ,Johm
    Wiley and Sons,1996.
  • Ladley,John,Operational Data StoresBuilding an
    Effective Strategy,Data warehousePratical
    Advice form the Experts,Prentice Hall,Englewood
    Cliffs,NJ,1997.
  • Gardmer,Stephen R., Building the Data
    warehouse,Communication of ACM, September 1998,
    Volume 41, Numver 9, 52-60.
  • Douglas Hackney , Http// www.egltd.com, DW101 A
    Practical Overview, 2001
  • Pieter R. Mimno, The Big Picture - How Brio
    Competes in the Data Warehousing Market,
    Presentation to Brio Technology - August 4, 1998.
  • Alex Berson, Stephen Smith, Kurt Therling,
    Building Data Mining Application for CRM,
    McGraw-Hill, 1999
  • Martin Stardt, Anca Vaduva, Thomas Vetterli, The
    Role of Meta for Data Warehouse, 2000
  • W.H.Inmon, Ken Rudin, Christopher K. Buss, Ryan
    Sousa, Data Warehouse Performance, John Wiley
    Sons , 1999

46
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

47
????????
  • ????????
  • ?????????
  • ????????

48
????????
  • ????
  • Data Mining Upsides
  • Data Mining Downsides
  • Data Mining Use
  • Data Mining Industry and Application
  • Data Mining Costs

49
????
Clustering 22 Direct Marketing 14
Cross-Sell Models 12 www.kdnuggets.com
2001/6/11 News
50
Data Mining Upsides
  • Discovery of previously unknown relationships,
    trends, anomalies, etc.
  • ????????????????
  • Powerful competitive weapon
  • ????? ??
  • Automation of repetitive analysis
  • ?????????
  • Predictive capabilities

51
Data Mining Downsides
  • Knowledge discovery technology immature
  • Long learning and tuning cycles for some
    technologies
  • Black box technology minimizes confidence
  • VLDB (Very Large Data Base) requirements

52
Data Mining Uses
  • Discover anomalies, outliers and exceptions in
    process data
  • Discover behavior and predict outcomes of
    customer relationships
  • Churn management
  • Target marketing (market of one)
  • Promotion management
  • Fraud detection
  • Pattern ID matching (dark programs, science)

53
Data Mining Industry and Applications
  • From research prototypes to data mining products,
    languages, and standards
  • IBM Intelligent Miner, SAS Enterprise Miner, SGI
    MineSet, Clementine, MS/SQLServer 2000, DBMiner,
    BlueMartini, MineIt, DigiMine, etc.
  • A few data mining languages and standards (esp.
    MS OLEDB for Data Mining).
  • Application achievements in many domains
  • Market analysis, trend analysis, fraud detection,
    outlier analysis, Web mining, etc.

54
Data Mining Costs
  • Desktop tools 500 and up (MSFT coming at low
    price point)
  • Server / MF based 20,000 to 700,000
  • Must also add cost of extensive consulting for
    high end tools
  • Dont forget long training and learning curve
    time
  • Ongoing process, not task automation software

55
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

56
??????
  • ????
  • ?????
  • ???????????
  • ?????????
  • ???????
  • ??????????

57
????
  • 1989 IJCAI Workshop on Knowledge Discovery in
    Databases
  • Knowledge Discovery in Databases (G.
    Piatetsky-Shapiro and W. Frawley, 1991)
  • 1991-1994 Workshops on Knowledge Discovery in
    Databases
  • Advances in Knowledge Discovery and Data Mining
    (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and
    R. Uthurusamy, 1996)
  • 1995-1998 International Conferences on Knowledge
    Discovery in Databases and Data Mining
    (KDD95-98)
  • Journal of Data Mining and Knowledge Discovery
    (1997)
  • 1998 ACM SIGKDD, SIGKDD1999-2001 conferences,
    and SIGKDD Explorations
  • More conferences on data mining
  • PAKDD, PKDD, SIAM-Data Mining, (IEEE) ICDM,
    DaWaK, SPIE-DM, etc.

58
Data Mining Confluence of Multiple Disciplines
Database Technology
Statistics
Data Mining
Machine Learning (AI)
Visualization
Information Science
Other Disciplines
59
A Multi-Dimensional View of Data Mining
  • Databases to be mined
  • Relational, transactional, object-relational,
    active, spatial, time-series, text, multi-media,
    heterogeneous, legacy, WWW, etc.
  • Knowledge to be mined
  • Characterization, discrimination, association,
    classification, clustering, trend, deviation and
    outlier analysis, etc.
  • Techniques utilized
  • Database-oriented, data warehouse (OLAP), machine
    learning, statistics, visualization, neural
    network, etc.
  • Applications adapted
  • Retail, telecommunication, banking, fraud
    analysis, DNA mining, stock market analysis, Web
    mining, Weblog analysis, etc.

60
Research Progress in the Last Decade
  • Multi-dimensional data analysis Data warehouse
    and OLAP (on-line analytical processing)
  • Association, correlation, and causality analysis
  • Classification scalability and new approaches
  • Clustering and outlier analysis
  • Sequential patterns and time-series analysis
  • Similarity analysis curves, trends, images,
    texts, etc.
  • Text mining, Web mining and Weblog analysis
  • Spatial, multimedia, scientific data analysis
  • Data preprocessing and database compression
  • Data visualization and visual data mining
  • Many others, e.g., collaborative filtering

61
Research Directions Han J. W. , 2001
  • Web mining
  • Towards integrated data mining environments and
    tools
  • Vertical (or application-specific) data mining
  • Invisible data mining
  • Towards intelligent, efficient, and scalable data
    mining methods

62
Towards Integrated Data Mining Environments and
Tools
  • OLAP Mining Integration of Data Warehousing and
    Data Mining
  • Querying and Mining An Integrated Information
    Analysis Environment
  • Basic Mining Operations and Mining Query
    Optimization
  • Vertical (or application-specific) data mining
  • Invisible data mining

63
Querying and Mining An Integrated Information
Analysis Environment
  • Data mining as a component of DBMS, data
    warehouse, or Web information system
  • Integrated information processing environment
  • MS/SQLServer-2000 (Analysis service)
  • IBM IntelligentMiner on DB2
  • SAS EnterpriseMiner data warehousing mining
  • Query-based mining
  • Querying database/DW/Web knowledge
  • Efficiency and flexibility preprocessing,
    on-line processing, optimization, integration,
    etc.

64
Vertical Data Mining
  • Generic data mining tools? Too simple to match
    domain-specific, sophisticated applications
  • Expert knowledge and business logic represent
    many years of work in their own fields!
  • Data mining business logic domain experts
  • A multi-dimensional view of data miners
  • Complexity of data Web, sequence, spatial,
    multimedia,
  • Complexity of domains DNA, astronomy, market,
    telecom,
  • Domain-specific data mining tools
  • Provide concrete, killer solution to specific
    problems
  • Feedback to build more powerful tools

65
Invisible Data Mining
  • Build mining functions into daily information
    services
  • Web search engine (link analysis, authoritative
    pages, user profiles)adaptive web sites, etc.
  • Improvement of query processing history data
  • Making service smart and efficient
  • Benefits from/to data mining research
  • Data mining research has produced many scalable,
    efficient, novel mining solutions
  • Applications feed new challenge problems to
    research

66
Towards Intelligent Tools for Data Mining
  • Integration paves the way to intelligent mining
  • Smart interface brings intelligence
  • Easy to use, understand and manipulate
  • One picture may worth 1,000 words
  • Visual and audio data mining
  • Human-Centered Data Mining
  • Towards self-tuning, self-managing,
    self-triggering data mining

67
Integrated Mining A Booster for Intelligent
Mining
  • Integration paves the way to intelligent mining
  • Data mining integrates with DBMS, DW, WebDB, etc
  • Integration inherits the power of up-to-date
    information technology querying, MD analysis,
    similarity search, etc.
  • Mining can be viewed as querying database
    knowledge
  • Integration leads to standard interface/language,
    function/process standardization, utility, and
    reachability
  • Efficiency and scalability bring intelligent
    mining to reality

68
??????????
  • CRISPDM
  • ?????(CRoss-Industry Standard Process for Data
    Mining)
  • XML
  • ?????????
  • SOAP(Simple Object Access Protocol )
  • ????????????
  • PMML
  • ????????
  • OLE DB For Data Mining
  • ????????API???

69
??
  • ??????
  • ???????????
  • ??????
  • ??????(?????????)
  • ??????
  • ??????
  • ????????
  • ?????????
  • ????????(??????)

70
????????
  • ??????
  • ????(?????????)
  • ?????????
  • ??????
  • ?????????

71
??????(1)
  • ???,????????,??????????,??????????????????????????
    ?????????,????????????????????

72
??????(2)
  • ??????????????????
  • ???????????,????????????
  • ????????????????????,???????,???????
  • ??????????????????

73
????
  • ????
  • ???? ???? ????
  • ?????????
  • ????
  • ?????????
  • ?????????
  • ?????????

74
???????? ???? ????
????????
?????
??(DNA)??????????????????
?????
????????????????????????
???????
??????????????????????????
75
?????????
????????
?????????
????
?????
76
????
  • ????????????
  • ETL??
  • ?????????
  • ?????????????
  • ?????????

77
??????????????????
78
??????????????????
79
?????????
???
???
???
?????????
?????????
????
1???????? 2 ?????? 3 ?????? 4 ???????? 5 ???????
1 ??????? 2 ???????? 3 ???????? 4 ??????
1 ??????? 2 ??????? 3 ?????? 4 ???????? 5 ??????
80
??????
  • ?????????????
  • ?????????
  • ???????????(ERP,CRM)
  • ETL(????????)?????
  • ????????????????
  • ?????????

81
Any Questions?
Zhujianqiu_at_hotmail.com
Write a Comment
User Comments (0)
About PowerShow.com