Data Warehouses - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Data Warehouses

Description:

????? ??? ?????? ?????? ??? ????? ????? ????? ????? ???? ????? ?? ???? ???????? ... is below 600 and the place of birth is Diaspora, then the final grade is average ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 31
Provided by: liorr
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouses


1
????? ??????
  • ???? 10 ????? ??????

2
Knowledge Discovery in Databases (KDD)
  • ????? ??? ?????? ?????? ??? ????? ????? ?????
    ????? ???? ????? ?? ???? ???????? ???????? ???
    ????? ????? ????? ?????? ?????, ?????, ???????
    ????? ??????.
  • "???? ????? ?? ????".
  • "???????? ????????".
  • "???? ??????" ?????? ????? ???????.
  • "?????"
  • "???????" ????? ??????? ?????? ?????? ???????
    ?????? ??? ???? ????? ??????.
  • "?????" ?????? ???? ???????? ?? ??? ?????? ????
    ?? ??? ??????. ???? ???? "???? ???? ????? ??????"
    ??? ????. ????? ??? ?? ????? ??????? ???? ??
    ????? ??? ??? "?????".

3
Data Mining
  • ????? ?? ????? ?????? (Data Mining) ?????? ??????
    ???????? ?? ?????? ???? ????? ????? ????.
  • ????? ????? ???? ???? ???????? ?????? ?????????
    ?? ????? ????? ??? ????? ?????? ???.
  • ???? ????? ????? ?????? ???? ????? ???????? ?????
    "????? ??? ?????? ??????" ?????? "????? ??????"
    ??????? ??????. ??? "????? ??? ?????? ??????"
    ?????? ?????? ??????? ????? ????? ?????? ??? ??
    ??? ??? ???? ?????? - ???? ?? ??????? ??????????
    ?????. ???? ??? ????? ??????? ???? ?????? ?????
    ?????, ???? ?????? ?????? ??????? (???? ????
    ???????) ????? ?????? (???? ???? ???????).

4
Knowledge Discovery in Databases (KDD)Basic
Steps - based upon Fayyad, Piatetsky-Shapiro, and
Smyth (1996)
  • Data selection
  • Cleaning and pre-processing of target data
  • Removing noisy and erroneous data
  • Handling missing data values
  • Dimensionality reduction and transformation
  • Selecting most important features (attributes)

5
Knowledge Discovery in Databases
(continued)Basic Steps - based upon Fayyad,
Piatetsky-Shapiro, and Smyth (1996)
  • Data Mining
  • Selecting DM methods and tools
  • Select the methods parameters.
  • Build a model.

6
Knowledge Discovery in Databases
(continued)Basic Steps - based upon Fayyad,
Piatetsky-Shapiro, and Smyth (1996)
  • Post-processing , interpretation, and evaluation
    of results
  • Visualization of results
  • Evaluation of discovered patterns by statistical
    significance, interestingness, importance,
    relevance, actionability, etc.
  • If necessary returning to previous steps

7
Taxonomy of Data Mining Methods
Discovery
Verification
  • Goodness of Fit
  • Hypotheses testing
  • Analysis of Variance

Prediction
Description
  • Clustering
  • Association Rules
  • Linguistic Summary
  • Visualization

Regression
Classification
Bayesian Networks
Decision Trees
Support Vectors
Neural Networks
Others
8
Data Mining vs. Statistics
9
Methods of Data Mining
  • Classification
  • Credit approval
  • Fraud Detection
  • Churn Detection
  • Medical diagnosis
  • Pattern recognition
  • Clustering
  • Customer analysis
  • Documents clustering.

10
Methods of Data Mining(contd)
  • Association Rules
  • Retail analysis
  • Regression
  • Forecasting

11
Look at all of the different industries
www.kdnuggets.com
12
Typical questions addressed by DM
  • Which customers are most likely to drop their
    cell phone service?
  • What is the probability that a customer will
    purchase at least 100 worth of merchandise from
    a particular mail-order catalog?
  • Which prospects are most likely to respond to a
    particular offer?

13
Example
  • X bad situation
  • O good situation
  • The graph represent historical data.
  • How to decide when to give a loan?

14
Thumb Rule Example
  • X bad situation
  • O good situation
  • 3 good cases will not get the loan
  • 2 bad cases will get the loan.

15
Data Mining Methods
  • Classification - is the learning of a function
    that maps a data into one of several predefined
    classes.
  • 1 good case will not get the loan
  • 2 bad cases will get the loan.

16
Data Mining Methods
  • Regression - is the learning of a function that
    maps data item into A real variable, for example
    the Debt value according to income.

17
Data Mining Methods
  • Clustering - Identify a set of items with common
    characteristics

18
ID3 AlgorithmExample -Complete decision tree
outlook
sunny
overcast
rain
P
humidity
windy
normal
high
true
false
N
P
N
P
19
Decision Tree StructureMain Components
  • Nodes - tests of some attribute
  • Branch - one of possible values for the attribute
  • Leaves (leaf nodes) - classifications
  • Path (from the tree root to a leaf) - conjunction
    of attribute tests

20
Decision Tree Learning
Example a simple decision tree - university
studies Rule Extraction
  • If the test is over 700, then the final grade is
    above average
  • If the test is between 600 and 700, and the
    gender is male, then the final grade is below
    average
  • If the test is between 600 and 700, and the
    gender is female, then the final grade is above
    average
  • If the test is below 600 and the place of birth
    is Diaspora, then the final grade is average
  • If the test is below 600 and the place of birth
    is Israel, then the final grade is below average

21
Decision Tree LearningAppropriate Problems
  • Instances are described by a fixed set of
    attributes
  • Each predicting attribute takes a small number of
    disjoint possible values
  • The target function has discrete output values
    (each value class / concept)
  • The training data may contain errors (noise)
  • The training data may contain missing attribute
    values

22
Bayesian Networks
23
Bayesian Approach
Classify the new instance by the most probable
target value
24
Naive Bayes Classifier
Assumption all attribute values are
conditionally independent given the target value
25
Naive Bayes Classifier example (two target
values)
26
Naive Bayes Classifier example (continued)
  • New instance
  • Outlook sunny
  • Temperature cool
  • Humidity high
  • Wind strong
  • Play Tennis Yes / No ?

27
Naive Bayes Classifier example (continued)
  • P(Yes) 9/14 0.64
  • P(No) 5 / 14 0.36
  • P(Yes) P(Sunny / Yes)P(Cool/ Yes)P(High/
    Yes)P(Strong/ Yes)
  • lt
  • P(No) P(Sunny / No)P(Cool/ No)P(High/
    No)P(Strong/ No)

28
Multilayer Neural NetworkA Sample
Backpropagation Network
Interconnection weights
Input Layer
Hidden Layer
Output Layer
29
Neural Network
30
Artificial Neural NetworksAppropriate Problems
  • Instances are represented by many attributes
  • The target function may be discrete-valued,
    real-valued, or a vector of several discrete /
    real-valued attributes
  • The training data may contain errors
  • Long training times are acceptable
  • Fast prediction of the target function in a new
    instance may be required
  • The ability of understanding the learned target
    function is not important
Write a Comment
User Comments (0)
About PowerShow.com