DataMining - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

DataMining

Description:

Data mining takes this evolutionary process beyond retrospective data access and ... size and quality, data mining technology can generate new business opportunities ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 32
Provided by: kentm
Category:

less

Transcript and Presenter's Notes

Title: DataMining


1
DataMining
  • By
  • Guan Hang Su
  • CS157A section 2 fall 2005

2
Outline
  • Overview
  • ---- Define Data Mining
  • ---- Foundation of Data Mining
  • ---- Scope of Data Mining
  • ---- Techniques in data mining
  • ----Applications

3
What is DataMining?
  • Discovering hidden value in your data
    warehouse

4
Define Data Mining
  • The automated extraction of hidden predictive
    information from (large) databases
  • Three key words
  • Automated
  • Hidden
  • Predictive
  • Implicit is a statistical methodology
  • Data mining lets you be proactive
  • Prospective rather than Retrospective

5
The Foundations of Data Mining
  • Data mining techniques are the result of a long
    process of research and product development. This
    evolution began when business data was first
    stored on computers, continued with improvements
    in data access, and more recently, generated
    technologies that allow users to navigate through
    their data in real time. Data mining takes this
    evolutionary process beyond retrospective data
    access and navigation to prospective and
    proactive information delivery.

6
The Foundations of Data Mining (continue)
  • Data mining is ready for application in
  • the business community because it is supported
    by three technologies that are now sufficiently
    mature
  • Massive data collection
  • Powerful multiprocessor computers
  • Data mining algorithms

7
The Scope of Data Mining
  • Data mining derives its name from the
    similarities between searching for valuable
    business information in a large database
  • Example finding linked products in
    gigabytes of store scanner data and mining a
    mountain for a vein of valuable ore.
  • Both processes require either sifting
    through an immense amount of material, or
    intelligently probing it to find exactly where
    the value resides.

8
The Scope of Data Mining (cont..)
  • Given databases of sufficient size and
    quality, data mining technology can generate new
    business opportunities by providing these
    capabilities
  • Automated prediction of trends and behaviors
  • Automated discovery of previously unknown
    patterns.

9
The Scope of Data Mining (cont..)
  • Automated prediction of trends and behaviors
  • --- Data mining automates the process of
    finding predictive information in large
    databases. Questions that traditionally required
    extensive hands-on analysis can now be answered
    directly from the data.
  • Typical example of a predictive problem
    1)targeted marketing.
  • 2) forecasting bankruptcy

10
The Scope of Data Mining (cont..)
  • Automated discovery of previously unknown
    patterns
  • ---- Data mining tools sweep through databases
    and identify previously hidden patterns in one
    step.
  • Example of pattern discovery The analysis of
    retail sales data to identify seemingly unrelated
    products that are often purchased together
  • Other pattern discovery problems include
    detecting fraudulent credit card transactions and
    identifying anomalous data that could represent
    data entry keying errors.

11
Techniques in data mining
  • The most commonly used techniques in data mining
  • Artificial neural networks
  • Decision trees
  • Genetic algorithms
  • Nearest neighbor method
  • Rule induction

12
  • Artificial neural networks Non-linear predictive
    models that learn through training and resemble
    biological neural networks in structure.
  • Decision trees Tree-shaped structures that
    represent sets of decisions. These decisions
    generate rules for the classification of a
    dataset. Specific decision tree methods include
    Classification and Regression Trees (CART) and
    Chi Square Automatic Interaction Detection
    (CHAID)

13
  • Genetic algorithms Optimization techniques that
    use processes such as genetic combination,
    mutation, and natural selection in a design based
    on the concepts of evolution.
  • Nearest Neighbor. A data mining technique that
    performs prediction by finding the prediction
    value of records (near neighbors) similar to the
    record to be predicted.

14
  • Rule induction The extraction of useful if-then
    rules from data based on statistical significance
  • Other Techniques
  • Bayesian networks
  • ----- Naïve Bayes
  • Support vector machines
  • Many more..

15
  • Decision Trees
  • Nearest Neighbor classification
  • Neural Networks
  • Rule Induction
  • K-means Clustering

16
Example of Neural Network
  • Difficult interpretation
  • Tends to overfit the data
  • Extensive amount of training time
  • A lot of data preparation
  • Works with all data types

17
Example of Rule of induction
  • Description
  • Produces decision trees
  • income lt 40K
  • job gt 5 yrs then good risk
  • job lt 5 yrs then bad risk
  • income gt 40K
  • high debt then bad risk
  • low debt then good risk
  • Or Rule Sets
  • Rule 1 for good risk
  • if income gt 40K
  • if low debt
  • Rule 2 for good risk
  • if income lt 40K
  • if job gt 5 years

18
K-Nearest-Neighbor (kNN) Models
  • Use entire training database as the model
  • Find nearest data point and do the same thing as
    you did for that record

100
Age
0 Doses 1000
Very easy to implement. More difficult to use in
production. Disadvantage Huge Models
19
Example of Decision Trees
20
How Data Mining Works
  • How exactly is data mining able to tell you
    important things that you didn't know or what is
    going to happen next? The technique that is used
    to perform these feats in data mining is called
    modeling.
  • Modeling is simply the act of building a model in
    one situation where you know the answer and then
    applying it to another situation that you don't.

21
  • Computers are loaded up with lots of information
    about a variety of situations where an answer is
    known and then the data mining software on the
    computer must run through that data and distill
    the characteristics of the data that should go
    into the model
  • Once the model is built it can then be used in
    similar situations where you don't know the
    answer

22
Some results of Data Mining
  • Forecasting what may happen in the future.
  • Classifying people or things into groups by
    recognizing patterns.
  • Clustering people or things into groups based on
    their attributes.
  • Sequencing what events are likely to lead to
    later events

23
Example
  • For example, say that you are the director of
    marketing for a telecommunications company and
    you'd like to acquire some new long distance
    phone customers.
  • 1)randomly mail out the coupon to general
    population.
  • 2) or use your business experience stored in
    your database to build a model , then choose the
    right target.

24
Cont..
  • As the marketing director you have access to a
    lot of information about all of your customers
    their age, sex, credit history and long distance
    calling usage.
  • The problem is that you don't know the long
    distance calling usage of these prospects (since
    they are most likely now customers of your
    competition).

25
  • We 'd like to concentrate on those prospects who
    have large amounts of long distance usage .We can
    accomplish this by building a model

26
  • For instance, a simple model for a
    telecommunications company might be
  • 98 of my customers who make more than
    60,000/year spend more than 80/month on long
    distance.
  • With this model in hand new customers can be
    selectively targeted

27
Architecture for Data Mining
  • To best apply these advanced techniques, they
    must be fully integrated with a data warehouse as
    well as flexible interactive business analysis
    tools.
  • Many data mining tools currently operate outside
    of the warehouse, requiring extra steps for
    extracting, importing, and analyzing the data.
    Furthermore, when new insights require
    operational implementation, integration with the
    warehouse simplifies the application of results
    from data mining.

28
  • illustrates an architecture for advanced analysis
    in a large data warehouse

29
Data Mining Applications
  • The US Drug Enforcement Agency needed to be more
    effective in their drug busts.
  • Analyzed suspects cell phone usage to focus
    investigations.

30
  • HSBC need to cross-sell more effectively by
    identifying profiles that would be interested in
    higher yielding investments.
  • Reduced direct mail costs by 30 while
    garnering 95 of the campaigns revenue.

31
Bibliography
  • http//www.thearling.com/dmintro/dmintro_frame.htm
  • http//www.thearling.com/text/dwhite/dmwhite.htm
  • http//www.cs.sjsu.edu/faculty/lee/cs157/25SpL22Da
    taMining.ppt
  • http//www.oracle.com/technology/products/bi/odm/i
    ndex.html
Write a Comment
User Comments (0)
About PowerShow.com