Another Look at Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Another Look at Data Mining

Description:

Another Look at Data Mining Why do we mine? What do we mine? How do we mine? What is Data Mining Data mining discovers meaningful new correlations, hidden patterns ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 28
Provided by: cisBentle
Learn more at: http://cis.bentley.edu
Category:
Tags: another | data | look | mining

less

Transcript and Presenter's Notes

Title: Another Look at Data Mining


1
Another Look at Data Mining
  • Why do we mine?
  • What do we mine?
  • How do we mine?

2
What is Data Mining
  • Data mining discovers meaningful new
    correlations, hidden patterns and relationships
    in your data
  • Conceptual descendent of statistics
  • Combines machine learning,statistics,and
    databases
  • Knowledge discoveryprocess of building and
    implementing a data mining solution

3
Data Mining Overview
  • Knowledge Discovery in Databases, KDD
  • No one data mining approach
  • each tool viewed logically as application of
    client
  • Can reside on separate machine or in separate
    process and access data warehouse
  • RDBMS or proprietary OLAP embed data mining
    capabilities deeply within engines to improve
    efficiency and add extensions
  • Requires a good foundation in terms of a data
    warehouse

4
Data Mining Overview (cont)
  • Common algorithmic approaches
  • association, affinity grouping
  • predicting, sequence-based analysis
  • clustering
  • classification
  • estimation
  • Steps aredata selection, data transformation,data
    mining,result interpretation.

5
Strategic Benefit of Data Mining
  • Direct Marketing
  • Trend Analysis
  • Fraud detection
  • Forecasting in Financial Markets

6
Why Data Mining Now?
  • Economics
  • Unprecedented affordability of MIPS and MB
  • Parallel computing
  • Enormous amounts of data can be processed
  • Popularity of data warehouses, data marts
  • Relatively clean data available

7
Data Mining compared to Traditional Analysis
  • Traditional Analysis
  • Did sales of product X increase in Nov.?
  • Do sales of product X decrease when there is a
    promotion on product Y?
  • Data mining is result oriented
  • What are the factors that determine sales of
    product X?

8
Data Mining compared to Traditional Analysis
(cont)
  • Traditional analysis is incremental
  • Does billing level affect turnover?
  • Does location affect turnover?
  • Analyst builds model step by step
  • Data Mining is result oriented
  • Identify the factors and predict turnover

9
Steps in Data Mining
  • Data Manipulation - can be 70-80 of data mining
    effort
  • data cleaning
  • missing values
  • data derivation
  • merging data
  • Defining a study
  • Supervised-articulating goal, choosing dependent
    variable or output and specifying data fields
  • Unsupervised-group similar types of data or
    identify exceptions

10
Steps in Data Mining (cont)
  • Reading the data and building the model
  • model summarizes large amounts of data by
    accumulating indicators (frequencies,weight,conjun
    ctions,differentiation)
  • Understanding the model
  • Know the particular model
  • Prediction
  • Choose the best outcome based on historical data

11
Models
  • Genetic Algorithms
  • Neural Nets
  • Agents
  • Statistics
  • Visualization

12
Genetic Algorithms
  • Artificial intelligence system that mimics the
    evolutionary, survival-of-the-fittest processes
    to generate increasingly better solutions to a
    problem.
  • Genetic algorithms produce several generations of
    solutions, choosing the best of the current set
    for each new generation.
  • Examples
  • Generating human faces based on a few known
    features.
  • Generating solutions to routing problems.
  • Generating stock portfolios.

13
EVOLUTION IN GENETIC ALGORITHMS
  • SELECTION - or survival of the fittest. The key
    is to give preference to better outcomes.
  • CROSSOVER - combining portions of good outcomes
    in the hope of creating an even better outcome.
  • MUTATION - randomly trying combinations and
    evaluating the success (or failure) of the
    outcome.

14
Neural Nets
  • Mathematical Model of the Way a Brain Functions
  • Machine learning approach by which historical
    data can be examined for pattern recognition
  • A neural network simulates the human ability to
    classify things based on the experience of seeing
    many examples.
  • Pros -Numerical Data
  • Cons - Opaque, Art or Science

//www.attar.com/
15
  • Example
  • Distinguishing different chemical compounds
  • Detecting anomalies in human tissue that may
    signify disease
  • Reading handwriting
  • Detecting fraud in credit card use

16
Intelligent Agents
  • Software entities that carry out some set of
    operations on behalf of user or program with some
    degree of autonomy and employ some knowledge or
    representation of users goals and desires.
  • Some common characteristics
  • ability to communicate, cooperate and coordinate
    with other agents
  • ability to act autonomously to achieve collective
    goal of system

17
Intelligent Agents (cont)
  • Tasks
  • automate repetitive tasks
  • finding and filtering information
  • summarizing complex data
  • Capability to learn and make recommendations
  • Black box approach hides complexity and allows
    for design of scalable system

18
Comparison
19
Statistics
  • SAS, SPSS
  • Pros - Established technology
  • Cons - Needs assumptions, nominal variable
    handling, management acceptance?

20
Visualization
  • Data visualization refers to technologies that
    support visualization of information
  • Includes digital images, GIS, multi-dimensions,
    3-D presentations, animations
  • http//www.almaden.ibm.com/cs/quest/demo/assoc/gen
    eral.html

21
Data Mining is Not a Silver Bullet
  • It does not
  • Find answers to questions you dont ask
  • Eliminate the need for domain experience
  • Remove the need for data analysis skills

22
Data Mining Software
  • http//www.kdnuggets.com/software/
  • http//www.attar.com/ download
  • http//www.cs.bham.ac.uk/anp/software.html
    software listing

23
Six Rules of Data Qualityby Ken Orr
  • 1. Data that is not used cannot be correct for
    very long
  • 2. Data Quality in an information system is a
    function of its use, not its collection
  • 3.Data quality will ultimately be no better than
    its most stringent use
  • 4. Data quality problems tend to become worse
    with the age of the system
  • 5. Less likely it is that some data element will
    change, more traumatic it will be when it finally
    does change.
  • 6. Information overload affects data quality

24
Data Quality Software
  • http//www.rulequest.com/gritbot-info.html

25
General DW Data transformation
  • Resolve inconsistent legacy formats
  • Strip out unwanted fields
  • Interpret codes into text
  • Combine data from multiple sources under a common
    key
  • Find fields used for multiple purposes and
    interpret fields value based on context

26
Data transformation for Data Mining
  • Flag normal, abnormal, out of bounds or
    impossible facts
  • Recognize random or noise values from context and
    mask out
  • Apply uniform treatment to NULL values
  • Flag fast records with changed status
  • Classify individual record by one of its
    aggregates

27
Conclusion
  • For successful data mining
  • data analysis and mining goals must be identifies
    and formulated
  • appropriate data must be selected, cleaned and
    prepared for queries and business analysis
  • http//www.rulequest.com/cubist-examples.htmlBOST
    ON
  • http//www.almaden.ibm.com/cs/quest/
Write a Comment
User Comments (0)
About PowerShow.com