MSCS 282: Data Mining - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

MSCS 282: Data Mining

Description:

Craig A. Struble, Ph.D. Dept. of Mathematics, Statistics, and Computer Science ... MSCS 282: Data Mining - Craig A. Struble, Ph.D. 9. Typical Tasks in Data Mining ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 23
Provided by: CraigAS7
Learn more at: http://www.mscs.mu.edu
Category:
Tags: mscs | craig | data | mining

less

Transcript and Presenter's Notes

Title: MSCS 282: Data Mining


1
MSCS 282 Data Mining
  • Craig A. Struble, Ph.D.
  • Dept. of Mathematics, Statistics, and Computer
    Science
  • Marquette University
  • craig.struble_at_marquette.edu

2
Overview
  • Welcome
  • Syllabus
  • Student Introductions
  • Introduction to Data Mining
  • Data Mining and Knowledge Discovery
  • Typical Tasks
  • Models

3
Knowledge Discovery and Data Mining
  • Knowledge Discovery
  • The nontrivial process of identifying valid,
    novel, potentially useful, and ultimately
    understandable patterns in data. (Fayyad, et al
    1996)
  • Data Mining
  • A step in the knowledge discovery process
  • Application of algorithms to extract meaningful
    patterns
  • Data Dredging
  • Blind application of data mining techniques

4
Knowledge Discovery in Databases
Selection Transformation
Cleaning Integration
Evaluation Visualization
Data Mining
Data Warehouse
Prepared data
Patterns
Knowledge
Knowledge Base
Data
5
Typical Tasks in Data Mining
  • Classification
  • Prediction
  • Clustering
  • Association Analysis
  • Summarization

6
Typical Tasks in Data Mining
  • Classification
  • From data with known labels, create a classifier
    that determines which label to apply to a new
    observation
  • E.g. Label loan applications as low, medium, or
    high risk

7
Typical Tasks in Data Mining
  • Prediction
  • Given a collection of data with known numeric
    outputs, create a function that outputs a
    predicted value from a new set of inputs.
  • E.g. Given historical consumption of milk in the
    U.S., predict what the consumption will be over
    the next five years.

8
Typical Tasks in Data Mining
  • Clustering
  • Identify natural groupings in data
  • Unsupervised learning, no predefined groups
  • E.g. A city planner grouping houses by value,
    location, and house type.

9
Typical Tasks in Data Mining
  • Association Analysis
  • Identify relationships in data from co-occuring
    terms or items.
  • E.g. Analyze grocery store purchases to identify
    items most commonly purchased together. This is
    often used to create coupons and sales buy chips
    and get 0.50 off salsa.

10
Typical Tasks in Data Mining
  • Summarization
  • Given a data set, summarize the important
    characteristics of the data.
  • E.g. calculate mean and standard deviation,
    determine statistical distribution, identify most
    commonly appearing attribute values, etc.

11
Typical Tasks in Data Mining
  • Sequence Analysis
  • Given data collected over time, identify trends
    in the data that may be used to predict future
    events occuring
  • E.g. Analyzing stock data to identify stocks that
    will perform well vs. those that will perform
    poorly.

12
What is Data Mining?
  • Data Mining Process

No
Meet Criteria?
Fit a Model
Calculate Performance
Yes
Interpret Model
13
Data Mining Algorithms
  • Apply/create a model
  • A model is an abstract description of data
  • What is the models function? (i.e. what task
    does it perform?)
  • How is the model represented? (I.e. mathematical
    function, rules, Gaussian distribution)

14
Data Mining Algorithms
  • Determine the preference criterion
  • In the face of two models, which one is better
  • Examples goodness of fit, prediction accuracy,
    size/complexity, etc.
  • Search algorithm
  • Good models are found by searching the space of
    all possible models
  • How is this space organized and searched?

15
Data Mining Models
  • Mathematical Functions
  • Mathematical combination of attribute values
  • E.g. linear model, non-linear model, support
    vectors, etc.
  • CPU performance prediction

16
Data Mining Models
  • Decision Trees

Study
lt10 hours
gt 10 hours
Do Homework
Test Well
Yes
No
No
Yes
Test Well
C
F
C
No
Yes
A
B
17
Data Mining Models
  • Neural Networks

0.8
0.23
-0.48
0.5
0.67
1.5
1.93
-0.88
-0.81
-0.4
0.18
18
Data Mining Models
  • Mixture Models

19
Data Mining Models
  • Bayesian Networks

Earthquake
Burglary
Alarm
John Calls
Mary Calls
20
Searching the Model Space
  • Concept generalization is searching
  • Almost all search algorithms are heuristic
  • Optimal models are not guaranteed
  • Enumerating the space involve bias
  • Language bias what the model can represent
  • Search bias which models are ignored
  • Overfitting-avoidance bias how models are
    simplified to handle outliers

21
Searching the Model Space
Study
lt10 hours
gt 10 hours
Do Homework
Test Well
Yes
Model 2
Yes
No
No
Test Well
C
F
C
Yes
No
A
B
Model 1
22
Goals of the Course
  • Understand the KDD process
  • Know major data mining tasks
  • Learn common data mining models
  • Be able to mine moderate sized data sets
  • Learn how to evaluate data mining algorithms
  • Learn how to incorporate data mining into a
    project or product
Write a Comment
User Comments (0)
About PowerShow.com