The Marriage of Market Basket Analysis to Predictive Modeling - PowerPoint PPT Presentation

About This Presentation
Title:

The Marriage of Market Basket Analysis to Predictive Modeling

Description:

Sparsity: new columns have a preponderance of missing values; e.g., id-2 will ... especially when there is such a preponderance of the same values (e.g., zeros ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 23
Provided by: integrated96
Category:

less

Transcript and Presenter's Notes

Title: The Marriage of Market Basket Analysis to Predictive Modeling


1
The Marriage of Market Basket Analysis to
Predictive Modeling
  • Sanford Gayle

2
How Would You Mine This Transactional Data?
3
Is Data Mining Simply Market Basket Analysis?
4
Market Basket Analysis identifies the rule
/our_company/bboard/?hr/café/ but
  • How do you use this information?
  • Can the information be used to develop a
    predictive model?
  • More generally, how do you develop predictive
    models using transactional tables?

5
Data Mining Software Objectives
  • Predictive Modeling
  • Clustering
  • Market Basket Analysis
  • Feature Discovery that is, improve the
    predictive accuracy of existing models

6
Agenda
  • Converting a transactional to a
    modeling table
  • The curse of dimensionality possible fixes
  • A feature discovery process using market basket
    analysis output as an input to predictive
    modeling
  • A dimensional reduction scheme using confidence

7
DM Table Structures
  • Transactional tables (Market Basket Analysis)
  • Trans-id page spend count
  • id-1 page1 0 1
  • id-1 page2 0 1
  • id-1 page3 0 1
  • id-1 page4 19.99 1
  • id-1 page5 0 1
  • id-2 page1 0 1
  • Modeling tables (modeling clustering tools)
  • Trans-id page spend count
  • id-1 . 19.95 5
  • id-2 . 0 1

8
Converting Transactional Into Modeling Data
  • Continuous variable case - easy
  • Collapse the spend or count columns via the sum,
    mean, or frequency statistic for each
    transaction-id value
  • Proc sql
    create table new as
    select id,sum(amount) as total from old

    group by id
  • Categorical variable case - challenging
  • It seems the detail page information is lost when
    the rows are rolled-up or collapsed
  • However, with transposition you collapse the rows
    onto a single row for each id, with each distinct
    page now being a column in the modeling table and
    taking the count or sum statistic as its value

9
The Input Discovery Process
  • Existing modeling table contains
  • id-1, age, income, job-category, married,
    recency, frequency, zip-code
  • New potential predictors per transpose contains
  • id-1, spend on page1, spend on page2, spend on
    page3, spend on page4, spend on page5
  • Augment existing modeling table with the new
    inputs and, hopefully, discover new, significant
    predictors to improve predictive accuracy

10
Problem with Transpose Method
  • Suppose the server has 1,000 distinct pages the
    transpose method now produces 1,000 new columns
    instead of 5
  • Sparsity new columns have a preponderance of
    missing values e.g., id-2 will have 5 missing
    values and the 1 non-missing
  • Regression, Neural, and Cluster tools struggle
    with this many variables, especially when there
    is such a preponderance of the same values (e.g.,
    zeros or missing)

11
The Curse of Dimensionality
  • Suppose interest lies in a second classification
    column too e.g., both time (hour) and page
    visited
  • Transpose method now produces 1,00024 new
    variables, assuming no interest in interactions
  • If interactions are of interest, then there will
    be 24,000 (1,000x24) new variable generated

12
General Fix
  • Reduce the number of levels of the categorical
    variable (e.g., using confidence)
  • Use the transpose method to convert the
    transactional to a modeling table
  • Add the new inputs to the traditional modeling
    table in an effort to improve predictive accuracy

13
Creating Rules-Based Dummy Variables
  • Obtain rules using market basket analysis
  • Choose the rule of interest
  • Identify folks having the rule of interest in
    their market basket
  • Create a dummy variable flagging them
  • Augment the traditional modeling table with the
    dummy variable
  • Use the dummy variable as an input or target in a
    predictive modeling tool

14
Using SQL to Identify Folks Having a Rule of
Interest in Their Market Basket
15
Creating a Rule-Based Dummy Variable
16
The All-Info Table
17
Feature Discovery A new potential predictor or
input
18
Possible Sub-setting Criteria
  • Any rule of interest
  • The confidence - e.g., all rules having
    confidence gt 100 (optimal level of confidence?)
  • The support - e.g., all rules having support gt
    10 (optimal level of support?)
  • The lift - e.g., all rules having lift gt 5
    (optimal level of lift)

19
Using Confidence as the Basis for a
Reclassification Scheme
  • Suppose diapers?beer has a confidence of 100
  • Then the two levels diapers beer can be
    mapped into the value diapers?beer, it seems
  • Actually, both the rule and its reverse must have
    a confidence of 100

20
The Confidence Reclassification Scheme
  • If confidence for the rule and its opposite is
    gt80, then combine the two levels into the
    rule-based level
  • e.g., page1 page2 both mapped into
    page1?page2
  • Using 80 instead of 100 will introduce
    inaccuracy, but the analyst overwhelmed with too
    many levels will likely be willing to substitute
    a little accuracy for dimensional reduction

21
The Confidence Reclassification Scheme
  • Use the transpose method to generate candidate
    predictors
  • Augment the traditional modeling table with the
    new candidate predictors table
  • Develop an enhanced model using some of the
    candidate predictors in the hope of fostering
    predictive accuracy

22
Contact Information
  • Sanford.Gayle_at_sas.com
Write a Comment
User Comments (0)
About PowerShow.com