Analysis of Reliance Home Comfort RHC Survey Data fragment PowerPoint PPT Presentation

presentation player overlay
1 / 17
About This Presentation
Transcript and Presenter's Notes

Title: Analysis of Reliance Home Comfort RHC Survey Data fragment


1
Analysis of Reliance Home Comfort (RHC) Survey
Data (fragment)
2
Objectives
  • To show potentials of Business Intelligent
    Solution in the development and analysis of
    complex survey study
  • Illustrate fruitfulness of synergy of statistical
    and data mining approaches in survey data
    analysis
  • Formulate new important business questions that
    can be answered only within data mining modeling
    paradigm

3
Brief description of 2008 Reliance Home Comfort
(RHC) Brand Ad Tracking Study
  • The study is dedicated to evaluation of client
    awareness and ability to recognize 7 the most
    popular Canada home comfort products and
    services
  • Reliance Home ComfortDirect EnergyLennoxCarrier
    Air OneSearsHome Depot
  • The phone household survey is conducted by agents
    who asked customers to identify at most 3 out of
    those 7 companies. Therefore, the number of
    recognized companies could be between 0 and 3.
  • The number of questions in the questionnaire was
    about 300, but the questionnaire had hierarchical
    structure, and average time to complete the
    survey was approximately 15 minutes.
  • Example of questions
  • When you think of COMPANIES that provide
    ESSENTIAL HOME COMFORT products and services,
    which company comes to mind FIRST?
  • Have you seen or heard any advertising from any
    companies that provide ESSENTIAL HOME COMFORT
    products and services in the past 3 months?

4
Executive Summary BI Solutions
  • Business Intelligence Solutions (BIS) is a well
    established statistical/data mining/GIS company
    that conducts business in the USA and Canada.
  • Our specialization is complex unstructured
    business problems for data rich firms. Our
    multidisciplinary team includes professionals in
    applied statistics, data mining, GIS, and
    software application development.
  • Among our employees there are professionals with
    PhD degree in diverse quantitative fields
    Applied Statistics, Data Mining/Machine Learning,
    Operations Research and Differential Equations
  • The team members are authors of more than 100
    published papers on diverse applications of data
    mining and other quantitative fields to market
    research, customer relationship management,
    pilot study design, etc. 
  • BIS has access to the best statistical,
    visualization, data mining and GIS software on
    the world market.
  • The essence of our approach is to understand and
    analyze our clients business problem and
    corresponding data through the prism of
    dissimilar statistical/data mining models. As a
    result we are always able to produce the best
    possible model /results and help our clients in
    the most effective and scientifically sound way.

4
5
Exploratory Data Analysis (EDA) and Data
Complexity
5
6
Example of Data Transformation
Exploratory Data Analysis (EDA) and data
preprocessing are a vital step of any data
analysis project
Original First Response (Q9) Frequency
Modified First Response (Q9) Frequency
Transformed Data, 9 categories
Categories MorEnergy, Prestige Home Comfort,
and Roy Inch Sons have no variance and do not
produce useful information in the analysis.
Therefore these categories should be aggregated.
This example demonstrates the necessity of these
preliminary steps it turns out that the
predictability of constructed variable
Modified First Response is much higher than
original First Response (Q9)
Original Data, 22 categories
6
6
7
Modified First Response (Q9) by Region (Q1)
Company comes to mind FIRST (Q9) is significantly
different (p-value for Chi-Square is 0.0003) for
different regions (Q3)
Region
Binary Q9
For Sudbury/Thunder Bay residents Reliance Home
Comport company comes to mind FIRST 6 times more
often than for Hamilton residents. Contrasting
RHC with aggregated other companies (Other), we
can note that Other has practically uniform
distribution. Therefore, the advertisement/marketi
ng of RHC In Burlington, Hamilton, and Oakville
have to be improved.
7
8
The 5-point scale statements (questions Q74a -
Q74f) should be analyzed separately for those
interviewees who heard about the company by word
of mouth, and who did not
Spearman correlation is non-parametric
(distribution free) measure of the relationship
between two variables
Q70a How to hear about the company Word of
mouth / Recommendation Yes
Just 2 pairs of questions out of 15 have
non-significant correlation
Different correlation structure
10 pairs of questions out of 15 have
non-significant correlation
Q70a How to hear about the company Word of
mouth / Recommendation No
8
8
9
Exploratory Data Analysis summary
  • RHC survey data analysis requires sophisticated
    approaches due to high complexity of the data.
  • The Complexity can be characterized by
  • High dimensionality (about 300 attributes/question
    s)
  • Uncharacterizable non-linearities
  • Hierarchy among attributes
  • Presence of differently scaled attributes
    (numeric, binary, and nominal)
  • Vast majority of attributes are nominal
  • Large percentage of categorical attributes with
    huge numbers of categories and non-uniform
    frequency distributions
  • Large percentage of missing values for some
    attributes/predictors

9
10
Data Mining Application (Decision Tree and
TreeNet) to Survey Data Analysis
10
11
Fragment of Decision Tree for Binary First
Response (Q9) Binary First Response RHC, or
OTHER
Example of If-Then scenario that can be
answered by Decision Tree If all interviewees
would give the highest score to the quality of
RHC products and services, how the probability of
First Response RHC will be changed?
Providing high quality products and services is a
great predictor of Binary First Response
Probability of First Response RHC jumps by
100 from 0.11 for the whole sample to 0.21 for
interviewees experienced good quality
RHC has a weak association with low quality of
products and services
11
12
TreeNet Intro
  • TreeNet (Stochastic Gradient Boosting) was
    invented in 1999 by Stanford University Professor
    Jerome Friedman. It is the most flexible and
    powerful data mining tool.
  • Salford Systems - a California based data mining
    software development company (http//www.salford-s
    ystems.com) has implemented and commercialized
    this invention as a TreeNet product in 2003. It
    was the first stochastic gradient boosting tool
    in the world data mining industry.
  • The intensive research has shown that TreeNet
    models are among the most accurate of any known
    modeling techniques.
  • TreeNet model is a non-parametric non-linear
    regression and can be described as a linear
    combination of a large amount of small trees.

12
13
Drivers of Modified First Response (Q9)
  • The most important predictor of values of the
    First Response (Q9) is Q75a (Age of interviewee)
  • Q1(Region) and Q78 (income) are examples of
    predictors with modest impact on First Response
    (Q9)
  • Q8 (Gender) is an example of a predictor that
    have no impact on First Response (Q9)

Predictor importance of the probability
(Modified First Response RHC)
13
13
14
Misclassification Rate TreeNet model for
Modified First Response (Q9) prediction
Cost Matrix
Cost of correct classification equals 0, and cost
of incorrect classification equals 1.
Prediction Accuracy (learning data- 60)
The Percent Error is the smallest for Reliance
Home Comfort (best accuracy) Pct Error
0.00. The Percent Error is the largest for
Union Energy (worst accuracy) Pct Error
27.59. On average, the prediction accuracy for
Modified First Response across all 9 Categories
is 15.79.
14
14
15
TreeNet model Impact of You mentioned that you
are familiar with RHC (Q15 ) on Probability of
Binary First Response (Q9) RHC, controlling for
all other predictors
Using the TreeNet model, it is possible to answer
diverse If Then business questions. For
example, if the response Telemarketing would be
increased by 10 , how the probability of First
Response RNC will be changed?
The highest positive impact on the Probability
of First Response (Q9) RHC
The highest negative impact on the Probability of
First Response (Q9) RHC
15
15
16
TreeNet summary
  • TreeNet algorithm has about 20 different options
    that can be controlled by a researcher.
  • Usage of default options did not produce a good
    model.
  • Determination of the best set of options/optimal
    model is time consuming and requires experience
    and expertise.
  • TreeNet is an appropriate tool for the analysis
    of complex survey data.
  • TreeNet is a perfect tool for
  • Prediction and Scoring
  • Estimation of a probability of an event of
    interest
  • Identification of predictor importance and
    drivers
  • If - Then scenario analysis

16
16
17
Conclusion
  • Typical survey data analysis questions are
  • Segmenting respondents
  • Drivers identification of question of interest
  • Relationship between different survey questions
  • Predictability of the answer to a question under
    consideration
  • Diverse If then scenarios
  • Combining primary and secondary data to answer
    unique business question
  • The essence of our approach is to understand and
    analyze our clients business problem and
    corresponding data through the prism of
    dissimilar statistical/data mining models.
  • Synergy of data mining and traditional statistics
    allows to extract maximum useful information from
    complex survey data.
  • As a result we are always able to produce the
    best possible model /results and help our clients
    in the most effective and scientifically sound
    way.

17
Write a Comment
User Comments (0)
About PowerShow.com