Title: Analysis of Reliance Home Comfort RHC Survey Data fragment
1Analysis of Reliance Home Comfort (RHC) Survey
Data (fragment)
2Objectives
- To show potentials of Business Intelligent
Solution in the development and analysis of
complex survey study - Illustrate fruitfulness of synergy of statistical
and data mining approaches in survey data
analysis - Formulate new important business questions that
can be answered only within data mining modeling
paradigm
3Brief description of 2008 Reliance Home Comfort
(RHC) Brand Ad Tracking Study
- The study is dedicated to evaluation of client
awareness and ability to recognize 7 the most
popular Canada home comfort products and
services - Reliance Home ComfortDirect EnergyLennoxCarrier
Air OneSearsHome Depot - The phone household survey is conducted by agents
who asked customers to identify at most 3 out of
those 7 companies. Therefore, the number of
recognized companies could be between 0 and 3. - The number of questions in the questionnaire was
about 300, but the questionnaire had hierarchical
structure, and average time to complete the
survey was approximately 15 minutes. - Example of questions
- When you think of COMPANIES that provide
ESSENTIAL HOME COMFORT products and services,
which company comes to mind FIRST? - Have you seen or heard any advertising from any
companies that provide ESSENTIAL HOME COMFORT
products and services in the past 3 months?
4Executive Summary BI Solutions
- Business Intelligence Solutions (BIS) is a well
established statistical/data mining/GIS company
that conducts business in the USA and Canada. - Our specialization is complex unstructured
business problems for data rich firms. Our
multidisciplinary team includes professionals in
applied statistics, data mining, GIS, and
software application development. - Among our employees there are professionals with
PhD degree in diverse quantitative fields
Applied Statistics, Data Mining/Machine Learning,
Operations Research and Differential Equations - The team members are authors of more than 100
published papers on diverse applications of data
mining and other quantitative fields to market
research, customer relationship management,
pilot study design, etc. - BIS has access to the best statistical,
visualization, data mining and GIS software on
the world market. - The essence of our approach is to understand and
analyze our clients business problem and
corresponding data through the prism of
dissimilar statistical/data mining models. As a
result we are always able to produce the best
possible model /results and help our clients in
the most effective and scientifically sound way.
4
5Exploratory Data Analysis (EDA) and Data
Complexity
5
6Example of Data Transformation
Exploratory Data Analysis (EDA) and data
preprocessing are a vital step of any data
analysis project
Original First Response (Q9) Frequency
Modified First Response (Q9) Frequency
Transformed Data, 9 categories
Categories MorEnergy, Prestige Home Comfort,
and Roy Inch Sons have no variance and do not
produce useful information in the analysis.
Therefore these categories should be aggregated.
This example demonstrates the necessity of these
preliminary steps it turns out that the
predictability of constructed variable
Modified First Response is much higher than
original First Response (Q9)
Original Data, 22 categories
6
6
7Modified First Response (Q9) by Region (Q1)
Company comes to mind FIRST (Q9) is significantly
different (p-value for Chi-Square is 0.0003) for
different regions (Q3)
Region
Binary Q9
For Sudbury/Thunder Bay residents Reliance Home
Comport company comes to mind FIRST 6 times more
often than for Hamilton residents. Contrasting
RHC with aggregated other companies (Other), we
can note that Other has practically uniform
distribution. Therefore, the advertisement/marketi
ng of RHC In Burlington, Hamilton, and Oakville
have to be improved.
7
8The 5-point scale statements (questions Q74a -
Q74f) should be analyzed separately for those
interviewees who heard about the company by word
of mouth, and who did not
Spearman correlation is non-parametric
(distribution free) measure of the relationship
between two variables
Q70a How to hear about the company Word of
mouth / Recommendation Yes
Just 2 pairs of questions out of 15 have
non-significant correlation
Different correlation structure
10 pairs of questions out of 15 have
non-significant correlation
Q70a How to hear about the company Word of
mouth / Recommendation No
8
8
9Exploratory Data Analysis summary
- RHC survey data analysis requires sophisticated
approaches due to high complexity of the data. - The Complexity can be characterized by
- High dimensionality (about 300 attributes/question
s) - Uncharacterizable non-linearities
- Hierarchy among attributes
- Presence of differently scaled attributes
(numeric, binary, and nominal) - Vast majority of attributes are nominal
- Large percentage of categorical attributes with
huge numbers of categories and non-uniform
frequency distributions - Large percentage of missing values for some
attributes/predictors
9
10Data Mining Application (Decision Tree and
TreeNet) to Survey Data Analysis
10
11Fragment of Decision Tree for Binary First
Response (Q9) Binary First Response RHC, or
OTHER
Example of If-Then scenario that can be
answered by Decision Tree If all interviewees
would give the highest score to the quality of
RHC products and services, how the probability of
First Response RHC will be changed?
Providing high quality products and services is a
great predictor of Binary First Response
Probability of First Response RHC jumps by
100 from 0.11 for the whole sample to 0.21 for
interviewees experienced good quality
RHC has a weak association with low quality of
products and services
11
12TreeNet Intro
- TreeNet (Stochastic Gradient Boosting) was
invented in 1999 by Stanford University Professor
Jerome Friedman. It is the most flexible and
powerful data mining tool. - Salford Systems - a California based data mining
software development company (http//www.salford-s
ystems.com) has implemented and commercialized
this invention as a TreeNet product in 2003. It
was the first stochastic gradient boosting tool
in the world data mining industry. - The intensive research has shown that TreeNet
models are among the most accurate of any known
modeling techniques. - TreeNet model is a non-parametric non-linear
regression and can be described as a linear
combination of a large amount of small trees.
12
13Drivers of Modified First Response (Q9)
- The most important predictor of values of the
First Response (Q9) is Q75a (Age of interviewee)
- Q1(Region) and Q78 (income) are examples of
predictors with modest impact on First Response
(Q9) - Q8 (Gender) is an example of a predictor that
have no impact on First Response (Q9) -
-
Predictor importance of the probability
(Modified First Response RHC)
13
13
14Misclassification Rate TreeNet model for
Modified First Response (Q9) prediction
Cost Matrix
Cost of correct classification equals 0, and cost
of incorrect classification equals 1.
Prediction Accuracy (learning data- 60)
The Percent Error is the smallest for Reliance
Home Comfort (best accuracy) Pct Error
0.00. The Percent Error is the largest for
Union Energy (worst accuracy) Pct Error
27.59. On average, the prediction accuracy for
Modified First Response across all 9 Categories
is 15.79.
14
14
15TreeNet model Impact of You mentioned that you
are familiar with RHC (Q15 ) on Probability of
Binary First Response (Q9) RHC, controlling for
all other predictors
Using the TreeNet model, it is possible to answer
diverse If Then business questions. For
example, if the response Telemarketing would be
increased by 10 , how the probability of First
Response RNC will be changed?
The highest positive impact on the Probability
of First Response (Q9) RHC
The highest negative impact on the Probability of
First Response (Q9) RHC
15
15
16TreeNet summary
- TreeNet algorithm has about 20 different options
that can be controlled by a researcher. - Usage of default options did not produce a good
model. - Determination of the best set of options/optimal
model is time consuming and requires experience
and expertise. - TreeNet is an appropriate tool for the analysis
of complex survey data. - TreeNet is a perfect tool for
- Prediction and Scoring
- Estimation of a probability of an event of
interest - Identification of predictor importance and
drivers - If - Then scenario analysis
16
16
17Conclusion
- Typical survey data analysis questions are
- Segmenting respondents
- Drivers identification of question of interest
- Relationship between different survey questions
- Predictability of the answer to a question under
consideration - Diverse If then scenarios
- Combining primary and secondary data to answer
unique business question - The essence of our approach is to understand and
analyze our clients business problem and
corresponding data through the prism of
dissimilar statistical/data mining models. - Synergy of data mining and traditional statistics
allows to extract maximum useful information from
complex survey data. - As a result we are always able to produce the
best possible model /results and help our clients
in the most effective and scientifically sound
way.
17