Title: Combining Cross Sectional Market Research Surveys: An Application of Statistical Matching
1Combining Cross Sectional Market Research
Surveys An Application of Statistical Matching
- Catherine Frethey-Bentham
2-
- Most Quantitative Market Research in New Zealand
uses Cross-sectional Data
Useful because
3- We get a snapshot of a given situation at any
point in time
Relatively simple, easy and timely to conduct
- E.g., change in the proportion of mobile phone
users from one year to the next.
- Allows estimation of net changes that is,
changes at the aggregate level
- Representative of population
4- But cross-sectional data is not always ideal.
- Consider this situation
5Time 1
Time 2
6Limitations of Cross-Sectional Surveys
- Net changes only
- Which segment did segment five gain most of
its members from? - Are there other changes between segments that
are not obvious on the surface? - No past history, memory, to see what happens
to individuals - Require data that provide more detail about
how people are changing
7- Now, consider this situation
8Time 1
Time 2
9Example Profiling Change
Market Segments - time 1
Market Segments - time 2
1 2 3 4 5
1
95
2
73
25
3
96
4
62
37
10- Longitudinal data has many benefits, including
11- Some phenomena are inherently longitudinal in
nature - As researchers we should be able to capture this
- E.g., Links between current events and outcomes
and past history - Such ascurrent brand choice and past brand
choice
12Longitudinal Data Permit
- Tracing the dynamics of behaviours
- Can observe how circumstances change with time
spent in state - Identifying the influence of past behaviours on
current behaviours - Ability to make causal inference enhanced by
temporal ordering - Repeated observations on individuals allow for
possibility of controlling for unobserved
individual characteristics (measurement error)
13- But longitudinal Data is not without its
limitations
14time
Cost
- Attrition rates estimated at 6-50 between two
survey rounds
15A Little Background to my Problem
- Otago Lifestyles Study
- Four cross-sectional consumer lifestyles studies
undertaken by The University of Otago. - Exploring psychographics, consumer lifestyles and
demographics of New Zealanders at a point in time - Initial desire to conduct a study to explore
change in lifestyle segments over time but
problems with repeated cross-sectional studies
Study 1
Study 2
Study 3
Study 4
1989
2005
2000
1995/96
Note Samples are independent but drawn from same
population.
16- There must be another way
- Data Fusion???
17Data Fusion
- Data Fusion (Statistical Matching)
- Typically involves matching each unit in one
database with similar (but not identical) units
in another database - Traditional Uses
- Typically used to explore dependency
relationships when one data set contains
independent variables and another contains
dependent variables - E.g., Media research - to merge dependent and
independent variables into one data set - Media consumption and product purchased
18Research Objective
- To use data fusion (statistical matching) methods
to develop a methodology capable of modelling
gross change using repeated cross-sectional
surveys - The intention is to create pseudo panel data
that depicts change in consumer lifestyles over
time
35 still purchase brand X
45 purchase brand X
10 dont
Time one
Time two
19Research Summary
Time 1
Time 1
Time 2
Ageing Characteristics
Common Variables
Matching Cohorts
Population Weightings
Respondent Set A
Respondent Set B
Respondent Set A
Note Respondent Set A and Respondent Set B are
independent of one another
20Design over Multiple Time Periods
21Research Summary
Time 1
Time 2
Respondent Set A
Respondent Set B
Matching Changeable
Cluster Characteristics (t1) Characteristics
(t1) Membership (t1) E.g., Age, Gender E.g.,
TV Viewing 4 .. .
Matching Changeable
Cluster Characteristics (t2) Characteristics
(t2) Membership (t2) E.g., Age, Gender E.g.,
TV Viewing 6 .. .
Match (t1) Match (t2)
Behavioural (t1) Behavioural (t2) Cluster
(t1) Cluster (t2) Match 1 ..
. . 4 6
Match 2 .. .
.
22Example Profiling Change
Market Segments - time 1
Market Segments - time 2
1 2 3 4 5
Creating a singular merged database allows the
exploration of gross change across time
1
95
2
73
25
3
96
4
62
37
23Procedure
- Step One
- Validation Study
- To test how accurate the results are compared to
panel data - Using ACNielsens PanelViews (Australia) panel
data
24 25Consider the question.
Do individuals report buying more or fewer
environmentally friendly products than before?
26Initial Results
Example I try to buy environmentally friendly
products (ENVIRON)
Year 2001
Year 2000
27Initial Results
Response 2001
28Initial Results
Response 2001
29Initial Results
Response 2001
30Initial Results
Response 2001
31Initial Results
Response 2001
32Example I try to buy environmentally friendly
products (ENVIRON)
33So, does it always work this well?
Response 2001
34Main Assumption of Matching
- Conditional Independence Assumption (CIA)
- Y1 and Y2 are conditionally independent given X
- I.e., the common variables contain all the
information about the relationship between Y1 and
Y2
Time 1
Time 2
Behavioural Traits (Y2) e.g. purchase habits
Behavioural Traits (Y1) e.g. purchase habits
Common Variables (Xi) e.g. income, materialism
35Overcoming the CIA Problem
- Can utilise multiple regression as a tool to help
choose common variables - A large R-square when a Y variable is regressed
on common (X) variables (at any given time
period) is a necessary, but not sufficient,
condition for a good match - A good match is also dependent on the patterns of
the residuals from regression analyses - We must use the panel data to obtain these
residuals
36Patterns in the residuals
- Looking at the residuals from the panel data
Variable ENVIRON
Variable BUDGET
Residuals - 2000
Residuals - 2000
Residuals - 2001
Residuals - 2001
There is still some pattern here that our common
variables are unable to explain
Good outcome no/little pattern in the residuals
37Procedure
- Step Two
- Simulation Study
- Undertaken to simulate, and understand the
effects of, different scenarios that might occur - How does the method perform when non-matching
variables change slowly, systematically,
rapidly/instantaneously? - How does the method perform with different sample
sizes?
38Procedure
- Step Three
- Apply Method
- Using TNS New Zealand Lifestyle and Opinions
Survey - Series of cross-sectional studies collected
annually - Same lifestyles and demographic questions
collected at each phase - Data collected between 1998 and 2005
- Sample of approximately 8000 individuals at each
period, randomly selected from New Zealand
population
39 40Issues for Further Investigation
- Must have same variables across studies
- Cannot account for new (and different) members of
the population - Error introduced between time periods due to
matching
41Technical Notes
- Preparation of the Matching Framework
- Population weightings
- All data weighted so representative of the
Australian population - Matching cohorts based on age groups
- 16-34, 35-44, 45-54, 55
- Choice of common variables
- Between seven and ten common variables used -
combination of demographic and psychographic
variables - Data adjusted for change over time using
secondary data from The Australian Bureau of
Statistics
42Technical Notes
- Use of unconstrained matching algorithm
- nearest neighbour method - achieves best matches
possible at an individual unit level - Results from survey at time one (donors) fused
onto those at time two (recipients) - Minimum distance between matched elements good
- Using Gowers distance
- Initially a few elements matched too many times
- Constraints on the number of matches imposed
where necessary