Combining Cross Sectional Market Research Surveys: An Application of Statistical Matching - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Combining Cross Sectional Market Research Surveys: An Application of Statistical Matching

Description:

... Sectional Market Research Surveys: An Application of Statistical Matching ... the method perform when non-matching variables change slowly, systematically, ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 43

Provided by: janeca9

Category:

more less

Transcript and Presenter's Notes

Title: Combining Cross Sectional Market Research Surveys: An Application of Statistical Matching

1
Combining Cross Sectional Market Research
Surveys An Application of Statistical Matching

Catherine Frethey-Bentham

Most Quantitative Market Research in New Zealand
uses Cross-sectional Data

Useful because
3

We get a snapshot of a given situation at any
point in time

Relatively simple, easy and timely to conduct

E.g., change in the proportion of mobile phone
users from one year to the next.

Allows estimation of net changes that is,
changes at the aggregate level

Representative of population

But cross-sectional data is not always ideal.
Consider this situation

5
Time 1
Time 2
6
Limitations of Cross-Sectional Surveys

Net changes only
Which segment did segment five gain most of
its members from?
Are there other changes between segments that
are not obvious on the surface?
No past history, memory, to see what happens
to individuals
Require data that provide more detail about
how people are changing

Now, consider this situation

8
Time 1
Time 2
9
Example Profiling Change
Market Segments - time 1
Market Segments - time 2
1 2 3 4 5
1
95
2
73
25
3
96
4
62
37
10

Longitudinal data has many benefits, including

Some phenomena are inherently longitudinal in
nature
As researchers we should be able to capture this
E.g., Links between current events and outcomes
and past history
Such ascurrent brand choice and past brand
choice

12
Longitudinal Data Permit

Tracing the dynamics of behaviours
Can observe how circumstances change with time
spent in state
Identifying the influence of past behaviours on
current behaviours
Ability to make causal inference enhanced by
temporal ordering
Repeated observations on individuals allow for
possibility of controlling for unobserved
individual characteristics (measurement error)

But longitudinal Data is not without its
limitations

14
time
Cost

Attrition rates estimated at 6-50 between two
survey rounds

Panel Attrition

Panel Conditioning

15
A Little Background to my Problem

Otago Lifestyles Study
Four cross-sectional consumer lifestyles studies
undertaken by The University of Otago.
Exploring psychographics, consumer lifestyles and
demographics of New Zealanders at a point in time
Initial desire to conduct a study to explore
change in lifestyle segments over time but
problems with repeated cross-sectional studies

Study 1
Study 2
Study 3
Study 4
1989
2005
2000
1995/96
Note Samples are independent but drawn from same
population.
16

There must be another way
Data Fusion???

17
Data Fusion

Data Fusion (Statistical Matching)
Typically involves matching each unit in one
database with similar (but not identical) units
in another database
Traditional Uses
Typically used to explore dependency
relationships when one data set contains
independent variables and another contains
dependent variables
E.g., Media research - to merge dependent and
independent variables into one data set
Media consumption and product purchased

18
Research Objective

To use data fusion (statistical matching) methods
to develop a methodology capable of modelling
gross change using repeated cross-sectional
surveys
The intention is to create pseudo panel data
that depicts change in consumer lifestyles over
time

35 still purchase brand X
45 purchase brand X
10 dont
Time one
Time two
19
Research Summary
Time 1
Time 1
Time 2
Ageing Characteristics
Common Variables
Matching Cohorts
Population Weightings
Respondent Set A
Respondent Set B
Respondent Set A
Note Respondent Set A and Respondent Set B are
independent of one another
20
Design over Multiple Time Periods
21
Research Summary
Time 1
Time 2
Respondent Set A
Respondent Set B
Matching Changeable
Cluster Characteristics (t1) Characteristics
(t1) Membership (t1) E.g., Age, Gender E.g.,
TV Viewing 4 .. .
Matching Changeable
Cluster Characteristics (t2) Characteristics
(t2) Membership (t2) E.g., Age, Gender E.g.,
TV Viewing 6 .. .
Match (t1) Match (t2)
Behavioural (t1) Behavioural (t2) Cluster
(t1) Cluster (t2) Match 1 ..
. . 4 6

Match 2 .. .
.
22
Example Profiling Change
Market Segments - time 1
Market Segments - time 2
1 2 3 4 5
Creating a singular merged database allows the
exploration of gross change across time
1
95
2
73
25
3
96
4
62
37
23
Procedure

Step One
Validation Study
To test how accurate the results are compared to
panel data
Using ACNielsens PanelViews (Australia) panel
data

Initial Results

25
Consider the question.
Do individuals report buying more or fewer
environmentally friendly products than before?
26
Initial Results
Example I try to buy environmentally friendly
products (ENVIRON)
Year 2001
Year 2000
27
Initial Results
Response 2001
28
Initial Results
Response 2001
29
Initial Results
Response 2001
30
Initial Results
Response 2001
31
Initial Results
Response 2001
32
Example I try to buy environmentally friendly
products (ENVIRON)
33
So, does it always work this well?

No such luck!

Response 2001
34
Main Assumption of Matching

Conditional Independence Assumption (CIA)
Y1 and Y2 are conditionally independent given X
I.e., the common variables contain all the
information about the relationship between Y1 and
Y2

Time 1
Time 2
Behavioural Traits (Y2) e.g. purchase habits
Behavioural Traits (Y1) e.g. purchase habits
Common Variables (Xi) e.g. income, materialism
35
Overcoming the CIA Problem

Can utilise multiple regression as a tool to help
choose common variables
A large R-square when a Y variable is regressed
on common (X) variables (at any given time
period) is a necessary, but not sufficient,
condition for a good match
A good match is also dependent on the patterns of
the residuals from regression analyses
We must use the panel data to obtain these
residuals

36
Patterns in the residuals

Looking at the residuals from the panel data

Variable ENVIRON
Variable BUDGET
Residuals - 2000
Residuals - 2000
Residuals - 2001
Residuals - 2001
There is still some pattern here that our common
variables are unable to explain
Good outcome no/little pattern in the residuals
37
Procedure

Step Two
Simulation Study
Undertaken to simulate, and understand the
effects of, different scenarios that might occur
How does the method perform when non-matching
variables change slowly, systematically,
rapidly/instantaneously?
How does the method perform with different sample
sizes?

38
Procedure

Step Three
Apply Method
Using TNS New Zealand Lifestyle and Opinions
Survey
Series of cross-sectional studies collected
annually
Same lifestyles and demographic questions
collected at each phase
Data collected between 1998 and 2005
Sample of approximately 8000 individuals at each
period, randomly selected from New Zealand
population

Thank You!

40
Issues for Further Investigation

Must have same variables across studies
Cannot account for new (and different) members of
the population
Error introduced between time periods due to
matching

41
Technical Notes

Preparation of the Matching Framework
Population weightings
All data weighted so representative of the
Australian population
Matching cohorts based on age groups
16-34, 35-44, 45-54, 55
Choice of common variables
Between seven and ten common variables used -
combination of demographic and psychographic
variables
Data adjusted for change over time using
secondary data from The Australian Bureau of
Statistics

42
Technical Notes

Use of unconstrained matching algorithm
nearest neighbour method - achieves best matches
possible at an individual unit level
Results from survey at time one (donors) fused
onto those at time two (recipients)
Minimum distance between matched elements good
Using Gowers distance
Initially a few elements matched too many times
Constraints on the number of matches imposed
where necessary