Complex Survey Samples - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Complex Survey Samples

Description:

Complex Survey Samples. Explaining the Miracle: Statistics and Analysis in Public ... Modeller's adage. If weighted and unweighted differ then, both are wrong ... – PowerPoint PPT presentation

Number of Views:282
Avg rating:3.0/5.0
Slides: 42
Provided by: susan288
Category:

less

Transcript and Presenter's Notes

Title: Complex Survey Samples


1
Complex Survey Samples
  • Explaining the Miracle Statistics and Analysis
    in Public Health
  • APHEO Conference 2007, October 14-16, 2007

Susan Bondy, Department of Public Health
Sciences, University of Toronto
2
Outline
  • Goals of complex survey analysis
  • What is simple, what is complex
  • Issues and implications of complexities
  • Working with software
  • Tips for working with expert analysts

3
What we report from surveys
  • Descriptive statistics
  • Mean, median, counts, totals
  • Measures of difference, association and effect
  • diff, risk diff, OR, RR, rho, etc.
  • Always reported with expression of variance
  • Margin of Error (MOE or /- part)
  • Confidence intervals
  • Point estimate versus variance

4
Meet two users of survey data
The Modeller
The Describer
5
  • The describer
  • Population inference is 1
  • ALWAYS need true popn rep. samples
  • Sometimes just descriptive statistics (rates)
  • Interest in comparisons
  • monitoring and surveillance (e.g., across time,
    space, sub-populations)
  • Consistency is important
  • The modeller
  • Hypothesis tests are 1
  • Analyses simulate controlled experiments
  • Rarely need true popn rep. samples
  • Interest in comparison
  • Replication of experiments
  • Differences between studies more interesting
  • Extending and testing theory

6
Complex samples
7
Simple Random Sample
  • Selection into sample is entirely at random
  • Each member of pop has same chance of being in
    the sample
  • No strata, no clusters, self-weighting
  • Statistically efficient (all observations are
    independent tightest margins of error)

8
Complex designs
  • Selection by cluster
  • Stratification
  • Probability sample weights
  • Finite population correction
  • Worst of all
  • Mishmashes of all the above
  • where you cant have the information

9
Cluster sampling
10
Cluster sampling
  • E.g., people by FAMILY, students by CLASS, teeth
    by MOUTH , etc.,
  • Now WELL recognized as a problem
  • Non-independence means loss of statistical power
    (variance understated, if ignored)
  • Need
  • New statistics textbooks
  • More expensive software
  • will return to software options

11
Sample logistic results
12
Repeat after me
  • Failure to account for non-independence of
    observations, in the analysis, will always result
    in an underestimation of variances
  • Confidence intervals narrower
  • p-values smaller
  • results less conservative
  • than they should be

13
Stratification
14
What is stratification?
  • Division of the target population into groups or
    layers from which samples are drawn
  • e.g., Plan for reports on
  • Youth
  • Smaller popn regions

15
Goals of stratification
  • For PLANNED descriptions of sub-populations
  • E.g., regions, age-groups
  • For design correction
  • To prevent extreme unrepresentativeness
  • e.g., empty groups extreme weights
  • To improve precision of the overall (or full pop)
    estimates

Implications
16
They come as a pair
WEIGHTS
Stratification
17
Impact of weights in analysis
  • Impacts precision a huge DEFF issue
  • Other model problems
  • E.g., can create highly influential observations
  • Restricts software and analysis choices
  • When, why of weights?

18
Repeat after me
  • You knew clustering affected variance estimates
    and had to be taken into account
  • Sometimes WEIGHTS have an even bigger bad effect
    on precision !
  • Always use software and procedures specific to
    complex survey data, even when weighting is your
    only complexity.

19
But wait a minute, Ive been told unweighted is
sometimes better
20
Scenario A
People up-weighted
People down-weighted
Weighted or unweighted is same slope !

21
Scenario B
  • Something correlated with relative weights is
    associated with a different slope

Low educ.
Readiness to quit
Over educated
Exposure to materials
Weighted
22
Scenario C
Annoyance ratings ()
Distance from airport (km)
Weighted slope
Unweighted slope
23
Scenario C
Annoyance ratings ()
Distance from airport (km)
Weighted or unweighted curve
24
Modellers adage
  • If weighted and unweighted differ then, both are
    wrong
  • There must be a complex relationship, or better
    model, to find and describe

25
Pub. Hlth. Epis. are always DESCRIBERS
26
Scenario B
  • Something correlated with relative weights is
    associated with a different slope

Low educ.
Readiness to quit
Over educated
Exposure to materials
Popn weighted is TRUE population estimate of
net or average effect
27
Model all possible interactions with age, sex and
geography strata?
  • Yes,
  • Do look for effect modification where there are
    good grounds (show net and specific data)
  • No,
  • In hundreds of agesexregion strata, some random
    variation by chance
  • In large samples lots of meaningless interactions
    can be detected
  • Pop average effect is still pop average effect

28
Message so far
  • Can never ignore
  • Cluster sampling
  • Weighting
  • So, HOW to analyze data?

29
2 most commonly used for complex survey variance
estimation
  • Taylor-Series
  • aka
  • Linearized variance estimation

Bootstrap Usually achieved using bootstrap
replicate resampling weights
30
Taylor Series
  • Complex linear equations to estimate corrected
    variance for every estimate
  • Requires assumptions about data !
  • Normally distribution assumptions
  • Large sample sizes
  • Very difficult for user to know
  • when limits are being pushed
  • When procedure is accepted or controversial
  • Requires full design information
  • Even more approximate with more complex designs

31
Using Taylor-series type software
  • 1) Use syntax (or even boxes) to declare the
    following
  • Weight variable
  • Stratification variable
  • Group unit for cluster sampling
  • Primary sampling unit or PSU
  • (Ignore requests for finite population info)
  • 2) Run your analysis as available in software
  • Using only special commands for complex samples

32
Survey estimates
  • Prevalence 13.0 (95 CI 10.0-16.0)
  • Odds ratio 2.1 (95 CI 1.6-4.0)

Usual weighted point estimate
Variance calculated from a formula substituted
in things like CIs
33
Bootstrap variance weights
  • Sampling variability observed not calculated
    from a fixed formula
  • Felt to reflect true sampling variability,
  • As due to chance alone if survey really repeated
    an infinite number of times
  • Virtually free of assumptions
  • Tends to be more appropriate and conservative
    when assumptions for linearization fails
  • Very broadly applicable

34
Creation of BRR weights
  • Someone takes a lot of random COMPLEX sub-samples
    of the full survey dataset (500 times)
  • The full algorithm for popn weighting is applied
    to each sub-sample
  • When obs not in sample, weightzero
  • Rest re-weighted to reflect popn again
  • RESULT
  • 500 weights,
  • When applied to full dataset, simulates taking
    500 samples again

35
Bootstrapping (with weights)
  • Point estimates taken from full sample
  • Mean 13.0
  • Same point estimate taken from 500 B.S. samples
  • Observed variability in 500 B.S. estimates
    becomes variance for mean of 13.0.

36
Survey estimates
  • Prevalence 13.0 (95 CI 10.0-16.0)
  • Odds ratio 2.1 (95 CI 1.6-4.0)

Usual weighted point estimate
Variance reflects OBSERVED variance in 500
estimates of prev. and OR.
37
Software options (more?)
38
Beware
  • Stick to procedures custom-designed for complex
    survey samples
  • Will handle weights properly
  • Will give useful statistics, such as DEFF
  • Bootstrapping without a set of BS weights
  • If you arent screaming in pain, you havent got
    it right

39
Tips for working in partnership
  • Get a geek to generate lots of useful sets of BS
    Weights for your survey
  • e.g., your favourite standard popn
  • Does take expertise, but done once benefits many
    many users
  • Get a nerd to do only your variance corrections
    for you
  • Use your favourite software and keep very
    detailed programs (recodes, restrictions, etc)
  • Have them repeat very defined results tables

40
Embargoed
Not for release Preliminary analyses pending
adjustment of variance estimates to account for
complex survey design
41
Q A
Write a Comment
User Comments (0)
About PowerShow.com