Title: Winning the War of Attrition? Sampling, response analysis and weighting using the National Pupil Database
1Winning the War of Attrition?Sampling, response
analysis and weighting using the National Pupil
Database
- James Halse
- Young People Analysis, DCSF
- james.halse_at_dcsf.gsi.gov.uk
2Overview
- The way we were sampling from school records
for the Youth Cohort Studies (YCS) - A new way of sampling for the Longitudinal Study
of Young People in England (LSYPE) - Analysis of response rates and non response bias
using NPD - Weighting for non-response on LSYPE
- Applying the lessons learned to the next cohort
of the YCS
3The way we were - the YCS
- Youth Cohort Studies were a multimode panel study
of young people starting in the spring after year
11 and following these young people 1, 2 and 3
years later - In theory a simple random sample - the Department
wrote to all schools and asked for names and
addresses of pupils born on 3 dates within any
month (e.g. 5th, 15th, 25th) - Issued sample drawn from information provided by
schools - Some attempt to correct for school non-response
- For cohorts 11 and 12, attempt to increase the
number of young people from ethnic minorities by
over sampling in LAs with high proportion of
pupils from minority ethnic groups
4YCS response
- Non-response and attrition are a big problem
- Attempts to deal with this by increasing the
sample size
Cohort Initial Issued Sample response rate (per cent) at sweep (age of cohort) response rate (per cent) at sweep (age of cohort) response rate (per cent) at sweep (age of cohort) response rate (per cent) at sweep (age of cohort) Latest sweep achieved sample as a of initial issued sample Achieved sample size at latest sweep
16 17 18 19
9 22,500 65 66 65 76 21 4,800
10 25,000 56 74 71 77 23 5,600
11 30,000 56 76 75 79 21 6,200
12 30,000 47 70 70 64 15 4,400
5Non-response bias
- But the real concern is differential
non-response, especially over 4 sweeps
YCS cohort 11 respondents at each sweep by year
11 attainment
Year 11 attainment Population Sweep 1 Sweep 2 Sweep 3 Sweep 4
8 A-C 36 49 54 56 60
5-7 A-C 15 17 16 16 15
1-4 A-C 24 22 20 18 17
1 D-G 20 9 8 7 6
None 4 4 3 2 1
6Achieved sample sizes by selected characteristics
and sweep YCS cohort 12
Sweep 1 Sweep 2 Sweep 3
Black Caribbean 152 92 60
Black African 193 140 98
Indian 495 380 279
Pakistani 382 260 179
Bangladeshi 156 106 79
Mixed 316 203 147
lt5 D-G (no A-C) 270 137 82
No qualifications 240 118 62
7YCS Weighting for non-response
- Cell weighting at sweep 1 (attainment, region,
school type and sex) - CHAID for sweep 2 onwards using information
collected at previous sweeps - Lowest response rate is at initial sweep, but
this is the stage at which we have least
information for non-response weighting
8Problems with the YCS
- Burden on schools to provide details for sample
frame - Boosting number of sample members from LAs or
schools with high proportion of minority ethnic
pupils was inefficient - Declining response rates and differential
non-response led to very small sample sizes for
some groups by 3rd or 4th sweep - Little information for sweep 1 non-response
weighting - Large differentials in non-response weights
leading to large design effects and reduced
sample efficiency (55 efficient at 11.4)
9Things can only get better the Longitudinal
Study of Young People in England (LSYPE)
- Similar to YCS in that it is a study of
transitions from compulsory education, but - Face to face
- Started when pupils were in year 9 (age 13/14)
- Plan to continue till young people are aged 25
- Includes interviews with parents
- Much more detailed (e.g. attitudes to school,
bullying, parental employment histories) - Used incentives (conditional at wave 1,
unconditional thereafter) - For LSYPE use a 2 stage Probability Proportional
to Size (PPS) design with schools as PSUs - Sample drawn directly from PLASC
- But had to approach schools for contact details
so drew a large enough sample to allow for some
non-cooperation from schools
10LSYPE Sampling schools
- Maintained schools stratified into
deprived/non-deprived - Deprived schools sampled with fraction 1.5 times
greater than non-deprived - Within each stratum, a size measure was
calculated dependent on number of pupils from
major ethnic minority groups (Indian, Pakistani,
Bangladeshi, Black African, Black Caribbean,
Mixed) in year 8 at that school - A small sample of independent schools also
selected
11Sampling pupils
- Within each school, selection probabilities were
calculated for pupils to ensure issued sample
target numbers of 1000 from each of the main
ethnic minority groups - Importantly, the way ethnic minorities were
boosted means that all pupils within an ethnic
group and within a school deprivation stratum
were sampled with the same probability as one
another
12LSYPE response
- About 3 quarters of schools sampled cooperated
- Of the issued sample, the overall response rate
was 74 (including partial responses) - Some evidence of response bias
13Analysis of LSYPE response
- Use NPD to analyse school non-response and pupil
level non response separately - Run logistic regression models to find variables
associated with propensity to respond - Start with variables in sample frame and add
attainment variables - For school non-response, significant terms in the
model were deprivation strata and whether or not
the school was in London - For pupil non-response, significant terms are
attainment, ethnicity and region, plus an
interaction between white and region
14LSYPE non-response weighting wave 1
- School non-response and pupil non-response
treated separately - Logistic regression model used to estimate
probability of response p - To create weights, take reciprocal of p (i.e.
1/p) and rescale by dividing by mean of 1/p - School non-response and pupil level non-response
weights combined with design weights to create
final weight - Generally speaking, non-response weights are
inversely correlated with design weights small
loss of efficiency
15LSYPE waves 2 and 3 response
- Good response rates (89 wave 2, 93 wave 3)
- Model response using both NPD variables and
information collected at earlier sweeps - NPD variables had stronger association with
propensity to response at wave 2 than at wave 1 - Adding survey variables to the model only
explains a bit more than the NPD variables
16YCS 13
- Similar sample design to LSYPE
- Face to face
- 2 stage PPS design
- Over sample ethnic minorities using school census
- But
- Over sample low attainers (defined as those with
no A-Cs and less than 5 D-Gs) by a factor of 2 - Postcode sectors are PSUs as opposed to schools
(smaller design effects) - Full address collected through school census
by-passing need to go through schools
17YCS 13 response (maintained sector)
Cases with a final outcome 10380 100.0
Response 7174 69.1
No contact 696 6.7
Refusal 889 8.6
Could not find address/address inaccessible 224 2.2
Mover 896 8.6
Other unproductive 448 4.3
Ineligible 53 0.5
- Note the high proportion of movers and address
problems
18YCS 13 response by selected characteristics
Characteristics Issued Achieved Response rate
Very low attainers (lt 5 D-G) 2138 1194 56
Others 7713 5642 73
Indian 514 377 73
Pakistani 628 470 75
Bangladeshi 490 369 75
Black Caribbean 672 395 59
Black African 710 427 60
Mixed 470 305 65
White 6366 4493 71
19Benefits of sampling from the NPD
- Wealth of information from which to design your
sample - Run simulations to help decide on the optimum
design for your requirements and budget - Easy to over sample key groups of interest and/or
those least likely to respond - Lots of information to use for non-response
weighting - Now that addresses are collected through school
census, school non-cooperation is not an issue - Can follow up drop outs longitudinally through
the admin data
20Drawbacks of sampling from the NPD
- Address information missing or not up to datebut
2006 was the first year in which schools were
required to supply addresses in the school census
so this should improve - Data quality in school census is a potential
problem, e.g. discrepancies between census report
and self reported ethnicity
21Any questions?
- For more information on LSYPE see our page at
ESDS longitudinal http//www.esds.ac.uk/longitudi
nal/access/lsype/L5545.asp - YCS downloads and documentation
- http//www.esds.ac.uk/search/indexSearch.asp?ctxm
lSnq133233
22LSYPE sampling technical slides
- Taken from A new method for sample designs with
disproportionate stratification paper given to
AAPOR annual conference 2005 by Peter Lynn,
Patten Smith and Iain Noble
23Sampling Method for LSYPE
- Construct size measure Si in each PSU (school)
- Si ?(Nik(nk/Nk))
- Where
- Si the size measure for PSU i
- Nik the number in sub-population group k in PSU
i - nk number required in issued sample in
sub-population group k - Nk number in sub-population group k in the
population. - Select m PSUs with probability proportional to
Si - P(PSU) mSi/? Si
24Method
- Within each PSU, select 2nd stage units with
probability Pjki - Pjki (n(s)/Si ) (nk/Nk)
- Where
- Pjki conditional probability of selecting 2nd
stage unit j in sub-population group k in PSU i. - n(s) total number to be selected in each PSU
25Result
- Overall probability of selection of 2nd stage
unit Pjk is constant within sub-population k - Pjk nk/Nk
- Total number selected in each PSU is fixed at
n(s) - Therefore avoid precision losses through
corrective (design) weighting and excessive
variation in cluster sizes
26LSYPE some complications
- Sample deprived schools (top quintile in
students entitled to free school meals) at 1.5
times the rate of other schools - Calculations resulted in Pgt1 for some schools
- Calculations resulted in Pgt1 for students in some
small schools (happens when Si lt (nk/Nk) n(s)) - Small schools covering small proportion of
student population fieldwork inefficiencies - No data on current number of year 9 students
27Dealing with the complications
- Deprived schools separate stratum with higher
sampling fraction - Schools for which calculations give Pgt1 sample
with certainty and select pupils with appropriate
sampling fraction for ethnic group - Small schools where students in a group for which
calculations give Pgt1 select all pupils in the
group and apply weight - Small schools for fieldwork efficiency reasons
omit schools for which no. students selected
would be less than 12 - No information on no. Year 9s use previous no.
year 8s as proxy, and then select new year 9
pupils during interviewer school visits