Title: Small Area Estimation: Performance and Data Policy Implications Learned from the EURAREA Project
1Small Area Estimation Performance and Data
Policy Implications Learned from the EURAREA
Project
Neil Higgins and Martin Ralphs Spatial Analysis
and Modelling Branch Methodology Group Office for
National Statistics United Kingdom
2EURAREA Project
- EURAREA is a research project funded by the
European Union under FP5. - The project covers six countries, and the
consortium includes academic and NSI partners in
seven. - The project has explored small area estimation
methods and evaluated their performance using
simulation studies based on population census and
register datasets.
3Topics for this talk
- Estimator performance when estimating local area
values. - Estimator performance when estimating the
distribution of area values. - Some misspecification problems for synthetic and
composite estimators. - We use general comparisons across all countries,
and then specific examples from the United
Kingdom and Sweden to illustrate particular
points.
4We consider several estimator types
- Design-based Estimators
- Direct
- Generalised REGression (GREG), model-assisted
- Synthetic and Composite Estimators
- Area-level synthetic estimator
- Composite estimator (weighted combination of
direct and area synthetic) - Enhanced Estimators
- Composite Estimator that borrows strength over
TIME
5Design-based Estimators
where
and
is a regression estimate.
6Area-level Synthetic Estimator
Estimator
Model
Standard linear regression model with area-level
covariates.
is estimated by the pooled within-area residual
variance.
7Composite Estimator
where
8Performance Criteria
- Minimising Mean Squared Error
- Empirical Measure is Average Empirical MSE
9Average performance when estimating values for
individual small areas
These graphs show ranked estimator performance
based on empirical Mean Squared Error averaged
across many simulations. Synthetic and composite
estimators are usually as good as or better than
design-based estimators at NUTS3 level, and
virtually always perform best at NUTS 4 / 5.
10Performance Relative to National Sample Mean at
NUTS3 and NUTS5
- Estimator performance at NUTS3 and NUTS5 for
three target variables in Sweden. - Scores are relative to the performance of the
National Sample Mean (NSM 1.0) and root AEMSE
is the criterion.
11Composite Estimator with TIME Effects
Estimator
Model
is a random effect that follows
independent AR(1) processes for d 1,2,,D.
is a random residual term.
12Enhanced Estimators Borrowing Strength Over
Time
Comparing the performance of Direct,
Area-Synthetic and Composite Estimators with a
Composite estimator incorporating temporal
autocorrelation at area level.
13Estimation and Resource Allocation
- When allocating resources to specific areas, the
best estimate for each specific area is required. - However, there are important policy problems for
which the information required is the shape of
the area distribution. - Overall assessment of importance of spatial
inequality. - Number of areas below particular threshold.
- The first is important for international
comparisons. The second is important for
obtaining EU subsidies! - How well do our SAE methods capture the shape of
the distribution of area values?
14Estimating the distribution of area values
Comparing the standard deviation of true and
estimated area means.
15Estimating the distribution of area values
- The underlying reason for this pattern is that,
in design based estimation, the sampling process
adds an additional layer of variability to the
underlying between area variability of the true
area values, and therefore
- On the other hand the synthetic estimator
approximates the value produced by regressing the
true area values on the covariates
and so
Note In this slide var between area variance.
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Adjustment Methods
- Adjustments for over-shrinkage have been
proposed - Shen and Louis (1998)
- Spjøtvoll and Thomsen (1987)
- More empirical work on practicability and
performance should now be a priority.
21Misspecification Ecological Effects
Area estimates produced by unit-level and
area-level synthetic estimators for three target
variables in Sweden. Scores are relative to the
AEMSE performance of the National Sample Mean
(NSM 1.0).
22Misspecification - Ecological effects
Synthetic estimates produced by unit-level and
area-level models plotted against true area means
in Sweden for income in NUTS 5 areas.
23Misspecification Logistic model
Comparison of area-level linear and logistic
estimates with true area means in Sweden for ILO
Unemployment at NUTS 5 level. The curvature in
the scatter plot of logistic estimates occurs
because the primary covariate (claimant count) is
linearly related to the target variable but
this has not been taken into account in the model
fitting process.
24Conclusions
- Model-based and composite SAE methods are able to
provide high quality local estimates for small
areas, but model misspecification issues require
careful consideration. - Further gains are possible by borrowing strength
across time and space. - SAE methods are less effective at summarising the
shape of the distribution of area values. - There are methods to overcome this problem
testing and further development should be a
priority for European research in future.
25Backup Slide Explanation of slide 14
Standard Deviation of the population area means
Standard Deviation of the estimated area means
26Backup Slide Explanation of slide 15
Variance of the population area means
Variance of the estimated area means
Note In this slide var between area variance.
27Unit-level Synthetic Estimator
Standard linear model with unit-level covariates
28Area-level Synthetic Logistic
Standard logistic regression model with
area-level covariates