Small Area Estimation: Performance and Data Policy Implications Learned from the EURAREA Project - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Small Area Estimation: Performance and Data Policy Implications Learned from the EURAREA Project

Description:

... based estimators at NUTS3 level, and virtually always perform best at NUTS 4 / 5. ... Further gains are possible by borrowing strength across time and space. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 25
Provided by: marg141
Category:

less

Transcript and Presenter's Notes

Title: Small Area Estimation: Performance and Data Policy Implications Learned from the EURAREA Project


1
Small Area Estimation Performance and Data
Policy Implications Learned from the EURAREA
Project
Neil Higgins and Martin Ralphs Spatial Analysis
and Modelling Branch Methodology Group Office for
National Statistics United Kingdom
2
EURAREA Project
  • EURAREA is a research project funded by the
    European Union under FP5.
  • The project covers six countries, and the
    consortium includes academic and NSI partners in
    seven.
  • The project has explored small area estimation
    methods and evaluated their performance using
    simulation studies based on population census and
    register datasets.

3
Topics for this talk
  • Estimator performance when estimating local area
    values.
  • Estimator performance when estimating the
    distribution of area values.
  • Some misspecification problems for synthetic and
    composite estimators.
  • We use general comparisons across all countries,
    and then specific examples from the United
    Kingdom and Sweden to illustrate particular
    points.

4
We consider several estimator types
  • Design-based Estimators
  • Direct
  • Generalised REGression (GREG), model-assisted
  • Synthetic and Composite Estimators
  • Area-level synthetic estimator
  • Composite estimator (weighted combination of
    direct and area synthetic)
  • Enhanced Estimators
  • Composite Estimator that borrows strength over
    TIME

5
Design-based Estimators
where
and
is a regression estimate.
6
Area-level Synthetic Estimator
Estimator
Model
Standard linear regression model with area-level
covariates.
is estimated by the pooled within-area residual
variance.
7
Composite Estimator
where
8
Performance Criteria
  • Minimising Mean Squared Error
  • Empirical Measure is Average Empirical MSE

9
Average performance when estimating values for
individual small areas
These graphs show ranked estimator performance
based on empirical Mean Squared Error averaged
across many simulations. Synthetic and composite
estimators are usually as good as or better than
design-based estimators at NUTS3 level, and
virtually always perform best at NUTS 4 / 5.
10
Performance Relative to National Sample Mean at
NUTS3 and NUTS5
  • Estimator performance at NUTS3 and NUTS5 for
    three target variables in Sweden.
  • Scores are relative to the performance of the
    National Sample Mean (NSM 1.0) and root AEMSE
    is the criterion.

11
Composite Estimator with TIME Effects
Estimator
Model
is a random effect that follows
independent AR(1) processes for d 1,2,,D.
is a random residual term.
12
Enhanced Estimators Borrowing Strength Over
Time
Comparing the performance of Direct,
Area-Synthetic and Composite Estimators with a
Composite estimator incorporating temporal
autocorrelation at area level.
13
Estimation and Resource Allocation
  • When allocating resources to specific areas, the
    best estimate for each specific area is required.
  • However, there are important policy problems for
    which the information required is the shape of
    the area distribution.
  • Overall assessment of importance of spatial
    inequality.
  • Number of areas below particular threshold.
  • The first is important for international
    comparisons. The second is important for
    obtaining EU subsidies!
  • How well do our SAE methods capture the shape of
    the distribution of area values?

14
Estimating the distribution of area values
Comparing the standard deviation of true and
estimated area means.
15
Estimating the distribution of area values
  • The underlying reason for this pattern is that,
    in design based estimation, the sampling process
    adds an additional layer of variability to the
    underlying between area variability of the true
    area values, and therefore
  • On the other hand the synthetic estimator
    approximates the value produced by regressing the
    true area values on the covariates

and so
Note In this slide var between area variance.
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Adjustment Methods
  • Adjustments for over-shrinkage have been
    proposed
  • Shen and Louis (1998)
  • Spjøtvoll and Thomsen (1987)
  • More empirical work on practicability and
    performance should now be a priority.

21
Misspecification Ecological Effects
Area estimates produced by unit-level and
area-level synthetic estimators for three target
variables in Sweden. Scores are relative to the
AEMSE performance of the National Sample Mean
(NSM 1.0).
22
Misspecification - Ecological effects
Synthetic estimates produced by unit-level and
area-level models plotted against true area means
in Sweden for income in NUTS 5 areas.
23
Misspecification Logistic model
Comparison of area-level linear and logistic
estimates with true area means in Sweden for ILO
Unemployment at NUTS 5 level. The curvature in
the scatter plot of logistic estimates occurs
because the primary covariate (claimant count) is
linearly related to the target variable but
this has not been taken into account in the model
fitting process.
24
Conclusions
  • Model-based and composite SAE methods are able to
    provide high quality local estimates for small
    areas, but model misspecification issues require
    careful consideration.
  • Further gains are possible by borrowing strength
    across time and space.
  • SAE methods are less effective at summarising the
    shape of the distribution of area values.
  • There are methods to overcome this problem
    testing and further development should be a
    priority for European research in future.

25
Backup Slide Explanation of slide 14
Standard Deviation of the population area means
Standard Deviation of the estimated area means
26
Backup Slide Explanation of slide 15
Variance of the population area means
Variance of the estimated area means
Note In this slide var between area variance.
27
Unit-level Synthetic Estimator
Standard linear model with unit-level covariates
28
Area-level Synthetic Logistic
Standard logistic regression model with
area-level covariates
Write a Comment
User Comments (0)
About PowerShow.com