Evaluating Anti-Poverty Programs: Concepts and Methods

About This Presentation

Title:

Evaluating Anti-Poverty Programs: Concepts and Methods

Description:

... Outcome on flexible formulation of control function, ... the region of common support Parametric assumption Observational methods: ... models (Todd and ... – PowerPoint PPT presentation

Number of Views:185

Avg rating:3.0/5.0

Slides: 58

Provided by: World191

Learn more at: https://message.worldbank.org

Category:

more less

Transcript and Presenter's Notes

Title: Evaluating Anti-Poverty Programs: Concepts and Methods

1

Evaluating Anti-Poverty Programs Concepts and
Methods
Norbert Schady
Development Research Group

2
Outline of presentation

Introduction The evaluation problem
Possible solutions
1. Experimental evaluations
Randomization
2. Quasi-experimental evaluations
Instrumental variables
Regression discontinuity
3. Non-experimental evaluations
OLS
Matching methods
Differences-in-differences
Learning more from evaluations

3
Outline of presentation

Big disclaimer! I will frequently be drawing on
my own work in this presentation for examples

4
The evaluation problem

Assigned programs
Some units (individuals, households, villages)
get the program
Some do not
Examples
Social fund selects from applicants
School construction some villages get a new
school, others get nothing
Cash transfers to eligible households only
Ex-post evaluation

5
The evaluation problem

Impact is the difference between the relevant
outcome indicator with the program and that
without it
However, we can never observe someone in two
different states of nature at the same time
While a post-intervention indicator is observed,
its value in the absence of the program is notit
is a counter-factual
So all evaluation is essentially a problem of
missing data
Calls for counterfactual analysis

6
Naïve comparisons can be deceptive

Common practices
Compare outcomes after the intervention to those
before, or
Compare units (people, households, villages) with
and without the anti-poverty program
Potential biases from failure to control for
Other changes over time under the counterfactual,
or
Unit characteristics that influence program
placement

7
We observe an outcome indicator,

Intervention
8
and its value rises after the program

Intervention
9
However, we need to identify the counterfactual

Intervention
10
since only then can we determine the impact of
the intervention

11
The evaluation problem

However, we never observe the counterfactual, and
so have to estimate it
Making comparisons between treated and
control (or comparison) groups

12
Alternative solutions

Experimental evaluations (Social experiments)
Program is randomly assigned
If properly carried out, corrects for observable
and unobservable differences between treated and
controls
Estimates ATE
Quasi-experimental evaluations
Instrumental variables
Regression discontinuity
Can correct for observable and unobservable
differences, but estimated treatment effect is
local
Non-experimental evaluations (observational
studies)
OLS
Matching techniques
Exogenous placement conditional on observables
Differences in differences or higher-order
differencing
Can correct for time-invariant, additive
differences (including in unobservables) between
treated and controls

13
Randomization

Lottery used to assign households to treatment
and control groups
If sample is large enough, this equates all
characteristicsobservable and unobservableof
both groups
Differences in outcomes can then be credibly
interpreted as program impacts
No need for complicated econometrics or
conditioning variables
Simple differences of means suffices

14
Randomization

Randomization checks
Check random assignment
Check whether conditioning on X variables makes a
difference
Check whether cross-sectional and
first-differenced analysis yields similar results

15
Conclusion Randomization

Randomization is the benchmark for
quasi-experimental and non-experimental
evaluation methods
Has become much more popular in developing
countries in recent decades, and with good reason
Groundbreaking example of PROGRESA
However, randomization is no panacea
Often infeasible Political and moral
difficulties of denying treatment to eligible
beneficiaries who have lost a lottery
Be thoughtful about extrapolating from estimated
parameters

16
What is the estimated parameter? Is it
policy-relevant?

Randomization estimates Average Treatment Effect
(ATE) if all households in treatment group
receive the treatment and all those in control
group do not
If compliance in treatment group is imperfect,
then can estimate Intent-to-Treat (ITT)the
impact of being offered the program
Or can inflate ITT by program take-up to estimate
Treatment-on-the-Treated (TT)
Program take-up R
ITT estimate of program effect ß1
TT estimate of program effect ß2(ß1/R)

17
What is the estimated parameter? Is it
policy-relevant?

Deeper problem Randomization often implemented
in small-scale pilots, with highly-motivated
staff
Impact of large-scale, perhaps nationwide program
may be very different
US literature the impact of attending preschool
on school outcomes
Perry Preschool program compared to Head Start
Difficult problem to overcome
In some cases, randomization takes place in the
context of a large-scale program
PROGRESA in Mexico
However, this tends to be politically difficult
to sustain
Oportunidades evaluations in Mexico

18
Quasi-experimental analysis RDD

Threshold M below which individuals are eligible
for treatment, above which they are ineligible
Intuition behind approach is you compare
individuals just above and just below this
threshold value
Proxy means Determines eligibility for programs
Scholarships in Cambodia (Filmer and Schady 2008)
School fee reduction program in Bogota (Barrera,
Linden and Urquiola 2007)
Geographic jurisdiction Program implemented in
some areas but not others
Piso Firme in Mexico Comparisons in households
just across the border in Coahuila and Durango
states (Cattaneo et al. 2007)
Class size on test scores in Bolivia (Urquiola
2007)

19
Quasi-experimental analysis RDD

Sharp RDD the threshold M perfectly predicts who
receives a given treatment and who does not
Regress Outcome on flexible formulation of
control function, and dummy for treatment
Estimate Yi a df(Ci) F(CiltM) ei
Note that, by definition (CiltM)T
Can also estimate control function
nonparametrically, above and below threshold
Fuzzy RDD The threshold is a significant but
imperfect predictor of treatment
Estimate Yi a df(Ci) FTi ei, where Ti is
instrumented with CiltM

20
Quasi-experimental analysis RDD

Identifying assumption No discontinuity in
counterfactual values at threshold
Essentially threshold is given exogenously and
individuals respond mechanically to it
Can be violated if there is sorting
Urquiola and Verhogen (2008) sorting in Chilean
education system
Schools dont want to add another class because
it is expensive
Increase fees to limit enrollment
Parents understand school behavior and higher
education parents sort themselves into schools
with smaller class sizes
Discontinuity in observable (and perhaps
unobservable) characteristics at threshold
violates identifying assumption
RDD check present evidence of no observable
differences at threshold

21
Quasi-experimental analysis RDD
Intent-to-treat effects of 45 versus no
scholarship (LHS) and 60 versus 45
(RHS) Source Filmer and Schady (2008)
22
What is the estimated parameter? Is it
policy-relevant?

RDD estimates treatment effects at the threshold
If there is heterogeneity of treatment effects,
this may not correspond to the ATE
However, it may be a policy-relevant parameter
for a small expansion of the program near the
threshold
For example, for targeted programs, it will
estimate effect of expanding coverage of program
to incorporate marginal individuals

23
Quasi-experimental analysis IV

Intuition Identifying exogenous variation using
a 3rd variable
Outcome regression
Yi ßTi FXi ei
Concern is that there are differences between
treated (T1) and control (T0) individuals that
are not captured by vector Xi
Induces correlation Ti between and ei
Biased estimates of program effects
Solution identify a variable Zi that is
correlated with Ti (first stage) but is
uncorrelated with ei (exclusion restriction)

24
Quasi-experimental analysis IV

Steps
1 Regress Ti ß1Zi F1Xi ei
Predict T-hati
This gives you the exogenous variation in Ti
2 Regress Yi ß2T-hati F2Xi ?i
In practice, this is done in one step to get the
correct standard errors
Practical difficulty finding convincing
instruments (the exclusion restriction cannot be
tested)
If exclusion restriction does not hold, biases
can be severe

25
Quasi-experimental analysis IV

Some examples
Partially randomized design
Angrist et al. (2002) on impact of vouchers on
test scores in Colombia
Schady and Araujo (2008) on impact of cash
transfers on enrollment in Ecuador
Lottery to determine access to Bono de Desarrollo
Humano cash transfer program
But substantial contamination of control group,
which appears to be non-random
Want to determine impact of program on enrollment
Solution regress enrollment on treatment, with
treatment instrumented by the lottery
Since the lottery was randomized, it is not
correlated with regression error term

26
Quasi-experimental analysis IV

Some examples
Political variables as instruments
Want to assess the impact of new school
infrastructure on enrollment in Peru
But placement of school infrastructure may be
endogenous
Maybe communities with tastes for education
clamor more for a new school, and tastes are
unobserved
Maybe program administrators want to place
schools in very disadvantaged areas or in areas
where they expect the returns to be highest
In any of these cases, a simple regression of
school outcomes (enrollment, test scores) on new
school infrastructure could be biased

27
Quasi-experimental analysis IV

Schady (2000) shows that the distribution of
expenditures on school infrastructure in the
Fujimori administration was partially determined
by political considerations
Districts that had voted for Fujimori in 1990 but
against Fujimori in 1993 were more likely to
receive school investments than other, comparable
districts (a buy-back strategy)
Paxson and Schady (2002) use this to construct an
instrument for school infrastructure
Regress enrollment on school infrastructure, with
school infrastructure instrumented with the
change in the share voting for Fujimori
Exclusion restriction changes in vote share
uncorrelated with regression error term

28
Quasi-experimental analysis IV

Some examples
Program glitches
Impact of Bolsa Alimentação CCT program
Software used by program could not read special
characters
As a result, people whose names had special
characters (for example, Ângela, João, José,
Gonçalves) were rejected by the system, and did
not receive BDH payments in a first phase
Interested in estimating the effect of Bolsa on
an outcome, but participation in program may be
endogenous
Regress outcome (say, height-for-age z-score) on
Bolsa, with Bolsa instrumented with whether or
not applicant had special character in name

29
What is the estimated parameter? Is it
policy-relevant?

If identifying assumptions hold, IV estimates are
LATEthey estimate the impact of treatment on
outcome on complier households (Imbens and
Angrist 1994 Angrist, Imbens and Rubin 1996)
These are households whose probability of
receiving the treatment was affected by the
instrument
So, in partial randomization example, these
exclude individuals who would have received
transfers no matter what, as well as those who
would not have received transfers no matter what
Note that this is a counterfactual comparisonwe
cannot identify these individuals in practice
So, if there is heterogeneity of treatment
effects, so that some households respond
differently to an intervention than others, it is
hard to extrapolate to another populationeven if
IV estimator is unbiased

30
What is the estimated parameter? Is it
policy-relevant?

Also, if there is selection on expected returns
(Card 1999 Heckman and Vytlacil 2005), so that
those who stand to benefit the most are most
likely to select into the program, this selection
effect is incorporated into the estimated
treatment effects
Imagine creating a program that randomly assigns
fee waivers to some districts in a country but
not others
Since program is randomized, you can estimate
impact of fee waiver on school attainment without
additional complications

31
What is the estimated parameter? Is it
policy-relevant?

But say you also want to use this design to
estimate the impact of school attainment on wages
In theory, you could run a regression of wages on
schooling, with schooling instrumented with
whether a district was selected into the fee
reduction program
However, if those who stood to gain the most from
schooling were also more likely to respond to the
fee waiver program, as seems plausibleso-called
Roy selectionthen the IV estimates of schooling
on wages include (i) the effect of schooling on
wages, and (ii) a selection effect
Heckman calls this essential heterogeneity
Card (1999 2001) argues that this is the reason
whycontrary to expectationsinstrumenting
schooling generally results in higher estimates
of the returns to schooling than those obtained
by OLS

32
Detour 1 What is the estimated parameter? Is
it policy-relevant?

So, is the estimated parameter policy relevant?
Not if you are interested in estimating the
effect of schooling on wages for the population
at large
However, it may be the right parameter if you are
considering expanding the fee waiver program and
you want to assess how this will affect wages

33
Conclusion Quasi-experimental methods

Quasi-experimental methods can be appealing
because, in the best of circumstances, they
approach the design of a randomized study
Can control for observable and unobservable
differences between treated and control
households
However, estimates are generally local in one
way or another
Makes it difficult to extrapolate to other
population groups if there is heterogeneity of
effects
Also (especially with instrumental variables)
they are opportunistic, and the exclusion
restriction is untestable
Cannot count on finding a good instrument after
a program has been rolled out and using this to
assess impact

34
Observational methods OLS

The intuition behind OLS and matching estimators
of impact is that you can correct for differences
between treated and control groups by
including a vector of characteristics Xi
Equivalently, that there is selection on
observables only
Basic set-up
Yi ßTi FXi ei
The coefficient ß is then an estimate of the
average treatment effect
Concerns
Selection on unobservables
Using observations outside the region of common
support
Parametric assumption

35
Observational methods Matching

Match on the probability of participation
Ideally we would match on the entire vector X of
observed characteristics
However, this is practically impossible, since X
could be huge
PSM match on the basis of the propensity score
(Rosenbaum and Rubin 1983)
Basic steps
Step 1 Regress participation on observable
characteristics
Ti ß1Xi ei
Predict T-hati, the propensity score
Step 2 Restrict sample to assume common
support
Failure of common support is an important source
of bias in observational studies (Heckman et al.
1997)

36
Density of scores for participants
37
Density of scores for non-participants
38
Density of scores for non-participants
39
Observational methods Matching

Basic steps (continued)
Step 3 For each participant, find a sample of
non-participants with similar propensity scores
Various weighting schemes
Step 4 Compare the outcome indicators
The difference is the estimate of the gain due to
the program for that observation
Step 5 Calculate the mean of these individual
gains to obtain the average overall gain

40
Observational methods Matching

Many recent developments in the matching
literature
For example, Hirano, Imbens, and Ridder (2003)
show that a reweighting of the data by the
propensity score performs well
Step 1 Predict propensity score, T-hati, as
before
Step 2 Run OLS for outcome equation, weighting
treated households by (1/ T-hati) and comparison
households by (1/ 1-T-hati)
This produces a fully efficient estimator of the
Average Treatment Effect with conservative
standard errors

41
Conclusion OLS, matching

Low cost, and can use existing data sets
(censuses, survey)
However, need high-quality data with information
on many X variables for treated and comparison
observations
Matching is more flexible than OLS and does not
make use of data outside the region of common
support
This can be an important advantage
However, both methods are based on the assumption
of no selection on observables
This is untestable and has to be argued on a
case-by-case basis
In practice, single-difference OLS and matching
can often be badly biased by unobserved
heterogeneity, correlated with treatment

42
Observational methods DD and higher-order
differences

Observed changes over time for non-participants
provide the counterfactual for participants
Steps
Collect baseline data on non-participants and
(probable) participants before the program
Compare with data after the program
Subtract the two differences, or use a regression
with a dummy variable for participant
This allows for selection bias but it must be
time-invariant and additive

43
Diff-in-diff requires that the bias be additive
and time-invariant

44
Observational methods DD and higher-order
differences

In practice, estimate a regression of the
following form
Ei ßTi dYi F(YiTi) ei
where F is the difference-in-difference estimate
of program impact
Note that this is equivalent to a regression in
first differences
Eit-Eit-1 ßTi eit-eit
Both approaches can also be supplemented with a
vector of characteristics Xi
Can also combine with matching
Step 1 match observations on the basis of their
baseline observable characteristics
Step 2 Test whether outcome grew by more in
treated than in comparison units (individuals,
schools, districts)

45
Observational methods DD and higher-order
differences

Example 1 Galiani, Gertler and Schargrodsky
(2005) on impact of privatization of water
services on child mortality in Argentina
Did child mortality decrease by more in districts
that privatized water than in those that did not?
More convincing when you can show that
pre-existing trends were the same in both groups
(as they do)
Example 2 Berlinski, Galiani and Gertler (2005)
on impact of preschool attendance on test scores
in primary school
Preschool construction program Did test scores
increase by more in provinces and among cohorts
exposed to the construction program when they
were of preschool age
More convincing with placebo experiment only
the affected cohorts in provinces that received
the preschool intervention saw gains in test
scores

46
Observational methods DD and higher-order
differences

Example 3 Filmer and Schady (2008) Did female
school enrollment grow by more in schools that
offered female scholarships than in other schools
in Cambodia?
Yes, but
These same schools appear to have higher
pre-intervention growth rates in female
enrollment
So, triple-differencing
Did the school enrollment of girls, relative to
that of boys, grow by more in schools that
offered female scholarships?
Yes, and there were no pre-existing differences
between treated and control schools in the growth
rate of the boy-girl enrollment ratio

47
Conclusion DD and higher-order differencing

More convincing than OLS or matching with a
single, post-intervention survey
Requires careful planning for baseline
Particularly convincing when there are placebo
experiments
Things that you would not expect to change dont
change
Scholarship program for 7th graders should have
no effects (or very small effects) on enrollment
in (say) 1st grade
Cohorts not exposed to program should not behave
differently from those who are
No apparent differences in pre-existing trends in
outcomes

48
Detour spillover effects

What if the effects of the treatment spill over
to the control group, or if there are general
equilibrium effects?
Intervention 1 Provide deworming drugs in Kenya
(Miguel and Kremer)
Program benefits extend not just to those who
receive the drugs, but also to other children in
the study areas
Intervention 2 scholarships to low-SES girls in
Cambodia (Filmer and Schady 2008)
Concern that increased enrollment among
scholarship recipients affects enrollment
decisions of other children in same grade
Can be serious threat to identification
Possible solution move to a higher unit of
aggregationcompare treated and control
villages or schools, rather than individuals

49
Detour anticipation effects

What if people in the control group expect that
they will be incorporated into the treatment in
the future and change their behavior accordingly?
Consumption smoothing
Simple version of permanent income hypothesis
all of short-term transfer income should be
invested
Or maybe control households change their behavior
(schooling, health-seeking, asset ownership)
because they think that this makes it more likely
that they will receive benefits?
Very hard to rule out
Collect qualitative data
Collect data from before baseline, and test for
unexpected changes in behavior among controls

50
Conclusions and future challenges

Moving beyond averages Assessing the impact of
program on different population groups
Great deal of accumulating evidence of
heterogeneity of treatment effects
A positive overall effect may hide a great deal
of variability, possibly including zero or
negative effects for some groups

51
Conclusions and future challenges

Open up the Black box provided by impact
evaluations
What features of program matter?
For example, in explaining the impact of a CCT on
outcomes is it the cash that matters? the
condition? the fact that transfers are made to
women?
Various options for trying to untangle possible
explanations
Structural models (Todd and Wolpin 2007) or
ex-ante simulation (Bourguignon, Ferreira and
Leite 2003)
Randomize alternative program features, perhaps
on a small-scale pilot basis (forthcoming
evaluation of a CCT in Morocco)
Collect information on other intermediate
outcomes, and see whether these help shed light
on underlying mechanisms

52
Conclusions and future challenges

Example Macours, Schady and Vakis (2008) on
impact of the Atención a Crisis CCT program in
Nicaragua on child cognitive development among
children of preschool age
Program resulted in an improvement in language
ability of .17 to .22 standard deviations
Was it the cash, the social marketing of the
program, or the gender of the beneficiaries?
Literature identifies two key risk factors for
inadequate cognitive development in poor
countries
Inadequate nutrition (calories, proteins,
micronutrients)
Inadequate early stimulation
Program resulted in increase in food expenditures
and diversification of food consumption (out of
staples, and into fruits, vegetables, animal
proteins), and increase in stimulation inputs