Title: Evaluating Anti-Poverty Programs: Concepts and Methods
1- Evaluating Anti-Poverty Programs Concepts and
Methods - Norbert Schady
- Development Research Group
2Outline of presentation
- Introduction The evaluation problem
- Possible solutions
- 1. Experimental evaluations
- Randomization
- 2. Quasi-experimental evaluations
- Instrumental variables
- Regression discontinuity
- 3. Non-experimental evaluations
- OLS
- Matching methods
- Differences-in-differences
- Learning more from evaluations
3Outline of presentation
- Big disclaimer! I will frequently be drawing on
my own work in this presentation for examples
4The evaluation problem
- Assigned programs
- Some units (individuals, households, villages)
get the program - Some do not
- Examples
- Social fund selects from applicants
- School construction some villages get a new
school, others get nothing - Cash transfers to eligible households only
- Ex-post evaluation
5The evaluation problem
- Impact is the difference between the relevant
outcome indicator with the program and that
without it - However, we can never observe someone in two
different states of nature at the same time - While a post-intervention indicator is observed,
its value in the absence of the program is notit
is a counter-factual - So all evaluation is essentially a problem of
missing data - Calls for counterfactual analysis
6Naïve comparisons can be deceptive
- Common practices
- Compare outcomes after the intervention to those
before, or - Compare units (people, households, villages) with
and without the anti-poverty program - Potential biases from failure to control for
- Other changes over time under the counterfactual,
or - Unit characteristics that influence program
placement
7We observe an outcome indicator,
Intervention
8 and its value rises after the program
Intervention
9However, we need to identify the counterfactual
Intervention
10 since only then can we determine the impact of
the intervention
11The evaluation problem
- However, we never observe the counterfactual, and
so have to estimate it - Making comparisons between treated and
control (or comparison) groups
12Alternative solutions
- Experimental evaluations (Social experiments)
- Program is randomly assigned
- If properly carried out, corrects for observable
and unobservable differences between treated and
controls - Estimates ATE
- Quasi-experimental evaluations
- Instrumental variables
- Regression discontinuity
- Can correct for observable and unobservable
differences, but estimated treatment effect is
local - Non-experimental evaluations (observational
studies) - OLS
- Matching techniques
- Exogenous placement conditional on observables
- Differences in differences or higher-order
differencing - Can correct for time-invariant, additive
differences (including in unobservables) between
treated and controls
13Randomization
- Lottery used to assign households to treatment
and control groups - If sample is large enough, this equates all
characteristicsobservable and unobservableof
both groups - Differences in outcomes can then be credibly
interpreted as program impacts - No need for complicated econometrics or
conditioning variables - Simple differences of means suffices
14Randomization
- Randomization checks
- Check random assignment
- Check whether conditioning on X variables makes a
difference - Check whether cross-sectional and
first-differenced analysis yields similar results
15Conclusion Randomization
- Randomization is the benchmark for
quasi-experimental and non-experimental
evaluation methods - Has become much more popular in developing
countries in recent decades, and with good reason - Groundbreaking example of PROGRESA
- However, randomization is no panacea
- Often infeasible Political and moral
difficulties of denying treatment to eligible
beneficiaries who have lost a lottery - Be thoughtful about extrapolating from estimated
parameters
16What is the estimated parameter? Is it
policy-relevant?
- Randomization estimates Average Treatment Effect
(ATE) if all households in treatment group
receive the treatment and all those in control
group do not - If compliance in treatment group is imperfect,
then can estimate Intent-to-Treat (ITT)the
impact of being offered the program - Or can inflate ITT by program take-up to estimate
Treatment-on-the-Treated (TT) - Program take-up R
- ITT estimate of program effect ß1
- TT estimate of program effect ß2(ß1/R)
17What is the estimated parameter? Is it
policy-relevant?
- Deeper problem Randomization often implemented
in small-scale pilots, with highly-motivated
staff - Impact of large-scale, perhaps nationwide program
may be very different - US literature the impact of attending preschool
on school outcomes - Perry Preschool program compared to Head Start
- Difficult problem to overcome
- In some cases, randomization takes place in the
context of a large-scale program - PROGRESA in Mexico
- However, this tends to be politically difficult
to sustain - Oportunidades evaluations in Mexico
18Quasi-experimental analysis RDD
- Threshold M below which individuals are eligible
for treatment, above which they are ineligible - Intuition behind approach is you compare
individuals just above and just below this
threshold value - Proxy means Determines eligibility for programs
- Scholarships in Cambodia (Filmer and Schady 2008)
- School fee reduction program in Bogota (Barrera,
Linden and Urquiola 2007) - Geographic jurisdiction Program implemented in
some areas but not others - Piso Firme in Mexico Comparisons in households
just across the border in Coahuila and Durango
states (Cattaneo et al. 2007) - Class size on test scores in Bolivia (Urquiola
2007)
19Quasi-experimental analysis RDD
- Sharp RDD the threshold M perfectly predicts who
receives a given treatment and who does not - Regress Outcome on flexible formulation of
control function, and dummy for treatment - Estimate Yi a df(Ci) F(CiltM) ei
- Note that, by definition (CiltM)T
- Can also estimate control function
nonparametrically, above and below threshold - Fuzzy RDD The threshold is a significant but
imperfect predictor of treatment - Estimate Yi a df(Ci) FTi ei, where Ti is
instrumented with CiltM
20Quasi-experimental analysis RDD
- Identifying assumption No discontinuity in
counterfactual values at threshold - Essentially threshold is given exogenously and
individuals respond mechanically to it - Can be violated if there is sorting
- Urquiola and Verhogen (2008) sorting in Chilean
education system - Schools dont want to add another class because
it is expensive - Increase fees to limit enrollment
- Parents understand school behavior and higher
education parents sort themselves into schools
with smaller class sizes - Discontinuity in observable (and perhaps
unobservable) characteristics at threshold
violates identifying assumption - RDD check present evidence of no observable
differences at threshold
21Quasi-experimental analysis RDD
Intent-to-treat effects of 45 versus no
scholarship (LHS) and 60 versus 45
(RHS) Source Filmer and Schady (2008)
22What is the estimated parameter? Is it
policy-relevant?
- RDD estimates treatment effects at the threshold
- If there is heterogeneity of treatment effects,
this may not correspond to the ATE - However, it may be a policy-relevant parameter
for a small expansion of the program near the
threshold - For example, for targeted programs, it will
estimate effect of expanding coverage of program
to incorporate marginal individuals
23Quasi-experimental analysis IV
- Intuition Identifying exogenous variation using
a 3rd variable - Outcome regression
- Yi ßTi FXi ei
- Concern is that there are differences between
treated (T1) and control (T0) individuals that
are not captured by vector Xi - Induces correlation Ti between and ei
- Biased estimates of program effects
- Solution identify a variable Zi that is
correlated with Ti (first stage) but is
uncorrelated with ei (exclusion restriction)
24Quasi-experimental analysis IV
- Steps
- 1 Regress Ti ß1Zi F1Xi ei
- Predict T-hati
- This gives you the exogenous variation in Ti
- 2 Regress Yi ß2T-hati F2Xi ?i
- In practice, this is done in one step to get the
correct standard errors - Practical difficulty finding convincing
instruments (the exclusion restriction cannot be
tested) - If exclusion restriction does not hold, biases
can be severe
25Quasi-experimental analysis IV
- Some examples
- Partially randomized design
- Angrist et al. (2002) on impact of vouchers on
test scores in Colombia - Schady and Araujo (2008) on impact of cash
transfers on enrollment in Ecuador - Lottery to determine access to Bono de Desarrollo
Humano cash transfer program - But substantial contamination of control group,
which appears to be non-random - Want to determine impact of program on enrollment
- Solution regress enrollment on treatment, with
treatment instrumented by the lottery - Since the lottery was randomized, it is not
correlated with regression error term
26Quasi-experimental analysis IV
- Some examples
- Political variables as instruments
- Want to assess the impact of new school
infrastructure on enrollment in Peru - But placement of school infrastructure may be
endogenous - Maybe communities with tastes for education
clamor more for a new school, and tastes are
unobserved - Maybe program administrators want to place
schools in very disadvantaged areas or in areas
where they expect the returns to be highest - In any of these cases, a simple regression of
school outcomes (enrollment, test scores) on new
school infrastructure could be biased
27Quasi-experimental analysis IV
- Schady (2000) shows that the distribution of
expenditures on school infrastructure in the
Fujimori administration was partially determined
by political considerations - Districts that had voted for Fujimori in 1990 but
against Fujimori in 1993 were more likely to
receive school investments than other, comparable
districts (a buy-back strategy) - Paxson and Schady (2002) use this to construct an
instrument for school infrastructure - Regress enrollment on school infrastructure, with
school infrastructure instrumented with the
change in the share voting for Fujimori - Exclusion restriction changes in vote share
uncorrelated with regression error term
28Quasi-experimental analysis IV
- Some examples
- Program glitches
- Impact of Bolsa Alimentação CCT program
- Software used by program could not read special
characters - As a result, people whose names had special
characters (for example, Ângela, João, José,
Gonçalves) were rejected by the system, and did
not receive BDH payments in a first phase - Interested in estimating the effect of Bolsa on
an outcome, but participation in program may be
endogenous - Regress outcome (say, height-for-age z-score) on
Bolsa, with Bolsa instrumented with whether or
not applicant had special character in name
29What is the estimated parameter? Is it
policy-relevant?
- If identifying assumptions hold, IV estimates are
LATEthey estimate the impact of treatment on
outcome on complier households (Imbens and
Angrist 1994 Angrist, Imbens and Rubin 1996) - These are households whose probability of
receiving the treatment was affected by the
instrument - So, in partial randomization example, these
exclude individuals who would have received
transfers no matter what, as well as those who
would not have received transfers no matter what - Note that this is a counterfactual comparisonwe
cannot identify these individuals in practice - So, if there is heterogeneity of treatment
effects, so that some households respond
differently to an intervention than others, it is
hard to extrapolate to another populationeven if
IV estimator is unbiased
30What is the estimated parameter? Is it
policy-relevant?
- Also, if there is selection on expected returns
(Card 1999 Heckman and Vytlacil 2005), so that
those who stand to benefit the most are most
likely to select into the program, this selection
effect is incorporated into the estimated
treatment effects - Imagine creating a program that randomly assigns
fee waivers to some districts in a country but
not others - Since program is randomized, you can estimate
impact of fee waiver on school attainment without
additional complications
31What is the estimated parameter? Is it
policy-relevant?
- But say you also want to use this design to
estimate the impact of school attainment on wages - In theory, you could run a regression of wages on
schooling, with schooling instrumented with
whether a district was selected into the fee
reduction program - However, if those who stood to gain the most from
schooling were also more likely to respond to the
fee waiver program, as seems plausibleso-called
Roy selectionthen the IV estimates of schooling
on wages include (i) the effect of schooling on
wages, and (ii) a selection effect - Heckman calls this essential heterogeneity
- Card (1999 2001) argues that this is the reason
whycontrary to expectationsinstrumenting
schooling generally results in higher estimates
of the returns to schooling than those obtained
by OLS
32Detour 1 What is the estimated parameter? Is
it policy-relevant?
- So, is the estimated parameter policy relevant?
- Not if you are interested in estimating the
effect of schooling on wages for the population
at large - However, it may be the right parameter if you are
considering expanding the fee waiver program and
you want to assess how this will affect wages
33Conclusion Quasi-experimental methods
- Quasi-experimental methods can be appealing
because, in the best of circumstances, they
approach the design of a randomized study - Can control for observable and unobservable
differences between treated and control
households - However, estimates are generally local in one
way or another - Makes it difficult to extrapolate to other
population groups if there is heterogeneity of
effects - Also (especially with instrumental variables)
they are opportunistic, and the exclusion
restriction is untestable - Cannot count on finding a good instrument after
a program has been rolled out and using this to
assess impact
34Observational methods OLS
- The intuition behind OLS and matching estimators
of impact is that you can correct for differences
between treated and control groups by
including a vector of characteristics Xi - Equivalently, that there is selection on
observables only - Basic set-up
- Yi ßTi FXi ei
- The coefficient ß is then an estimate of the
average treatment effect - Concerns
- Selection on unobservables
- Using observations outside the region of common
support - Parametric assumption
35Observational methods Matching
- Match on the probability of participation
- Ideally we would match on the entire vector X of
observed characteristics - However, this is practically impossible, since X
could be huge - PSM match on the basis of the propensity score
(Rosenbaum and Rubin 1983) - Basic steps
- Step 1 Regress participation on observable
characteristics - Ti ß1Xi ei
- Predict T-hati, the propensity score
- Step 2 Restrict sample to assume common
support - Failure of common support is an important source
of bias in observational studies (Heckman et al.
1997)
36Density of scores for participants
37Density of scores for non-participants
38Density of scores for non-participants
39Observational methods Matching
- Basic steps (continued)
- Step 3 For each participant, find a sample of
non-participants with similar propensity scores - Various weighting schemes
- Step 4 Compare the outcome indicators
- The difference is the estimate of the gain due to
the program for that observation - Step 5 Calculate the mean of these individual
gains to obtain the average overall gain
40Observational methods Matching
- Many recent developments in the matching
literature - For example, Hirano, Imbens, and Ridder (2003)
show that a reweighting of the data by the
propensity score performs well - Step 1 Predict propensity score, T-hati, as
before - Step 2 Run OLS for outcome equation, weighting
treated households by (1/ T-hati) and comparison
households by (1/ 1-T-hati) - This produces a fully efficient estimator of the
Average Treatment Effect with conservative
standard errors
41Conclusion OLS, matching
- Low cost, and can use existing data sets
(censuses, survey) - However, need high-quality data with information
on many X variables for treated and comparison
observations - Matching is more flexible than OLS and does not
make use of data outside the region of common
support - This can be an important advantage
- However, both methods are based on the assumption
of no selection on observables - This is untestable and has to be argued on a
case-by-case basis - In practice, single-difference OLS and matching
can often be badly biased by unobserved
heterogeneity, correlated with treatment
42Observational methods DD and higher-order
differences
- Observed changes over time for non-participants
provide the counterfactual for participants - Steps
- Collect baseline data on non-participants and
(probable) participants before the program - Compare with data after the program
- Subtract the two differences, or use a regression
with a dummy variable for participant -
- This allows for selection bias but it must be
time-invariant and additive
43Diff-in-diff requires that the bias be additive
and time-invariant
44Observational methods DD and higher-order
differences
- In practice, estimate a regression of the
following form -
- Ei ßTi dYi F(YiTi) ei
-
- where F is the difference-in-difference estimate
of program impact - Note that this is equivalent to a regression in
first differences - Eit-Eit-1 ßTi eit-eit
- Both approaches can also be supplemented with a
vector of characteristics Xi - Can also combine with matching
- Step 1 match observations on the basis of their
baseline observable characteristics - Step 2 Test whether outcome grew by more in
treated than in comparison units (individuals,
schools, districts)
45Observational methods DD and higher-order
differences
- Example 1 Galiani, Gertler and Schargrodsky
(2005) on impact of privatization of water
services on child mortality in Argentina - Did child mortality decrease by more in districts
that privatized water than in those that did not?
- More convincing when you can show that
pre-existing trends were the same in both groups
(as they do) - Example 2 Berlinski, Galiani and Gertler (2005)
on impact of preschool attendance on test scores
in primary school - Preschool construction program Did test scores
increase by more in provinces and among cohorts
exposed to the construction program when they
were of preschool age - More convincing with placebo experiment only
the affected cohorts in provinces that received
the preschool intervention saw gains in test
scores
46Observational methods DD and higher-order
differences
- Example 3 Filmer and Schady (2008) Did female
school enrollment grow by more in schools that
offered female scholarships than in other schools
in Cambodia? - Yes, but
- These same schools appear to have higher
pre-intervention growth rates in female
enrollment - So, triple-differencing
- Did the school enrollment of girls, relative to
that of boys, grow by more in schools that
offered female scholarships? - Yes, and there were no pre-existing differences
between treated and control schools in the growth
rate of the boy-girl enrollment ratio
47Conclusion DD and higher-order differencing
- More convincing than OLS or matching with a
single, post-intervention survey - Requires careful planning for baseline
- Particularly convincing when there are placebo
experiments - Things that you would not expect to change dont
change - Scholarship program for 7th graders should have
no effects (or very small effects) on enrollment
in (say) 1st grade - Cohorts not exposed to program should not behave
differently from those who are - No apparent differences in pre-existing trends in
outcomes
48Detour spillover effects
- What if the effects of the treatment spill over
to the control group, or if there are general
equilibrium effects? - Intervention 1 Provide deworming drugs in Kenya
(Miguel and Kremer) - Program benefits extend not just to those who
receive the drugs, but also to other children in
the study areas - Intervention 2 scholarships to low-SES girls in
Cambodia (Filmer and Schady 2008) - Concern that increased enrollment among
scholarship recipients affects enrollment
decisions of other children in same grade - Can be serious threat to identification
- Possible solution move to a higher unit of
aggregationcompare treated and control
villages or schools, rather than individuals
49Detour anticipation effects
- What if people in the control group expect that
they will be incorporated into the treatment in
the future and change their behavior accordingly? - Consumption smoothing
- Simple version of permanent income hypothesis
all of short-term transfer income should be
invested - Or maybe control households change their behavior
(schooling, health-seeking, asset ownership)
because they think that this makes it more likely
that they will receive benefits? - Very hard to rule out
- Collect qualitative data
- Collect data from before baseline, and test for
unexpected changes in behavior among controls
50Conclusions and future challenges
-
- Moving beyond averages Assessing the impact of
program on different population groups - Great deal of accumulating evidence of
heterogeneity of treatment effects - A positive overall effect may hide a great deal
of variability, possibly including zero or
negative effects for some groups
51Conclusions and future challenges
- Open up the Black box provided by impact
evaluations - What features of program matter?
- For example, in explaining the impact of a CCT on
outcomes is it the cash that matters? the
condition? the fact that transfers are made to
women? - Various options for trying to untangle possible
explanations - Structural models (Todd and Wolpin 2007) or
ex-ante simulation (Bourguignon, Ferreira and
Leite 2003) - Randomize alternative program features, perhaps
on a small-scale pilot basis (forthcoming
evaluation of a CCT in Morocco) - Collect information on other intermediate
outcomes, and see whether these help shed light
on underlying mechanisms
52 Conclusions and future challenges
- Example Macours, Schady and Vakis (2008) on
impact of the Atención a Crisis CCT program in
Nicaragua on child cognitive development among
children of preschool age - Program resulted in an improvement in language
ability of .17 to .22 standard deviations - Was it the cash, the social marketing of the
program, or the gender of the beneficiaries? - Literature identifies two key risk factors for
inadequate cognitive development in poor
countries - Inadequate nutrition (calories, proteins,
micronutrients) - Inadequate early stimulation
- Program resulted in increase in food expenditures
and diversification of food consumption (out of
staples, and into fruits, vegetables, animal
proteins), and increase in stimulation inputs
53(No Transcript)
54 Conclusions and future challenges
- Example Macours, Schady and Vakis
(2008)continued - But can the changes in inputs be fully explained
by the increase in income? - Engel curve analysis
55 Conclusions and future challenges
56 Conclusions and future challenges
57 Conclusions and future challenges