Title: Why Stay in the Dark About Real Program Results Shedding New Light on Methods for Revitalizing Evalu
1Why Stay in the Dark About Real Program Results?
Shedding New Light on Methods for Revitalizing
Evaluation in Health
- Charles Teller, USAID
- Philip Setel, MEASURE Evaluation
2- To stay in the dark or not on evidence-based
results of PRH programs? - Why do we need rigorous program evaluation and
evaluation research?
3- What is the current situation on ME of GH
programs in general, and in USAID in particular?
4- What is the new USAID directive on "revitalizing"
evaluation? - Objective is to clearly demonstrate results that
USAID is achieving with taxpayer . Mission
actions - Appointing an ME officer
- Setting aside for evaluation during design
phase - Preparing Mission Order on ME
- Preparing an annual Mission Evaluation Plan
- Providing evaluation training for CTOs, TAs, SO
Leaders - Offering incentives who promote the use of
evaluations
5- What is PRH doing on revitalization?
- New strategic framework with CAs
- MEASURE Evaluation Policy Program Coordination
updating training - New ME Working Group
- Individual training mentoring to new CAs
- Rigorous evaluation studies (examples)
- Updating indicators manuals in ME
- Supporting PMP development under new Fragile
States strategy.
6- What methodological innovations are being done by
MEASURE/Evaluation project to address these
issues - DDU, Capacity Building, Com-Based Info systems
(PRISM, SAVVY, PLACE), GIS, Pop-Environ.,
poverty-equity, etc.
7- Can we do ME business as usual in the so-called
Fragile States? - Innovations in ME/Strategic Information/surveilla
nce on gender-based violence - USAID through MEASURE Evaluation providing ideas
on design for such a system, including a
decision-support approach and indicators with
WHO, UNICEF, UNFPA and others
8- SUMMARY- to light a candle or curse the darkness?
9MEASURE Evaluation no innovation without
evaluation
- Charles mentioned several of the methods MEASURE
Evaluation has produced or is developing and
applying. - Priorities for Local AIDS Control Efforts (PLACE)
- Sample Vital Registration with Verbal Autopsy
(SAVVY) - PRISM
- Poverty Measures
- Innovation and revitalization is all very nice,
but hang on a second How do we know that these
innovations are answering the task?
10Some examples
- Limited time to discuss a number of activities
and how we do our best to ensure that the ME
methods we develop answer the questions that need
to be asked. - Ill discuss two
- SAVVY
- Verbal Autopsy validation
- Poverty Measures
- Consumption Expenditure Proxy validation
11- SAVVY
- Demographic Surveillance Mortality Surveillance
- D is for Denominators!
- Mortality surveillance based on verbal autopsy
(VA) - How do we validate VA?
12Background and objectives
- Compare the VA to a gold standard (i.e. medical
records) - Validation of VA procedures for three age groups
- Perinatal/neonatal
- Post-neonatal lt 5
- Age 5
- Cause of death list/coding is important!
- International comparability
- International Classification of Diseases (ICD)
13Methods
- A time sample of deaths, or a quota sample of a
certain number of deaths by various causes. - Must have
- Death occurred in health facility, or
- Death occurred at home, but contact with a health
facility before death (so some record) - AND
- A VA for the same individuals to use as the basis
of comparison.
14Coding Cause of Death Assignment
- Coding
- ICD training provided to coding physicians
coded to ICD-10 core four-digit levels - 3-line death certificates produced for all VAs
and medical records - No physician codes both MR and VA for same
individual - Validity of ICD coding verified using tools from
US National Center for Health Statistics.
15But how many carats is the gold standard?
- After verifying validity of underlying COD
- Appropriate diagnostic tests
- Appropriate treatment
- Documented signs
- Reported presenting symptoms
- Consistent past medical history
16Summing up good performance (Tanzania example)
- Perinatal Neonatal causes
- birth asphyxia/respiratory disorders
- intrauterine complications
- Pneumonia
- Post-neonatal child causes
- Pneumonia
- injuries
17Summing up good performance
- Population age 5
- HIV/AIDS (ICD codes B20-B24)
- Malaria
- Tuberculosis (ICD codes A15-A19)
- Cerebrovascular diseases
- Injuries
- Direct maternal causes
18Summing up VA validation issues
- Generalizability of hospital-based validation
results to community-based data (no practical
validation method!). - VA performed reasonably well (according to
specified criteria) for at least 9 causes across
all age groups - Cause-specific mortality rates possible
- For causes that did not perform well
- Trends priority setting generally still OK
- Is poorer performance this due to sample sizes?
- Or inherent limitations of VA?
- We dont know yet
- How many carats is the gold standard and how do
we factor this into validation studies?
19- Poverty Measurement
- Wealth in people versus wealth in things
- Wealth in things Permanent Income
- Consumption Expenditure as best guess of PI
- Proxies for Consumption Expenditure
- Get you absolute and relative measures
- i.e. how many are below the poverty line?
20- How to develop and validate a rapid consumption
expenditure proxy - NOT EASY!
21Which Construct?
- From theory standpoint, options were many
- huge literature menu of poverty measures
- Quickly narrowed to 2 taking constraints
criteria of estimating PI into account - asset index approach
- validated consumption expenditure proxy (CEP)
- (cf Morris 2000)
22Development of a CEP
- First available data from a Household Budget
Survey (HBS) or Living Standards Measurement
Survey (LSMS) used to develop preliminary models,
separately for rural and urban households. - Models identified limited set of potential
variables from a sub-set of variables. - Full HBS or LSMS data then used to evaluate and
thereby adjust the most appropriate model. - Final models used to predict estimates of monthly
household consumption expenditure per adult
equivalent in an evaluation study.
23Household Budget Survey (Tanzania)
- 2000/01 National Bureau of Statistics HBS
provided source data. - 22,000 households
- Consumption expenditure per adult equivalent
calculated on the basis of - Detailed expenditure data collected over a 28-day
period, combined with - a 12-month recall on major items of expenditure.
- Billions and billions of variables! (well, not
that many, but too many to include for an
evaluation study!)
24Model Development Data
- Regression modeling used with
- Household level variables, e.g. type of toilet
facilities, access to water, ownership of a
number of assets, etc., as poverty proxies. - If source data set allows, separate models can be
developed and validated sub-national areas where
evaluation is desired (regional level probably
lowest level in most cases).
25Minimization Validation
- Analytical Methods
- Variables selected using a backward elimination
procedure, but considering the possible
conceptual/local importance of variables
previously removed from the model (e.g. spending
money on fertilizer or seasonal labor). - Model developed using part of the data, and
validated on remaining observations. - Basic validation question How well does the
minimal model predict the true consumption of the
household?
26Validation Model applied to an external data set
- Data set A used for fitting the model
- Remaining data (set B) used for validating model
r 0.72
27Model Results
- Best predictors of consumption expenditure
measured 60-65 of variation in consumption
expenditure - Compression toward the mean (misclassifies some
of the poorest). - Common variables to consider
- Household size
- Education level of head of household
- Number of days meat eaten in past week.
- Urban variables
- Status of walls
- Whether household owned an iron, an electric/gas
stove, an automobile - In past month whether household paid money to
purchase certain food items - Rural variables
- Area of land used for farming/pastoralism
- Whether household spent money to purchase
agricultural inputs. - Number of persons employed in household (inc.
self employed) - Main source of drinking water
- In past 12 months whether household spent money
to purchase fertiliser/manure - Whether household owned a bicycle owned a bed
net - Toilet facility available
- Main fuel used for lighting.
28- Best predictors of consumption expenditure
(rural) - Kilimanjaro (rural) R2 65
- Age of household head
- Area of land used for farming/pastoralism
- In past 12 months whether household spent money
to purchase seeds. - In past 12 months whether household spent money
to purchase fertiliser/manure. - Whether household owned a bicycle, sofa, lamp
- Main source of cash income.
- Morogoro (rural) R2 56
- Sex and age of household head
- Number of persons employed in household (inc.
self employed) - Dependency ratio
- Number of persons per sleeping room
- Main source of drinking water
- In past 12 months whether household spent money
to purchase fertiliser/manure - Whether household owned a bicycle owned a bed
net - Status of walls
- Toilet facility available
- Main fuel used for lighting.
29Model Performance
30AMMP CEP Compared to PCA Index
- Some concerns relating to PCA derived asset
index - Variable selection?
- Connection to wealth
- Binary variables
- Would one set of PCA coefficients, nationally
derived, be likely to give meaningful results? - Is it valid to regard a categorical variable
(e.g. main source of drinking water) as a set of
independent binary variables in PCA? - To what extent is the asset index suitable for
determining wealth quintiles?
31Graphical Comparison
2
1
1 PCA Asset Index (r0.46) 2 Additive Index
(r0.44) 3 CEP (r0.76)
3
32Conclusions 1
- Proxies generally perform poorly, but CEP may be
best of worst so far - Method requires HBS or similar separate
modeling for each region of a country - CEP approach stood up well to model assessment
criteria and cross-validation against an external
data set.
33Conclusions 2
- Performance was reasonable for predicting means
- So able to use these in relating to health
outcome variables (measured at community level). - Predictions at an individual household level are
much less reliable. - Less than 50 of population classified into
correct quintile, but results much better
compared with similar results from alternatives.
34Recommendations
- No innovation without validation!
- In 2005 this should be non-negotiable.
- How good are these new tools?
- How good are the old ones?
- Keep your eyes on the evaluation prize!
- Given the non-negotiable need to know something
concrete about method performance - How much power does your evaluation need?
- Plausibility? Explanatory? Validating right
priorities for health decision-making - Do you get what you pay for?
- Depends on the decisions results
35- cteller_at_usaid.gov
- USAID
- psetel_at_unc.edu
- MEASURE Evaluation
- Carolina Population Center
- University of North Carolina at Chapel Hill
- http//www.cpc.unc.edu/measure
36- Pearls
- (To be developed in session)