Title: Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component
1Improved County Level Estimation of Crop Yield
Using Model-Based Methodology With a Spatial
Component
- Michael E. Bellow, USDA/NASS
-
-
2Outline
- Background
- Simulation Methodology
- Results of Ten State Study
- Convergence Evaluation
- Summary
3County Level Commodity Estimation
- NASS program since 1917
- Estimates used by private sector, academia,
government - Data from various sources used
- NASS County Estimates System developed to
facilitate the estimation process
4Available Data Sources
- Voluntary response surveys of farm operators
- List frame control data (lists of known farming
operations) - Previous year official estimates
- Census of Agriculture data (NASS conducts Census
every five years) - Earth resources satellite data
5County Crop Yield Estimation
- Yield is ratio of crop production to harvested
area (acres) - Accurate estimation challenging due to
- - reliable administrative data seldom available
- - high year-to-year variability of yields
(weather - sensitive)
- - lack of adequate sample survey data
6Desirable Features of a County Yield Estimation
Method
- Repeatability
- Accurate variance estimation
- Produce estimates for counties having no survey
data
7Ratio (R) Estimator
- Traditional crop yield estimator used by NASS
- Computed as ratio between production and
harvested area estimates (with minor adjustment) - Can produce inconsistent yields due to
fluctuations in harvested acreage - No utilization of survey data from counties
- other than the one being estimated
8Model-Based County Estimation Methods
- Based on linear or non-linear models relating
true yields to survey reported values - Generally fit using an iterative algorithm
- Convergence not always guaranteed
- Estimates can be adjusted for consistency with
published state figures
9Stasny-Goel (SG) Method
- Developed at Ohio State University under
cooperative agreement with NASS - Assumes mixed effects model with farm size group
as fixed effect and county as random effect - Random effect assumed multivariate normal with
covariance matrix reflecting spatial correlation
among neighboring counties - - corr(ti, tj ) r if county i
borders county j - 0 otherwise
- EM algorithm used to fit model
10Stasny-Goel Method (cont.)
- Previous year county yields used to derive
initial estimates of county and size group
effects - Processing continues until at least one of the
following two conditions is satisfied - relative group and log-likelihood distances fall
below preset limits - maximum allowable number of iterations reached
- County yield estimates computed as weighted
- averages of individual farm level estimates
- (weights derived from Census of Agriculture
data)
11Griffith (G) Method
- Developed by Dr. Dan Griffith at Syracuse
University under cooperative agreement with NASS - Predicts yield values using published number of
farms producing crop of interest - Assumes autoregressive model
- Employs Box-Cox and Box-Tidwell transformations
- Spatial imputation routine can compute estimates
for counties with missing survey data
12Previous Research on Model-Based Methods
- Stasny, Goel and Rumsey (1991) early version of
SG method tested on Kansas wheat production data - Stasny et al (1995) improved version of SG
tested on Ohio corn yield data - Crouse (2000) SG evaluated for Michigan corn
and barley yield - Griffith (2000) Griffith method tested on
Michigan - corn yield data
- Bellow (2004) SG and Griffith methods compared
for North Dakota oats and barley yield (presented
at FCSM Research Conference)
13Ten-State Research Study
- Compare performance of Stasny-Goel, Griffith and
ratio methods for various crops in ten
geographically dispersed states - NY, OH, MI, TN, MS, FL, ND, OK,
- CO, WA
- Criteria for comparison bias, variance, MSE,
outlier properties, convergence percentage
14States In Study Area
15Post-Stratification Size Groups
- NASS statewide survey data post-stratified by
county and farm size based on COA data - (two or three size groups defined)
- Percentages of Census farm acres by size group
used as weights for SG algorithm - Equal total land in farms criterion used to
- form groups
16Data Sources For Research Study
- 2002-03 Quarterly Agricultural Survey
- 2001-03 County Estimates Survey
- 2001-02 official crop yield estimates
- (previous year data)
- 2002 Census of Agriculture (number of
- farms, land in farms)
17Simulation Procedure
- Multiple regression performed on survey reported
yield vs. official county yields, - weighted average neighbor yields, size group
membership variables - Artificial population of 10,000 simulated survey
data sets used to compute true population
parameter values - 250 sample data sets selected at random from
population -
18Simulation Procedure (cont.)
- Morans I computed to test whether simulated
data sets reflect spatial correlation of real - survey data
- SG, G and R methods applied to each of the
- 250 sampled data sets
- Average simulated parameter values compared with
corresponding population values for each
estimation method
19Measures of Estimator Performance
- Absolute Bias - average absolute difference
between simulated yield estimates and true
(population) yield - Variance sample variance of simulated yield
estimates - Mean Square Error average squared deviation
between simulated estimates and true yield (SG
program also computes analytic MSE) - Lower (Upper) Tail Proximity average absolute
difference between 5th (95th) percentile of
simulated yield estimates and true yield -
-
20Pairwise Estimator Comparison for Absolute Bias
( - better method)
Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith
Crop Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Crop SG R SG G
Barley 90 10 82 18
Corn 92 8 66 34
Cotton (upland) 86 14 58 42
Dry Beans 93 7 73 27
Oats 88 12 63 37
Rye 83 17 47 53
Sorghum 84 16 59 41
Soybeans 88 12 62 38
Sunflower 94 6 69 31
Tobacco (burley) 98 2 56 44
Wheat (spring) 83 17 78 22
Wheat (winter) 83 17 66 34
21Pairwise Estimator Comparison for Variance
Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith
Crop Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Crop SG R SG G
Barley 100 0 51 49
Corn 99.9 0.1 33 67
Cotton (upland) 100 0 13 87
Dry Beans 100 0 20 80
Oats 100 0 36 64
Rye 97 3 77 23
Sorghum 98 2 25 75
Soybeans 100 0 40 60
Sunflower 100 0 56 44
Tobacco (burley) 100 0 49 51
Wheat (spring) 100 0 62 38
Wheat (winter) 100 0 43 57
22Pairwise Estimator Comparison for MSE
Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith
Crop Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Crop SG R SG G
Barley 92 8 77 23
Corn 94 6 62.5 37.5
Cotton (upland) 89 11 55 45
Dry Beans 96 4 75 25
Oats 90 10 61 39
Rye 87 13 40 60
Sorghum 84 16 51 49
Soybeans 89 11 57 43
Sunflower 95.5 4.5 65 35
Tobacco (burley) 100 0 53 47
Wheat (spring) 85 15 80 20
Wheat (winter) 86 14 64 36
23Pairwise Estimator Comparison for LTP
Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith
Crop Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Crop SG R SG G
Barley 92 8 55 45
Corn 93 7 41 59
Cotton (upland) 84 16 41 59
Dry Beans 96 4 64 36
Oats 94 6 52 48
Rye 90 10 40 60
Sorghum 97 3 59 41
Soybeans 85 15 38 62
Sunflower 96 4 56 44
Tobacco (burley) 100 0 31 69
Wheat (spring) 99 1 69 31
Wheat (winter) 89 11 50 50
24Pairwise Estimator Comparison for UTP
Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith
Crop Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Crop SG R SG G
Barley 93 7 61 39
Corn 98 2 56 44
Cotton (upland) 97 3 53 47
Dry Beans 98 2 49 51
Oats 92 8 43 57
Rye 97 3 33 67
Sorghum 84 16 32 68
Soybeans 99 1 53 47
Sunflower 91 9 43 57
Tobacco (burley) 98 2 69 31
Wheat (spring) 85 15 47 53
Wheat (winter) 90 10 53 47
25Additional Bias Evaluation
- Wilcoxon Rank Sum Test compare median absolute
error (over simulation runs) of SG vs. R, SG vs.
G for each county - Wilcoxon Signed Rank Test assess whether median
error of SG, G, R is negative, positive or zero
(two one-sided tests performed for each county)
26Results of Rank Sum Tests on Absolute Bias
Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith
Crop Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Crop SG R Neither SG G Neither
Barley 82 9 10 74 13 13
Corn 85 7 8 62 27 11
Cotton (upland) 78 13 9 54 33 13
Dry Beans 84 7 9 67 22 11
Oats 76 11 13 61 30 10
Rye 63 10 27 40 40 20
Sorghum 65 13 22 56 35 10
Soybeans 80 12 9 60 32 7
Sunflower 85 5 10 66 25 8
Tobacco (burley) 95 2 3 45 38 16
Wheat (spring) 78 15 7 75 17 9
Wheat (winter) 72 16 12 61 27 12
All 79 11 10 62 27 11
27Summary of Signed Rank Test Results (All Crops
Combined)
Method Test Result Test Result Test Result Test Result Test Result Test Result
Method Bias lt 0 Bias lt 0 Bias gt 0 Bias gt 0 Bias 0 Bias 0
Method No. Counties No. Counties No. Counties
Stasny-Goel 1607 59 887 32 243 9
Griffith 1456 54 1174 43 82 3
Ratio 292 11 245 9 2200 80
28Percent of Counties With Average Underestimate
Less Than 10 of True Yield ( - best method)
Crop Method Method Method
Crop Stasny-Goel Griffith Ratio
Barley 81 62 46
Corn 83 71 42
Cotton (upland) 79 78 64.5
Dry Beans 95 74 62.5
Oats 70.5 54 21
Rye 41 52 13
Sorghum 52 41 11
Soybeans 84 76 62
Sunflower 80 63.5 50
Tobacco (burley) 93 98 27
Wheat (spring) 94 55 54
Wheat (winter) 86 75 51.5
29Convergence Issues
- SG algorithm not guaranteed to converge within
fixed limit on number of iterations - Non-convergence associated with numerical
instability conditions - Yield estimates produced for non-convergent runs
may be suspect - Convergence generally most reliable for highly
prevalent crops, least reliable for rare crops
30Algorithm Convergence Percentage By Crop (Limit
of 5000 Iterations)
Crop Method Method
Crop Stasny-Goel Griffith
Barley 93 68
Corn 87 77
Cotton (upland) 81 89
Dry Beans 89 75
Oats 80 71
Rye 74 83
Sorghum 85 66
Soybeans 93 73
Sunflower 90.5 80
Tobacco (burley) 41 52
Wheat (spring) 63 52.5
Wheat (winter) 88 65
31Two Approaches to Dealing With SG
Non-Convergence
- SG(1) - use estimate generated at final allowable
- iteration (N0)
- SG(2) - keep track of which iteration (i)
maximized - the log-likelihood
- - if i lt N0 , rerun
algorithm to i and use that estimate - - if i N0 , resume processing
at iteration (N01) and continue - until either -
- o convergence occurs (use
that estimate) OR -
- o log-likelihood decreases
from one iteration to next (use estimate - at next-to-last iteration)
32Non-Convergence Study
- Does SG(1) or SG(2) outperform ratio estimator in
- cases where SG failed to converge?
-
- Six cases with high non-convergence percentage
selected for comparison of SG(1), SG(2) and R - - 2002 CO barley (37 simulation runs)
- - 2002 MS soybeans (105)
- - 2002 NY winter wheat (39)
- - 2002 ND dry beans (38)
- - 2002 OH oats (50)
- - 2003 OK rye (59)
33Combined Pairwise Estimator Comparison
forNon-Convergence Test Cases
Measure SG(1) vs. Ratio SG(1) vs. Ratio SG(2) vs. Ratio SG(2) vs. Ratio SG(1) vs. SG(2) SG(1) vs. SG(2)
Measure Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
Measure SG(1) R SG(2) R SG(1) SG(2)
Absolute Bias 78 22 80 20 23 77
Variance 95 5 99 1 0 100
MSE 81 19 83 17 15 85
LTP 74 26 88 12 13 87
UTP 84 16 90 10 15 85
34Summary
- SG yield estimation method outperforms R in all
efficiency categories and G in most categories (G
outperforms R) - Convergence problems can be alleviated using
enhanced SG approach - SG method recommended for integration into NASS
County Estimates System