Title: Applying%20Geostatistical%20Methods%20to%20Lattice%20Data:%20An%20Initial%20Examination%20of%20U.S.%20Presidential%20Elections%20in%20Iowa
1Applying Geostatistical Methods to Lattice Data
An Initial Examination of U.S. Presidential
Elections in Iowa
- A.C. Thomas
- Statistics 225
- December 14, 2004
2Sources/Guides
- Main source Hierarchical Models, chapters 2
and 3 (geostatistical and spatial data) - Data sources http//www.sos.state.ia.us/elections
/results/ (1996/2000) - http//www.cnn.com/ (2004)
- Special thanks Brad Carlin (UMN), Andy Gelman
(Columbia), Paul Edlefsen (Harvard) - GeoR P.J. Ribeiro and P.J. Diggle
3Motivation
- In this course, we have learned about three
different methods of examining spatial data
(depending on relevant conditions) with some
interchangeabilities - Often, we may not have the tools to examine data
sets using one method (i.e. the shortcomings of R
in manipulating lattice data) - In this case, we will compare and contrast the
effectiveness of a geostatistical method used on
lattice data to a lattice method through self
cross-validation
4Interrelationship
- Geostats and kriging using variograms and
distance relationships to predict quantities
across distances - Lattices using neighbour relationships to
predict quantities across distances - Direct similarities some weighting schemes
across distances directly resemble covariograms
5Why election data?
- Why not?
- Spatial organization is well understood and
constant in time (county borders have not changed
across data sets) and built into R (maps library) - While specific challengers change over time,
parties are relatively constant, as are other
control variables - Ramifications are germane to the functioning of
society (and the insatiable appetite of news
junkies and policy wonks)
6Questions
- For this data set, does a geostatistical
approximation produce a result comparable in
error to a lattice model? - If so, can we use fitted information from one
election to predict the complete results of the
next one? (And how much are we off?)
7Chosen model Iowa
8Why Iowa?
- 99 counties which have roughly equal area,
removing a possible nuisance (and are
rectilinear, so easier to draw) - Swing state, with a rough vote balance over time
- Not too big, not too small in either population
or size
9Simplification No third parties
- For now, considering only the votes for Democrat
and Republican candidates in presidential
elections from 1996-2004 - Not so bad in 2000/2004, when independent vote
was about 3 of total - Worse in 1996 (Perots successful campaign drew a
lot), up to 10 of total votes
10Iowa in 1996 (Dole, Clinton)
11Iowa in 2000 (Bush, Gore)
12Iowa in 2004 (Bush, Kerry)
13Initial impressions
- There seems to be a tendency to vote more
Republican the further west we look - (Observation, courtesy Matt Anthony as we go
east, we hit Illinois, a Democratic core.) - What is the population distribution by county
over time?
14Iowas total voters, 1996
15Iowas total voters, 2000
16Iowas total voters, 2004
17Quick-and-dirty non-spatial analysis
- Question how does population size correlate with
the Democratic vote? - Correlation between blue vote and total vote
- 1996 r 0.18
- 2000 r 0.30
- 2004 r 0.29.
- So population would appear to be an important
covariate.
18Geostatistical analysis
- Locations centroids of each county (obtained
through centroid.polygon function in maps library
of R) - Data Republican percentage of vote (arbitrarily
chosen, not necessarily personal political
affiliation)
19Initial data plots Unaltered
20Initial fitting
- Semivariogram appears to increase without bound,
suggesting nonstationarity - Plan use Universal Kriging with this
semivariogram - Problem Trend appears to be power law, with
power greater than 2 (impossible to fit with
conventional definitions - Possible solutions a) remove trend from data. b)
dont care.
21Plan A Remove trend from data
- What it does lets us remove known spatial
dependence, look at other trends - Initial look
- major discrepancies.
22Plan B Dont care.
- The goodness of fit only tails off at the end
- Preliminary results show the other option to be
extremely inaccurate due to noise levels in
residual data
23Second trend removed, data centered
24Exploratory Kriging
25Meaningful Kriging
- Since we want to test the predictive power of
this method, we should test it on our current
data through cross-validation - Key remove one point, use semivariogram with
remaining points to interpolate the value at each
centroid - Then, return trend to data and compare with
original values - Use universal kriging with second-degree trend
261996 Redux Predicted Values
- In total, Dole receives 9,726 more votes than
predicted. - Absolute error 43,526
- Total 2-party votes 1,112,902
27Fitting variograms between models
- For all, power model was appropriate choice g
t2 s2 tj - 1996 s2 9.24e-4, j1.98, t20.031
- 2000 s2 9.93e-4, j2.00, t20
- 2004 s2 1.16e-3, j2.00, t20.025
- All roughly identical, even with different total
averages
282000 Predicted
- Prediction Bush gets 26,000 more votes
- Absolute error 181,880
- Total Bush/Gore votes 1,272,890
292004 Prediction
- Prediction Bush gets 32,094 more votes
- Absolute difference 74,458
- Total votes 1,479,702
30Naïve Neighbour
- For a baseline comparison, take the simplest
(stupidest) lattice cross-validation test ask
your neighbour, trivial SAR weights - Predicted value at a square is simply the mean of
border-sharing neighbours (data is Republican
percentage of vote)
31NN 1996
- Dole 10,819 more predicted
- Total deviation 40,923
32NN 2000
- Bush gets 28,535 extra in prediction
- Total deviation 59,670
33NN 2004
- Bush gets 37,175 more
- Total deviation 76,926
34Cross-validation summary
Geostat error NN error Geostat total error NN total error Voting pop.
1996 9,726 10,819 43,526 40,923 1,112,902
2000 26,000 28,535 61,485 59,670 1,272,890
2004 32,094 37,175 74,458 76,926 1,479,702
35Conclusions
- Data is definitely not stationary, even after
removing trends - Good kriging is about as effective as naïve
neighbour, both without covariates - Prediction with these tools at this simple level
is not yet accurate enough - Each method overpredicts the Republican vote
- Fitting information for each year is very close
36Future Developments and Unanswered Questions
New!
- Ive since introduced universal co-kriging with
population, past voting behavior and
second-degree spatial dependences using the gstat
package. - Needed data from the last 4 elections,
conveniently packaged. Other prediction using
spatial methods.