Applying%20Geostatistical%20Methods%20to%20Lattice%20Data:%20An%20Initial%20Examination%20of%20U.S.%20Presidential%20Elections%20in%20Iowa - PowerPoint PPT Presentation

About This Presentation

Title:

Applying%20Geostatistical%20Methods%20to%20Lattice%20Data:%20An%20Initial%20Examination%20of%20U.S.%20Presidential%20Elections%20in%20Iowa

Description:

... of U.S. Presidential Elections in Iowa ... Data sources: http://www.sos.state.ia.us/elections/results/ (1996/2000) ... Chosen model: Iowa. Why Iowa? ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 37

Provided by: WSE9171

Learn more at: http://www.stat.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Applying%20Geostatistical%20Methods%20to%20Lattice%20Data:%20An%20Initial%20Examination%20of%20U.S.%20Presidential%20Elections%20in%20Iowa

1
Applying Geostatistical Methods to Lattice Data
An Initial Examination of U.S. Presidential
Elections in Iowa

A.C. Thomas
Statistics 225
December 14, 2004

2
Sources/Guides

Main source Hierarchical Models, chapters 2
and 3 (geostatistical and spatial data)
Data sources http//www.sos.state.ia.us/elections
/results/ (1996/2000)
http//www.cnn.com/ (2004)
Special thanks Brad Carlin (UMN), Andy Gelman
(Columbia), Paul Edlefsen (Harvard)
GeoR P.J. Ribeiro and P.J. Diggle

3
Motivation

In this course, we have learned about three
different methods of examining spatial data
(depending on relevant conditions) with some
interchangeabilities
Often, we may not have the tools to examine data
sets using one method (i.e. the shortcomings of R
in manipulating lattice data)
In this case, we will compare and contrast the
effectiveness of a geostatistical method used on
lattice data to a lattice method through self
cross-validation

4
Interrelationship

Geostats and kriging using variograms and
distance relationships to predict quantities
across distances
Lattices using neighbour relationships to
predict quantities across distances
Direct similarities some weighting schemes
across distances directly resemble covariograms

5
Why election data?

Why not?
Spatial organization is well understood and
constant in time (county borders have not changed
across data sets) and built into R (maps library)
While specific challengers change over time,
parties are relatively constant, as are other
control variables
Ramifications are germane to the functioning of
society (and the insatiable appetite of news
junkies and policy wonks)

6
Questions

For this data set, does a geostatistical
approximation produce a result comparable in
error to a lattice model?
If so, can we use fitted information from one
election to predict the complete results of the
next one? (And how much are we off?)

7
Chosen model Iowa
8
Why Iowa?

99 counties which have roughly equal area,
removing a possible nuisance (and are
rectilinear, so easier to draw)
Swing state, with a rough vote balance over time
Not too big, not too small in either population
or size

9
Simplification No third parties

For now, considering only the votes for Democrat
and Republican candidates in presidential
elections from 1996-2004
Not so bad in 2000/2004, when independent vote
was about 3 of total
Worse in 1996 (Perots successful campaign drew a
lot), up to 10 of total votes

10
Iowa in 1996 (Dole, Clinton)
11
Iowa in 2000 (Bush, Gore)
12
Iowa in 2004 (Bush, Kerry)
13
Initial impressions

There seems to be a tendency to vote more
Republican the further west we look
(Observation, courtesy Matt Anthony as we go
east, we hit Illinois, a Democratic core.)
What is the population distribution by county
over time?

14
Iowas total voters, 1996
15
Iowas total voters, 2000
16
Iowas total voters, 2004
17
Quick-and-dirty non-spatial analysis

Question how does population size correlate with
the Democratic vote?
Correlation between blue vote and total vote
1996 r 0.18
2000 r 0.30
2004 r 0.29.
So population would appear to be an important
covariate.

18
Geostatistical analysis

Locations centroids of each county (obtained
through centroid.polygon function in maps library
of R)
Data Republican percentage of vote (arbitrarily
chosen, not necessarily personal political
affiliation)

19
Initial data plots Unaltered
20
Initial fitting

Semivariogram appears to increase without bound,
suggesting nonstationarity
Plan use Universal Kriging with this
semivariogram
Problem Trend appears to be power law, with
power greater than 2 (impossible to fit with
conventional definitions
Possible solutions a) remove trend from data. b)
dont care.

21
Plan A Remove trend from data

What it does lets us remove known spatial
dependence, look at other trends
Initial look
major discrepancies.

22
Plan B Dont care.

The goodness of fit only tails off at the end
Preliminary results show the other option to be
extremely inaccurate due to noise levels in
residual data

23
Second trend removed, data centered
24
Exploratory Kriging
25
Meaningful Kriging

Since we want to test the predictive power of
this method, we should test it on our current
data through cross-validation
Key remove one point, use semivariogram with
remaining points to interpolate the value at each
centroid
Then, return trend to data and compare with
original values
Use universal kriging with second-degree trend

26
1996 Redux Predicted Values

In total, Dole receives 9,726 more votes than
predicted.
Absolute error 43,526
Total 2-party votes 1,112,902

27
Fitting variograms between models

For all, power model was appropriate choice g
t2 s2 tj
1996 s2 9.24e-4, j1.98, t20.031
2000 s2 9.93e-4, j2.00, t20
2004 s2 1.16e-3, j2.00, t20.025
All roughly identical, even with different total
averages

28
2000 Predicted

Prediction Bush gets 26,000 more votes
Absolute error 181,880
Total Bush/Gore votes 1,272,890

29
2004 Prediction

Prediction Bush gets 32,094 more votes
Absolute difference 74,458
Total votes 1,479,702

30
Naïve Neighbour

For a baseline comparison, take the simplest
(stupidest) lattice cross-validation test ask
your neighbour, trivial SAR weights
Predicted value at a square is simply the mean of
border-sharing neighbours (data is Republican
percentage of vote)

31
NN 1996

Dole 10,819 more predicted
Total deviation 40,923

32
NN 2000

Bush gets 28,535 extra in prediction
Total deviation 59,670

33
NN 2004

Bush gets 37,175 more
Total deviation 76,926

34
Cross-validation summary
Geostat error NN error Geostat total error NN total error Voting pop.
1996 9,726 10,819 43,526 40,923 1,112,902
2000 26,000 28,535 61,485 59,670 1,272,890
2004 32,094 37,175 74,458 76,926 1,479,702
35
Conclusions

Data is definitely not stationary, even after
removing trends
Good kriging is about as effective as naïve
neighbour, both without covariates
Prediction with these tools at this simple level
is not yet accurate enough
Each method overpredicts the Republican vote
Fitting information for each year is very close

36
Future Developments and Unanswered Questions
New!

Ive since introduced universal co-kriging with
population, past voting behavior and
second-degree spatial dependences using the gstat
package.
Needed data from the last 4 elections,
conveniently packaged. Other prediction using
spatial methods.

Write a Comment

User Comments (0)