Applying%20Geostatistical%20Methods%20to%20Lattice%20Data:%20An%20Initial%20Examination%20of%20U.S.%20Presidential%20Elections%20in%20Iowa - PowerPoint PPT Presentation

About This Presentation
Title:

Applying%20Geostatistical%20Methods%20to%20Lattice%20Data:%20An%20Initial%20Examination%20of%20U.S.%20Presidential%20Elections%20in%20Iowa

Description:

... of U.S. Presidential Elections in Iowa ... Data sources: http://www.sos.state.ia.us/elections/results/ (1996/2000) ... Chosen model: Iowa. Why Iowa? ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Applying%20Geostatistical%20Methods%20to%20Lattice%20Data:%20An%20Initial%20Examination%20of%20U.S.%20Presidential%20Elections%20in%20Iowa


1
Applying Geostatistical Methods to Lattice Data
An Initial Examination of U.S. Presidential
Elections in Iowa
  • A.C. Thomas
  • Statistics 225
  • December 14, 2004

2
Sources/Guides
  • Main source Hierarchical Models, chapters 2
    and 3 (geostatistical and spatial data)
  • Data sources http//www.sos.state.ia.us/elections
    /results/ (1996/2000)
  • http//www.cnn.com/ (2004)
  • Special thanks Brad Carlin (UMN), Andy Gelman
    (Columbia), Paul Edlefsen (Harvard)
  • GeoR P.J. Ribeiro and P.J. Diggle

3
Motivation
  • In this course, we have learned about three
    different methods of examining spatial data
    (depending on relevant conditions) with some
    interchangeabilities
  • Often, we may not have the tools to examine data
    sets using one method (i.e. the shortcomings of R
    in manipulating lattice data)
  • In this case, we will compare and contrast the
    effectiveness of a geostatistical method used on
    lattice data to a lattice method through self
    cross-validation

4
Interrelationship
  • Geostats and kriging using variograms and
    distance relationships to predict quantities
    across distances
  • Lattices using neighbour relationships to
    predict quantities across distances
  • Direct similarities some weighting schemes
    across distances directly resemble covariograms

5
Why election data?
  • Why not?
  • Spatial organization is well understood and
    constant in time (county borders have not changed
    across data sets) and built into R (maps library)
  • While specific challengers change over time,
    parties are relatively constant, as are other
    control variables
  • Ramifications are germane to the functioning of
    society (and the insatiable appetite of news
    junkies and policy wonks)

6
Questions
  • For this data set, does a geostatistical
    approximation produce a result comparable in
    error to a lattice model?
  • If so, can we use fitted information from one
    election to predict the complete results of the
    next one? (And how much are we off?)

7
Chosen model Iowa
8
Why Iowa?
  • 99 counties which have roughly equal area,
    removing a possible nuisance (and are
    rectilinear, so easier to draw)
  • Swing state, with a rough vote balance over time
  • Not too big, not too small in either population
    or size

9
Simplification No third parties
  • For now, considering only the votes for Democrat
    and Republican candidates in presidential
    elections from 1996-2004
  • Not so bad in 2000/2004, when independent vote
    was about 3 of total
  • Worse in 1996 (Perots successful campaign drew a
    lot), up to 10 of total votes

10
Iowa in 1996 (Dole, Clinton)
11
Iowa in 2000 (Bush, Gore)
12
Iowa in 2004 (Bush, Kerry)
13
Initial impressions
  • There seems to be a tendency to vote more
    Republican the further west we look
  • (Observation, courtesy Matt Anthony as we go
    east, we hit Illinois, a Democratic core.)
  • What is the population distribution by county
    over time?

14
Iowas total voters, 1996
15
Iowas total voters, 2000
16
Iowas total voters, 2004
17
Quick-and-dirty non-spatial analysis
  • Question how does population size correlate with
    the Democratic vote?
  • Correlation between blue vote and total vote
  • 1996 r 0.18
  • 2000 r 0.30
  • 2004 r 0.29.
  • So population would appear to be an important
    covariate.

18
Geostatistical analysis
  • Locations centroids of each county (obtained
    through centroid.polygon function in maps library
    of R)
  • Data Republican percentage of vote (arbitrarily
    chosen, not necessarily personal political
    affiliation)

19
Initial data plots Unaltered
20
Initial fitting
  • Semivariogram appears to increase without bound,
    suggesting nonstationarity
  • Plan use Universal Kriging with this
    semivariogram
  • Problem Trend appears to be power law, with
    power greater than 2 (impossible to fit with
    conventional definitions
  • Possible solutions a) remove trend from data. b)
    dont care.

21
Plan A Remove trend from data
  • What it does lets us remove known spatial
    dependence, look at other trends
  • Initial look
  • major discrepancies.

22
Plan B Dont care.
  • The goodness of fit only tails off at the end
  • Preliminary results show the other option to be
    extremely inaccurate due to noise levels in
    residual data

23
Second trend removed, data centered
24
Exploratory Kriging
25
Meaningful Kriging
  • Since we want to test the predictive power of
    this method, we should test it on our current
    data through cross-validation
  • Key remove one point, use semivariogram with
    remaining points to interpolate the value at each
    centroid
  • Then, return trend to data and compare with
    original values
  • Use universal kriging with second-degree trend

26
1996 Redux Predicted Values
  • In total, Dole receives 9,726 more votes than
    predicted.
  • Absolute error 43,526
  • Total 2-party votes 1,112,902

27
Fitting variograms between models
  • For all, power model was appropriate choice g
    t2 s2 tj
  • 1996 s2 9.24e-4, j1.98, t20.031
  • 2000 s2 9.93e-4, j2.00, t20
  • 2004 s2 1.16e-3, j2.00, t20.025
  • All roughly identical, even with different total
    averages

28
2000 Predicted
  • Prediction Bush gets 26,000 more votes
  • Absolute error 181,880
  • Total Bush/Gore votes 1,272,890

29
2004 Prediction
  • Prediction Bush gets 32,094 more votes
  • Absolute difference 74,458
  • Total votes 1,479,702

30
Naïve Neighbour
  • For a baseline comparison, take the simplest
    (stupidest) lattice cross-validation test ask
    your neighbour, trivial SAR weights
  • Predicted value at a square is simply the mean of
    border-sharing neighbours (data is Republican
    percentage of vote)

31
NN 1996
  • Dole 10,819 more predicted
  • Total deviation 40,923

32
NN 2000
  • Bush gets 28,535 extra in prediction
  • Total deviation 59,670

33
NN 2004
  • Bush gets 37,175 more
  • Total deviation 76,926

34
Cross-validation summary
Geostat error NN error Geostat total error NN total error Voting pop.
1996 9,726 10,819 43,526 40,923 1,112,902
2000 26,000 28,535 61,485 59,670 1,272,890
2004 32,094 37,175 74,458 76,926 1,479,702
35
Conclusions
  • Data is definitely not stationary, even after
    removing trends
  • Good kriging is about as effective as naïve
    neighbour, both without covariates
  • Prediction with these tools at this simple level
    is not yet accurate enough
  • Each method overpredicts the Republican vote
  • Fitting information for each year is very close

36
Future Developments and Unanswered Questions
New!
  • Ive since introduced universal co-kriging with
    population, past voting behavior and
    second-degree spatial dependences using the gstat
    package.
  • Needed data from the last 4 elections,
    conveniently packaged. Other prediction using
    spatial methods.
Write a Comment
User Comments (0)
About PowerShow.com