Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure - PowerPoint PPT Presentation

About This Presentation

Title:

Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure

Description:

Lecture 22 Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure Lecture Contents Introduction to spatial modelling Nested random effect ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 34

Provided by: Mat5158

Category:

more less

Transcript and Presenter's Notes

Title: Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure

1
Lecture 22

Spatial Modelling 1 Incorporating spatial
modelling in a random effects structure

2
Lecture Contents

Introduction to spatial modelling
Nested random effect levels
House price dataset
Including distance as a fixed effect
Direction effects
Focused clustering (Falkirk dataset)

3
Spatial statistical modelling

Here we require a statistical approach that
accounts for the spatial location at which a
response is collected. This means that the model
that is fitted to the data needs to account for
the spatial effects.
This may be to account for any effects due to
location in the model or to predict values of the
response at other locations via some form of
interpolation that accounts for both other
predictor variables and/or the spatial location.

4
Types of spatial data

There are many forms of spatial data but we can
broadly divide these into three types (Cressie
1993)
Geostatistical data here measurements are taken
at a fixed number of chosen locations in a
geographical area.
Lattice data here measurement are taken at on a
regular lattice and at each point on this lattice
a measurement is collected.
Point process data here each observation is the
location of a response and its co-ordinates are
also recorded.

5
Geostatistical data

Such data are collected in various fields,
particularly mining and earth sciences.
A measurement e.g. age coal ash is taken at each
of a number of locations.
Methods such as variograms and spatial Kriging
are used to analyse such data.
Other application areas include weather maps and
agricultural field trials.
Note such data is not ideally suited to standard
random effect modelling.

6
Disease mapping

One particular type of spatial modelling that is
often linked with random effect modelling is
disease mapping.
Here cases of a disease (either human or animal)
are observed over a chosen region e.g. a country.
We then wish to infer the relative risk of the
disease for a particular individual at a
particular location based on the data collected.
Both our practicals this afternoon will consider
disease mapping datasets. The other two types of
spatial data relate to disease data.

7
Lattice Data

Such data is common in many fields, for example
image analysis where the pixels in an image are
found on a regular rectangular lattice.
More importantly we will consider disease count
data where counts of a disease are recorded for
contiguous regions on a map.
Although a map is not regular we can construct a
lattice from a map by identifying neighbouring
regions and linking neighbouring regions to form
a lattice.

8
Example
Here we see a map of 5 regions in the left hand
picture, and on the right it has been converted
to a lattice with connections between regions
that share boundaries.
?
9
Point process data

This data is also commonly found in disease
mapping although may be used in many applications
where cases of an event are seen at particular
locations.
Each item of data consists of the location of an
event, the response (type of event) and
potentially predictor variables for the event.
Note Rasmus has worked more extensively in this
area and will be happy to answer questions here. ?

10
Disease point process modelling

In disease mapping our data is typically binary
i.e. people are infected (or die from) a disease
or are not.
The data occur in point process form but there
are 2 problems with analysing them as a point
process
All our responses are 1 as we only observe the
infected/dead people!
Due to confidentiality and the sensitive nature
of medical data the data cannot often be released
as individual records.
To counter point 1 we could sample control cases
at random from the population however point 2
means that we typically total up cases for fixed
areas and use a Poisson model on the lattice data
that this creates.

11
Why might there be spatial effects?

This depends on the response variable and
application area.
It is possible that geography is itself a
predictor for our response or is a surrogate for
other factors.
Many factors can be linked to location e.g.
weather, deprivation, altitude, pollution, wealth
which might influence the response.
So if our response is influenced by any of these
factors then accounting for spatial effects many
improve our model.

12
Nested random effects/ levels of geography

The simplest link to random effect models is to
consider nested random effects.
We have considered pupils nested in schools and
cows nested in herds.
In some sense the schools and herds are spatial
units in that schools generally take children
from their locality and a herd is based on a
particular farm. However we could also fit where
the pupils live as another classification of the
data which is more spatial.
On the next slide we consider a dataset with more
levels of geography.

13
UK house prices dataset

An MMath student of mine (David Goodacre) studied
a dataset of house prices in the UK. The data
supplied by the Nationwide building society
consists of average house prices in areas of the
UK over a 12 year period (1992-2003). The data is
for 753 towns in the UK and there are 3 levels of
geography (towns nested in counties nested in
regions.)
Note that if we had individual house sale
information then we could have considered point
process approaches but here we consider random
effect modelling.

14
A 4-level VC model for the house price dataset

The following model was fitted to the data
where i indexes year, j indexes town, k indexes
county and l indexes region. The response, y is
the log of the average price.
This model can be fitted using both frequentist
and likelihood methods in packages that allow
four levels in the model.

15
Links with other topics

It is worth noting that this house price dataset
is a repeated measures dataset as you considered
yesterday.
It also contains missing data as in any year in
which there were less than 50 sales in a postal
town will lead to a missing observation.
However we here assume MAR conditional on the
model we are fitting.

16
Estimates for house price dataset

Below are given IGLS estimates for the model

Parameter Estimate (SE)
ß0 4.036 (0.067)
ß1 -0.020 (0.002)
ß2 0.009 (0.0001)
s2f 0.045 (0.021)
s2v 0.016 (0.004)
s2u 0.045 (0.003)
s2e 0.013 (0.0002)
Here we see that the model consists of parallel
curves with both year and year2 very significant.
The variance is greatest between regions and
between postal towns
17
Region Level Effects
Here we see that the south east of the UK and
London are the most expensive whilst Scotland the
North and Wales are the cheapest.
18
County level effects
After accounting for regions the pattern of
county effects is more sporadic. We can however
pick up 2 regions, Cheshire in the North West and
Surrey in the South East that are more expensive
than their neighbours.
19
Region level predictions

Here we see a graph of region level predictions

20
Further Modelling

In his project Dave looked at random slopes
models at the various levels of the model, so
that we could pick out whether the increase in
prices was different in different regions.
He also looked at fitting models of a more
spatial nature! See next lecture.

21
Why are spatial effects different?

The main difference with spatial effects is that
we have additional information about each
(spatial) unit.
For example if we observe the average house price
of a town in Grampian, a town in Surrey and 2
towns in Berkshire then we know something of the
spatial relation of these towns.
We might expect the prices in the 2 towns in
Berkshire to be similar and to be more similar to
Surrey which is also in the South East than
Grampian that is in Scotland.
In our current models we will fit an effect for
Berkshire which will capture some of the
relationship between its 2 towns and a South East
effect that will capture the link with the Surrey
town.

22
Problems with the nested classification approach

As we have seen the nested classification
approach can capture much of the spatial
variability however we have to decide on the
geographic definitions of areas.
We generally use easily available definitions
e.g. county and region but there is no guarantee
that these are the best classifications.
We also have the problem of border effects, for
example two towns on either side of a region
border will not share either region or county
effects but may have very similar prices.
We will look at another approach here before
studying more complex spatial approaches in the
next lecture.

23
Including location in fixed effects

It may be the case that there is a trend e.g. in
house prices in the UK they generally fall as we
move North and West. We could therefore add in
two (fixed effect) predictors giving the N/S and
E/W co-ordinates of each point.
If the unit of observation is an area e.g. postal
town we would generally use the co-ordinates of
the centroid of the unit.
If a linear relationship is not sensible then we
could consider polynomial terms in each
direction. For example (excluding random effects)

24
Distance effects

Another possibility in terms of UK house prices
is to consider the distance from London. This
distance can be constructed from the co-ordinates
of each point. The graph to the left gives the
combined region and county effects and suggests a
distance from London effect might be appropriate.

25
Distance and direction effects

In some scenarios the direction as well as the
distance from a particular point is important.
This is not the case with house prices however in
pollution data then direction can be very
important where a dominant wind direction will
suggest that particular directions away from the
source will experience more pollution than
others.
We will next look at a dataset from Falkirk in
Scotland that is analysed in Lawson, Browne
Vidal Rodeiro (2003)

26
Focused Clustering

One research area in public health looks at the
impact of sources of pollution on the health
status of communities. The detection of patterns
of health events associated with pollution
sources is known as focused clustering. The
statistical modelling involved usually relates to
the point process nature of such data.
Lawson, Browne Vidal Rodeiro (2003) devote a
whole chapter to Focused clustering and include
some fairly complex models that can be considered
in WinBUGS. Here we will look at some simpler
models that can be fitted in MLwiN to a dataset
from Falkirk in Scotland.

27
Respiratory cancer in Falkirk

The figure to the right shows the census
geographies of 26 regions found around a foundry
(marked by ) in Falkirk, Scotland. It is thought
conceivable that the foundry was an air pollution
hazard in the early 1970s prior to the study.
This could have an impact on the respiratory
cancer experience of those living in the areas
close to the foundry

28
Falkirk dataset

The data consists of observed and expected counts
of respiratory cancer cases in the time period
1978-1983.
We first compare the standardized mortality rates
(SMRs) observed/expected against the locations
of the centroids of the 26 areas in Falkirk
(relative to the foundry) to look for patterns.

29
Position of the sites

Note in the graphs to the right that the 3
highest SMRs are close to the source both in the
N/S and E/W directions.
We can convert these locations to distance and
direction measures.

30
Distance and direction

Here we see that there appears to be a negative
relationship between distance and SMR but no
obvious pattern with regard to the direction
relationship.

31
(Extra) Poisson modelling

We have modelled the effects of deprivation,
distance and direction in the following Poisson
model
Note that we have used 1st order MQL in MLwiN and
allowed extra-Poisson variation. This shows there
is less variation than a Poisson distribution so
we will also try fitting SMR as a Normal
distributed response.

32
Normal response model for SMR

Here we see that none of the predictors has a
significant effect which is probably because the
dataset is so small.
We do see however that the risk reduces as
distance from the foundry increases and for areas
with larger deprivation scores. (suggesting
higher rates in less deprived areas but not
significantly.)

33
Information for the practical

In the practical we will return to using nested
random effects to account for spatial effects.
Our data is from the European community and
consists of male deaths from malignant melanoma
in 9 countries in the EU.
The practical is a (modified) chapter from Browne
(2003) and looks at MCMC methods for this
dataset. It is also analysed using
quasilikelihood methods in the MLwiN users guide
and you are welcome to also try these methods.