Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure - PowerPoint PPT Presentation

About This Presentation
Title:

Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure

Description:

Lecture 22 Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure Lecture Contents Introduction to spatial modelling Nested random effect ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 34
Provided by: Mat5158
Category:

less

Transcript and Presenter's Notes

Title: Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure


1
Lecture 22
  • Spatial Modelling 1 Incorporating spatial
    modelling in a random effects structure

2
Lecture Contents
  • Introduction to spatial modelling
  • Nested random effect levels
  • House price dataset
  • Including distance as a fixed effect
  • Direction effects
  • Focused clustering (Falkirk dataset)

3
Spatial statistical modelling
  • Here we require a statistical approach that
    accounts for the spatial location at which a
    response is collected. This means that the model
    that is fitted to the data needs to account for
    the spatial effects.
  • This may be to account for any effects due to
    location in the model or to predict values of the
    response at other locations via some form of
    interpolation that accounts for both other
    predictor variables and/or the spatial location.

4
Types of spatial data
  • There are many forms of spatial data but we can
    broadly divide these into three types (Cressie
    1993)
  • Geostatistical data here measurements are taken
    at a fixed number of chosen locations in a
    geographical area.
  • Lattice data here measurement are taken at on a
    regular lattice and at each point on this lattice
    a measurement is collected.
  • Point process data here each observation is the
    location of a response and its co-ordinates are
    also recorded.

5
Geostatistical data
  • Such data are collected in various fields,
    particularly mining and earth sciences.
  • A measurement e.g. age coal ash is taken at each
    of a number of locations.
  • Methods such as variograms and spatial Kriging
    are used to analyse such data.
  • Other application areas include weather maps and
    agricultural field trials.
  • Note such data is not ideally suited to standard
    random effect modelling.

6
Disease mapping
  • One particular type of spatial modelling that is
    often linked with random effect modelling is
    disease mapping.
  • Here cases of a disease (either human or animal)
    are observed over a chosen region e.g. a country.
    We then wish to infer the relative risk of the
    disease for a particular individual at a
    particular location based on the data collected.
  • Both our practicals this afternoon will consider
    disease mapping datasets. The other two types of
    spatial data relate to disease data.

7
Lattice Data
  • Such data is common in many fields, for example
    image analysis where the pixels in an image are
    found on a regular rectangular lattice.
  • More importantly we will consider disease count
    data where counts of a disease are recorded for
    contiguous regions on a map.
  • Although a map is not regular we can construct a
    lattice from a map by identifying neighbouring
    regions and linking neighbouring regions to form
    a lattice.

8
Example
Here we see a map of 5 regions in the left hand
picture, and on the right it has been converted
to a lattice with connections between regions
that share boundaries.
?
9
Point process data
  • This data is also commonly found in disease
    mapping although may be used in many applications
    where cases of an event are seen at particular
    locations.
  • Each item of data consists of the location of an
    event, the response (type of event) and
    potentially predictor variables for the event.
  • Note Rasmus has worked more extensively in this
    area and will be happy to answer questions here. ?

10
Disease point process modelling
  • In disease mapping our data is typically binary
    i.e. people are infected (or die from) a disease
    or are not.
  • The data occur in point process form but there
    are 2 problems with analysing them as a point
    process
  • All our responses are 1 as we only observe the
    infected/dead people!
  • Due to confidentiality and the sensitive nature
    of medical data the data cannot often be released
    as individual records.
  • To counter point 1 we could sample control cases
    at random from the population however point 2
    means that we typically total up cases for fixed
    areas and use a Poisson model on the lattice data
    that this creates.

11
Why might there be spatial effects?
  • This depends on the response variable and
    application area.
  • It is possible that geography is itself a
    predictor for our response or is a surrogate for
    other factors.
  • Many factors can be linked to location e.g.
    weather, deprivation, altitude, pollution, wealth
    which might influence the response.
  • So if our response is influenced by any of these
    factors then accounting for spatial effects many
    improve our model.

12
Nested random effects/ levels of geography
  • The simplest link to random effect models is to
    consider nested random effects.
  • We have considered pupils nested in schools and
    cows nested in herds.
  • In some sense the schools and herds are spatial
    units in that schools generally take children
    from their locality and a herd is based on a
    particular farm. However we could also fit where
    the pupils live as another classification of the
    data which is more spatial.
  • On the next slide we consider a dataset with more
    levels of geography.

13
UK house prices dataset
  • An MMath student of mine (David Goodacre) studied
    a dataset of house prices in the UK. The data
    supplied by the Nationwide building society
    consists of average house prices in areas of the
    UK over a 12 year period (1992-2003). The data is
    for 753 towns in the UK and there are 3 levels of
    geography (towns nested in counties nested in
    regions.)
  • Note that if we had individual house sale
    information then we could have considered point
    process approaches but here we consider random
    effect modelling.

14
A 4-level VC model for the house price dataset
  • The following model was fitted to the data
  • where i indexes year, j indexes town, k indexes
    county and l indexes region. The response, y is
    the log of the average price.
  • This model can be fitted using both frequentist
    and likelihood methods in packages that allow
    four levels in the model.

15
Links with other topics
  • It is worth noting that this house price dataset
    is a repeated measures dataset as you considered
    yesterday.
  • It also contains missing data as in any year in
    which there were less than 50 sales in a postal
    town will lead to a missing observation.
  • However we here assume MAR conditional on the
    model we are fitting.

16
Estimates for house price dataset
  • Below are given IGLS estimates for the model

Parameter Estimate (SE)
ß0 4.036 (0.067)
ß1 -0.020 (0.002)
ß2 0.009 (0.0001)
s2f 0.045 (0.021)
s2v 0.016 (0.004)
s2u 0.045 (0.003)
s2e 0.013 (0.0002)
Here we see that the model consists of parallel
curves with both year and year2 very significant.
The variance is greatest between regions and
between postal towns
17
Region Level Effects
Here we see that the south east of the UK and
London are the most expensive whilst Scotland the
North and Wales are the cheapest.
18
County level effects
After accounting for regions the pattern of
county effects is more sporadic. We can however
pick up 2 regions, Cheshire in the North West and
Surrey in the South East that are more expensive
than their neighbours.
19
Region level predictions
  • Here we see a graph of region level predictions

20
Further Modelling
  • In his project Dave looked at random slopes
    models at the various levels of the model, so
    that we could pick out whether the increase in
    prices was different in different regions.
  • He also looked at fitting models of a more
    spatial nature! See next lecture.

21
Why are spatial effects different?
  • The main difference with spatial effects is that
    we have additional information about each
    (spatial) unit.
  • For example if we observe the average house price
    of a town in Grampian, a town in Surrey and 2
    towns in Berkshire then we know something of the
    spatial relation of these towns.
  • We might expect the prices in the 2 towns in
    Berkshire to be similar and to be more similar to
    Surrey which is also in the South East than
    Grampian that is in Scotland.
  • In our current models we will fit an effect for
    Berkshire which will capture some of the
    relationship between its 2 towns and a South East
    effect that will capture the link with the Surrey
    town.

22
Problems with the nested classification approach
  • As we have seen the nested classification
    approach can capture much of the spatial
    variability however we have to decide on the
    geographic definitions of areas.
  • We generally use easily available definitions
    e.g. county and region but there is no guarantee
    that these are the best classifications.
  • We also have the problem of border effects, for
    example two towns on either side of a region
    border will not share either region or county
    effects but may have very similar prices.
  • We will look at another approach here before
    studying more complex spatial approaches in the
    next lecture.

23
Including location in fixed effects
  • It may be the case that there is a trend e.g. in
    house prices in the UK they generally fall as we
    move North and West. We could therefore add in
    two (fixed effect) predictors giving the N/S and
    E/W co-ordinates of each point.
  • If the unit of observation is an area e.g. postal
    town we would generally use the co-ordinates of
    the centroid of the unit.
  • If a linear relationship is not sensible then we
    could consider polynomial terms in each
    direction. For example (excluding random effects)

24
Distance effects
  • Another possibility in terms of UK house prices
    is to consider the distance from London. This
    distance can be constructed from the co-ordinates
    of each point. The graph to the left gives the
    combined region and county effects and suggests a
    distance from London effect might be appropriate.

25
Distance and direction effects
  • In some scenarios the direction as well as the
    distance from a particular point is important.
  • This is not the case with house prices however in
    pollution data then direction can be very
    important where a dominant wind direction will
    suggest that particular directions away from the
    source will experience more pollution than
    others.
  • We will next look at a dataset from Falkirk in
    Scotland that is analysed in Lawson, Browne
    Vidal Rodeiro (2003)

26
Focused Clustering
  • One research area in public health looks at the
    impact of sources of pollution on the health
    status of communities. The detection of patterns
    of health events associated with pollution
    sources is known as focused clustering. The
    statistical modelling involved usually relates to
    the point process nature of such data.
  • Lawson, Browne Vidal Rodeiro (2003) devote a
    whole chapter to Focused clustering and include
    some fairly complex models that can be considered
    in WinBUGS. Here we will look at some simpler
    models that can be fitted in MLwiN to a dataset
    from Falkirk in Scotland.

27
Respiratory cancer in Falkirk
  • The figure to the right shows the census
    geographies of 26 regions found around a foundry
    (marked by ) in Falkirk, Scotland. It is thought
    conceivable that the foundry was an air pollution
    hazard in the early 1970s prior to the study.
    This could have an impact on the respiratory
    cancer experience of those living in the areas
    close to the foundry

28
Falkirk dataset
  • The data consists of observed and expected counts
    of respiratory cancer cases in the time period
    1978-1983.
  • We first compare the standardized mortality rates
    (SMRs) observed/expected against the locations
    of the centroids of the 26 areas in Falkirk
    (relative to the foundry) to look for patterns.

29
Position of the sites
  • Note in the graphs to the right that the 3
    highest SMRs are close to the source both in the
    N/S and E/W directions.
  • We can convert these locations to distance and
    direction measures.

30
Distance and direction
  • Here we see that there appears to be a negative
    relationship between distance and SMR but no
    obvious pattern with regard to the direction
    relationship.

31
(Extra) Poisson modelling
  • We have modelled the effects of deprivation,
    distance and direction in the following Poisson
    model
  • Note that we have used 1st order MQL in MLwiN and
    allowed extra-Poisson variation. This shows there
    is less variation than a Poisson distribution so
    we will also try fitting SMR as a Normal
    distributed response.

32
Normal response model for SMR
  • Here we see that none of the predictors has a
    significant effect which is probably because the
    dataset is so small.
  • We do see however that the risk reduces as
    distance from the foundry increases and for areas
    with larger deprivation scores. (suggesting
    higher rates in less deprived areas but not
    significantly.)

33
Information for the practical
  • In the practical we will return to using nested
    random effects to account for spatial effects.
  • Our data is from the European community and
    consists of male deaths from malignant melanoma
    in 9 countries in the EU.
  • The practical is a (modified) chapter from Browne
    (2003) and looks at MCMC methods for this
    dataset. It is also analysed using
    quasilikelihood methods in the MLwiN users guide
    and you are welcome to also try these methods.
Write a Comment
User Comments (0)
About PowerShow.com