# Recent Advances in Statistical Ecology using Computationally Intensive Methods - PowerPoint PPT Presentation

View by Category
Title:

## Recent Advances in Statistical Ecology using Computationally Intensive Methods

Description:

### Recent Advances in Statistical Ecology using Computationally Intensive Methods Ruth King Overview Introduction to wildlife populations and identification of questions ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 39
Provided by: ruthk155
Category:
Tags:
Transcript and Presenter's Notes

Title: Recent Advances in Statistical Ecology using Computationally Intensive Methods

1
Recent Advances in Statistical Ecology using
Computationally Intensive Methods
• Ruth King

2
Overview
• Introduction to wildlife populations and
identification of questions of interest.
• Motivating example.
• Issues to be addressed
• Missing data
• Model discrimination.
• Summary.
• Future research.

3
Wildlife Populations
• In recent years there has been increasing
interest in wildlife populations.
• Often we may be interested in population changes
over time, e.g. if there is a declining
population. (Steve Buckland)
• Alternatively, we may be interested in the
underlying dynamics of the system, in order to
obtain a better understanding of the population.
• We shall concentrate on this latter problem, with
particular focus on identifying factors that
affect demographic rates.

4
Data Collection
• Data are often collected via some form of
capture-recapture study.
• Observers go out into the field and record all
animals that are seen at a series of capture
events.
• Animals may be recorded via simply resightings or
recaptures (of live animals) and recoveries (of
• At each capture event all unmarked animals are
uniquely marked all observed animals are
recorded and subsequently released back into the
population.

5
Data
• Each animal is uniquely identifiable so our data
consist of the capture histories for each
individual observed in the study.
• A typical capture history may look like
• 0 1 1 0 0 1 2
• 0/1 corresponds to the individual being
unobserved/observed at that capture time and
• 2 denotes an individual is recovered dead.
• We can then explicitly write down the
corresponding likelihood as a function of
survival (?), recapture (p) and recovery (?)
rates.

6
Likelihood
• The likelihood is the product over all
individuals of the probability of their
corresponding capture history, conditional on
their initial capture.
• For example, for an individual with history
• 0 1 1 0 0 1 2
• the contribution to the likelihood is
• ?2 p3 ?3 (1-p4) ?4 (1-p5) ?5 p6 (1-?6) ?7.
• Then, we can use this likelihood to estimate the
parameter values (either MLEs or posterior
distributions).

7
Covariates
• Covariates are often used to explain temporal
heterogeneity within the parameters.
• Typically these are environmental factors, such
as resource availability, weather conditions or
human intervention.
• Alternatively, heterogeneity in a population can
often be explained via different (individual)
covariates.
• For example, sex, condition or breeding status.
• We shall consider the survival rates to be
possibly dependent on the different covariates.

8
Case Study Soay sheep
• We consider mark-recapture-recovery (MRR) data
relating to Soay sheep on the island of Hirta.
• The sheep are free from human activity and
external competition/predation.
• Thus, this population is ideal for investigating
the impact of different environmental and/or
individual factors on the sheep.
• We consider annual data from 1986-2000, collected
on female sheep (1079 individuals).

This is joint work with Steve Brooks and Tim
Coulson
9
Covariate Information
• Individual covariates
• Coat type (1dark, 2light)
• Horn type (1polled, 2scurred, 3classical)
• Birth weight (real normalised)
• Age (in years)
• Weight (real - normalised)
• Number of lambs born to the sheep in the spring
prior to summer census (0, 1, 2)
• And in the spring following the census (0, 1, 2).
• Environmental covariates
• NAO index population size March rainfall
Autumn rainfall and March temperature.

10
Survival Rates - ?
• Let the set of environmental covariate values at
time t be denoted by xt.
• The survival rate for animal i, of age a, at time
t, is given by,
• logit ?i,a,t ?a ?aT xt ?aT yi ?aT zi,t
?a,t
• Here, yi denotes the set of time-independent
covariates and zi,t the time varying covariates
• ?a,t N(0,?a2) denotes a random effect.
• There arise two issues here missing covariate
values and model choice.

11
Issue 1 Missing Data
• The capture histories and time-invariant
covariate values data are presented in the form
• Note that there are also many missing values for
the time-dependent weight covariate.

12
Problem
• Given the set of covariate values, the
corresponding survival rate can be obtained, and
hence the likelihood calculated.
• However, a complexity arises when there are
unknown (i.e. missing) covariate values, removing
the simple and explicit expression for the
likelihood.

13
Missing Data Classical Approaches
• Typical classical approaches to missing data
problems include
• Ignoring individuals with missing covariate
values
• EM algorithm (can be difficult to implement and
computationally expensive)
• Imputation of missing values using some
underlying model (e.g. Gompertz curve).
• Conditional approach for time-varying covariates
(Catchpole, Morgan and Tavecchia, 2006 in
submission)
• We consider a Bayesian approach, where we assume
an underlying model for the missing data which
allows us to account for the corresponding
uncertainty of the missing values.

14
Bayesian Approach
• Suppose that we wish to make inference on the
parameters ?, and the data observed corresponds
to capture histories, c, and covariate values,
vobs.
• Then, Bayes theorem states
• ?(?c, vobs) Ç L(c, vobs ?) p(?)
• The posterior distribution is very complex and so
we use Markov chain Monte Carlo (MCMC) to obtain
estimates of the posterior statistics of
interest.
• However, in our case the likelihood is
analytically intractable, due to the missing
covariate values.

15
Auxiliary Variables
• We treat the missing covariate values (vmis) as
parameters or auxiliary variables (AVs).
• We then form the joint posterior distribution
over the parameters, ?, and AVs, given the
capture histories c and observed covariate values
yobs
• ?(?, vmis c, vobs) Ç L(c ?, vmis , vobs)
• f(vmis, vobs ?) p(?)
• We can now sample from the joint posterior
distribution ?(?, vmis c, vobs).
• We can integrate out over the missing covariate
values, vmis, within the MCMC algorithm to obtain
a sample from ?(? c, vobs).

16
f(vmis, vobs ?) Categorical Data
• For categorical data (coat type and horn type),
we assume the following model.
• Let y1,i denote the horn type of individual i.
Then, y1,i 2 1,2,3, and we assume that,
• y1,i Multinomial(1,q),
• where q q1, q2, q3.
• Thus, we have additional parameters, q, which can
be regarded as the underlying probability of each
horn type.
• The qs are updated within the MCMC algorithm, as
well as the y1,is which are unknown.
• We assume the analogous model for coat type.

17
f(vmis, vobs ?) Continuous Data
• Let y3,i denote the birth weight of individual i.
• Then, a possible model is to assume that,
• y3,i N(?, ?2),
• where ? and ?2 are to be estimated.
• For the weight of individual i, aged a at time t,
denoted by z1,i,t we set,
• z1,i,t N(z1,i,t-1 ?a ?t, ?w2),
• where the parameters ?a, ?t and ?w2 are to be
estimated.
• In general, the modelling assumptions will depend
on the system under study.

18
Practical Implications
• Within each step of the MCMC algorithm, we not
only need to update the parameter values ?, but
also impute the missing covariate values and
random effects (if present).
• This can be computationally expensive for large
amounts of missing data.
• The posterior results may depend on the
underlying model for the covariates a
sensitivity analysis can be performed using
different underlying models.
• (State-space modelling can also be implemented
using similar ideas see Steves talk).

19
Issue 2 Model Selection
• For the sheep data set we can now deal with the
issue of missing covariate values.
• But.. how do we decide which covariates to use
and/or the age structure? often there may be a
large number of possible covariates and/or age
structures.
• Discriminating between different models can often
be of particular interest, since they represent
competing biological hypotheses.
• Model choice can also be important for future
predictions of the system.

20
Possible Models
• We only want to consider biologically plausible
models.
• We have uncertainty on the age structure of the
survival rates, and consider models of the form
• ?1a ?a1b ?j
• k is unknown a priori and also the covariate
dependence for each age group.
• We consider similar age-type models for p and ?
with possible arbitrary time dependence.
• E.g. ?1(N,BW), ?27(W,L),?8(N,P)/p(t)/?1,?2(t)
• The number of possible models is immense!!

21
Bayesian Approach
• We treat the model itself to be an unknown
parameter to be estimated.
• Then, applying Bayes Theorem we obtain the
posterior distribution over both parameter and
model space
• ?(?m, m data) Ç L(data ?m, m) p(?m) p(m).
• Here ?m denotes the parameters in model m.

22
Reversible Jump MCMC
• The reversible jump (RJ)MCMC algorithm allows us
to construct a Markov chain with stationary
distribution equal to the posterior distribution.
• It is simply an extension of the
Metropolis-Hastings algorithm that allows moves
between different dimensions.
• This algorithm is needed because the number of
parameters, ?m, in model m, may differ between
models.
• Note that this algorithm needs only one Markov
chain, regardless of the number of models.

23
Posterior Model Probabilities
• Posterior model probabilities are able to
quantitatively compare different models.
• The posterior probability of model m is defined
as,
• ?(m data) s ?(?m,m data) d?m.
• These are simply estimated within the RJMCMC
algorithm as the proportion of the time the chain
is in the given model.
• We are also able to obtain model-averaged
estimates of parameters, which takes into account
both parameter and model uncertainty.

24
• The RJMCMC algorithm is the most widely used
algorithm to explore and summarise a posterior
distribution defined jointly over parameter and
model space.
• The posterior model probabilities can be
sensitive to the priors specified on the
parameters (p(?)).
• The acceptance probabilities for reversible jump
moves are typically lower than MH updates.
• Longer simulations are generally needed to
explore the posterior distribution.
• Only a single Markov chain is necessary,
irrespective of the number of possible models!!

25
Problem 1
• Constructing efficient RJ moves can be difficult.
• This is particularly the case when updating the
number of age groups for the survival rates.
• This step involves
• Proposing a new age structure (typically
• Proposing a covariate dependence for the new age
structure
• Proposing new parameter values for this new
model.
• It can be very difficult to construct the Markov
chain with (reasonably) high acceptance rates.

26
Example Reversibility Problem
• One obvious (and tried!) method for adding a
change-point would be as follows.
• Suppose that we propose to split age group ac.
• We propose new age groups ab and b1c.
• Consider a small perturbation for each (non-zero)
regression parameter, e.g.,
• ?ab ?ac ? and ?b1c ?ac - ?.
• where ? N(0,??2).
• However, to satisfy the reversibility
constraints, a change-point can only be removed
when the covariate dependence structure is the
same for two consecutive age groups

27
Example High Posterior Mass
• An alternative proposal would be to set
• ?ab ?ac (i.e. keep all parameters same for
ab).
• Propose a model (in terms of covariates) for
?b1c
• Choose each possible model with equal probability
(reverse move always possible)
• Choose model that differs by at most one
individual and one environmental covariate.
• The problem now lies in proposing sensible
parameter values for the new model.
• One approach is to use posterior estimates of the
parameters from a saturated model (full
covariate dependence for some age structure) as
the mean of the proposal distribtion.

28
Problem 2
• Consider the missing covariates vmis.
• Then, if the covariate is present, we have,
• ?(vmis vobs, ?, data) / L(data ?, vobs,
vmis)
• f(vmis, vobs ?).
• However, if the covariate is not present in the
model, we have,
• ?(vmis vobs, ?, data) / f(vmis, vobs ?).
• Thus, adding (or removing) the covariate values
may be difficult, due to (potentially) fairly
different posterior conditional distributions.
• One way to avoid this is to simultaneously update
the missing covariate values in the model move.

29
Soay Sheep Analysis
• We now use these techniques for analysing the
(complex) Soay sheep data.
• Discussion with experts identified several points
of particular interest
• the age-dependence of the parameters
• identification of the covariates influencing the
survival rates
• whether random effects are present.
• We focus on these issues on the results presented.

30
Prior Specification
• We place vague priors on the parameters present
in each model.
• Priors also need to be specified on the models.
• Placing an equal prior on each model places a
high prior mass on models with a large number of
age groups, since the number of models increases
with the number of age groups.
• Thus, we specify an equal prior probability on
each marginal age structure and a flat prior over
the covariate dependence given the age structure
or on time-dependence.

31
Results Survival Rates
• The marginal models for the age groups with
largest posterior support are
• Note that with probability 1, lambs have a
distinct survival rate.
• Often the models with most posterior support are
close neighbours of each other.

Age-structure Posterior probability
?1 ?27 ?8 0.943
?1 ?27 ?89 ?10 0.052
32
Covariate Dependence
BF 3
BF 20
BF Bayes factor
33
Influence of Covariates
Weight
Population size
Age 1
Age 10
34
Marginalisations
• Presenting only marginal results can hide some of
the more intricate details.
• This is most clearly seen from another MRR data
analysis relating to the UK lapwing population.
• There are two covariates time and fdays a
measure of the harshness of the winter.
• Without going into too many details, we had MRR
data and survey data (estimates of total
population size).
• An integrated analysis was performed, jointly
analysing both data sets.

35
Marginal Models
• The marginal models with most posterior support
for the demographic parameters are
• (a) ?1 1st year survival (b) ?a adult
survival
• Model Post. prob. Model Post. prob.
• ?1(fdays) 0.636 ?a(fdays,t) 0.522
• ?1 0.331 ?a(fdays) 0.407
• (c) ? productivity (d) l recovery rate
• Model Post. prob. Model Post. prob.
• r 0.497 l(t) 0.730
• r(t) 0.393 l(fdays,t) 0.270

This is joint work with Steve Brooks, Chiara
Mazzetta and Steve Freeman
36
Warning!
• The previous (marginal) posterior estimates hides
some of the intricate details.
• The marginal models of the adult survival rate
and productivity rate are highly correlated.
• In particular, the joint posterior probabilities
for these parameters are
• Model Post. prob.
• ?a(fdays,t), ? 0.45
• ?a(fdays), ?(t) 0.36
• Thus, there is strong evidence that either adult
survival rate OR productivity rate is time
dependent.

37
Summary
• Bayesian techniques are widely used, and allow
complex data analyses.
• Covariates can be used to explain both temporal
and individual heterogeneity.
• However, missing values can add another layer of
complexity and the need to make additional
assumptions.
• The RJMCMC algorithm can explore the possible
models and discriminate between competing
biological hypotheses.
• The analysis of the Soay sheep has stimulated new
discussion with biologists, in terms of the
factors that affect their survival rates.

38
Further Research
• The development of efficient and generic RJMCMC
algorithms.
• Assessing the posterior sensitivity on the model
specification for the covariates.
• Investigation of the missing at random assumption
for the covariates in both classical and Bayesian
frameworks.
• Prediction in the presence of time-varying
covariate information.