Lecture 6 Your data and models are never perfect - PowerPoint PPT Presentation

Loading...

PPT – Lecture 6 Your data and models are never perfect PowerPoint presentation | free to download - id: 16a736-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Lecture 6 Your data and models are never perfect

Description:

Making choices in research design and analysis that you can defend. Designing your study: ... Sampling design and scope of inference. Tradeoffs between ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 15
Provided by: canh5
Learn more at: http://www.sortie-nd.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Lecture 6 Your data and models are never perfect


1
Lecture 6Your data and models are never
perfectMaking choices in research design and
analysis that you can defend
2
Designing your studyTradeoffs are everywhere
  • Sampling design and scope of inference
  • Tradeoffs between randomization vs.
    stratification
  • Allocating sampling effort
  • Tradeoffs between sample size and measurement
    error
  • Sample size and model complexity
  • Replication and independence
  • Spatial autocorrelation
  • Collinearity in your data, and parameter
    tradeoffs in your models

3
Classical Sampling Theory
  • Randomization vs. Stratification
  • Randomization unbiased inference about
    populations
  • Stratification parameterization of robust,
    predictive models

4
Allocation of Sampling Effort and Inference
  • Scope of Scientific Inference
  • Is ecology a science of case studies, with no
    formal scope of inference?
  • Strength of evidence for one hypothesis
    (relative to others) at the end of the day, is
    this all you can ever really hope to assess from
    your results?
  • Remember to a likelihoodist, all of the
    information relevant to inference about a
    hypothesis is contained within the data

5
Allocating effort to precision vs.
replicationThe benefits of large sample sizes
  • Signal vs. Noise
  • If you can see the signal, do you care how much
    noise there is?
  • -- understanding can embrace uncertainty,
    but prediction loves precisionWhy do we love a
    high R2 (and why dont statisticians share our
    preoccupation with goodness of fit)?

6
Sample size and model complexity
  • How many parameters in a model can your data
    support? How do you know if your model is
    overspecified?
  • Minimum of observations per parameter
  • Whats your comfort zone? (mine is shrinking
    over time)
  • How many parameters should your model contain?
  • If parsimony is a core principle of science,
    dont we have to accept a certain level of
    uncertainty?
  • -- you can always add more terms to a model
    to increase R2, but at what cost to generality?

7
Independence of Observations vs. Residuals
Definition of independence If two events (A
and B) are independent, then P(A,B)
P(A)P(B) But if you dont know P(A) and P(B),
how do you check whether P(A,B) P(A)P(B)?
Why is independence important?
But what needs to be independent? the errors,
not the observations!
8
The bugaboo of spatial autocorrelation
One of the most misapplied statements in
ecology In such a case, because the value
at any one locality can be at least partly
predicted by the values at neighboring points,
these values are not stochastically independent
from one another. Legendre, P. 1993. Spatial
autocorrelation trouble or new paradigm.
Ecology 74 1659-1673
But does spatial autocorrelation in observed
values necessarily imply lack of independence of
the residuals?
9
What needs to be independent?
The errors, not the observations!
If your observations are spatially autocorrelated
because they share similar values of their
independent variables, this does not necessarily
violate the assumption that the errors are
independent
10
Spatial autocorrelation of seedling density in a
New Zealand temperate rainforest And Spatial
autocorrelation in the residuals of an inverse
model to predict seedling density as a function
of adult tree distribution
11
Consequences of spatial autocorrelation
  • What are the statistical consequences of spatial
    autocorrelation?
  • To a frequentist, the consequences are quite
    serious inflation of degrees of freedom for
    test statistics
  • To a likelihoodist, the issue is simply one of
    identifying any bias in parameter estimationas
    long as there are no demons involved, the bias is
    generally restricted to an underestimate of
    variance terms

12
Dealing with Autocorrelation
  • Frequentists
  • A plethora of gyrations quasi-likelihood,
    variance inflation factors, Mantel tests, and a
    variety of adjustments of degrees of freedom
  • Likelihoodists
  • Recognize that the variance is under-estimated
    and move on
  • Model the spatial autocorrelation in the error
    term explicitly

13
Collinearity in your data and parameter tradeoffs
in your models
  • Collinearity is probably just as common as
    autocorrelation, and just as often misinterpreted
    by reviewers! How much scatter do you need to
    separate the effects of two different independent
    variables?
  • Identifying collinearity is easy, but determining
    whether it is a problem generally depends on
    examining the model-fitting process

14
Covariance and tradeoffs among model parameters
  • Identifying parameter tradeoffs
  • Invert the Hessian to get the parameter
    variance/covariance matrix
  • Examine the likelihood surface
  • Parameter tradeoffs
  • Structural (anytime there are multiplicative
    terms in your model, you should pay attention)
  • Empirical (whenever there is very strong
    collinearity in a set of independent variables
    data, there are likely to be tradeoffs and
    covariance among parameters using those
    variables)
About PowerShow.com