ESM 206A Data Analysis for Environmental Science - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

ESM 206A Data Analysis for Environmental Science

Description:

Hunter Lenihan and Nick Parker. Course Objectives. Learn how to use quantitative data analysis to: ... Matt Kay Bren PhD student (Kay_at_lifesci.ucsb.edu) www. ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 36
Provided by: brucek64
Category:

less

Transcript and Presenter's Notes

Title: ESM 206A Data Analysis for Environmental Science


1
ESM 206AData Analysis for Environmental Science
Management
  • Winter 2009
  • Hunter Lenihan and Nick Parker

2
Course Objectives
  • Learn how to use quantitative data analysis to
  • Make decisions regarding compliance with
    environmental standards or performance of new
    management strategies
  • Assess the impact of past management actions or
    development projects
  • Make predictions about the likely outcome of
    proposed policy or management actions
  • In ESM 206A, learn to use the following tools
  • hypothesis formulation and testing,
  • t-tests, Analysis of Variance (ANOVA)
  • Ordinary Least Squares regression
  • Single variable
  • Multiple variable
  • Regression with discrete dependent variables
    (time permitting)
  • Probit and Logit

3
Instructors
To make an appointment, find an open time on my
Corporate Time schedule, add a meeting, and send
me an email so Im aware of it. Note that if you
schedule something for the immediate future, I
may not find out about it in time.
4
Class format
  • Lectures meet twice per week
  • Winter quarter
  • T-TR 200-315
  • Feb. 10,12,17,19, 24, 26
  • Mar. 3, 6, 10, 12
  • Labs meet once per week, in the GIS lab.
  • There are 3 sections in Winter quarter
  • - W 200-250, W 430-520, TR 830-920
  • - Weeks 7-10 (No lab the first week!)

5
Class format
  • Labs
  • These provide you the opportunity to learn the
    nuts and bolts of running the analyses
  • Will work on problem sets in lab
  • The lab sections are usually quite full if you
    need to switch sections on a continuing basis,
    please find someone to swap with.

6
Micro-exam
  • A relatively short assignment that you will turn
    in for a grade.
  • Typically one extended problem that involves both
    conceptual and technical aspects
  • Treat as a take-home exam once you open it, no
    help from peers or instructors (but can use
    notes, books, online resources)
  • One micro-exam this quarter due 4 PM, Friday,
    12 March

7
Text
  • Statistical Methods in Water Resources by Helsel
    and Hirsch
  • PDF available on course webpage
  • Individual chapters (smaller files) at
    http//pubs.usgs.gov/twri/twri4a3/html/pdf_new.htm
    l
  • Other readings/links will be posted as appropriate

8
Computing
  • Excel can be used for simple analysis, using the
    Analysis ToolPak
  • We will mostly use JMP
  • JMP is comprehensive, reliable, but expensive
    (but UCSB pays!)
  • JMP provides a somewhat user-friendly interface
    to it
  • Based on SAS, worlds most more powerful stat
    program/system
  • The course webpage is at http//www.bren.ucsb.edu/
    academics/course.asp?number206A
  • We will use this page for the whole year

9
Definition of statistics
  • Mathematical science pertaining to the
    collection, analysis, interpretation or
    explanation, and presentation of data
  • Provides tools for prediction and forecasting
    based on data
  • Applicable to all fields of science

10
Definition of statistics
  • Descriptive statistics methods used to
    summarize or describe a collection of data
  • Inferential statistics patterns in data are
    modeled in a way that accounts for randomness and
    uncertainty in the observations. Then used to
    draw inferences about the process or population
    being studied.

11
Definition of statistics
  • Descriptive and inferential statistics applied
    statistics
  • Three basic types
  • Monte Carlo analysis minimal assumptions about
    data Uses randomizations of observed data as
    basis for inference
  • Parametric analysis Assumes data were sampled
    from an underlying distribution of known form
    (normal) Estimates the parameters of the
    distribution from the data Estimates
    probabilities from observed frequencies of events
    and uses probablities as a basis for inference
    (frequentist inference)
  • Bayesian analysis Also assumes the data were
    sampled from an underlying distribution of known
    form. Estimates parameters not only from data but
    also from prior knowledge, and assigns
    probablities to these parameters

12
Definition of statistics
  • Mathematical statistics concerned with the
    theoretical basis of the subject

13
Applied statistics
  • Common goal investigate causality
  • Draw conclusions on the effect of changes in the
    values of predictors (independent variables) on
    dependent variables (response)
  • Y a ßX
  • There are two major types of investigations
    (studies) experimental and observational
  • Difference between the two types how the study
    is conducted. Each can be very effective.

14
Tools to learn
Experimental studies
Observational studies
  • Hypothesis formulation
  • and testing
  • t-tests
  • Analysis of Variance (ANOVA)
  • Ordinary Least Squares regression
  • Multiple regression
  • Probit and Logit regression

15
Some notation and formulas
  • For a random variable called x, the sample
    statistics are
  • Mean
  • Variance
  • Standard deviation
  • The population statistics are called
  • Mean
  • Variance
  • Standard deviation

16
Data for examples in lectures
17
Data from traps used by lobster fishery
18
www.calobtser.org
Matt Kay Bren PhD student (Kay_at_lifesci.ucsb.edu)
19
Fishery
20
(No Transcript)
21
Hypothesis formulation and testing
22
Karl Raimund Popper (1902-1994)
  • Most influential philosophers of science of the
    20th century
  • Professor at the London School of Economics
  • Repudiated classical observationalist /
    inductivist
  • approach to science as the only method
  • Advanced empirical falsification
  • Vigorous defense of liberal democracy
    principles of social
  • criticism

23
The Scientific Method - from a Popperian
perspective
  • 1. Conception - Inductive reasoning
  • a. Observations
  • b. Theory
  • c. Problem
  • d. Belief
  • 2. Leads to Insight and a General
  • Hypothesis
  • 3. Assessment is done by
  • a. Formulating Specific hypotheses
  • b. Comparison with new
    observations
  • 4. Which leads to
  • a. Falsification - and rejection of
    insight, and specific and general
    hypotheses, or
  • b. Confirmation - and retesting of
    alternative hypotheses

Conception
- Inductive reasoning
Perceived Problem
Previous Observations
Belief
INSIGHT
Existing Theory
General hypothesis
Confirmation
Falsification
Assessment
- Deductive reasoning
24
Absolute vs. measured differences
Example - Specific hypothesis number of Oak
seedlings is higher in areas outside oil
polluted (impacted) sites than inside polluted
sites  
What counts as a difference? Are these different?
25
Statistical analysis - cause, probability, and
effect
II) Statistical Methodology General A)
Null hypotheses Ho 1) In most
sciences, we are faced with a) NOT whether
something is true or false (Popperian
decision) b) BUT rather the degree to which an
effect exists (if at all) - a statistical
decision. B) Therefore 2 competing
statistical hypotheses are posed a) HA there
is a difference in effect between (usually
posed as ) b) HO there is no difference
in effect between
26
Statistical analysis - cause, probability, and
effect
27
Statistical analysis - cause, probability, and
effect
The logic of statistical tests how they are
performed
  • Assume the Null hypothesis (Ho) is true (e.g., no
    difference in number of
  • oak seedlings in impact and non-impact
    sites).



2) Compare measurements - generally this means
comparing two sample distributions
(determined from the experiment or survey)
  • Comparison of distribution generally by
    comparing means and the
  • estimate of error associated with the
    sampling of the means. Simplest
  • case is the Standard Error of the Mean (SE
    or SEM) Sx



STDEV (sx) / n.5 standard deviation / square
root of level of replication
3) Determine the probability that distributions
are similar/different




4) Compare with a critical p-value to assign
significance



What is a distribution (around a mean)?
28
Calculation of statistical distribution
Distribution of Oak Seedlings - pre-impact or
non-polluted site
Sites 100
Mean per site 25
Total seedlings 2500
29
Evaluate effect of sample size on calculation of
and confidence in Mean
Compare for sample size's of 5,10, 20, 50, 99
cells
Iterate 50 times
Example sample 10 sites to
determine mean
x fifty
iterations
30
Means 21.5 22.3 23.0 23.9 24.9 25.1 25.8 26.5 27.8
29.9 etc
True Mean 25
31
Effect of number of observations on estimate of
Mean
32
Statistical comparison of two sample distributions
Ho
X

X
1
2
HA
X
?
X
1
2
X
X
2
1
33
How to estimate optimal sample size
1) Do a preliminary study of variables that will
be evaluated in project 2) Plot the mean and
some estimate of variability of data as a
function of sample size 3) Look for sweet spot
where estimates of mean and variance (or standard
deviation) converge on a stable value 4)
Calculate a bang for buck relationship to
determine if a robust design (sufficient
sampling) can be paid for
34
Trade off between accuracy and cost
Cost

Accuracy
Sample Size e.g. number of replicate sites
35
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com