Stedinger, J.R., C. M. Crainiceanu, D. Ruppert, C.T. Behr, and R. McKay, Statistical Models of Crypt - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Stedinger, J.R., C. M. Crainiceanu, D. Ruppert, C.T. Behr, and R. McKay, Statistical Models of Crypt

Description:

The hierarchical structure viewed as an empirical Bayesian ... Stomacher. Suspended. Solution. Centrifuge. Pellet of. Solids. Fraction (F) of Pellet. Suspended ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 43
Provided by: chr1227
Category:

less

Transcript and Presenter's Notes

Title: Stedinger, J.R., C. M. Crainiceanu, D. Ruppert, C.T. Behr, and R. McKay, Statistical Models of Crypt


1
Stedinger, J.R., C. M. Crainiceanu, D. Ruppert,
C.T. Behr, and R. McKay, Statistical Models of
Cryptosporidium Concentrations in Natural Waters,
seminar presented to the New York City Dept. of
Environmental Protection, Valhalla, NY, March 14,
2002.
  • The report describes statistical methods for the
    analysis of the Cryptosporidium concentrations
    in natural waters, using the ICR data set as an
    example. Zero counts are part of the sampling
    variation of count data and will be modeled as
    zero counts to allow correct inferences
    concerning environmental concentrations. The
    hierarchical structure viewed as an empirical
    Bayesian model allows prediction of the
    distribution of concentrations at different sites
    as a function of the site, season, and water
    matrix. This is what agencies need for risk
    characterization.
  • To obtain these results has required extension of
    available regression models for discrete data
    with random effects and with hierarchical
    structure. Generalized Linear Mixed Model (GLMM)
    with a hierarchical structure that includes
    sites, regions, hydrologic and watershed effects
    has been developed. Markov Chain Monte Carlo
    (MCMC) simulation is employed to compute the
    posterior distributions of the parameters. A very
    powerful and flexible statistical software
    package called WinBugs is used for the Bayesian
    computations.
  • These models were applied separately to stream
    and reservoir source waters. Results indicate
    that streams have a much higher average
    concentration, turbidity is a significant
    indicator of concentrations, and that seasonal
    variations are relevant in some cases. Models of
    variations in recover rates are also considered
    and show a small dependence upon turbidity.

2
Statistical Models of Cryptosporidium
Concentrations in Natural Waters
  • Prof. Jery Stedinger (Civil Env. Engin.)
  • Prof. David Ruppert (ORIE)
  • Ciprian Craniceanu (Statistics)
  • Christopher Behr (Civil Env. Engin.)
  • R. J. MacKay (ORIE)
  • Cornell University

3
Outline
  • Background Cryptosporidium parvum
  • Cryptosporidium Data as Counts
  • Model Formulation and Computation
  • Analyses of Cryptosporidium concentrations
  • Recovery Rate Analysis
  • Conclusions

4
Concern about Cryptosporidium
  • C. parvum causes mild to serious infections
  • Outbreaks largest in Milwaukee (1994)
  • Potentially high endemic levels
  • 2,000-3,000 reported cases/yr between 1995-97
  • Unreported cases estimated between 0.2 and 2 of
    population for industrialized countries
  • ? 0.5 to 5 million cases/year in U.S.

5
ICR Datasets
6
Oocyst Recovery IFA Method
Walker, 1995
Stomacher
Centrifuge
Fiber Filter
Raw Water (V)
Suspended Solution
Separated Liquid
Top Layer
Centrifuge
Fraction (F) of Pellet
Suspended Solution
Pellet of Solids
Oocysts counted DISCRETELY!
Acetate Membrane
Dye added
Slides
7
Issues in Cryptosporidium Data
  • Cryptosporidium testing methods
  • Immunoflourescence assay method (ICR)
  • Immunomagnetic separation (1623)
  • Difficult to detect
  • Low mean recovery rates
  • ICR ? 10 1623 ? 40
  • High variability (s/m)
  • ICR CV ? 100 1623 CV ? 50

8
Issues in Cryptosporidium Data
  • Many zero counts (93 - ICR 85 - ICRSS)
  • Many sites have only zero counts

9
Problems posed by Zeros
  • Data source NJ, Delaware River, 1997,
    LeChevallier et al.
  • Volume analyzed was always 5 liters.
  • Without adjustment for recovery
  • Average non-zero conc. 0.48 oocysts/L
  • Total_counts/total_volume 0.19 oocysts/L

10
NJ Data as Censored LN
  • Detection limit 1/(effective volume)
  • Maximum likelihood estimation (MLE)
  • Recovery rate R 10 (fixed)
  • Lognormal Percentile (oocysts/L)
  • 50th 90th 99th Mean
  • 1.40 4.95 13.9 2.28

11
Data as Poisson Counts
  • E Number of Counts R V C
  • R recovery rate C concentration
  • V effective volume (known)
  • Models
  • Poisson counts R 10 C const
  • Poisson counts R 10 C Gamma
  • Poisson counts R Beta C Gamma

12
Results of Model Fitting, NJ
  • Percentile
  • Distribution 50th 90th 99th Mean CV
  • Lognormal 1.4 5.0 14 2.3 1.3
  • Poisson 1.9 - - 1.9 -
  • Gamma/P 0.7 5.4 15 1.9 1.6
  • G/B(0.6)/P 1.0 5.0 12 1.9 1.3
  • G/B(1.2)/P 1.4 3.5 6 1.8 0.8

13
Model Fits West Virgina
  • Percentile
  • Distribution 50th 90th 99th Mean
    CV
  • Lognormal 0.1 17 930 187
  • Poisson 0.2 - - 0.2
    0
  • Gamma/P 0.05 15 85 5.6
    3.1
  • G/B(0.6)/P 1.0 4.0 8.8 1.6
    1.1
  • G/B(1.2)/P 1.4 2.7 4.1 1.6
    0.5

14
West Virginia Data
15
Outline
  • Background Cryptosporidium parvum
  • Cryptosporidium Data as Counts
  • Model Formulation and Computation
  • Analyses of Cryptosporidium concentrations
  • Recovery Rate Analysis
  • Conclusions

16
National ICR Crypto Data Set
17
Modeling Implications
  • Linear regression is inappropriate
  • Limited information in zero counts
  • Counts not normally distributed
  • Information at most sites insufficient to
    estimate mean concentration for site
  • Need to combine information from different sites
  • Sites can be viewed as residing in regions
  • Together Generalized Linear Mixed Model

Use a Poisson Count Model
Use Hierarchial model w/ regional site effects
18
Bayesian Pathogen Concentration Model
Model Elements Yij pathogen counts Cij
pathogen conc. Vij volume of water Rij
recovery rate Xij predictor matrix random
effects tij time-site effects sj site
effects rk(j) regional effects k(j) region
for j
Hierarchical Model Yij ? PoissonVijCijRij l
og Cij XijT? tij Rij Beta (a, b) where
tij N sj , st2 sj N rk(j), ss2 rk(j) N
m, sr2
19
Bayesian Statistical Approach
  • Frequentist Approach
  • Maximizing likelihood function f(y?) given data
    y yields point estimates of ? (MLEs)
  • Bayesian Approach
  • Provide prior distribution ?(?)
  • Determine posterior distribution p(?y)
  • where p(?y) ? f(y?) ?(?)

20
Bayesian Computation
  • Want posterior distribution of parameters ?, s, ?
  • Basic model has a Poisson residual, recovery rate
    and a random effect for each observation, plus
    site means are linked as are ten regional
    effects.
  • These must be integrated out to compute
    likelihood.
  • With 100 sites and 18 observations per site,
  • such integration is analytically intractable
  • Use Markov Chain Monte Carlo simulation

21
Outline
  • Background Cryptosporidium parvum
  • Cryptosporidium Data as Counts
  • Model Formulation and Computation
  • Analyses of Cryptosporidium concentrations
  • WQ Prediction Health Risk Analysis
  • Recovery Rate Analysis
  • Conclusions

22
Model Covariates
  • Time-site Covariates
  • Log-turbidity, Carbonate hardness,
  • Total organic carbon,
  • Hydrologic Variables (stream sites only)
  • Seasonal Effects
  • Spline Function
  • Temperature Anomaly
  • Site-Specific Covariates
  • Urban land area, sediment export potential, pop.
  • Log-Avg Residence Time (res./lake sites only)

23
Modeling Objectives
  • Water Quality Prediction (WQP)
  • Focus covariates that vary over time place
  • Includes all relevant covariates
  • Model for Health Risk Analysis (HRA)
  • Focus covariates known over time at given place
  • Includes site characteristics and time function

24
HRA for Reservoir/Lake Sites
  • Not Significant Parameter in italics.
  • Other parameters in Full model Temp. anomaly,
    log-population, log-urban land area, soil
    permeability sediment export, log residence time,
    seasonal spline coefficients.

25
WQP for Reservoir/Lake Sites
  • Not Significant Parameter in italics.
  • Other parameters in Full model Temp. anomaly,
    log-population, log-urban land area, soil
    permeability sediment export, log residence time,
    seasonal spline coefficients.

26
WQP for Stream Sites
  • Not Significant Parameter italics.
  • Other parameters Full model Total Organic
    Carbon, Temp. anomaly, log-population, soil
    permeability, sediment export, hydrologic
    variables.

27
Summary of Results
28
Outline
  • Background Cryptosporidium parvum
  • Cryptosporidium Data as Counts
  • Model Formulation
  • Model Computation and Parameterization
  • Analyses of Cryptosporidium concentrations
  • Recovery Rate Analysis
  • Conclusions

29
Recovery rates
  • For given laboratory, recovery rate R equals
    probability that a lab technician observes
    counts a pathogen originally in the sample.
  • Because for Crypto and Giardia recovery rates are
    small and highly variable, ignoring recovery
    rates would underestimate concentrations and
    exaggerate variability

30
EPA ICR-Spiking Study
31
Recovery rate model
  • Nij Gammaa, b
  • Zij Poisson?
  • ????Vij Rij Nij / VTij
  • logRij/(1- Rij) XijT? tij
  • tij N labj , st2
  • labj Nm , s2lab
  • Nij - number of pathogens spiked
  • VTij total vol.
  • Vij effective vol. analyzed
  • Rij recovery rate
  • Zij pathogens counted
  • Xij covariates

32
Posterior means for parameters of interest ICR
spiking data
33
Recovery rates conclusions
  • Recovery rates are small highly variable
  • Turbidity is statistically significant for
    Cryptosporidium but not for Giardia
  • Laboratory effects are appreciable for Giardia
    but not for Cryptosporidium

34
Conclusions
  • Cryptosporidium and Giardia data are discrete
    counts, with many zeros.
  • Recovery rates for Cryptosporidium and Giardia
    are small highly variable.
  • Turbidity is statistically sig for
  • Crypto lab effect sig. for Giardia.
  • A Bayesian analysis of hierarchical
  • Generalized Linear Mixed Model (GLMM) is able to
    evaluate the 350-site ICR data

35

Date Total Giardia Total Crypto cysts/50L oocys
ts/50L 9-Mar-02 0 0 6-Mar-02 0 0 4-Mar-02 5 0 3-Ma
r-02 0 0 2-Mar-02 0 0 1-Mar-02 1 1 28-Feb-02 2 0 2
7-Feb-02 2 0 26-Feb-02 5 0 25-Feb-02 5 1 24-Feb-02
1 1 19-Feb-02 6 2 13-Feb-02 0 0 11-Feb-02 4 1 6-F
eb-02 4 1 4-Feb-02 2 2 28-Jan-02 1 0 22-Jan-02 1
0 14-Jan-02 1 0
Catskill Lower Effluent Chamber (CATLEFF)
36
Thoughts about NYC
  • Long records for 3 key sites
  • Used 2 analysis methods with different recovery
    rate characteristics (ICR, 1623)
  • Interested in
  • quality prediction
  • long-term health risk assessment

37
Thoughts about NYC
  • Bayesian analysis can integrate
  • count data,
  • observed WQ characteristics,
  • recover rate distributions (ICR, 1623),
  • and
  • natural variability and
  • persistence
  • of environmental concentrations of pathogens

38
(No Transcript)
39
References
Crainiceanu, C. M., D. Ruppert, J.R. Stedinger,
and C.T. Behr, Improving MCMC Mixing for a GLMM
Describing Pathogen Concentrations in Water
Supplies, in Case Studies in Bayesian Analysis,
Springer-Verlag Series, New York,
2002. Crainiceanu, C. M., D. Ruppert, and J.R.
Stedinger, Bayesian recovery rates modelling for
waterborne pathogens, technical report, Cornell
University, 2002. Behr, C.T., Modeling
Cryptosporidum Concentrations A Bayesian GLMM
of Regional Discrete Count Data, MS Thesis,
Cornell Univ., August, 2001. Stedinger, J. R.,
and R. J. MacKay, Interpretation of
Cryptosporidium and Giardia Monitoring Data
Generated by ICR Program invited presentation,
EPA Workshop on Statistics, Washington DC, Nov.
19, 1998. We also published Walker, F.R., Jr.,
and Jery R. Stedinger, Fate and Transport model
of Cryptosporidium, J. of Environmental
Engineering, 125(4), 325-333, 1999.
40
Generalized Linear Mixed Model
  • Hierarchical Poisson-lognormal structure
  • Log link function
  • Random effects captured by t, s

41
Gibbs Sampling Method
  • Gibbs Sampling is one type of
  • Markov Chain Monte Carlo method
  • General Idea used to obtain p(?y)
  • Start with initial values of ?
  • Iteratively sample values from p(?y)
  • Many iterations yields p(?y) empirically

42
Gibbs Sampling Algorithm
  • Each iteration i, Gibbs Sampler (GS) generates
  • q1(i) p(q1 q2 (i-1) , ... , qd (i-1) , y)
  • qd(i) p(qd q1 (i) , ... , qd-1 (i-1) , y)
  • Since q1(i) , ... , qd(i) ? p (qy) as i ? ?,
  • After T iterations, the posterior mean
  • qj ? mj (?qj(i)) /T E(mj) qj
  • Obtain p (qy) and marginal p (qiy, qj) for i?j
Write a Comment
User Comments (0)
About PowerShow.com