F. Jay Breidt, - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

F. Jay Breidt,

Description:

Definite integral of truncated linear basis (x-?) becomes differenced quadratic ... MLE for MA with roots near unit circle [Rosenblatt, Davis, Breidt, Hsu] ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 37
Provided by: NSU18
Category:

less

Transcript and Presenter's Notes

Title: F. Jay Breidt,


1
Nonparametric Survey Regression Estimation Using
Penalized Splines
  • F. Jay Breidt,
  • Colorado State University
  • Jean D. Opsomer
  • Iowa State University
  • ( more folks acknowledged soon)
  • Research supported by EPA STAR Grants
  • R-82909501 (CSU) and R-82909601 (OSU)

2
The Usual Disclaimer
  • The work reported here was developed under STAR
    Research Assistance Agreements CR-829095 and
    CR-829096 awarded by the U.S. Environmental
    Protection Agency (EPA) to Colorado State
    University and Oregon State University. This
    presentation has not been formally reviewed by
    EPA. The views expressed here are solely those of
    the authors. EPA does not endorse any products or
    commercial services mentioned in this report.

3
Outline
  • Background
  • Scales of inference
  • Specific versus generic
  • Model-assisted and model-based inference
  • Penalized splines
  • Comparison to other smoothers two-stage small
    area
  • Variations network data, increment data
  • Other
  • Non-Gaussian time series
  • Summary
  • Status of STARMAP.2 and DAMARS.5

4
Scales of Inference in Surveys
  • Large area
  • sample itself suffices for inference
  • no model needed
  • Medium area
  • use auxiliary information through a model
  • model helps inference but is not critical
  • Small area
  • sample size is small or zero
  • inference must be based on a model

5
Specific and Generic Inference
  • Specific one study variable, few population
    parameters
  • lots of modeling resources to specify, estimate,
    and diagnose a model
  • willingness to defend the model
  • Generic many study variables, many population
    parameters
  • no resources to model every variable
  • no single model is adequate/defensible

6
Generic Inferences in Aquatic Resources
  • Generic inference is a common problem for
    federal, state, and tribal agencies
  • Example conduct a survey and prepare a report
  • analyze large numbers of chemical, biological,
    and physical variables
  • estimate means, quantiles, and distribution
    functions
  • break down both by political classifications and
    by various ecological classifications

7
Model-Assisted Survey Inference
  • Scarce modeling resources for generic inference,
    so we dont trust models
  • Can we use a model without depending on the
    model?
  • Model-assisted inference
  • efficiency gains if model is right
  • sensible inference even if model is wrong

8
Model-Assisted Estimators
  • Form of model-assisted estimator
  • (model-based prediction)(design bias adjustment)
  • model incorporates auxiliary information
  • bias adjustment corrects for bad models
  • Classical parametric model-assisted
  • prediction from linear regression model
  • Our idea nonparametric model-assisted
  • prediction from kernel regression or other
    smoother (JB JO (2000), Annals of Stat)

9
Why Nonparametric?
  • More flexible model specification
  • smooth mean function, positive variance function
  • Approximately correct more often
  • more opportunities for efficiency gains from
    auxiliary information
  • often, not a large efficiency loss if parametric
    specification is correct

10
Goals of Our Research
  • Focus on generic inference
  • Use flexible nonparametric models to reduce
    misspecification bias
  • model-assisted medium area problem
  • model-based small area problem
  • Make the methods operationally feasible for state
    and tribal agencies
  • linear smoothers generate generic weights

11
Penalized Splines
  • Very useful class of linear smoothers
  • Readily fits into standard linear mixed model
    framework
  • Modular, extensible, computationally convenient
  • Automated smoothing parameter selection and
    fitting with standard software
  • Several ongoing projects
  • Model-assisted p-spline estimation (Gerda
    Claeskens, JO, JB) two-stage extensions (Mark
    Delorey)
  • Small area p-spline estimation (Gerda, Giovanna
    Ranalli, Goran Kauermann, JO, JB)
  • Smoothing on networks (Giovanna, JB)
  • Semiparametric mixed models for
    increment-averaged core data (Nan-Jung Hsu, Steve
    Ogle, JB)

12
Penalized Splines
  • Truncated linear basis allows slope changes at
    each of many knots
  • Penalize for unnecessary slope changes

13
P-Splines Influence of Penalty
  • Fits with increasing penalty parameter

14
Penalized Splines Computation
  • Computation using S-Plus
  • Set up design matrix truncated linear splines
  • Z lt- outer(x, knots, "-")
  • Z lt- Z (Z gt 0)
  • C lt- cbind(one,x,Z)
  • Solve for spline with fixed degrees of freedom
  • D lt- diag(rep(0,2),rep(1,K))
  • mhat lt- X solve(t(C) diag(1/pi) C
    lambda2 D) t(C) diag(1/pi)y
  • For data-determined df/roughness penalty, can use
    lme()to select via REML

15
Model-Assisted P-Spline Estimator
  • Model-based prediction design bias adjustment
  • Asymptotically design-unbiased and design
    consistent
  • Asymptotic variance given by

16
Design of Simulation Study
  • Model-assisted estimators
  • Polynomial regression
  • Poststratification (piecewise constant)
  • Local polynomial regression (kernel)
  • Penalized spline
  • Model-based estimator
  • Penalized spline
  • All use common degrees of freedom 3 or 6
  • Eight response variables on one population
  • Two noise levels
  • N1000
  • Designs SI or STSI
  • 1000 replicate samples of size n50

17
Estimator Comparisons Common Degrees of Freedom
18
MSE Ratio Relative to Model-Assisted Penalized
Splines
19
Further Results from Simulation
  • Variance estimation
  • For all estimators, variance estimator has
    negative bias
  • Weighted residual variance estimator performs
    better
  • Confidence interval coverage
  • Somewhat less than nominal for all estimators
    (90-92)
  • Undercoverage not as severe as bias would suggest
  • Negative weights (2 df)x(2 designs)x(1000
    reps)x(50 weights) 200,000 weights
  • 902 negative REG weights
  • 145 negative LLR weights
  • 2 negative MA weights

20
Two-Stage P-Spline Estimation
  • Available auxiliary information in two-stage
    sampling
  • All clusters
  • All elements
  • All elements in sampled clusters
  • Mark Delorey (poster) focus on first case
  • Simulation study comparing Horvitz-Thompson,
    regression, model-based p-spline, model-assisted
    p-spline with and without cluster random effects
  • Operational issues with df, cluster variance
    component
  • Some results p-spline is good!

21
Semiparametric Small Area Estimation
  • Gerda, Giovanna, Goran Kauermann, JO, JB
  • Example ANC level for Northeastern lakes
  • 557 observations over 113 HUCs
  • Average sample size/HUC 4.9
  • 64 HUCs contain less than 5 observations
  • Site-specific covariates lake location and
    elevation
  • Simple way to capture spatial effects?

22
Semiparametric Small Area Model
  • Replace linear function of covariates by more
    general model
  • direct estimator truth sampling error
  • truth semiparametric regression area-specific
    deviation
  • Semiparametric regression expressed as linear
    mixed model
  • Thin plate splines
  • Low-rank radial basis functions

23
Small Area Estimation Results
  • EBLUP for this model easily handled with
    standard software (SAS proc mixed, SPlus lme())

24
P-Splines for Increment Data
  • Common for soil, sediment core data
  • Datum represents not a single depth point but a
    depth increment (e.g., cylinder of soil 2.5cm in
    diameter x 15cm high, collected at 20-35 cm)
  • Ignoring increment structure leads to biased,
    inconsistent estimators
  • Integrate linear mixed model representation
  • Definite integral of truncated linear basis
    (x-?) becomes differenced quadratic basis
  • (top-?) 2 - (bottom-?) 2
  • Immediate extension to small area estimation
  • E.g., soil mapping by map unit symbol

25
Carbon Sequestration
  • (Nan-Jung Hsu, Steve Ogle, JB) Broad class of
    semiparametric mixed models for
    increment-averaged data

26
Smoothing on Networks
  • Current research with post-doc, Giovanna Ranalli
  • have noisy data on stream network
  • have within-network distance measure (rather
    than as the crow flies)
  • want interpolations at unsampled locations in
    network
  • Semiparametric methodology readily extends to
    this setting
  • low-rank radial basis functions
  • Possible real data from EPA (John Faustini)

27
Smoothing on Stream Networks
  • Toy stream network
  • Two first-order, one second-order stream segment
  • Regression function is exponential along
    straight reach (two segments), constant along
    remaining segment, continuous at intersection
  • n150 noisy observations obtained along network

28
Toy Network Results
  • Noisy observations smoothed via
  • Low-rank thin plate spline (2D, ignoring network
    structure)
  • Within-network radial basis functions (1D,
    accounts for network structure)
  • Network smooth offers 25-30 reduction in MISE
    over spatial smooth

29
Non-Gaussian Time Series
  • Potential models for one-dimensional spatial
    processes

30
Identification and Estimation
  • In Gaussian case, models of differing
    causality/invertibility cannot be identified
  • Identification in non-Gaussian case
  • Fit causal/invertible ARMA via Gaussian quasi-MLE
  • Examine residuals for IID-ness
  • If not IID, fit All-Pass model (LAD Breidt,
    Davis, Trindade, Ann. Stat. (2001), MLE, rank
    estimation) to determine order of non-causality
    or non-invertibility
  • Prediction and Estimation in non-Gaussian case
  • Best MS prediction requires trickery
  • Exact MLE, Bayes for non-Gaussian MA
  • Exact and conditional MLE for MA with roots near
    unit circle Rosenblatt, Davis, Breidt, Hsu

31
Asymptotic Results for All-Pass
32
Where Are We Now?
  • DAMARS.5 Nonparametric model-assisted
  • 1. Extensions
  • 1.1 continuous spatial domains (Siobhan poster
    Giovanna, work in progress)
  • 1.2 multiple phases (Kim (PhD 2004, ISU), working
    paper)
  • 1.3 multiple auxiliary variables (gam Gretchen,
    Goran, JO, JB, JASA 2nd submission)
  • 1.3-1.4 alternative smoothing (Gerda, JO, JB,
    p-splines Biometrika 2nd submission Ranalli and
    Montanari, neural nets, JASA 2nd submission)
  • Other two-stage kernels (Kim, JO, JB JRSS
    submission) two-stage splines (Mark, JB, poster)
  • 2. Applications
  • 2.1 CDF estimation (Alicia, JO, JB poster, CJS
    submission)
  • 2.2 Medium area (Siobhan, JO, JB poster)
  • 2.3 Surveys over time (Jehad Al-Jararha, JO, JB,
    spam with partial overlap)
  • 2.4 Nonresponse (da Silva and Opsomer, Survey
    Methodology 2004)

33
Where Are We Now?
  • STARMAP.2 Local Inferences
  • 1. Small area
  • 1.1-1.4 Nonparametric model-assisted for spatial
    (Siobhan, poster Giovanna, work in progress)
    Semiparametric (Gerda, Giovanna, Goran, JO, JB,
    working paper) Increments (Nan-Jung, Steve, JB,
    working paper)
  • 1.1 MLE for all-pass (Beth, RD, JB, JMVA
    submission) rank for all-pass (Beth, RD, JB,
    working paper) Prediction for MA (Breidt and
    Hsu, Stat Sinica 2004) Exact MLE for MA
    (Nan-Jung, RD, JB)
  • Spatial trend detection (Hsin-Cheng Huang)
  • Design aspects (Bill, JB, poster)
  • 2. Deconvolution
  • Formulated as another small area estimation
    problem using constrained Bayes methods (Mark,
    JB, poster)
  • Methodology seems OK example (88 HUCs in MAHA)
    still being tweaked work in progress
  • 3. Causal inference
  • 3.1-3.3 (Alix G)

34
Some Summaries (these projects only)
  • Some Invited Talks and Seminars
  • Winemiller Symposium (Columbia, MO)
  • Computational Environmetrics (Chicago, IL)
  • Monitoring Symposium (Denver, CO)
  • ICSA (Singapore)
  • EMAP 2004 (Newport, RI)
  • ENAR (Pittsburgh PA)
  • IWAP (Piraeus, Greece)
  • IMS-ASA (Calcutta, India)
  • Western Ecology Division, EPA (Corvallis, OR)
  • University of Maryland (Baltimore County, MD)
  • Jeans talks

35
More Summaries (these projects only)
  • People
  • Students Ji-Yeon Kim, ISU PhD completed Spring
    2004 (JO and JB) Bill Coar, Mark Delorey, Jehad
    Al-Jararha, CSU PhD work in progress ISU
    student?
  • Post-Doctoral Research Associate Giovanna
    Ranalli
  • Visiting Research Scientists Nan-Jung Hsu and
    Hsin-Cheng Huang
  • Unsuspecting Collaborators Gerda Claeskens and
    Goran Kauermann
  • Papers
  • 2 appeared, 2 tentatively accepted, 1 invited
    revision, 4 submitted, n working papers

36
Optimal Sampling Design under Frame Imperfections
  • Motivated by problems with RF3 perennial
    classification
  • About 20 errors of omission and of commission!
  • Previous work logistic regression for
    probability of perennial as function of
    covariates (Bill Coar)
  • Compare optimal biased and unbiased designs using
    anticipated MSE criterion
  • Account for differential costs (in frame, not in
    frame perennial, non-perennial)
  • Minimize AMSE for fixed cost
  • Further work
  • Asymptotic results for cases of negligible,
    non-negligible bias
  • Empirical results
Write a Comment
User Comments (0)
About PowerShow.com