1 / 44

Introduction into the Bootstrap

- Marieke Timmerman

Baron von MĂĽnchhausen

What is bootstrapping?

- Resampling based method to obtain inferential

information on parameter estimates - Resampling based methods
- bootstrap (inferential info on parameters)
- jackknife
- permutation test (significance testing)
- cross-validation (validity of full model)

Standard Inference based an analytically derived

results

- Derivation of sampling distribution of using

assumptions about Population Distribution

Function - and estimate standard error/confidence interval

When Standard inference fails

- Distributional assumptions violated
- Derivation of sampling distribution impossible or

too complex

Alternative Bootstrap

Repeat many times

Core idea of the bootstrap

- The empirical distribution function (EDF) becomes

equal to the population distribution function

(PDF) as n?8 - For nlt8, assume that the EDF is representative of

the PDF

Repeat many times

Estimate standarderror or Confidence interval

Example CI for population mean Âµ

Repeat many times

Bootstrap Sampling Distribution of ?

Used for estimating the Standard Error (SE) and

Confidence Interval (CI)

(No Transcript)

- ExampleyiĂź0 Ăź1xi ei
- Which ??
- Sample(s) drawn from which population(s)?
- How to define the EDF?
- Is s(x) is non-unique?
- How to estimate CIs from distribution of s(x)?

3. How to define the EDF?

yiĂź0 Ăź1xi ei, i1,,n

- Resampling draw n times with replacement
- non-parametric resample i from i1,,n, EDF is

xi,yi, i1,,n - semi-parametric

- parametric

5. How to estimate CIs from distribution of

s(x)?

- CIs based on bootstrap tables
- CIs based on percentiles

- Based on bootstrap tables

- Wald ( )
- Students t-interval
- Bootstrap t-interval with

If no simple se formula is available use of

Double bootstrap (pff)

- Based on percentiles

- Percentile method
- Bias corrected percentile method
- Bias corrected and accelerated (BCa)

,

Which bootstrap CI estimate?

- Percentile methods are (and bootstrap table

methods are not) - range preserving
- transformation respecting
- BCa usually better than ordinary percentile

method - What means better??

Quality of CI? ? Coverage

?

- central 1-2a CI CIleftCIright
- P(?ltCIleft) a P(?gtCIright) a with ?

population parameter

(No Transcript)

Whats next

- Principal Component Analysis
- 4. Is s(x) is non-unique? How to make s(x)

comparable? - Multilevel Component Analysis
- 2. Sample(s) drawn from which population(s)?

Principal Component Analysis

X (I?J) observed scores of I subjects on J

variables Z standardized scores of X F

(I?Q) Principal component scores A (I?Q)

Principal loadings Q Number of selected

principal components T (Q?Q) Rotation matrix

1. Which ??

- Loadings
- 1. Principal loadings (AQ)
- 2. Rotated loadings (AQT)
- a. Procrustes rotation towards external

structure - b. use one, fixed criterion (e.g., Varimax)
- c. search for the optimal simple solution
- Oblique case correlations between components

2. Sample(s) drawn from which Population(s)?

- observed scores of I subjects on J variables

3. How to define the EDF

- non-parametric Xb rowwise resampling of Z

4. Is s(x) non-unique? How to make s(x)

comparable?

- Loadings
- 1. Principal loadings (AQ) non unique
- Sign of Principal loadings (AQ) is arbitrary
- reflect columns of AQ to the same direction

- 1. Principal loadings (AQ) non-unique
- Sign of Principal loadings (AQ) is arbitrary
- reflect columns of AQ to the same direction

2. Rotated loadings (AQT)

a. Procrustes rotation towards external structure

reveals unique rotated solution

2. Rotated loadings (AQT)

- b. use of one, fixed criterion (e.g., Varimax)

reveals a non-unique solution - Sign order of Varimax rotated loadings is

arbitrary - reflect reorder columns of AQT

2. Rotated loadings (AQT)c. search for the

optimal simple solution

- How are bootstrap solutions AQT found?
- For each bootstrap solution look for optimal

simple loadings (unfeasible) reflect reorder

columns of AQT - For each bootstrap solution Procrustes rotation

towards optimally simple sample loadings

reveals unique solution

- Fixed criterion versus Procrustes towards

(simple) sample loadings - Instable varimax rotated solutions over samples?

5. How to estimate CIs from the distribution of

?

- Wald?
- BCa?

Simulation study

- CIs for Varimax rotated Sample loadings
- Data properties varied
- VAF in population (0.8,0.6,0.4)
- number of variables (8, 16)
- sample size (50, 100, 500)
- distribution of component scores (normal,

leptokurtic, skew) - simplicity of loading matrix (simple,

halfsimple, complex) - Design completely crossed, 1000 replicates per

cell

- Simplicity of loading matrix ?
- Stability of Varimax solution of samples

Quality criteria for 95CIsP(?ltCIleft) a

P(?gtCIright) a

- 95coverage(1-prop(?ltCIleft)-prop(?gtCIright))100

Quality of estimated confidence intervals

(No Transcript)

Empirical example of2 PCA loadings

Multilevel Component Analysis

- Examples
- inhabitants within different countries
- measurement occasions within different subjects

2. Sample(s) drawn from which population(s)?

Which level(s) considered fixed, which random?

- different countries and samples of inhabitants
- sample of mothers and their children
- sample of hospitals and samples of patients

- level 2 (countries) fixed, level 1 (inhabitants)

random - level 2 (mothers) random,level 1 (children)

fixed - both level 2 and 1 random

3. How to define the EDF?

- MLCA (two level groups and objects)
- level 2 fixed, level 1 random? (multi-group)
- Resample objects within all groups
- level 2 random, level 1 fixed (multi-observation

)? - Resample groups (keeping all associated objects)
- levels 2 and 1 random? (real multilevel)
- Resample objects within resampled groups

Object resampling

Group resampling

Double resampling

Quality of estimated confidence intervals

multi-group case level 2 fixed, level 1 random

10 groups 20, 100, or 200 individuals per group

multi-observation case level 2 random, level 2

fixed

20, 100 or 200 groups 10 individuals per group

multi-group case level 2 fixed, level 1 random

multi-level case level 2 and level 1 random

20 groups 40 groups 20 groups 40 groups high

loadings low loadings high loadings low

loadings between within

To conclude

Some remarks

- Bootstrapping is no solution for small sample

sizes - THE bootstrap procedure does not exist
- Be very careful in designing a bootstrap

procedure (you may test it via simulation)

(No Transcript)