Introduction into the Bootstrap - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Introduction into the Bootstrap

Description:

Resampling based method to obtain inferential information on parameter estimates ... permutation test (significance testing) cross-validation (validity of full model) ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 45
Provided by: URCG
Category:
Tags:
Transcript and Presenter's Notes

Title: Introduction into the Bootstrap

1
Introduction into the Bootstrap
• Marieke Timmerman

Baron von MĂĽnchhausen
2
What is bootstrapping?
• Resampling based method to obtain inferential
information on parameter estimates
• Resampling based methods
• bootstrap (inferential info on parameters)
• jackknife
• permutation test (significance testing)
• cross-validation (validity of full model)

3
Standard Inference based an analytically derived
results
• Derivation of sampling distribution of using
assumptions about Population Distribution
Function
• and estimate standard error/confidence interval

4
When Standard inference fails
• Distributional assumptions violated
• Derivation of sampling distribution impossible or
too complex

5
Alternative Bootstrap
Repeat many times
6
Core idea of the bootstrap
• The empirical distribution function (EDF) becomes
equal to the population distribution function
(PDF) as n?8
• For nlt8, assume that the EDF is representative of
the PDF

7
Repeat many times
Estimate standarderror or Confidence interval
8
Example CI for population mean Âµ
Repeat many times
9
Bootstrap Sampling Distribution of ?
Used for estimating the Standard Error (SE) and
Confidence Interval (CI)
10
(No Transcript)
11
• ExampleyiĂź0 Ăź1xi ei
• Which ??
• Sample(s) drawn from which population(s)?
• How to define the EDF?
• Is s(x) is non-unique?
• How to estimate CIs from distribution of s(x)?

12
3. How to define the EDF?
yiĂź0 Ăź1xi ei, i1,,n
• Resampling draw n times with replacement
• non-parametric resample i from i1,,n, EDF is
xi,yi, i1,,n
• semi-parametric
• parametric

13
5. How to estimate CIs from distribution of
s(x)?
• CIs based on bootstrap tables
• CIs based on percentiles

14
• Based on bootstrap tables
• Wald ( )
• Students t-interval
• Bootstrap t-interval with

If no simple se formula is available use of
Double bootstrap (pff)
15
• Based on percentiles
• Percentile method
• Bias corrected percentile method
• Bias corrected and accelerated (BCa)

,
16
Which bootstrap CI estimate?
• Percentile methods are (and bootstrap table
methods are not)
• range preserving
• transformation respecting
• BCa usually better than ordinary percentile
method
• What means better??

17
Quality of CI? ? Coverage
?
• central 1-2a CI CIleftCIright
• P(?ltCIleft) a P(?gtCIright) a with ?
population parameter

18
(No Transcript)
19
Whats next
• Principal Component Analysis
• 4. Is s(x) is non-unique? How to make s(x)
comparable?
• Multilevel Component Analysis
• 2. Sample(s) drawn from which population(s)?

20
Principal Component Analysis
X (I?J) observed scores of I subjects on J
variables Z standardized scores of X F
(I?Q) Principal component scores A (I?Q)
Principal loadings Q Number of selected
principal components T (Q?Q) Rotation matrix
21
1. Which ??
• Loadings
• 1. Principal loadings (AQ)
• 2. Rotated loadings (AQT)
• a. Procrustes rotation towards external
structure
• b. use one, fixed criterion (e.g., Varimax)
• c. search for the optimal simple solution
• Oblique case correlations between components

22
2. Sample(s) drawn from which Population(s)?
• observed scores of I subjects on J variables

23
3. How to define the EDF
• non-parametric Xb rowwise resampling of Z

24
4. Is s(x) non-unique? How to make s(x)
comparable?
• Loadings
• 1. Principal loadings (AQ) non unique
• Sign of Principal loadings (AQ) is arbitrary
• reflect columns of AQ to the same direction

25
• 1. Principal loadings (AQ) non-unique
• Sign of Principal loadings (AQ) is arbitrary
• reflect columns of AQ to the same direction

26
2. Rotated loadings (AQT)
a. Procrustes rotation towards external structure
reveals unique rotated solution
27
2. Rotated loadings (AQT)
• b. use of one, fixed criterion (e.g., Varimax)
reveals a non-unique solution
• Sign order of Varimax rotated loadings is
arbitrary
• reflect reorder columns of AQT

28
2. Rotated loadings (AQT)c. search for the
optimal simple solution
• How are bootstrap solutions AQT found?
• For each bootstrap solution look for optimal
simple loadings (unfeasible) reflect reorder
columns of AQT
• For each bootstrap solution Procrustes rotation
towards optimally simple sample loadings
reveals unique solution

29
• Fixed criterion versus Procrustes towards
(simple) sample loadings
• Instable varimax rotated solutions over samples?

30
5. How to estimate CIs from the distribution of
?
• Wald?
• BCa?

31
Simulation study
• CIs for Varimax rotated Sample loadings
• Data properties varied
• VAF in population (0.8,0.6,0.4)
• number of variables (8, 16)
• sample size (50, 100, 500)
• distribution of component scores (normal,
leptokurtic, skew)
• simplicity of loading matrix (simple,
halfsimple, complex)
• Design completely crossed, 1000 replicates per
cell

32
• Simplicity of loading matrix ?
• Stability of Varimax solution of samples

33
Quality criteria for 95CIsP(?ltCIleft) a
P(?gtCIright) a
• 95coverage(1-prop(?ltCIleft)-prop(?gtCIright))100

34
Quality of estimated confidence intervals
35
(No Transcript)
36
Empirical example of2 PCA loadings
37
Multilevel Component Analysis
• Examples
• inhabitants within different countries
• measurement occasions within different subjects

38
2. Sample(s) drawn from which population(s)?
Which level(s) considered fixed, which random?
• different countries and samples of inhabitants
• sample of mothers and their children
• sample of hospitals and samples of patients
• level 2 (countries) fixed, level 1 (inhabitants)
random
• level 2 (mothers) random,level 1 (children)
fixed
• both level 2 and 1 random

39
3. How to define the EDF?
• MLCA (two level groups and objects)
• level 2 fixed, level 1 random? (multi-group)
• Resample objects within all groups
• level 2 random, level 1 fixed (multi-observation
)?
• Resample groups (keeping all associated objects)
• levels 2 and 1 random? (real multilevel)
• Resample objects within resampled groups

Object resampling
Group resampling
Double resampling
40
Quality of estimated confidence intervals
multi-group case level 2 fixed, level 1 random
10 groups 20, 100, or 200 individuals per group
multi-observation case level 2 random, level 2
fixed
20, 100 or 200 groups 10 individuals per group
41
multi-group case level 2 fixed, level 1 random
multi-level case level 2 and level 1 random
20 groups 40 groups 20 groups 40 groups high
loadings low loadings high loadings low
loadings between within
42
To conclude
43
Some remarks
• Bootstrapping is no solution for small sample
sizes
• THE bootstrap procedure does not exist
• Be very careful in designing a bootstrap
procedure (you may test it via simulation)

44
(No Transcript)
User Comments (0)
About PowerShow.com