Title: Inference on SPMs: Random Field Theory
1Inference on SPMsRandom Field Theory
Alternatives
- Thomas Nichols, Ph.D.
- Director, Modelling GeneticsGlaxoSmithKline
Clinical Imaging Centre - http//www.fmrib.ox.ac.uk/nichols
- Zurich SPM Course
- Feb 12, 2009
2image data
parameter estimates
designmatrix
kernel
Thresholding Random Field Theory
- General Linear Model
- model fitting
- statistic image
realignment motioncorrection
smoothing
normalisation
StatisticalParametric Map
anatomicalreference
Corrected thresholds p-values
3Outline
- Orientation
- Assessing Statistic images
- The Multiple Testing Problem
- Random Field Theory FWE
- Permutation FWE
- False Discovery Rate
4Outline
- Orientation
- Assessing Statistic images
- The Multiple Testing Problem
- Random Field Theory FWE
- Permutation FWE
- False Discovery Rate
5Assessing Statistic Images
High Threshold
Med. Threshold
Low Threshold
Good SpecificityPoor Power(risk of false
negatives)
Poor Specificity(risk of false positives)Good
Power
...but why threshold?!
6Blue-sky inferenceWhat wed like
- Dont threshold, model the signal!
- Signal location?
- Estimates and CIs on(x,y,z) location
- Signal magnitude?
- CIs on change
- Spatial extent?
- Estimates and CIs on activation volume
- Robust to choice of cluster definition
- ...but this requires an explicit spatial model
space
7Blue-sky inferenceWhat we need
- Need an explicit spatial model
- No routine spatial modeling methods exist
- High-dimensional mixture modeling problem
- Activations dont look like Gaussian blobs
- Need realistic shapes, sparse
- Some initial work
- Hartvig et al., Penny et al., Xu et al.
- Not part of mass-univariate framework
8Real-life inferenceWhat we get
- Signal location
- Local maximum no inference
- Center-of-mass no inference
- Sensitive to blob-defining-threshold
- Signal magnitude
- Local maximum intensity P-values ( CIs)
- Spatial extent
- Cluster volume P-value, no CIs
- Sensitive to blob-defining-threshold
9Voxel-level Inference
- Retain voxels above ?-level threshold u?
- Gives best spatial specificity
- The null hyp. at a single voxel can be rejected
u?
space
Significant Voxels
No significant Voxels
10Cluster-level Inference
- Two step-process
- Define clusters by arbitrary threshold uclus
- Retain clusters larger than ?-level threshold k?
uclus
space
Cluster not significant
Cluster significant
k?
k?
11Cluster-level Inference
- Typically better sensitivity
- Worse spatial specificity
- The null hyp. of entire cluster is rejected
- Only means that one or more of voxels in
cluster active
uclus
space
Cluster not significant
Cluster significant
k?
k?
12Set-level Inference
- Count number of blobs c
- Minimum blob size k
- Worst spatial specificity
- Only can reject global null hypothesis
uclus
space
k
k
Here c 1 only 1 cluster larger than k
13Outline
- Orientation
- Assessing Statistic images
- The Multiple Testing Problem
- Random Field Theory FWE
- Permutation FWE
- False Discovery Rate
14Hypothesis Testing
- Null Hypothesis H0
- Test statistic T
- t observed realization of T
- ? level
- Acceptable false positive rate
- Level ? P( Tgtu? H0 )
- Threshold u? controls false positive rate at
level ? - P-value
- Assessment of t assuming H0
- P( T gt t H0 )
- Prob. of obtaining stat. as largeor larger in a
new experiment - P(DataNull) not P(NullData)
15Multiple Testing Problem
- Which of 100,000 voxels are sig.?
- ?0.05 ? 5,000 false positive voxels
- Which of (random number, say) 100 clusters
significant? - ?0.05 ? 5 false positives clusters
16MTP SolutionsMeasuring False Positives
- Familywise Error Rate (FWE)
- Familywise Error
- Existence of one or more false positives
- FWE is probability of familywise error
- False Discovery Rate (FDR)
- FDR E(V/R)
- R voxels declared active, V falsely so
- Realized false discovery rate V/R
17MTP SolutionsMeasuring False Positives
- Familywise Error Rate (FWE)
- Familywise Error
- Existence of one or more false positives
- FWE is probability of familywise error
- False Discovery Rate (FDR)
- FDR E(V/R)
- R voxels declared active, V falsely so
- Realized false discovery rate V/R
18FWE MTP Solutions Bonferroni
- For a statistic image T...
- Ti ith voxel of statistic image T
- ...use ? ?0/V
- ?0 FWE level (e.g. 0.05)
- V number of voxels
- u? ?-level statistic threshold, P(Ti ? u?) ?
Conservative under correlation Independent V
tests Some dep. ? tests Total dep. 1 test
19Outline
- Orientation
- Assessing Statistic images
- The Multiple Testing Problem
- Random Field Theory FWE
- Permutation FWE
- False Discovery Rate
20SPM approachRandom fields
- Consider statistic image as lattice
representation of a continuous random field - Use results from continuous random field theory
? lattice represtntation
21FWE MTP Solutions Controlling FWE w/ Max
- FWE distribution of maximum
- FWE P(FWE) P( ?i Ti ? u Ho) P(
maxi Ti ? u Ho) - 100(1-?)ile of max distn controls FWE
- FWE P( maxi Ti ? u? Ho) ?
- where
- u? F-1max (1-?)
- .
u?
22FWE MTP SolutionsRandom Field Theory
- Euler Characteristic ?u
- Topological Measure
- blobs - holes
- At high thresholds,just counts blobs
- FWE P(Max voxel ? u Ho) P(One or more
blobs Ho) ? P(?u ? 1 Ho) ? E(?u Ho)
Threshold
Random Field
No holes
Never more than 1 blob
Suprathreshold Sets
23RFT DetailsExpected Euler Characteristic
- E(?u) ? ?(?) ?1/2 (u 2 -1) exp(-u 2/2) / (2?)2
- ? ? Search region ? ? R3
- ?(?? ? volume
- ?1/2 ? roughness
- Assumptions
- Multivariate Normal
- Stationary
- ACF twice differentiable at 0
- Stationary
- Results valid w/out stationary
- More accurate when stat. holds
24Random Field TheorySmoothness Parameterization
- E(?u) depends on ?1/2
- ? roughness matrix
- Smoothness parameterized as Full Width at Half
Maximum - FWHM of Gaussian kernel needed to smooth a
whitenoise random field to roughness ?
25Random Field TheorySmoothness Parameterization
- RESELS
- Resolution Elements
- 1 RESEL FWHMx?? FWHMy?? FWHMz
- RESEL Count R
- R ?(?) ? ? (4log2)3/2 ?(?) / ( FWHMx??
FWHMy?? FWHMz ) - Volume of search region in units of smoothness
- Eg 10 voxels, 2.5 FWHM 4 RESELS
- Beware RESEL misinterpretation
- RESEL are not number of independent things in
the image - See Nichols Hayasaka, 2003, Stat. Meth. in Med.
Res. - .
26Random Field TheorySmoothness Estimation
- Smoothness estdfrom standardizedresiduals
- Variance ofgradients
- Yields resels pervoxel (RPV)
- RPV image
- Local roughness est.
- Can transform in to local smoothness est.
- FWHM Img (RPV Img)-1/D
- Dimension D, e.g. D2 or 3
27Random Field Intuition
- Corrected P-value for voxel value t
- Pc P(max T gt t) ? E(?t) ? ?(?) ?1/2 t2
exp(-t2/2) - Statistic value t increases
- Pc decreases (but only for large t)
- Search volume increases
- Pc increases (more severe MTP)
- Smoothness increases (roughness ?1/2 decreases)
- Pc decreases (less severe MTP)
28RFT DetailsUnified Formula
- General form for expected Euler characteristic
- ?2, F, t fields restricted search regions
D dimensions - E?u(W) Sd Rd (W) rd (u)
Rd (W) d-dimensional Minkowski functional of
W function of dimension, space W and
smoothness R0(W) ?(W) Euler characteristic
of W R1(W) resel diameter R2(W) resel
surface area R3(W) resel volume
rd (W) d-dimensional EC density of Z(x)
function of dimension and threshold, specific
for RF type E.g. Gaussian RF r0(u) 1- ?(u)
r1(u) (4 ln2)1/2 exp(-u2/2) /
(2p) r2(u) (4 ln2) exp(-u2/2) /
(2p)3/2 r3(u) (4 ln2)3/2 (u2 -1) exp(-u2/2)
/ (2p)2 r4(u) (4 ln2)2 (u3 -3u) exp(-u2/2)
/ (2p)5/2
?
29Random Field TheoryCluster Size Tests
- Expected Cluster Size
- E(S) E(N)/E(L)
- S cluster size
- N suprathreshold volume?(T gt uclus)
- L number of clusters
- E(N) ?(?) P( T gt uclus )
- E(L) ? E(?u)
- Assuming no holes
30RFT Cluster Inference Stationarity
- Problem w/ VBM
- Standard RFT result assumes stationarity,
constant smoothness - Assuming stationarity, false positive clusters
will be found in extra-smooth regions - VBM noise very non-stationary
- Nonstationary cluster inference
- Must un-warp nonstationarity
- Reported but not implemented
- Hayasaka et al, NeuroImage 22676 687, 2004
- Now available in SPM toolboxes
- http//fmri.wfubmc.edu/cms/softwareNS
- Gasers VBM toolbox (w/ recent update!)
VBMImage of FWHM Noise Smoothness
Nonstationarynoise
warped to stationarity
31Random Field TheoryCluster Size Distribution
- Gaussian Random Fields (Nosko, 1969)
- D Dimension of RF
- t Random Fields (Cao, 1999)
- B Beta distn
- Us ?2s
- c chosen s.t.E(S) E(N) / E(L)
32Random Field TheoryCluster Size Corrected
P-Values
- Previous results give uncorrected P-value
- Corrected P-value
- Bonferroni
- Correct for expected number of clusters
- Corrected Pc E(L) Puncorr
- Poisson Clumping Heuristic (Adler, 1980)
- Corrected Pc 1 - exp( -E(L) Puncorr )
33Random Field Theory Limitations
- Sufficient smoothness
- FWHM smoothness 3-4 voxel size (Z)
- More like 10 for low-df T images
- Smoothness estimation
- Estimate is biased when images not sufficiently
smooth - Multivariate normality
- Virtually impossible to check
- Several layers of approximations
- Stationary required for cluster size results
34Real Data
- fMRI Study of Working Memory
- 12 subjects, block design Marshuetz et al (2000)
- Item Recognition
- ActiveView five letters, 2s pause, view probe
letter, respond - Baseline View XXXXX, 2s pause, view Y or N,
respond - Second Level RFX
- Difference image, A-B constructedfor each
subject - One sample t test
35Real DataRFT Result
- Threshold
- S 110,776
- 2 ? 2 ? 2 voxels5.1 ? 5.8 ? 6.9 mmFWHM
- u 9.870
- Result
- 5 voxels above the threshold
- 0.0063 minimumFWE-correctedp-value
-log10 p-value
36Outline
- Orientation
- Assessing Statistic images
- The Multiple Testing Problem
- Random Field Theory FWE
- Permutation FWE
- False Discovery Rate
37Nonparametric Inference
- Parametric methods
- Assume distribution ofstatistic under
nullhypothesis - Needed to find P-values, u?
- Nonparametric methods
- Use data to find distribution of statisticunder
null hypothesis - Any statistic!
38Permutation TestToy Example
- Data from V1 voxel in visual stim. experiment
- A Active, flashing checkerboard B Baseline,
fixation - 6 blocks, ABABAB Just consider block
averages... - Null hypothesis Ho
- No experimental effect, A B labels arbitrary
- Statistic
- Mean difference
A B A B A B
103.00 90.48 99.93 87.83 99.76 96.06
39Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
AAABBB ABABAB BAAABB BABBAA
AABABB ABABBA BAABAB BBAAAB
AABBAB ABBAAB BAABBA BBAABA
AABBBA ABBABA BABAAB BBABAA
ABAABB ABBBAA BABABA BBBAAA
40Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
- Compute all possible statistic values
AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86
AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15
AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67
AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25
ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82
41Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
- Compute all possible statistic values
- Find 95ile of permutation distribution
AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86
AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15
AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67
AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25
ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82
42Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
- Compute all possible statistic values
- Find 95ile of permutation distribution
AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86
AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15
AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67
AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25
ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82
43Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
- Compute all possible statistic values
- Find 95ile of permutation distribution
0
4
8
-4
-8
44Controlling FWE Permutation Test
- Parametric methods
- Assume distribution ofmax statistic under
nullhypothesis - Nonparametric methods
- Use data to find distribution of max
statisticunder null hypothesis - Again, any max statistic!
45Permutation TestOther Statistics
- Collect max distribution
- To find threshold that controls FWE
- Consider smoothed variance t statistic
- To regularize low-df variance estimate
46Permutation TestSmoothed Variance t
- Collect max distribution
- To find threshold that controls FWE
- Consider smoothed variance t statistic
t-statistic
variance
47Permutation TestSmoothed Variance t
- Collect max distribution
- To find threshold that controls FWE
- Consider smoothed variance t statistic
SmoothedVariancet-statistic
mean difference
smoothedvariance
48Permutation TestStrengths
- Requires only assumption of exchangeability
- Under Ho, distribution unperturbed by permutation
- Allows us to build permutation distribution
- Subjects are exchangeable
- Under Ho, each subjects A/B labels can be
flipped - fMRI scans not exchangeable under Ho
- Due to temporal autocorrelation
49Permutation TestLimitations
- Computational Intensity
- Analysis repeated for each relabeling
- Not so bad on modern hardware
- No analysis discussed below took more than 3
hours - Implementation Generality
- Each experimental design type needs unique code
to generate permutations - Not so bad for population inference with t-tests
50Permutation TestExample
- fMRI Study of Working Memory
- 12 subjects, block design Marshuetz et al (2000)
- Item Recognition
- ActiveView five letters, 2s pause, view probe
letter, respond - Baseline View XXXXX, 2s pause, view Y or N,
respond - Second Level RFX
- Difference image, A-B constructedfor each
subject - One sample, smoothed variance t test
51Permutation TestExample
- Permute!
- 212 4,096 ways to flip 12 A/B labels
- For each, note maximum of t image
- .
52Permutation TestExample
- Compare with Bonferroni
- ? 0.05/110,776
- Compare with parametric RFT
- 110,776 2?2?2mm voxels
- 5.1?5.8?6.9mm FWHM smoothness
- 462.9 RESELs
53uRF 9.87uBonf 9.805 sig. vox.
uPerm 7.67 58 sig. vox.
t11 Statistic, RF Bonf. Threshold
t11 Statistic, Nonparametric Threshold
54Does this Generalize?RFT vs Bonf. vs Perm.
55RFT vs Bonf. vs Perm.
56Reliability with Small Groups
- Consider n50 group study
- Event-related Odd-Ball paradigm, Kiehl, et al.
- Analyze all 50
- Analyze with SPM and SnPM, find FWE thresh.
- Randomly partition into 5 groups 10
- Analyze each with SPM SnPM, find FWE thresh
- Compare reliability of small groups with full
- With and without variance smoothing
- .
Skip
57SPM t11 5 groups of 10 vs all 505 FWE
Threshold
Tgt10.93
Tgt11.04
Tgt11.01
10 subj
10 subj
10 subj
2 8 11 15 18 35 41 43 44 50
1 3 20 23 24 27 28 32 34 40
9 13 14 16 19 21 25 29 30 45
Tgt10.69
Tgt10.10
Tgt4.66
10 subj
10 subj
all 50
4 5 10 22 31 33 36 39 42 47
6 7 12 17 26 37 38 46 48 49
58SnPM t 5 groups of 10 vs. all 505 FWE
Threshold
Tgt7.06
Tgt8.28
Tgt6.3
10 subj
10 subj
10 subj
2 8 11 15 18 35 41 43 44 50
1 3 20 23 24 27 28 32 34 40
9 13 14 16 19 21 25 29 30 45
Tgt4.09
Tgt6.49
Tgt6.19
10 subj
10 subj
all 50
4 5 10 22 31 33 36 39 42 47
6 7 12 17 26 37 38 46 48 49
59SnPM SmVar t 5 groups of 10 vs. all 505 FWE
Threshold
Tgt4.69
Tgt5.04
Tgt4.57
10 subj
10 subj
10 subj
2 8 11 15 18 35 41 43 44 50
1 3 20 23 24 27 28 32 34 40
9 13 14 16 19 21 25 29 30 45
Tgt4.84
Tgt4.64
10 subj
10 subj
4 5 10 22 31 33 36 39 42 47
6 7 12 17 26 37 38 46 48 49
60Outline
- Orientation
- Assessing Statistic images
- The Multiple Testing Problem
- Random Field Theory FWE
- Permutation FWE
- False Discovery Rate
61MTP SolutionsMeasuring False Positives
- Familywise Error Rate (FWE)
- Familywise Error
- Existence of one or more false positives
- FWE is probability of familywise error
- False Discovery Rate (FDR)
- FDR E(V/R)
- R voxels declared active, V falsely so
- Realized false discovery rate V/R
62False Discovery Rate
- For any threshold, all voxels can be
cross-classified - Realized FDR
- rFDR V0R/(V1RV0R) V0R/NR
- If NR 0, rFDR 0
- But only can observe NR, dont know V1R V0R
- We control the expected rFDR
- FDR E(rFDR)
Accept Null Reject Null
Null True V0A V0R m0
Null False V1A V1R m1
NA NR V
63False Discovery RateIllustration
Noise
Signal
SignalNoise
64Control of Per Comparison Rate at 10
Percentage of Null Pixels that are False Positives
Control of Familywise Error Rate at 10
FWE
Occurrence of Familywise Error
Control of False Discovery Rate at 10
Percentage of Activated Pixels that are False
Positives
65Benjamini HochbergProcedure
- Select desired limit q on FDR
- Order p-values, p(1) ? p(2) ? ... ? p(V)
- Let r be largest i such that
- Reject all hypotheses corresponding to p(1),
... , p(r).
JRSS-B (1995)57289-300
1
p(i)
p-value
i/V ? q
0
0
1
i/V
66Adaptiveness of Benjamini Hochberg FDR
Ordered p-values p(i)
P-value threshold when no signal ?/V
P-value thresholdwhen allsignal ?
Fractional index i/V
67Benjamini Hochberg Procedure Details
- Standard Result
- Positive Regression Dependency on Subsets
- P(X1?c1, X2?c2, ..., Xk?ck Xixi) is
non-decreasing in xi - Only required of null xis
- Positive correlation between null voxels
- Positive correlation between null and signal
voxels - Special cases include
- Independence
- Multivariate Normal with all positive
correlations - Arbitrary covariance structure
- Replace q by q/c(V), c(V) ?i1,...,V 1/i ?
log(V)0.5772 - Much more stringent
Benjamini Yekutieli (2001).Ann.
Stat.291165-1188
68Benjamini HochbergKey Properties
- FDR is controlled E(rFDR) ? q
m0/V - Conservative, if large fraction of nulls false
- Adaptive
- Threshold depends on amount of signal
- More signal, More small p-values,More p(i) less
than i/V ? q/c(V)
69Controlling FDRVarying Signal Extent
p z
1
70Controlling FDRVarying Signal Extent
p z
2
71Controlling FDRVarying Signal Extent
p z
3
72Controlling FDRVarying Signal Extent
p 0.000252 z 3.48
4
73Controlling FDRVarying Signal Extent
p 0.001628 z 2.94
5
74Controlling FDRVarying Signal Extent
p 0.007157 z 2.45
6
75Controlling FDRVarying Signal Extent
p 0.019274 z 2.07
7
76Controlling FDRBenjamini Hochberg
- Illustrating BH under dependence
- Extreme example of positive dependence
1
p(i)
p-value
i/V ? q/c(V)
0
0
1
i/V
77Real Data FDR Example
- Threshold
- Indep/PosDepu 3.83
- Arb Covu 13.15
- Result
- 3,073 voxels aboveIndep/PosDep u
- lt0.0001 minimumFDR-correctedp-value
FDR Threshold 3.833,073 voxels
78Conclusions
- Must account for multiplicity
- Otherwise have a fishing expedition
- FWE
- Very specific, not so sensitive
- Random Field Theory
- Great for single-subject fMRI, EEG
- Nonparametric / SnPM
- Much better power voxel-wise than RFT for small
DF - FDR
- Less specific, more sensitive
- Interpret with care!
- FP risk is over whole set of surviving voxels
79References
- Most of this talk covered in these papers
- TE Nichols S Hayasaka, Controlling the
Familywise Error Rate in Functional Neuroimaging
A Comparative Review. Statistical Methods in
Medical Research, 12(5) 419-446, 2003. - TE Nichols AP Holmes, Nonparametric
Permutation Tests for Functional Neuroimaging A
Primer with Examples. Human Brain Mapping,
151-25, 2001. - CR Genovese, N Lazar TE Nichols, Thresholding
of Statistical Maps in Functional Neuroimaging
Using the False Discovery Rate. NeuroImage,
15870-878, 2002.