Mixed%20effects%20models%20for%20a%20hierarchical%20analysis%20of%20fMRI%20data,%20and%20Bayesian%20model%20selection%20applied%20to%20activation%20detection - PowerPoint PPT Presentation

About This Presentation

Title:

Mixed%20effects%20models%20for%20a%20hierarchical%20analysis%20of%20fMRI%20data,%20and%20Bayesian%20model%20selection%20applied%20to%20activation%20detection

Description:

applied to activation detection. Keith Worsley12, Chuanhong Liao1, John Aston123, ... Basic idea: increase df by spatial smoothing (local pooling) of the sd. ... – PowerPoint PPT presentation

Number of Views:156

Avg rating:3.0/5.0

Slides: 61

Provided by: hong156

Category:

more less

Transcript and Presenter's Notes

Title: Mixed%20effects%20models%20for%20a%20hierarchical%20analysis%20of%20fMRI%20data,%20and%20Bayesian%20model%20selection%20applied%20to%20activation%20detection

1
Mixed effects models for a hierarchical analysis
of fMRI data,and Bayesian model
selectionapplied to activation detection

Keith Worsley12, Chuanhong Liao1, John Aston123,
Jean-Baptiste Poline4, Gary Duncan5, Vali Petre2,
Frank Morales6, Alan Evans2, Ed George7
1Department of Mathematics and Statistics, McGill
University,
2Brain Imaging Centre, Montreal Neurological
Institute,
3Imperial College, London,
4Service Hospitalier Frédéric Joliot, CEA, Orsay,
5Centre de Recherche en Sciences Neurologiques,
Université de Montréal,
6Cuban Neuroscience Centre
7Wharton School

2
fMRI data 120 scans, 3 scans each of hot, rest,
warm, rest, hot, rest,
T (hot warm effect) / S.d. t110 if no
effect
3
(No Transcript)
4
Results from 4 runs on the same subject
Run 1
Run 2
Run 3
Run 4
1
Effect,
0
E

i
_mag_ef
-1
0.2
Sd,
S

0.1
i
_mag_sd
0
5
T stat,
0
E
/ S
i
i
_mag_t
-5
5
MULTISTAT mixed effects linear model for
combining effects from different
runs/sessions/subjects

Ei effect for run/session/subject i
Si standard error of effect
Mixed effects model
Ei covariatesi c Si WNiF ? WNiR

from FMRILM
?
?
Usually 1, but could add group, treatment,
age, sex, ...
Random effect, due to variability from run to run
Fixed effects error, due to variability within
the same run
6
_mag_ef
very noisy sd
_mag_sd
and Tgt15.96 for Plt0.05 (corrected)
_mag_t
so no response is detected
7
REML estimation using the EM algorithm

Slow to converge (10 iterations by default).
Stable (maintains estimate ?2 gt 0 ), but
?2 biased if ?2 (random effect) is small, so
Re-parameterize the variance model
Var(Ei) Si2 ?2
(Si2 minj Sj2) (?2
minj Sj2)
Si2
?2
?2 ?2 minj Sj2 (less biased estimate)

?
?

8
Solution Spatial regularization of the sd

Basic idea increase df by spatial smoothing
(local pooling) of the sd.
Cant smooth the random effects sd directly, -
too much anatomical structure.
Instead,
random effects sd
fixed effects sd
which removes the anatomical structure before
smoothing.

? )
sd smooth ?
fixed effects sd
9

Average Si
?
Random effects sd, 3 df
Fixed effects sd, 440 df
Mixed effects sd, 100 df
0.2
0.15
0.1
0.05
0
divide
multiply
Random sd / fixed sd
Smoothed sd ratio _sdratio
1.5
random effect, sd ratio 1.3
1
0.5
10
Effective df depends on smoothing

dfratio dfrandom(2 1)
1 1 1
dfeff dfratio dffixed

FWHMratio2 3/2 FWHMdata2
e.g. dfrandom 3, dffixed 4 ? 110
440, FWHMdata 8mm

fixed effects analysis, dfeff 440
dfeff
FWHM 19mm
Target 100 df
random effects analysis, dfeff 3
FWHMratio
11
Final result 19mm smoothing, 100 effective df
_mag_ef
_ef
less noisy sd
_sd
_mag_sd
and Tgt4.93 for Plt0.05 (corrected)
_t
_mag_t
and now we can detect a response!
12
Estimating the delay of the response

Delay or latency to the peak of the HRF is
approximated by a linear combination of two
optimally chosen basis functions

delay
basis1
basis2
HRF
shift

HRF(t shift) basis1(t) w1(shift)
basis2(t) w2(shift)
Convolve bases with the stimulus, then add to
the linear model

Fit linear model, estimate w1 and w2
Equate w2 / w1 to estimates, then solve for
shift (Hensen et al., 2002)
To reduce bias when the magnitude is small, use
shift / (1 1/T2)
where T w1 / Sd(w1) is the T statistic for the
magnitude
Shrinks shift to 0 where there is little
evidence for a response.

w2 / w1
w1
w2
14
Shift of the hot stimulus
T stat for magnitude _mag_t T stat for
shift _del_t
Shift (secs) _del_ef Sd of shift
(secs) _del_sd
15
Shift of the hot stimulus
T stat for magnitude _mag_t T stat for
shift _del_t
Tgt4
T2
Shift (secs) _del_ef Sd of shift
(secs) _del_sd
1 sec
/- 0.5 sec
16
Combining shifts of the hot stimulus (Contours
are T stat for magnitude gt 4)
_ef
_del_ef
_sd
_del_sd
_t
_del_t
17
Shift of the hot stimulus
Shift (secs) _del_ef
T stat for magnitude _mag_t gt 4.93
18
(No Transcript)
19
FWHM the local smoothness of the noise
voxel size (1 correlation)1/2
FWHM (2 log
2)1/2
(If the noise is modeled as white noise smoothed
with a Gaussian kernel, this would be its FWHM)
Volume FWHM3
P-values depend on Resels
Resels
Local maximum T 4.5
Clusters above t 3.0, search volume resels 500
0.1
0.1
0.08
0.08
0.06
0.06
P value of local max
P value of cluster
0.04
0.04
0.02
0.02
0
0
0
500
1000
0
0.5
1
1.5
2
Resels of search volume
Resels of cluster
20
STAT_SUMMARY
In between use Discrete Local Maxima (DLM)
High FWHM use Random Field Theory
Low FWHM use Bonferroni
Bonferroni
4.7
4.6
4.5
True
T, 10 df
4.4
Random Field Theory
4.3
T, 20 df
Discrete Local Maxima (DLM)
4.2
Gaussianized threshold

4.1
Gaussian
4
3.9
Bonferroni, NResels
3.8
3.7
0
1
2
3
4
5
6
7
8
9
10
FWHM of smoothing kernel (voxels)
21
STAT_SUMMARY
In between use Discrete Local Maxima (DLM)
High FWHM use Random Field Theory
Low FWHM use Bonferroni
0.12
Gaussian
T, 20 df
T, 10 df
0.1
Random Field Theory
Bonferroni
0.08
DLM can ½ P-value when FWHM 3 voxels
0.06
P-value
0.04
Discrete Local Maxima
True
0.02
Bonferroni, NResels
0
0
1
2
3
4
5
6
7
8
9
10
FWHM of smoothing kernel (voxels)
22
(No Transcript)
23
STAT_SUMMARY example single run, hot-warm
Detected by BON and DLM but not by RFT
Detected by DLM, but not by BON or RFT
24
Bayesian Model Selection (thanks to Ed George)

Z-statistic SPM at voxel i is Zi N(mi,1), i
1, , n
Most of the mis are zero (unactivated voxels)
and a few are non-zero (activated voxels), but we
do not know which voxels are activated, and by
how much (mi)
This is a model selection problem, where we add
an extra model parameter (mi) for the mean of
each activated voxel
Simple Bayesian set-up
each voxel is independently active with
probability p
the activation is itself drawn independently
from a Gaussian distribution mi N(0,c)
The hyperparameter p controls the expected
proportion of activated voxels, and c controls
their expected activation

Surprise! This prior setup is related to the
canonical penalized sum-of-squares criterion
AF Sactivated voxels Zi2 F q
where - q is the number of activated voxels and
- F is a fixed penalty for adding an activated
voxel
Popular model selection criteria simply entail
- maximizing AF for a particular choice of F
- which is equivalent to thresholding the image
at vF
Some choices of F
- F 0 all voxels activated
- F 2 Mallows Cp and AIC
F log n BIC
F 2 log n RIC
P(Z gt vF) 0.05/n Bonferroni (almost same as
RIC!)

The Bayesian relationship with AF is obtained by
re-expressing the posterior of the activated
voxels, given the data
P(activated voxels Zs) a exp ( c/2(1c) AF )
where
F (1c)/c 2 log(1-p)/p log(1c)
Since p and c control the expected number and
size of the activation, the dependence of F on p
and c provides an implicit connection between the
penalty F and the sorts of models for which its
value may be appropriate

The awful truth p and c are unknown
Empirical Bayes idea use p and c that maximize
the marginal likelihood, which simplifies to
L(p,c Zs) a ?i (1-p)exp(-Zi2/2)
p(1c)-1/2exp(-Zi2/2(1c) )
This is identical to fitting a classic mixture
model with
- a probability of (1-p) that Zi N(0,1)
- a probability of p that Zi N(0,c)
- vF is the value of Z where the two components
are equal
Using these estimated values of p and c gives us
an adaptive penalty F, or equivalently a
threshold vF, that is implicitly based on the SPM
All we have to do is fit the mixture model but
does it work?

Same data as before hot warm stimulus, four
runs
- proportion of activated voxels p 0.57
- variance of activated voxels c 5.8 (sd
2.4)
- penalty F 1.59 (a bit like AIC)
- threshold vF 1.26 (?) seems a bit low

AIC vF 2 FDR (0.05) vF 2.67 BIC vF
3.21 RIC vF 4.55 Bon (0.05) vF 4.66
Null model N(0,1)
vF threshold where components are equal
Mixture
Histogram of SPM (n30786)
57 activated voxels, N(0,5.8)
43 un- activated voxels, N(0,1)
Z
29

Same data as before hot warm stimulus, one
run
- proportion of activated voxels p 0.80
- variance of activated voxels c 1.55
- penalty F -3.02 (?)
- all voxels activated !!!!!! What is going on?

AIC vF 2 FDR (0.05) vF 2.67 BIC vF
3.21 RIC vF 4.55 Bon (0.05) vF 4.66
Null model N(0,1)
components are never equal!
Histogram of SPM (n30768)
80 activated voxels, N(0,1.55)
Mixture
20 un- activated voxels, N(0,1)
Z
30
Difference with SPM

FMRISTAT second level analysis is univariate
(can only lead to a T test) SPM is multivariate
(can lead to an F test)
To do this, SPM has to assume various parameters
are global
Not clear if allowing for correlated contrasts
at second level improves inference for a single
contrast i.e. better T stats (in the end, most
stats are T stats, not F) in fact if the
correlations and models across subjects are
equal, nothing is gained
FMRISTAT uses spatial information to boost df
SPM is mass univariate

31
Tgt4.86
32
Tgt4.86
T gt 4.93 (P lt 0.05, corrected)
33
Tgt4.86
T gt 4.93 (P lt 0.05, corrected)
34
Tgt4.86
35
Conjunction Minimum Ti gt threshold
Minimum of Ti _conj
Average of Ti _mag_t
For P0.05, threshold 1.82
For P0.05, threshold 4.93
Efficiency 82
36
Efficiency optimum block design
Sd of hot stimulus
Sd of hot-warm
0.5
0.5
20
20
0.4
0.4
15
15
Optimum design
Magnitude
0.3
0.3
10
10
Optimum design
0.2
0.2
X
5
5
0.1
0.1
0
X
0
0
0
5
10
15
20
5
10
15
20
InterStimulus Interval (secs)
(secs)
(secs)
1
1
20
20
0.8
0.8
15
15
Delay
0.6
0.6
Optimum design X
Optimum design X
10
10
0.4
0.4
5
5
0.2
0.2
(Not enough signal)
(Not enough signal)
0
0
0
5
10
15
20
5
10
15
20
Stimulus Duration (secs)
37
Efficiency optimum event design
0.5
(Not enough signal)
uniform . . . . . . . . .
____ magnitudes . delays
random .. . ... .. .
0.45
concentrated
0.4
0.35
0.3
Sd of effect (secs for delays)
0.25
0.2
0.15
0.1
0.05
0
5
10
15
20
Average time between events (secs)
38
How many subjects?

Largest portion of variance comes from the last
stage i.e. combining over subjects
sdrun2 sdsess2
sdsubj2
nrun nsess nsubj nsess nsubj
nsubj
If you want to optimize total scanner time, take
more subjects.
What you do at early stages doesnt matter very
much!

Comparison
Different slice acquisition times
Drift removal
Temporal correlation
Estimation of effects
Rationale
Random effects
Map of the delay

SPM99
Adds a temporal derivative
Low frequency cosines (flat at the ends)
AR(1), global parameter, bias reduction not
necessary
Band pass filter, then least-squares, then
correction for temporal correlation
More robust,
but lower df
No regularization,
low df, no conjuncs
No

fmristat
Shifts the model
Splines
(free at the ends)
AR(p), voxel parameters, bias reduction
Pre-whiten, then least squares (no further
corrections needed)
More accurate, higher df
Regularization, high df, conjuncs
Yes

40
References

http//www.math.mcgill.ca/keith/fmristat
Worsley et al. (2002). A general statistical
analysis for fMRI data. NeuroImage, 151-15.
Liao et al. (2002). Estimating the delay of the
response in fMRI data. NeuroImage, 16593-606.

41
Functional connectivity

Measured by the correlation between residuals at
every pair of voxels (6D data!)
Local maxima are larger than all 12 neighbours
P-value can be calculated using random field
theory
Good at detecting focal connectivity, but
PCA of residuals x voxels is better at detecting
large regions of co-correlated voxels

Activation only
Correlation only
Voxel 2

Voxel 2

Voxel 1
Voxel 1

42
Correlations gt 0.7, Plt10-10 (corrected)
First Principal Component gt threshold
43
False Discovery Rate (FDR)

Benjamini and Hochberg (1995), Journal of the
Royal Statistical Society
Benjamini and Yekutieli (2001), Annals of
Statistics
Genovese et al. (2001), NeuroImage
FDR controls the expected proportion of false
positives amongst the discoveries, whereas
Bonferroni / random field theory controls the
probability of any false positives
No correction controls the proportion of false
positives in the volume

44
P lt 0.05 (uncorrected), T gt 1.64 5 of volume is
false
Signal Gaussian white noise
Signal
True
Noise
False
FDR lt 0.05, T gt 2.82 5 of discoveries is false
P lt 0.05 (corrected), T gt 4.22 5 probability of
any false
45
Comparison of thresholds

FDR depends on the ordered P-values
P1 lt P2 lt lt Pn. To control the FDR at a
0.05, find
K max i Pi lt (i/n) a, threshold the
P-values at PK
Proportion of true 1 0.1 0.01
0.001 0.0001
Threshold T 1.64 2.56 3.28
3.88 4.41
Bonferroni thresholds the P-values at a/n
Number of voxels 1 10 100 1000
10000
Threshold T 1.64 2.58 3.29
3.89 4.42
Random field theory resels volume / FHHM3
Number of resels 0 1 10
100 1000
Threshold T 1.64 2.82 3.46
4.09 4.65

46
P lt 0.05 (uncorrected), T gt 1.64 5 of volume is
false
47
FDR lt 0.05, T gt 2.67 5 of discoveries is false
48
P lt 0.05 (corrected), T gt 4.93 5 probability of
any false
49
(No Transcript)
50
PCA_IMAGE PCA of time ? space
1 exclude first frames
2 drift
3 long-range correlation or anatomical effect
remove by converting to of brain
4 signal?
51
FMRILM fits a linear model for fMRI time series
with AR(p) errors

Linear model
?
?
Yt (stimulust HRF) b driftt c errort
AR(p) errors
? ?
?
errort a1 errort-1 ap errort-p s WNt

unknown parameters
52
(No Transcript)
53
FMRIDESIGN example pain perception
54
(No Transcript)
55
FMRILM first step estimate the autocorrelation
?

AR(1) model errort a1 errort-1 s WNt
Fit the linear model using least squares
errort Yt fitted Yt
â1 Correlation ( errort , errort-1)
Estimating errorts changes their correlation
structure slightly, so â1 is slightly biased
which_stats _cor
Raw autocorrelation Smoothed 12.4mm Bias
corrected â1
-0.05 0

56
Effective df depends on smoothing

Variability in acor lowers df
Df depends
on contrast
Smoothing acor brings df back up

dfacor dfresidual(2 1)
1 1 2 acor(contrast of data)2
dfeff dfresidual dfacor
FWHMacor2 3/2 FWHMdata2

Hot-warm stimulus
Hot stimulus
FWHMdata 8.79
Residual df 110
Residual df 110
100
100
Target 100 df
Target 100 df
Contrast of data, acor 0.79
50
50
Contrast of data, acor 0.61
dfeff
dfeff
0
0
0
10
20
30
0
10
20
30
FWHM 10.3mm
FWHM 12.4mm
FWHMacor
FWHMacor
57
FMRILM second step refit the linear model
Pre-whiten Yt Yt â1 Yt-1, then fit using
least squares
Hot - warm effect, _mag_ef
Sd of effect, _mag_sd
1
0.25
0.2
0.5
0.15
0
0.1
-0.5
0.05
-1
0
which_stats _mag_ef _mag_sd _mag_t
T effect / sd, 110 df _mag_t
6
4
2
0
-2
-4
-6
58
Higher order AR model? Try AR(3) _AR
a
a
a
1
2
3
0.3
0.2
AR(1) seems to be adequate
0.1
0
has little effect on the T statistics
-0.1
AR(1)
AR(2)
AR(3)
5
0
-5
59
Non-isotropic data (spatially varying FWHM)

fMRI data is smoother in GM than WM
VBM data is highly non-isotropic
Has little effect on P-values for local maxima
(use average FWHM inside search region), but
Has a big effect on P-values for spatial extents
smooth regions ? big clusters,
rough regions ? small clusters, so
Replace cluster volume by cluster resels
volume / FWHM3

60
_fwhm
_fwhm
FWHM (mm) of scans (110 df)
FWHM (mm) of effects (3 df)
20
20
Resels1.90 P0.007
15
15
10
10
Resels0.57 P0.387
5
5
0
0
FWHM of effects (smoothed)
effects / scans FWHM (smoothed)
20
1.5
15
10
1
5
0
0.5

Write a Comment

User Comments (0)