Maxima of discretely sampled random fields, with an application to - PowerPoint PPT Presentation

About This Presentation
Title:

Maxima of discretely sampled random fields, with an application to

Description:

Maxima of discretely sampled random fields, with an application to bubbles' ... but face is only revealed through random bubbles' First trial: 'Sad' expression ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 65
Provided by: KeithW69
Category:

less

Transcript and Presenter's Notes

Title: Maxima of discretely sampled random fields, with an application to


1
Maxima of discretely sampled random fields, with
an application to bubbles
Sparse inference and large-scale multiple
comparisons
  • Keith Worsley,
  • McGill
  • Nicholas Chamandy,
  • McGill and Google
  • Jonathan Taylor,
  • Stanford and Université de Montréal
  • Frédéric Gosselin,
  • Université de Montréal
  • Philippe Schyns, Fraser Smith,
  • Glasgow

2
What is bubbles?
3
Nature (2005)
4
Subject is shown one of 40 faces chosen at random

Happy
Sad
Fearful
Neutral
5
but face is only revealed through random
bubbles
  • First trial Sad expression
  • Subject is asked the expression Neutral

  • Response
    Incorrect

75 random bubble centres
Smoothed by a Gaussian bubble
What the subject sees
Sad
6
Your turn
  • Trial 2

Subject response Fearful CORRECT
7
Your turn
  • Trial 3

Subject response Happy INCORRECT (Fearful)
8
Your turn
  • Trial 4

Subject response Happy CORRECT
9
Your turn
  • Trial 5

Subject response Fearful CORRECT
10
Your turn
  • Trial 6

Subject response Sad CORRECT
11
Your turn
  • Trial 7

Subject response Happy CORRECT
12
Your turn
  • Trial 8

Subject response Neutral CORRECT
13
Your turn
  • Trial 9

Subject response Happy CORRECT
14
Your turn
  • Trial 3000

Subject response Happy INCORRECT (Fearful)
15
Bubbles analysis
  • E.g. Fearful (3000/4750 trials)

Trial 1 2 3 4
5 6 7 750
Sum
Correct trials
Thresholded at proportion of correct
trials0.68, scaled to 0,1
Use this as a bubble mask
Proportion of correct bubbles (sum correct
bubbles) /(sum all bubbles)
16
Results
  • Mask average face
  • But are these features real or just noise?
  • Need statistics

Happy Sad
Fearful Neutral
17
Statistical analysis
  • Correlate bubbles with response (correct 1,
    incorrect 0), separately for each expression
  • Equivalent to 2-sample Z-statistic for correct
    vs. incorrect bubbles, e.g. Fearful
  • Very similar to the proportion of correct
  • bubbles

ZN(0,1) statistic
Trial 1 2 3 4
5 6 7 750
Response 0 1 1 0
1 1 1 1
18
Results
  • Thresholded at Z1.64 (P0.05)
  • Sparse inference and large-scale multiple
    comparisons - correction?

ZN(0,1) statistic
Average face Happy Sad
Fearful Neutral
19
Three methods so far
  • The set-up
  • S is a subset of a D-dimensional lattice (e.g.
    pixels)
  • Z(s) N(0,1) at most points s in S
  • Z(s) N(µ(s),1), µ(s)gt0 at a sparse set of
    points
  • Z(s1), Z(s2) are spatially correlated.
  • To control the false positive rate to a we want
    a good approximation to a P(maxS Z(s) t)
  • Bonferroni (1936)
  • Random field theory (1970s)
  • Discrete local maxima (2005, 2007)

20
P(maxS Z(s) t) 0.05
Random field theory
Bonferroni
Discrete local maxima
Z(s)
21
Random field theory Euler Characteristic (EC)
blobs - holes
Excursion set s Z(s) t for neutral face
EC 0 0 -7 -11
13 14 9 1 0
Heuristic At high thresholds t, the holes
disappear, EC 1 or 0, E(EC) P(max Z
t).
  • Exact expression for E(EC) for all thresholds,
  • E(EC) P(max Z t) is extremely accurate.

22
Random field theoryThe details
Intrinsic volumes or Minkowski functionals
23
Random field theoryThe brain mapping version
EC0(S)
Resels0(S)
Resels1(S)
EC1(S)
Resels2(S)
EC2(S)
Resels (Resolution elements)
EC densities
24
Discrete local maxima
  • Bonferroni applied to events
  • Z(s) t and Z(s) is a discrete
    local maximum i.e.
  • Z(s) t and neighbour Zs Z(s)
  • Conservative
  • If Z(s) is stationary, with
  • Cor(Z(s1),Z(s2)) ?(s1-s2),
  • all we need is
  • PZ(s) t and neighbour Zs
    Z(s)
  • a (2D1)-variate integral

Z(s2)

Z(s-1) Z(s) Z(s1)

Z(s-2)
25
Discrete local maxima Markovian trick
  • If ? is separable s(x,y),
  • ?((x,y)) ?((x,0))
    ?((0,y))
  • e.g. Gaussian spatial correlation function
  • ?((x,y)) exp(-½(x2y2)/w2)
  • Then Z(s) has a Markovian property
  • conditional on central Z(s), Zs on
  • different axes are independent
  • Z(s1) - Z(s2) Z(s)
  • So condition on Z(s)z, find
  • Pneighbour Zs z Z(s)z dPZ(sd) z
    Z(s)z
  • then take expectations over Z(s)z
  • Cuts the (2D1)-variate integral down to a
    bivariate integral

Z(s2)

Z(s-1) Z(s) Z(s1)

Z(s-2)
26
(No Transcript)
27
Results, corrected for search
  • Random field theory threshold Z3.92 (P0.05)
  • DLM threshold Z3.92 (P0.05) same
  • Bonferroni threshold Z4.87 (P0.05) nothing

ZN(0,1) statistic
Average face Happy Sad
Fearful Neutral
28
Results, corrected for search
  • FDR threshold Z
  • 4.87 3.46 3.31
    4.87
  • (Q0.05)

ZN(0,1) statistic
Average face Happy Sad
Fearful Neutral
29
Comparison
  • Bonferroni (1936)
  • Conservative
  • Accurate if spatial correlation is low
  • Simple
  • Discrete local maxima (2005, 2007)
  • Conservative
  • Accurate for all ranges of spatial correlation
  • A bit messy
  • Only easy for stationary separable Gaussian data
    on rectilinear lattices
  • Even if not separable, always seems to be
    conservative
  • Random field theory (1970s)
  • Approximation based on assuming S is continuous
  • Accurate if spatial correlation is high
  • Elegant
  • Easily extended to non-Gaussian, non-isotropic
    random fields

30
Random field theory Non-Gaussian non-iostropic
31
Referee report
  • Why bother?
  • Why not just do simulations?

32
fMRI data 120 scans, 3 scans each of hot, rest,
warm, rest, hot, rest,
T (hot warm effect) / S.d. t110 if no
effect
33
Bubbles task in fMRI scanner
  • Correlate bubbles with BOLD at every voxel
  • Calculate Z for each pair (bubble pixel, fMRI
    voxel) a 5D image of Z statistics

Trial 1 2 3 4
5 6 7 3000
fMRI
34
Discussion thresholding
  • Thresholding in advance is vital, since we cannot
    store all the 1 billion 5D Z values
  • Resels5(image Resels2146.2) (fMRI
    Resels31057.2)
  • for P0.05, threshold is t 6.22 (approx)
  • The threshold based on Gaussian RFT can be
    improved using new non-Gaussian RFT based on
    saddle-point approximations (Chamandy et al.,
    2006)
  • Model the bubbles as a smoothed Poisson point
    process
  • The improved thresholds are slightly lower, so
    more activation is detected
  • Only keep 5D local maxima
  • Z(pixel, voxel) gt Z(pixel, 6 neighbours of voxel)
  • gt Z(4 neighbours of
    pixel, voxel)

35
Discussion modeling
  • The random response is Y1 (correct) or 0
    (incorrect), or YfMRI
  • The regressors are Xjbubble mask at pixel j, j1
    240x38091200 (!)
  • Logistic regression or ordinary regression
  • logit(E(Y)) or E(Y) b0X1b1X91200b91200
  • But there are only n3000 observations (trials)
  • Instead, since regressors are independent, fit
    them one at a time
  • logit(E(Y)) or E(Y) b0Xjbj
  • However the regressors (bubbles) are random with
    a simple known distribution, so turn the problem
    around and condition on Y
  • E(Xj) c0Ycj
  • Equivalent to conditional logistic regression
    (Cox, 1962) which gives exact inference for b1
    conditional on sufficient statistics for b0
  • Cox also suggested using saddle-point
    approximations to improve accuracy of inference
  • Interactions? logit(E(Y)) or E(Y)b0X1b1X91200
    b91200X1X2b1,2

36
P(maxS Z(s) t) 0.05
Random field theory
Bonferroni
Discrete local maxima
Z(s)
37
Bayesian Model Selection (thanks to Ed George)
  • Z-statistic at voxel i is Zi N(mi,1), i 1,
    , n
  • Most of the mis are zero (unactivated voxels)
    and a few are non-zero (activated voxels), but we
    do not know which voxels are activated, and by
    how much (mi)
  • This is a model selection problem, where we add
    an extra model parameter (mi) for the mean of
    each activated voxel
  • Simple Bayesian set-up
  • each voxel is independently active with
    probability p
  • the activation is itself drawn independently
    from a Gaussian distribution mi N(0,c)
  • The hyperparameter p controls the expected
    proportion of activated voxels, and c controls
    their expected activation

38
  • Surprise! This prior setup is related to the
    canonical
  • penalized sum-of-squares criterion
  • AF Sactivated voxels Zi2 F q
  • where - q is the number of activated voxels and
  • - F is a fixed penalty for adding an activated
    voxel
  • Popular model selection criteria simply entail
  • - maximizing AF for a particular choice of F
  • - which is equivalent to thresholding the image
    at vF
  • Some choices of F
  • - F 0 all voxels activated
  • - F 2 Mallows Cp and AIC
  • F log n BIC
  • F 2 log n RIC
  • P(Z gt vF) 0.05/n Bonferroni (almost same as
    RIC!)

39
  • The Bayesian relationship with AF is obtained by
    re-expressing the posterior of the activated
    voxels, given the data
  • P(activated voxels Zs) a exp ( c/2(1c) AF )
  • where
  • F (1c)/c 2 log(1-p)/p log(1c)
  • Since p and c control the expected number and
    size of the activation, the dependence of F on p
    and c provides an implicit connection between the
    penalty F and the sorts of models for which its
    value may be appropriate

40
  • The awful truth p and c are unknown
  • Empirical Bayes idea use p and c that maximize
    the
  • marginal likelihood, which simplifies to
  • L(p,c Zs) a ?i (1-p)exp(-Zi2/2)
    p(1c)-1/2exp(-Zi2/2(1c) )
  • This is identical to fitting a classic mixture
    model with
  • - a probability of (1-p) that Zi N(0,1)
  • - a probability of p that Zi N(0,c)
  • - vF is the value of Z where the two components
    are equal
  • Using these estimated values of p and c gives us
    an adaptive penalty F, or equivalently a
    threshold vF, that is implicitly based on the SPM
  • All we have to do is fit the mixture model but
    does it work?

41
  • Same data as before hot warm stimulus, four
    runs
  • - proportion of activated voxels p 0.57
  • - variance of activated voxels c 5.8 (sd
    2.4)
  • - penalty F 1.59 (a bit like AIC)
  • - threshold vF 1.26 (?) seems a bit low

AIC vF 2 FDR (0.05) vF 2.67 BIC vF
3.21 RIC vF 4.55 Bon (0.05) vF 4.66
Null model N(0,1)
vF threshold where components are equal
Mixture
Histogram of SPM (n30786)
57 activated voxels, N(0,5.8)
43 un- activated voxels, N(0,1)
Z
42
  • Same data as before hot warm stimulus, one
    run
  • - proportion of activated voxels p 0.80
  • - variance of activated voxels c 1.55
  • - penalty F -3.02 (?)
  • - all voxels activated !!!!!! What is going on?

AIC vF 2 FDR (0.05) vF 2.67 BIC vF
3.21 RIC vF 4.55 Bon (0.05) vF 4.66
Null model N(0,1)
components are never equal!
Histogram of SPM (n30768)
80 activated voxels, N(0,1.55)
Mixture
20 un- activated voxels, N(0,1)
Z
43
MS lesions and cortical thickness
  • Idea MS lesions interrupt neuronal signals,
    causing thinning in down-stream cortex
  • Data n 425 mild MS patients

5.5
5
4.5
4
Average cortical thickness (mm)
3.5
3
2.5
Correlation -0.568, T -14.20 (423 df)
2
Charil et al, NeuroImage (2007)
1.5
0
10
20
30
40
50
60
70
80
Total lesion volume (cc)
44
MS lesions and cortical thickness at all pairs of
points
  • Dominated by total lesions and average cortical
    thickness, so remove these effects
  • Cortical thickness CT, smoothed 20mm
  • Subtract average cortical thickness
  • Lesion density LD, smoothed 10mm
  • Find partial correlation(lesion density, cortical
    thickness) removing total lesion volume
  • linear model CT-av(CT) 1 TLV LD, test for
    LD
  • Repeat of all voxels in 3D, nodes in 2D
  • 1 billion correlations, so thresholding
    essential!
  • Look for high negative correlations

45
Thresholding? Crosscorrelation random field
  • Correlation between 2 fields at 2 different
    locations, searched over all pairs of locations
  • one in R (D dimensions), one in S (E dimensions)
  • sample size n
  • MS lesion data P0.05, c0.300, T6.46

Cao Worsley, Annals of Applied Probability
(1999)
46
Cluster extent rather than peak height (Friston,
1994)
  • Choose a lower level, e.g. t3.11 (P0.001)
  • Find clusters i.e. connected components of
    excursion set
  • Measure cluster
  • extent by
  • Distribution
  • fit a quadratic
  • to the peak
  • Dbn. of maximum cluster extent
  • Bonferroni on N clusters E(EC).

Z
D1
extent
t
Peak height
s
Cao, Advances in Applied Probability (1999)
47
How do you choose the threshold t for defining
clusters?
  • If signal is focal i.e. FWHM of noise
  • Choose a high threshold
  • i.e. peak height is better
  • If signal is broad i.e. gtgtFWHM of noise
  • Choose a low threshold
  • i.e. cluster extent is better
  • Conclusion cluster extent is better for
    detecting broad signals
  • Alternative smooth data with filter that matches
    signal (Matched Filter Theorem) try range of
    filter widths scale space search correct
    using random field theory a lot of work
  • Cluster extent is easier!

48
Thresholding? Crosscorrelation random field
  • Correlation between 2 fields at 2 different
    locations, searched over all pairs of locations
  • one in R (D dimensions), one in S (E dimensions)
  • MS lesion data P0.05, c0.300, T6.46

Cao Worsley, Annals of Applied Probability
(1999)
49
Histogram
threshold
threshold
Conditional histogram scaled to same max at
each distance
threshold
threshold
50
The details
51
2
Tube(S,r)
r
S
52

3

53

B
A
54

55
6
? is big
Tube?(S,r)
S
r
? is small
56
2
?
U(s1)
s1
S
Tube
S
Tube
s2
s3
U(s3)
U(s2)
57
Z2
R
r
Tube(R,r)
Z1
N2(0,I)
58

Tube(R,r)
R
z
t-r
t
z1
Tube(R,r)
r
R
R
z2
z3
59

60

61
Summary
62
(No Transcript)
63
Comparison
  • Both depend on average correct bubbles, rest is
    constant

Proportion correct bubbles Average correct
bubbles / (average all bubbles 4)
  • Z(Average correct bubbles
  • average incorrect bubbles)
  • / pooled sd

64
Random field theory results
  • For searching in D (2) dimensions, P-value of
    max Z is
  • P(maxs Z(s) t) E( Euler characteristic of s
    Z(s) t)
  • ReselsD(S) ECD(t) (
    boundary terms)
  • ReselsD(S) Image area / (bubble FWHM)2
  • 146.2 (unitless)
  • ECD(t) (4 log(2))D/2 tD-1 exp(-t2/2) /
    (2p)(D1)/2
Write a Comment
User Comments (0)
About PowerShow.com