fMRI Analysis with emphasis on the general linear model

About This Presentation

Title:

fMRI Analysis with emphasis on the general linear model

Description:

Jody Culham Department of Psychology University of Western Ontario http://www.fmri4newbies.com/ fMRI Analysis with emphasis on the general linear model – PowerPoint PPT presentation

Number of Views:248

Avg rating:3.0/5.0

Slides: 74

Provided by: jculham

Category:

more less

Transcript and Presenter's Notes

Title: fMRI Analysis with emphasis on the general linear model

1
fMRI Analysiswith emphasis on the general linear
model
Jody Culham Department of Psychology University
of Western Ontario
http//www.fmri4newbies.com/
Last Update November 29, 2008 fMRI Course,
Louvain, Belgium
2
What data do we start with

12 slices 64 voxels x 64 voxels 49,152 voxels
Each voxel has 136 time points
Therefore, for each run, we have 6.7 million data
points
We often have several runs for each experiment

3
Why do we need stats?

We could, in principle, analyze data by voxel
surfing move the cursor over different areas and
see if any of the time courses look interesting

4
Why do we need stats?

Clearly voxel surfing isnt a viable option.
Wed have to do it 49,152 times in this data set
and it would require a lot of subjective
decisions about whether activation was real
This is why we need statistics
Statistics
tell us where to look for activation that is
related to our paradigm
help us decide how likely it is that activation
is real

The lies and damned lies come in when you write
the manuscript
5
Predicted Responses

fMRI is based on the Blood Oxygenation Level
Dependent (BOLD) response
It takes about 5 sec for the blood to catch up
with the brain
We can model the predicted activation in one of
two ways
shift the boxcar by approximately 5 seconds (2
images x 2 seconds/image 4 sec, close enough)
convolve the boxcar with the hemodynamic response
to model the shape of the true function as well
as the delay

PREDICTED ACTIVATION IN OBJECT AREA
PREDICTED ACTIVATION IN VISUAL AREA
BOXCAR
6
Types of Errors
p value probability of a Type I error e.g., p
lt.05 There is less than a 5 probability that a
voxel our stats have declared as active is in
reality NOT active
Slide modified from Duke course
7
Statistical Approaches in a Nutshell

t-tests
compare activation levels between two conditions
use a time-shift to account for hemodynamic lag

correlations
model activation and see whether any areas show a
similar pattern

Fourier analysis
Do a Fourier analysis to see if there is energy
at your paradigm frequency

Fourier analysis images from Huettel, Song
McCarthy, 2004, Functional Magnetic Resonance
Imaging
8
Effect of Thresholds
r 0 0 of variance p lt 1
9
Complications

Not only is it hard to determine whats real, but
there are all sorts of statistical problems

Potential problems
data may be contaminated by artifacts (e.g., head
motion, breathing artifacts)
.05 49,152 2457 significant voxels by
chance alone
many assumptions of statistics (adjacent voxels
uncorrelated with each other adjacent time
points uncorrelated with one another) are false

Whats wrong with these data?
r .24 6 of variance p lt .05
10
The General Linear Model

T-tests, correlations and Fourier analysis work
for simple designs and were common in the early
days of imaging
The General Linear Model (GLM) is now available
in many software packages and tends to be the
analysis of choice
Why is the GLM so great?
the GLM is an overarching tool that can do
anything that the simpler tests do
you can examine any combination of contrasts
(e.g., intact - scrambled, scrambled - baseline)
with one GLM rather than multiple correlations
the GLM allows much greater flexibility for
combining data within subjects and between
subjects
it also makes it much easier to counterbalance
orders and discard bad sections of data
the GLM allows you to model things that may
account for variability in the data even though
they arent interesting in and of themselves
(e.g., head motion)
as we will see later in the course, the GLM also
allows you to use more complex designs (e.g.,
factorial designs)

11
A Simple Experiment

Lateral Occipital Complex
responds when subject views objects

Blank Screen
Intact Objects
Scrambled Objects
TIME
One volume (12 slices) every 2 seconds for 272
seconds (4 minutes, 32 seconds) Condition
changes every 16 seconds (8 volumes)
12
Whats real?
A.
C.
B.
D.
13
Whats real?

I created each of those time courses based by
taking the predictor function and adding a
variable amount of random noise

signal

noise
14
Whats real?
Which of the data sets below is more convincing?
15
Formal Statistics

Formal statistics are just doing what your
eyeball test of significance did
Estimate how likely it is that the signal is real
given how noisy the data is
confidence how likely is it that the results
could occur purely due to chance?
p value probability value
If p .03, that means there is a .03/1 or 3
chance that the results are bogus
By convention, if the probability that a result
could be due to chance is less than 5 (p lt .05),
we say that result is statistically significant
Significance depends on
signal (differences between conditions)
noise (other variability)
sample size (more time points are more
convincing)

16
Lets create a time course for one LO voxel
17
Well begin with activation
Response to Intact Objects is 4X greater than
Scrambled Objects
18
Then well assume that our modelled activation is
off because a transient component
19
Our modelled activation could be off for other
reasons

All of the following could lead to inaccurate
models
different shape of function
different width of function
different latency of function

20
Variability of HRF

Aguirre, Zarahn DEsposito, 1998
HRF shows considerable variability between
subjects

different subjects

Within subjects, responses are more consistent,
although there is still some variability between
sessions

same subject, same session
same subject, different session
21
Now lets add some variability due to head motion
22
though really motion is more complex

Head motion can be quantified with 6 parameters
given in any motion correction algorithm
x translation
y translation
z translation
xy rotation
xz rotation
yz rotation
For simplicity, Ive only included parameter one
in our model
Head motion can lead to other problems not
predictable by these parameters

23
Now lets throw in a pinch of linear drift

linear drift could arise from magnet noise (e.g.,
parts warm up) or physiological noise (e.g.,
subjects head sinks)

24
and then well add a dash of low frequency noise

low frequency noise can arise from magnet noise
or physiological noise (e.g., subjects cycles of
alertness/drowsiness)
low frequency noise would occur over a range of
frequencies but for simplicity, Ive only
included one frequency (1 cycle per run) here
Linear drift is really just very low frequency
noise

25
and our last ingredient some high frequency noise

high frequency noise can arise from magnet noise
or physiological noise (e.g., subjects breathing
rate and heartrate)

26
When we add these all together, we get a
realistic time course
27
Now lets be the experimenter

First, we take our time course and normalize it
using z scores
z (x - mean)/SD
normalization leads to data where
mean zero
SD 1

28
We create a GLM with 2 predictors
?1

?2

fMRI Signal
Residuals
Design Matrix

Betas
x
what we CAN explain
what we CANNOT explain
how much of it we CAN explain

x
our data
Statistical significance is basically a ratio of
explained to unexplained variance
29
Implementation of GLM in SPM
Many thanks to Øystein Bech Gadmar for creating
this figure in SPM
? Time

SPM represents time as going down
SPM represents predictors within the design
matrix as grayscale plots (where black low,
white high) over time
SPM includes a constant to take care of the
average activation level throughout each run

30
Effect of Beta Weights

Adjustments to the beta weights have the effect
of raising or lowering the height of the
predictor while keeping the shape constant

31
The beta weight is not a correlation

correlations measure goodness of fit regardless
of scale
beta weights are a measure of scale

32
We create a GLM with 2 predictors
when ?12

when ?20.5

Betas
x
Design Matrix
what we CAN explain
what we CANNOT explain
how much of it we CAN explain

x
our data
Statistical significance is basically a ratio of
explained to unexplained variance
33
Correlated Predictors

Where possible, avoid predictors that are highly
correlated with one another
This is why we NEVER include a baseline predictor
baseline predictor is almost completely
correlated with the sum of existing predictors

r -.53

r -.53
r -.95
Two stimulus predictors
Baseline predictor
34
Which model accounts for this data?
x ß 1
x ß 0

OR
x ß 1
x ß 0

x ß 0
x ß -1

Because the predictors are highly correlated, you
cant tell which model is best

35
Contrasts in the GLM

We can examine whether a single predictor is
significant (compared to the baseline)

36
A Real Voxel

Heres the time course from a voxel that was
significant in the Intact -Scrambled comparison

37
Maximizing Your Power
signal

noise

As we saw earlier, the GLM is basically comparing
the amount of signal to the amount of noise
How can we improve our stats?
increase signal
decrease noise
increase sample size (keep subject in longer)

38
How to Reduce Noise

If you cant get rid of an artifact, you can
include it as a predictor of no interest to
soak up variance

Example Some people include predictors from the
outcome of motion correction algorithms
Corollary Never leave out predictors for
conditions that will affect your data
39
Reducing Residuals
40
Whats this ing reviewer complaining about?!

Particularly if you do voxelwise stats, you have
to be careful to follow the accepted standards of
the field. In the past few years the following
approaches have been recommended by the stats
mavens
Correction for multiple comparisons
Random effects analyses
Correction for serial correlations

41
Correction for Multiple Comparisons
With conventional probability levels (e.g., p lt
.05) and a huge number of comparisons (e.g., 64 x
64 x 12 49,152), a lot of voxels will be
significant purely by chance e.g., .05 49,152
2458 voxels significant due to chance How can
we avoid this?

Bonferroni correction
divide desired p value by number of comparisons
Example
desired p value p lt .05
number of voxels 50,000
required p value p lt .05 / 50,000 ? p lt
.000001
quite conservative
can use less stringent values
e.g., Brain Voyager can use the number of voxels
in the cortical surface
small volume correction use more liberal
thresholds in areas of the brain which you
expected to be active

42
Correction for Multiple Comparisons

Gaussian field theory
Fundamental to SPM
If data are very smooth, then the chance of noise
points passing threshold is reduced
Can correct for the number of resolvable
elements (resels) rather than number of voxels

Slide modified from Duke course
43

3) Cluster correction
falsely activated voxels should be randomly
dispersed
set minimum cluster size to be large enough to
make it unlikely that a cluster of that size
would occur by chance
assumes that data from adjacent voxels are
uncorrelated (not true)

Test-retest reliability
Perform statistical tests on each half of the
data
The probability of a given voxel appearing in
both purely by chance is the square of the p
value used in each half
e.g., .001 x .001 .000001
Alternatively, use the first half to select an
ROI and evaluate your hypothesis in the second
half.

5) Poor mans Bonferroni
Jack up the threshold till you get rid of the
schmutz (especially in air, ventricles, white
matter)
If you have a comparison where one condition is
expected to produce much more activity than the
other, turn on both tails of the comparison
Jodys rule of thumb If ya cant trust the
negatives, can ya trust the positives?

Example MT localizer data Moving rings gt
stationary rings (orange) Stationary rings gt
moving rings (blue)
45

6) False discovery rate
Genovese et al, 2002, NeuroImage
also controls the proportion of rejected
hypotheses that are falsely rejected
standard p value (e.g., p lt .01) means that a
certain proportion of all voxels will be
significant by chance (1)
FDR uses q value (e.g., q lt .01), meaning that a
certain proportion of the activated (colored)
voxels will be significant by chance (1)
works in theory, though in practice, my lab
hasnt been that satisfied

Is the region truly active?
Yes
No
Type I Error
HIT
Yes
Does our stat test indicate that the region is
active?
Type II Error
Correct Rejection
No
46
Correction for Temporal Correlations
Statistical methods assume that each of our time
points is independent. In the case of fMRI, this
assumption is false. Even in a screen saver
scan, activation in a voxel at one time is
correlated with its activation within 6
sec This fact can artificially inflate your
statistical significance.
47
Collapsed Fixed Effects Models

assume that the experimental manipulation has
same effect in each subject
treats all data as one concatenated set with one
beta per predictor (collapsed across all
subjects)
e.g., Intact 2
Scrambled .5
strong effect in one subject can lead to
significance even when others show weak or no
effects
you can say that effect was significant in your
group of subjects but cannot generalize to other
subjects that you didnt test

48
Separate Subjects Models

one beta per predictor per subject
e.g., JC Intact 2.1
JC Scrambled 0.2
DQ Intact 1.5
DQ Scrambled 1.0
KV Intact 1.2
KV Scrambled 1.3
weights each subject equally
makes data less susceptible to effects of one
rogue subject

49
Random Effects Analysis

Typical fMRI stats test whether the differences
between conditions are significant in the sample
of subjects we have tested
Often, we want to be able to generalize to the
population as a whole including all potential
subjects, not just the ones we tested
Random effects analyses allow you to generalize
to the population you tested
Brain Voyager recommends you dont even toy with
random effects unless youve got 10 or more
subjects (and 50 is best)
Random effects analyses can really squash your
data, especially if you dont have many subjects.
Sometimes we refer to the random effects button
as the make my activation go away button.
Reviewers are now requesting random effects
analyses more frequently
You dont have to worry about it if youre using
the ROI approach because (1) presumably the ROI
has already been well-established across multiple
labs and (2) posthoc analyses of results in an
ROI approach allow you to generalize to the
population (assuming you include individual
variance)

underpaid graduate students in need of a few
bucks!
50
Fixed vs. Random Effects GLM
Sample Data 1
Sample Data 2
Subject Intact beta Scram beta Diff
1 4 3 1
2 2 3 -1
3 4 1 3
SUM 10 7 3
Subject Intact beta Scram beta Diff
1 4 3 1
2 2 1 1
3 4 3 1
SUM 10 7 3

Fixed Effects GLM cannot tell the difference
between these data sets because (Intact sum -
Scram sum) is the same in both cases
In Random Effects GLM, Data set 1 would be more
likely to be significant because all 3 subjects
show a trend in the same direction (intact gt
scrambled), whereas in data set 2, only 2 of 3
subjects show a difference in that direction

51
Autocorrelation function
To calculate the magnitude of the problem, we can
compute the autocorrelation function For a voxel
or ROI, correlate its time course with itself
shifted in time Plot these correlations by the
degree of shift
original
52
BV can correct for the autocorrelation to yield
revised (usually lower) p values
BEFORE
AFTER
53
BV Preprocessing Options
54
Temporal Smoothing of Data

We have the option in our software to temporally
smooth our data (i.e., remove high temporal
frequencies)
However, I recommended that you not use this
option
Now do you understand why?

55
Clarification

correction for temporal correlations is NOT
necessary with random effects analyses, only for
fixed effects and individual subjects analysis

WARNING UNDER CONSTRUCTION need to clarify
explanation
56
Approach 1 Voxelwise Statistics

You dont necessarily need a priori hypotheses
(though sometimes you can use less conservative
stats if you have them)
Average all of your data together in Talairach
space
Compare two (or more) conditions using precise
statistical procedures within every voxel of the
brain. Any area that passes a carefully
determined threshold is considered real.
Make a list of these areas and publish it.

This is the tricky part!
57
Voxelwise Approach Example

Malach et al., 1995, PNAS
Question Are there areas of the human brain that
are more responsive to objects than scrambled
objects
You will recognize this as what we now call an LO
localizer, but Malach was the first to identify LO

LO (red) responds more to objects, abstract
sculptures and faces than to textures, unlike
visual cortex (blue) which responds well to all
stimuli
LO activation is shown in red, behind MT
activation in green
58
The Danger of Voxelwise Approaches

This is one of two tables from a paper
Some papers publish tables of activation two
pages long
How can anyone make sense of so many areas?

Source Decety et al., 1994, Nature
59
To Localize or Not to Localize?
60
Methodological Fundamentalism
The latest review I received
61
Approach 2 Region of interest (ROI) analysis

If you are looking at a well-established area
(such as visual cortex, motor cortex, or the
lateral occipital complex), its fairly easy to
activate and identify the area
Do the stats and play with the threshold till you
get something believable in the right vicinity
based on anatomical location (e.g., sulcal
landmarks) or functional location (e.g.,
Talairach coordinates from prior studies)
Once you have found the ROI, do independent
experiments, extract the time course information
and determine whether activation differences
between conditions are significant
Because the runs that are used to generate the
area are independent from those used to test the
hypothesis, liberal statistics (p lt .05) can be
used

62
Example of ROI Approach
Culham et al., 2003, Experimental Brain
Research Does the Lateral Occipital Complex
compute object shape for grasping?
Step 1 Localize LOC
Intact Objects
Scrambled Objects
63
Example of ROI Approach
Culham et al., 2003, Experimental Brain
Research Does the Lateral Occipital Complex
compute object shape for grasping?
Step 2 Extract LOC data from experimental runs
Grasping
Reaching
NS p .35
NS p .31
64
Example of ROI Approach
Very Simple Stats
BOLD Signal Change Left Hem. LOC BOLD Signal Change Left Hem. LOC
Subject Grasping Reaching
1 0.02 0.03
2 0.19 0.08
3 0.04 0.01
4 0.10 0.32
5 1.01 -0.27
6 0.16 0.09
7 0.19 0.12
Then simply do a paired t-test to see whether the
peaks are significantly different between
conditions
Extract average peak from each subject for each
condition
NS p .35
NS p .31

Instead of using BOLD Signal Change, you can
use beta weights
You can also do a planned contrast in Brain
Voyager using a module called the ROI GLM

65
Utility of Doing Both Approaches

We also verified the result with a voxelwise
approach

Verification of no LOC activation for grasping gt
reaching even at moderate threshold (p lt .001,
uncorrected)
66
Example The Danger of ROI Approaches

Example 1 LOC may be a heterogeneous area with
subdivisions ROI analyses gloss over this
Example 2 Some experiments miss important areas
(e.g., Kanwisher et al., 1997 identified one
important face processing area -- the fusiform
face area, FFA -- but did not report a second
area that is a very important part of the face
processing network -- the occipital face area,
OFA -- because it was less robust and consistent
than the FFA.

67
Comparing the two approaches

Voxelwise Analyses
Require no prior hypotheses about areas involved
Include entire brain
Often neglect individual differences
Can lose spatial resolution with intersubject
averaging
Can produce meaningless laundry lists of areas
that are difficult to interpret
You have to be fairly stats-savvy and include all
the appropriate statistical corrections to be
certain your activation is really significant
Popular in Europe

68
Comparing the two approaches

Region of Interest (ROI) Analyses
Extraction of ROI data can be subjected to simple
stats (no need for multiple comparisons,
autocorrelation or random effects corrections)
Gives you more statistical power (e.g., p lt .05)
Hypothesis-driven
Useful when hypotheses are motivated by other
techniques (e.g., electrophysiology) in specific
brain regions
ROI is not smeared due to intersubject averaging
Important for discriminating abutting areas
(e.g., V1/V2)
Easy to analyze and interpret
Neglects other areas which may play a fundamental
role
If multiple ROIs need to be considered, you can
spend a lot of scan time collecting localizer
data (thus limiting the time available for
experimental runs)
Works best for reliable and robust areas with
unambiguous definitions
Popular in North America

69
A Proposed Resolution

There is no reason not to do BOTH ROI analyses
and voxelwise analyses
ROI analyses for well-defined key regions
Voxelwise analyses to see if other regions are
also involved
Ideally, the conclusions will not differ
If the conclusions do differ, there may be
sensible reasons
Effect in ROI but not voxelwise
perhaps region is highly variable in stereotaxic
location between subjects
perhaps voxelwise approach is not powerful enough
Effect in voxelwise but not ROI
perhaps ROI is not homogenous or is
context-specific

70
Finding the Obvious
UNDER CONSTRUCTION Need to create animation of
cards

Non-independence error
occurs when statistical tests performed are not
independent from the means used to select the
brain region

Arguments from Vul Kanwisher, book chapter in
press
71
Non-independence Error

Egregious example
Identify Area X with contrast of A gt B
Do post hoc stats showing that A is statistically
higher than B
Act surprised
More subtle example of selection bias
Identify Area X with contrast of A gt B
Do post hoc stats showing that A is statistically
higher than C and C is statistically greater than
B

UNDER CONSTRUCTION Improve examples and make
graphs to illustrate
Arguments from Vul Kanwisher, book chapter in
press
72
Hypothesis- vs. Data-Driven Approaches

Hypothesis-driven
Examples t-tests, correlations, general linear
model (GLM)
a priori model of activation is suggested
data is checked to see how closely it matches
components of the model
most commonly used approach
Data-driven
Example Independent Component Analysis (ICA)
no prior hypotheses are necessary
multivariate techniques determine the patterns
in the data that account for the most variance
across all voxels
can be used to validate a model (see if the math
comes up with the components you wouldve
predicted)
can be inspected to see if there are things
happening in your data that you didnt predict
can be used to identify confounds (e.g., head
motion)