Loading...

PPT – fMRI Analysis with emphasis on the general linear model PowerPoint presentation | free to view - id: 6b5212-ODM1O

The Adobe Flash plugin is needed to view this content

fMRI Analysiswith emphasis on the general linear

model

Jody Culham Department of Psychology University

of Western Ontario

http//www.fmri4newbies.com/

Last Update November 29, 2008 fMRI Course,

Louvain, Belgium

What data do we start with

- 12 slices 64 voxels x 64 voxels 49,152 voxels
- Each voxel has 136 time points
- Therefore, for each run, we have 6.7 million data

points - We often have several runs for each experiment

Why do we need stats?

- We could, in principle, analyze data by voxel

surfing move the cursor over different areas and

see if any of the time courses look interesting

Why do we need stats?

- Clearly voxel surfing isnt a viable option.

Wed have to do it 49,152 times in this data set

and it would require a lot of subjective

decisions about whether activation was real - This is why we need statistics
- Statistics
- tell us where to look for activation that is

related to our paradigm - help us decide how likely it is that activation

is real

The lies and damned lies come in when you write

the manuscript

Predicted Responses

- fMRI is based on the Blood Oxygenation Level

Dependent (BOLD) response - It takes about 5 sec for the blood to catch up

with the brain - We can model the predicted activation in one of

two ways - shift the boxcar by approximately 5 seconds (2

images x 2 seconds/image 4 sec, close enough) - convolve the boxcar with the hemodynamic response

to model the shape of the true function as well

as the delay

PREDICTED ACTIVATION IN OBJECT AREA

PREDICTED ACTIVATION IN VISUAL AREA

BOXCAR

Types of Errors

p value probability of a Type I error e.g., p

lt.05 There is less than a 5 probability that a

voxel our stats have declared as active is in

reality NOT active

Slide modified from Duke course

Statistical Approaches in a Nutshell

- t-tests
- compare activation levels between two conditions
- use a time-shift to account for hemodynamic lag

- correlations
- model activation and see whether any areas show a

similar pattern

- Fourier analysis
- Do a Fourier analysis to see if there is energy

at your paradigm frequency

Fourier analysis images from Huettel, Song

McCarthy, 2004, Functional Magnetic Resonance

Imaging

Effect of Thresholds

r 0 0 of variance p lt 1

Complications

- Not only is it hard to determine whats real, but

there are all sorts of statistical problems

- Potential problems
- data may be contaminated by artifacts (e.g., head

motion, breathing artifacts) - .05 49,152 2457 significant voxels by

chance alone - many assumptions of statistics (adjacent voxels

uncorrelated with each other adjacent time

points uncorrelated with one another) are false

Whats wrong with these data?

r .24 6 of variance p lt .05

The General Linear Model

- T-tests, correlations and Fourier analysis work

for simple designs and were common in the early

days of imaging - The General Linear Model (GLM) is now available

in many software packages and tends to be the

analysis of choice - Why is the GLM so great?
- the GLM is an overarching tool that can do

anything that the simpler tests do - you can examine any combination of contrasts

(e.g., intact - scrambled, scrambled - baseline)

with one GLM rather than multiple correlations - the GLM allows much greater flexibility for

combining data within subjects and between

subjects - it also makes it much easier to counterbalance

orders and discard bad sections of data - the GLM allows you to model things that may

account for variability in the data even though

they arent interesting in and of themselves

(e.g., head motion) - as we will see later in the course, the GLM also

allows you to use more complex designs (e.g.,

factorial designs)

A Simple Experiment

- Lateral Occipital Complex
- responds when subject views objects

Blank Screen

Intact Objects

Scrambled Objects

TIME

One volume (12 slices) every 2 seconds for 272

seconds (4 minutes, 32 seconds) Condition

changes every 16 seconds (8 volumes)

Whats real?

A.

C.

B.

D.

Whats real?

- I created each of those time courses based by

taking the predictor function and adding a

variable amount of random noise

signal

noise

Whats real?

Which of the data sets below is more convincing?

Formal Statistics

- Formal statistics are just doing what your

eyeball test of significance did - Estimate how likely it is that the signal is real

given how noisy the data is - confidence how likely is it that the results

could occur purely due to chance? - p value probability value
- If p .03, that means there is a .03/1 or 3

chance that the results are bogus - By convention, if the probability that a result

could be due to chance is less than 5 (p lt .05),

we say that result is statistically significant - Significance depends on
- signal (differences between conditions)
- noise (other variability)
- sample size (more time points are more

convincing)

Lets create a time course for one LO voxel

Well begin with activation

Response to Intact Objects is 4X greater than

Scrambled Objects

Then well assume that our modelled activation is

off because a transient component

Our modelled activation could be off for other

reasons

- All of the following could lead to inaccurate

models - different shape of function
- different width of function
- different latency of function

Variability of HRF

- Aguirre, Zarahn DEsposito, 1998
- HRF shows considerable variability between

subjects

different subjects

- Within subjects, responses are more consistent,

although there is still some variability between

sessions

same subject, same session

same subject, different session

Now lets add some variability due to head motion

though really motion is more complex

- Head motion can be quantified with 6 parameters

given in any motion correction algorithm - x translation
- y translation
- z translation
- xy rotation
- xz rotation
- yz rotation
- For simplicity, Ive only included parameter one

in our model - Head motion can lead to other problems not

predictable by these parameters

Now lets throw in a pinch of linear drift

- linear drift could arise from magnet noise (e.g.,

parts warm up) or physiological noise (e.g.,

subjects head sinks)

and then well add a dash of low frequency noise

- low frequency noise can arise from magnet noise

or physiological noise (e.g., subjects cycles of

alertness/drowsiness) - low frequency noise would occur over a range of

frequencies but for simplicity, Ive only

included one frequency (1 cycle per run) here - Linear drift is really just very low frequency

noise

and our last ingredient some high frequency noise

- high frequency noise can arise from magnet noise

or physiological noise (e.g., subjects breathing

rate and heartrate)

When we add these all together, we get a

realistic time course

Now lets be the experimenter

- First, we take our time course and normalize it

using z scores - z (x - mean)/SD
- normalization leads to data where
- mean zero
- SD 1

We create a GLM with 2 predictors

?1

?2

fMRI Signal

Residuals

Design Matrix

Betas

x

what we CAN explain

what we CANNOT explain

how much of it we CAN explain

x

our data

Statistical significance is basically a ratio of

explained to unexplained variance

Implementation of GLM in SPM

Many thanks to Øystein Bech Gadmar for creating

this figure in SPM

? Time

- SPM represents time as going down
- SPM represents predictors within the design

matrix as grayscale plots (where black low,

white high) over time - SPM includes a constant to take care of the

average activation level throughout each run

Effect of Beta Weights

- Adjustments to the beta weights have the effect

of raising or lowering the height of the

predictor while keeping the shape constant

The beta weight is not a correlation

- correlations measure goodness of fit regardless

of scale - beta weights are a measure of scale

We create a GLM with 2 predictors

when ?12

when ?20.5

Betas

x

Design Matrix

what we CAN explain

what we CANNOT explain

how much of it we CAN explain

x

our data

Statistical significance is basically a ratio of

explained to unexplained variance

Correlated Predictors

- Where possible, avoid predictors that are highly

correlated with one another - This is why we NEVER include a baseline predictor
- baseline predictor is almost completely

correlated with the sum of existing predictors

r -.53

r -.53

r -.95

Two stimulus predictors

Baseline predictor

Which model accounts for this data?

x ß 1

x ß 0

OR

x ß 1

x ß 0

x ß 0

x ß -1

- Because the predictors are highly correlated, you

cant tell which model is best

Contrasts in the GLM

- We can examine whether a single predictor is

significant (compared to the baseline)

A Real Voxel

- Heres the time course from a voxel that was

significant in the Intact -Scrambled comparison

Maximizing Your Power

signal

noise

- As we saw earlier, the GLM is basically comparing

the amount of signal to the amount of noise - How can we improve our stats?
- increase signal
- decrease noise
- increase sample size (keep subject in longer)

How to Reduce Noise

- If you cant get rid of an artifact, you can

include it as a predictor of no interest to

soak up variance

Example Some people include predictors from the

outcome of motion correction algorithms

Corollary Never leave out predictors for

conditions that will affect your data

Reducing Residuals

Whats this ing reviewer complaining about?!

- Particularly if you do voxelwise stats, you have

to be careful to follow the accepted standards of

the field. In the past few years the following

approaches have been recommended by the stats

mavens - Correction for multiple comparisons
- Random effects analyses
- Correction for serial correlations

Correction for Multiple Comparisons

With conventional probability levels (e.g., p lt

.05) and a huge number of comparisons (e.g., 64 x

64 x 12 49,152), a lot of voxels will be

significant purely by chance e.g., .05 49,152

2458 voxels significant due to chance How can

we avoid this?

- Bonferroni correction
- divide desired p value by number of comparisons
- Example
- desired p value p lt .05
- number of voxels 50,000
- required p value p lt .05 / 50,000 ? p lt

.000001 - quite conservative
- can use less stringent values
- e.g., Brain Voyager can use the number of voxels

in the cortical surface - small volume correction use more liberal

thresholds in areas of the brain which you

expected to be active

Correction for Multiple Comparisons

- Gaussian field theory
- Fundamental to SPM
- If data are very smooth, then the chance of noise

points passing threshold is reduced - Can correct for the number of resolvable

elements (resels) rather than number of voxels

Slide modified from Duke course

- 3) Cluster correction
- falsely activated voxels should be randomly

dispersed - set minimum cluster size to be large enough to

make it unlikely that a cluster of that size

would occur by chance - assumes that data from adjacent voxels are

uncorrelated (not true)

- Test-retest reliability
- Perform statistical tests on each half of the

data - The probability of a given voxel appearing in

both purely by chance is the square of the p

value used in each half - e.g., .001 x .001 .000001
- Alternatively, use the first half to select an

ROI and evaluate your hypothesis in the second

half.

- 5) Poor mans Bonferroni
- Jack up the threshold till you get rid of the

schmutz (especially in air, ventricles, white

matter) - If you have a comparison where one condition is

expected to produce much more activity than the

other, turn on both tails of the comparison - Jodys rule of thumb If ya cant trust the

negatives, can ya trust the positives?

Example MT localizer data Moving rings gt

stationary rings (orange) Stationary rings gt

moving rings (blue)

- 6) False discovery rate
- Genovese et al, 2002, NeuroImage
- also controls the proportion of rejected

hypotheses that are falsely rejected - standard p value (e.g., p lt .01) means that a

certain proportion of all voxels will be

significant by chance (1) - FDR uses q value (e.g., q lt .01), meaning that a

certain proportion of the activated (colored)

voxels will be significant by chance (1) - works in theory, though in practice, my lab

hasnt been that satisfied

Is the region truly active?

Yes

No

Type I Error

HIT

Yes

Does our stat test indicate that the region is

active?

Type II Error

Correct Rejection

No

Correction for Temporal Correlations

Statistical methods assume that each of our time

points is independent. In the case of fMRI, this

assumption is false. Even in a screen saver

scan, activation in a voxel at one time is

correlated with its activation within 6

sec This fact can artificially inflate your

statistical significance.

Collapsed Fixed Effects Models

- assume that the experimental manipulation has

same effect in each subject - treats all data as one concatenated set with one

beta per predictor (collapsed across all

subjects) - e.g., Intact 2
- Scrambled .5
- strong effect in one subject can lead to

significance even when others show weak or no

effects - you can say that effect was significant in your

group of subjects but cannot generalize to other

subjects that you didnt test

Separate Subjects Models

- one beta per predictor per subject
- e.g., JC Intact 2.1
- JC Scrambled 0.2
- DQ Intact 1.5
- DQ Scrambled 1.0
- KV Intact 1.2
- KV Scrambled 1.3
- weights each subject equally
- makes data less susceptible to effects of one

rogue subject

Random Effects Analysis

- Typical fMRI stats test whether the differences

between conditions are significant in the sample

of subjects we have tested - Often, we want to be able to generalize to the

population as a whole including all potential

subjects, not just the ones we tested - Random effects analyses allow you to generalize

to the population you tested - Brain Voyager recommends you dont even toy with

random effects unless youve got 10 or more

subjects (and 50 is best) - Random effects analyses can really squash your

data, especially if you dont have many subjects.

Sometimes we refer to the random effects button

as the make my activation go away button. - Reviewers are now requesting random effects

analyses more frequently - You dont have to worry about it if youre using

the ROI approach because (1) presumably the ROI

has already been well-established across multiple

labs and (2) posthoc analyses of results in an

ROI approach allow you to generalize to the

population (assuming you include individual

variance)

underpaid graduate students in need of a few

bucks!

Fixed vs. Random Effects GLM

Sample Data 1

Sample Data 2

Subject Intact beta Scram beta Diff

1 4 3 1

2 2 3 -1

3 4 1 3

SUM 10 7 3

Subject Intact beta Scram beta Diff

1 4 3 1

2 2 1 1

3 4 3 1

SUM 10 7 3

- Fixed Effects GLM cannot tell the difference

between these data sets because (Intact sum -

Scram sum) is the same in both cases - In Random Effects GLM, Data set 1 would be more

likely to be significant because all 3 subjects

show a trend in the same direction (intact gt

scrambled), whereas in data set 2, only 2 of 3

subjects show a difference in that direction

Autocorrelation function

To calculate the magnitude of the problem, we can

compute the autocorrelation function For a voxel

or ROI, correlate its time course with itself

shifted in time Plot these correlations by the

degree of shift

original

BV can correct for the autocorrelation to yield

revised (usually lower) p values

BEFORE

AFTER

BV Preprocessing Options

Temporal Smoothing of Data

- We have the option in our software to temporally

smooth our data (i.e., remove high temporal

frequencies) - However, I recommended that you not use this

option - Now do you understand why?

Clarification

- correction for temporal correlations is NOT

necessary with random effects analyses, only for

fixed effects and individual subjects analysis

WARNING UNDER CONSTRUCTION need to clarify

explanation

Approach 1 Voxelwise Statistics

- You dont necessarily need a priori hypotheses

(though sometimes you can use less conservative

stats if you have them) - Average all of your data together in Talairach

space - Compare two (or more) conditions using precise

statistical procedures within every voxel of the

brain. Any area that passes a carefully

determined threshold is considered real. - Make a list of these areas and publish it.

This is the tricky part!

Voxelwise Approach Example

- Malach et al., 1995, PNAS
- Question Are there areas of the human brain that

are more responsive to objects than scrambled

objects - You will recognize this as what we now call an LO

localizer, but Malach was the first to identify LO

LO (red) responds more to objects, abstract

sculptures and faces than to textures, unlike

visual cortex (blue) which responds well to all

stimuli

LO activation is shown in red, behind MT

activation in green

The Danger of Voxelwise Approaches

- This is one of two tables from a paper
- Some papers publish tables of activation two

pages long - How can anyone make sense of so many areas?

Source Decety et al., 1994, Nature

To Localize or Not to Localize?

Methodological Fundamentalism

The latest review I received

Approach 2 Region of interest (ROI) analysis

- If you are looking at a well-established area

(such as visual cortex, motor cortex, or the

lateral occipital complex), its fairly easy to

activate and identify the area - Do the stats and play with the threshold till you

get something believable in the right vicinity

based on anatomical location (e.g., sulcal

landmarks) or functional location (e.g.,

Talairach coordinates from prior studies) - Once you have found the ROI, do independent

experiments, extract the time course information

and determine whether activation differences

between conditions are significant - Because the runs that are used to generate the

area are independent from those used to test the

hypothesis, liberal statistics (p lt .05) can be

used

Example of ROI Approach

Culham et al., 2003, Experimental Brain

Research Does the Lateral Occipital Complex

compute object shape for grasping?

Step 1 Localize LOC

Intact Objects

Scrambled Objects

Example of ROI Approach

Culham et al., 2003, Experimental Brain

Research Does the Lateral Occipital Complex

compute object shape for grasping?

Step 2 Extract LOC data from experimental runs

Grasping

Reaching

NS p .35

NS p .31

Example of ROI Approach

Very Simple Stats

BOLD Signal Change Left Hem. LOC BOLD Signal Change Left Hem. LOC

Subject Grasping Reaching

1 0.02 0.03

2 0.19 0.08

3 0.04 0.01

4 0.10 0.32

5 1.01 -0.27

6 0.16 0.09

7 0.19 0.12

Then simply do a paired t-test to see whether the

peaks are significantly different between

conditions

Extract average peak from each subject for each

condition

NS p .35

NS p .31

- Instead of using BOLD Signal Change, you can

use beta weights - You can also do a planned contrast in Brain

Voyager using a module called the ROI GLM

Utility of Doing Both Approaches

- We also verified the result with a voxelwise

approach

Verification of no LOC activation for grasping gt

reaching even at moderate threshold (p lt .001,

uncorrected)

Example The Danger of ROI Approaches

- Example 1 LOC may be a heterogeneous area with

subdivisions ROI analyses gloss over this - Example 2 Some experiments miss important areas

(e.g., Kanwisher et al., 1997 identified one

important face processing area -- the fusiform

face area, FFA -- but did not report a second

area that is a very important part of the face

processing network -- the occipital face area,

OFA -- because it was less robust and consistent

than the FFA.

Comparing the two approaches

- Voxelwise Analyses
- Require no prior hypotheses about areas involved
- Include entire brain
- Often neglect individual differences
- Can lose spatial resolution with intersubject

averaging - Can produce meaningless laundry lists of areas

that are difficult to interpret - You have to be fairly stats-savvy and include all

the appropriate statistical corrections to be

certain your activation is really significant - Popular in Europe

Comparing the two approaches

- Region of Interest (ROI) Analyses
- Extraction of ROI data can be subjected to simple

stats (no need for multiple comparisons,

autocorrelation or random effects corrections) - Gives you more statistical power (e.g., p lt .05)
- Hypothesis-driven
- Useful when hypotheses are motivated by other

techniques (e.g., electrophysiology) in specific

brain regions - ROI is not smeared due to intersubject averaging
- Important for discriminating abutting areas

(e.g., V1/V2) - Easy to analyze and interpret
- Neglects other areas which may play a fundamental

role - If multiple ROIs need to be considered, you can

spend a lot of scan time collecting localizer

data (thus limiting the time available for

experimental runs) - Works best for reliable and robust areas with

unambiguous definitions - Popular in North America

A Proposed Resolution

- There is no reason not to do BOTH ROI analyses

and voxelwise analyses - ROI analyses for well-defined key regions
- Voxelwise analyses to see if other regions are

also involved - Ideally, the conclusions will not differ
- If the conclusions do differ, there may be

sensible reasons - Effect in ROI but not voxelwise
- perhaps region is highly variable in stereotaxic

location between subjects - perhaps voxelwise approach is not powerful enough
- Effect in voxelwise but not ROI
- perhaps ROI is not homogenous or is

context-specific

Finding the Obvious

UNDER CONSTRUCTION Need to create animation of

cards

- Non-independence error
- occurs when statistical tests performed are not

independent from the means used to select the

brain region

Arguments from Vul Kanwisher, book chapter in

press

Non-independence Error

- Egregious example
- Identify Area X with contrast of A gt B
- Do post hoc stats showing that A is statistically

higher than B - Act surprised
- More subtle example of selection bias
- Identify Area X with contrast of A gt B
- Do post hoc stats showing that A is statistically

higher than C and C is statistically greater than

B

UNDER CONSTRUCTION Improve examples and make

graphs to illustrate

Arguments from Vul Kanwisher, book chapter in

press

Hypothesis- vs. Data-Driven Approaches

- Hypothesis-driven
- Examples t-tests, correlations, general linear

model (GLM) - a priori model of activation is suggested
- data is checked to see how closely it matches

components of the model - most commonly used approach
- Data-driven
- Example Independent Component Analysis (ICA)
- no prior hypotheses are necessary
- multivariate techniques determine the patterns

in the data that account for the most variance

across all voxels - can be used to validate a model (see if the math

comes up with the components you wouldve

predicted) - can be inspected to see if there are things

happening in your data that you didnt predict - can be used to identify confounds (e.g., head

motion)

Example of ICA

- hardest part is sorting the many components into

meaningful ones vs. artifacts - fingerprints may help