Title: Circular analysis in systems neuroscience with particular attention to crosssubject correlation mapp
1Circular analysis in systems neuroscience with
particular attention to cross-subject correlation
mapping
- Nikolaus Kriegeskorte
- Laboratory of Brain and Cognition, National
Institute of Mental Health
2Collaborators
- Chris I Baker
- W Kyle Simmons
- Patrick SF Bellgowan
- Peter Bandettini
3Overview
- Part 1General introduction to circular analysis
in systems neuroscience(synopsis of Kriegeskorte
et al. 2009) - Part 2Specific issue selection bias
incross-subject correlation mapping(following
up on Vul et al. 2009)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8assumptions
9assumptions
data
results
10Circular inference
assumptions
11Circular inference
assumptions
12How do assumptions tinge results?
Through variants of selection!
13Elimination (binary selection)
assumptions selection criteria
14Example 1Pattern-information analysis
15Experimental design
TASK (property judgment)
Simmons et al. 2006
Animate?
Pleasant?
STIMULUS (object category)
16Pattern-information analysis
- define ROI by selecting ventral-temporal voxels
for which any pairwise condition contrast is
significant at plt.001 (uncorr.) - perform nearest-neighbor classificationbased on
activity-pattern correlation - use odd runs for trainingand even runs for
testing
17Results
stimulus (object category)
task (judged property)
decoding accuracy
chance level
18?
!
fMRI data
data from Gaussian random generator
using all data to select ROI voxels
...but we used cleanly independent training and
test data!
using only training data to select ROI voxels
19Conclusion for pattern-information analysis
- The test data must not be used in either...
- training a classifier or
- defining the ROI
continuous weighting
binary weighting
20Data selection is key to many conventional
analyses.
- Can it entail similar biases in other contexts?
21Example 2Regional activation analysis
22ROI definition is affected by noise
independent ROI
overfitted ROI
true region
overestimated effect
ROI-average activation
23Data sorting
assumptions sorting criteria
24Set-average tuning curves
...for data sorted by tuning
response
stimulus parameter (e.g. orientation)
noise data
25Set-average activation profiles
...for data sorted by activation
noise data
26To avoid selection bias, we can...
- ...perform a nonselective analysis
- OR
- ...make sure that selection and results
statistics are independent under the null
hypothesis, - because they are either
- inherently independent
- or computed on independent data
e.g. whole-brain mapping (no ROI analysis)
e.g. independent contrasts
27Does selection by an orthogonal contrast vector
ensure unbiased analysis?
- ROI-definition contrast AB
- ROI-average analysis contrast A-B
cselection1 1T
ctest1 -1T
orthogonal contrast vectors ?
28Does selection by an orthogonal contrast vector
ensure unbiased analysis?
contrast vector
No, there can still be bias.
The design and noise dependencies matter.
design
noise dependencies
29Circular analysis
- highly sensitive
- widely accepted (examples in all high-impact
journals) - doesn't require independent data sets
- grants scientists independencefrom the data
- allows smooth blending of blind faith and
empiricism
30Circular analysis
- highly sensitive
- widely accepted (examples in all high-impact
journals) - doesn't require independent data sets
- grants scientists independencefrom the data
- allows smooth blending of blind faith and
empiricism
31Circular analysis
Pros
- highly sensitive
- widely accepted (examples in all high-impact
journals) - doesn't require independent data sets
- grants scientists independencefrom the data
- allows smooth blending of blind faith and
empiricism
- the error that beautifies results
- confirms even incorrect hypotheses
- improves chances ofhigh-impact publication
- cant think of any right now
32Part 2Specific issue selection bias
incross-subject correlation mapping(following
up on Vul et al. 2009)
33Motivation
- Vul et al. (2009) posed a puzzle
- Why are the cross-subject correlations found in
brain mapping so high? - Selection bias is one piece of the puzzle.
- But there are more pieces and we have yet to put
them all together.
34Overview
- List and discuss six pieces of the puzzle.
- (They don't all point in the same direction!)
- Suggest some guidelines for good practice.
35Six pieces synopsis
- Cross-subject correlation estimates are very
noisy. - Bin or within-subject averaging legitimately
increases correlations. - Selecting among noisy estimates yields large
biases. - False-positive regions are highly likely for a
whole-brain mapping thresholded at plt.001,
uncorrected. - Reported correlations are high, but not highly
significant. - Studies have low power for finding realistic
correlations in the brain if multiple testing is
appropriately accounted for.
36Vul et al. 2009
,,
noise-free correlation
population
,,
The geometric mean of the reliability is an upper
bound on the population correlation.
The reliabilities provide no bound on the sample
correlation.
37Sample correlationsacross small numbers of
subjectsare very noisy estimatesof population
correlations.
Piece 1
380.65
39(No Transcript)
40Cross-subject correlation estimatesare very noisy
95-confidence interval
correlation
10 subjects
41Cross-subject correlation estimatesare very noisy
42The more we average(reducing noise but not
signal),the higher correlations become.
Piece 2
43Bin-averaging inflates correlations
44Bin-averaging inflates correlations
45(No Transcript)
46- Subjects are like bins...
- For each subject, all data is averaged to give
one number. - Take-home message
- Cross-subject correlation estimates are expected
to be... - high (averaging all data for each subject)
- noisy (low number of subjects)
So what's Ed fussing about?We don't need
selection bias to explain the high correlations,
right?
47Selecting the maximumamong noisy
estimatesyields large selection biases.
Piece 3
48Expected maximum correlationselected among null
regions
expected maximum correlation
bias
16 subjects
49False-positive regions are likely to be found in
whole-brain mappingusing plt.001, uncorrected.
Piece 4
50Mapping with plt.001, uncorrected
Global null hypothesis is true (population
correlation 0 in all brain locations)
51Reported correlations are high,but not highly
significant.
Piece 5
52Reported correlations are high,but not highly
significant
plt0.00001 plt0.001 plt0.01 plt0.05
53Reported correlations are high,but not highly
significant
plt0.00001 plt0.001 plt0.01 plt0.05
54Reported correlations are high,but not highly
significant
What correlations would we expect under the
global null hypothesis?
(assuming each study reports the maximum of
500 independent brain locations)
plt0.00001 plt0.001 plt0.01 plt0.05
55Reported correlations are high,but not highly
significant
What correlations would we expect under the
global null hypothesis?
plt0.00001 plt0.001 plt0.01 plt0.05
(assuming each study reports the max.of 500
independent brain locations)
56Most of the studies have low powerfor finding
realistic correlationswith whole-brain
mappingif multiple testing is appropriately
accounted for.
Piece 6
see also Yarkoni 2009
57Numbers of subjectsin studies reviewed by Vul et
al. (2009)
number of correlations estimates
4
8
16
36
60
100
number of subjects
58In order to find a single region with
across-subject correlation of 0.7 in the brain...
...we would need about 36 subjects
16 subjects
59In order to find a single region with
across-subject correlation of 0.7 in the brain...
...we would need about 36 subjects
16 subjects
60- Take-home message
- Whole-brain cross-subject correlation mapping
- with 16 subjects
does
not
work.
Need at least twice as many subjects.
61Conclusions
- Unless much larger numbers of subjects are used,
whole-brain cross-subject correlation mapping
suffers from either - very low power to detect true regions(if we
carefully to correct for multiple comparisons) - very high rates of false-positive
regions(otherwise) - If analysis is circular, selection bias is
expected to be high here (because selection
occurs among noisy estimates).
...in other words, it doesn't work.
62Suggestions
- Design study to have enough power to detect
realistic correlations. (Need either anatomical
restrictions or large numbers of subjects.) - Consider studying trial-to-trial rather than
subject-to-subject effects. - Correct for multiple testing to avoid false
positives. - Avoid circularity Use leave-one-subject out
procedure to estimate regional cross-subject
correlations. - Report correlation estimates with error bars.