Searching for structure in random field data - PowerPoint PPT Presentation

About This Presentation
Title:

Searching for structure in random field data

Description:

Searching for structure in random field data – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 56
Provided by: Kei7118
Category:

less

Transcript and Presenter's Notes

Title: Searching for structure in random field data


1
Searching for structure in random field data
  • Keith J. Worsley12,
  • Thomas W. Yee3, Russell B. Millar3
  • 1Department of Mathematics and Statistics, McGill
    University, 2McConnell Brain Imaging Centre,
    Montreal Neurological Institute, Montreal,
    Canada, and
  • 3Department of Statistics, University of
    Auckland, New Zealand
  • www.math.mcgill.ca/keith

2
What is Data Mining?
  • The June 26, 2000, issue of TIME predicted that
    one of the 10 hottest jobs of the 21st century
    will be Data Mining
  • research gurus will be on hand to extract
    useful tidbits from mountains of data,
    pinpointing behaviour patterns for marketers and
    epidemiologists alike.

3
Some definitions
  • Data mining is the process of selecting,
    exploring, and modeling large amounts of data to
    uncover previously unknown patterns for business
    advantage (SAS 1998 Annual Report, p51)
  • Data mining is the nontrivial process of
    identifying valid, novel, potentially useful, and
    ultimately understandable patterns in data
    (Fayyad)
  • Data mining is the process of discovering
    advantageous patterns in data (John)
  • Data mining is the computer automated exploratory
    data analysis of (usually) large complex data
    sets (Freidman, 1998)
  • Data mining is the search for valuable
    information in large volumes of data (Weiss and
    Indurkhya, 1998)
  • In contrast, Statistics is the science of
    collecting, organizing and presenting data.

4
Why is it called Data Mining?
  • Plentiful data can be mined for nuggets of gold
    (i.e. truth /insight/knowledge) by sifting
    through vast amounts of raw data.
  • Some statisticians have criticized it as data
    dredging or a fishing expedition in the search
    of publishable P-values, or torturing the data
    until it confesses.
  • Many DM methods are heuristic, complex, computer
    intensive, so their statistical properties are
    usually not tractable.
  • The focus of DM is often prediction and not
    statistical inference.
  • I understand mining to be a very carefully
    planned search for valuables hidden out of sight,
    not a haphazard ramble. Mining is thus rewarding,
    but, of course, a dangerous activity. (D.R. Cox,
    in the discussion of Chatfield, 1995).

5
Striking fools gold
  • The Bible Code, a best-selling book by Michael
    Drosnin, claims to find hidden messages in the
    Bible about dinosaurs, Bill Clinton, the Rabin
    assassination etc. from searches of arrays of
    letters
  • In 1992, ProCyte Corp. was dismayed when a newly
    developed drug, lamin, failed to promote general
    healing of diabetic ulcer wounds. So the company
    searched through subsets of data and found that
    lamin appeared to work on certain foot wounds.
    But that was a statistical fluke, as it turned
    out after an expensive clinical trial. Not
    allowed drug status, lamin is now sold as a wound
    dressing

6
Confirming vs. Discovering
  • There are two types of DM
  • Hypothesis testing (aka top-down approach)
  • Knowledge Discovery in Databases (KDD)
  • (aka bottom-up approach)
  • Directed KDD want to explain the value of some
    particular variable in terms of other variables
  • Undirected KDD identifies patterns in the data.
  • Undirected KDD recognizes relationships in data
  • Directed KDD explains those relationships once
    they have been found.

7
Mining the miners
  • DM so far has been largely a commercial
    enterprise. As in most gold rushes of the past,
    the goal is to mine the miners. The largest
    profits are made by selling the tools to the
    miners, rather than in doing the actual mining
  • Hardware manufacturers emphasize high
    computational requirements of DM.
  • Software developers emphasize competitive edge
    Your competitor is doing it, so you had better
    keep up.

8
Some commercial software
  • SAS Enterprise Miner
  • SPSS Clementine, Neural Connection and
    AnswerTree
  • IBM Intelligent Miner
  • SGI MineSet
  • NeoVista Software ASIC
  • Mathsoft S-PLUS (for small data sets)

9
Some methods
  • Hypothesis testing Regression, analysis of
    variance, time series analysis.
  • Directed KDD Classification, discrimination,
    structural equation modeling, supervised neural
    networks.
  • Undirected KDD Cluster analysis, tree methods
    (AID, CHAID, CART), principal components analysis
    (PCA), independent components analysis (ICA),
    unsupervised neural networks.

10
Allied fields
  • Exploratory Data Analysis (EDA) Tukey defined
    statistics in terms of problems rather than
    tools.
  • Informatics is research on, development of, and
    use of technological, sociological, and
    organizational tools and applications for the
    dynamic acquisition, indexing, dissemination,
    storage, querying, retrieval, visualization,
    integration, analysis, synthesis, sharing (which
    includes electronic means of collaboration), and
    publication of data such that economic and other
    benefits may be derived from the information by
    users of all sections of society.
  • Pattern recognition given some examples of
    complex signals and the correct decisions for
    them, make decisions automatically for a stream
    of future examples, e.g. identify plants, tumors,
    decide to buy or sell stocks.
  • Machine learning is the study of computer
    algorithms that improve automatically through
    experience. Applications range from data mining
    programs that discover rules in large data sets,
    to information filtering systems that
    automatically learn users interests. (Mitchell,
    1997).
  • Meta-Analysis is the statistical analysis of a
    large collection of analysis results from
    individual studies for the purpose of integrating
    the findings.

11
Brain mapping data
  • We have huge data bases of brain images (MRI,
    fMRI, PET, EEG, MEG ) together with patient
    information (age, sex, psychological tests,
    disease, genotype )
  • The novelty is that the image variables are 3D
    images rather than single numbers (such as blood
    pressure, cholesterol level )
  • These images can themselves be mined for
    interesting information, e.g. peaks or clusters
    of activated regions

12
Some data mining tools already used in brain
mapping
  • Regression, analysis of variance, time series
  • Cluster analysis (e.g. clustering of fMRI time
    courses)
  • PCA and ICA of voxels scans matrix
  • Structural equation modeling to analyze
    connectivity
  • Pattern recognition to segment gray/white/CSF
  • Meta-analysis to combine locations of activation
    from different studies

13
Tree methods Automatic Interaction Detection
(AID)
  • Morgan, J.N. and Sonquist, J.A. (1963). Problems
    in the analysis of survey data, and a proposal.
    Journal of the American Statistical Association,
    58, 415-434.
  • Kass, G.V. (1980). An exploratory technique for
    investigating large quantities of categorical
    data. Applied Statistics, 29, 119-127.
  • Worsley, K.J. (1978). Significance testing in
    Automatic Interaction Detection (AID). PhD
    Thesis, University of Auckland.

14
How AID works
  • Split observations into two groups according to
    the values of a predictor
  • Two types of predictors
  • Monotonic split by thresholding
  • predictor x predictor gt x
  • Free split into any two subsets, e.g. if
    predictor takes values x1, , x7
  • x1, x5, x6 x2, x3, x4, x7
  • Choose the split that maximizes a test statistic
    for the difference in dependent or target
    variable
  • Repeat on two subgroups until some stopping
    criterion is reached (split is not significant
    or subgroup size is too small)

15
SPSS example credit risk data
Dependant or target
Predictors M M F F M
M F F M M
M monotonic (split by thresholding), F free
(split into any two subsets)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Brain mapping example cortical thickness
Dependant or target
Predictor M M M M
M
Subject Node1 Node2 Node3 Node4 Node40962 Sex
1 3.73 3.05 3.93 2.30 1.59 m
2 2.95 1.17 3.33 2.75 1.03 f 3
2.30 1.23 2.56 1.20 1.46 f 4
2.64 2.19 2.57 2.25 1.29 m 5
2.39 2.76 2.51 2.82 1.02 f 6
3.26 1.85 3.31 1.70 1.65 f 7
2.68 2.52 3.23 2.30 1.47 m 8
3.60 3.66 2.90 2.25 1.79 m 9
3.27 1.43 2.88 1.81 2.14 f
321 4.10
2.67 2.83 1.78 1.70 f
20
Misclassification matrixcortical thickness
  • Actual category
  • Male Female
  • Predicted Male 145 18
  • category Female 18 140

21
(No Transcript)
22
fMRI data 120 scans, 3 scans each of hot, rest,
warm, rest, hot, rest,
T (hot warm effect) / S.d. t110 if no
effect
23
Brain mapping example fMRI
Dependant or target
Predictor M M M M
M
Frame Voxel1 Voxel2 Voxel3 Voxel4 Voxel30786
Stimulus 1 1.1 1.66 1.53 0.77 ...
-0.12 hot 2 -0.59 0.23 0.38 -0.43
... -1.73 hot 3 1.06 1.57 1.56
1.14 ... 0.64 hot 4 1.63 1.79 0.88
-0.22 ... -0.07 hot 5 2.3 1.96
1.41 1.33 ... 1.76 hot 6 1.27 1.36
0.73 0.24 ... 1.22 warm 7 1.18
1.33 1.35 1.3 ... 0.88 warm 8 0.98
0.9 0.47 0.18 ... 0.6 warm 9
1.46 1.25 0.77 0.73 ... 1.3 warm
10 0.07 0.7 1.29 1.96 ... 2.04
warm 11 0.39 0.68 1.13 1.81 ... 1.8
warm 12 0.04 -0.04 -0.18 0.37 ... 1.63
hot 13 -0.06 0.2 0.29 0.49 ...
0.7 hot 14 -0.48 -0.26 -0.19 -0.16 ...
-0.42 hot 15 -0.09 -0.39 -0.84 -0.94
... -0.68 hot 16 -0.24 0.02 0.51
1.2 ... 1.38 hot 17 -1.52 -1.11 -1.44
-1.88 ... -1.11 hot 18 -0.07 0.1
-0.07 -0.24 ... 0.17 warm 19 -1.4
-0.57 0.01 0.3 ... 0.41 warm
117
-0.01 0.5 0.74 0.83 ... 0.99 warm
24
Misclassification matrixfMRI
  • Actual category
  • Hot Warm
  • Predicted Hot 51 1
  • category Warm 7 58

25
Splitting the SPM itself
Dependant or target
Predictor ? ? ?
  • Voxel x y z T statistic
  • 1 1.1719 -10.5469 7.2921 5.4852
  • 2 3.5156 -10.5469 7.2921 5.9170
  • 3 5.8594 -10.5469 7.2921 5.0115
  • 4 1.1719 -8.2031 7.2921 6.1082
  • 5 3.5156 -8.2031 7.2921 6.4825
  • 6 5.8594 -8.2031 7.2921 5.7299
  • 7 1.1719 -5.8594 7.2921 6.7113
  • 8 3.5156 -5.8594 7.2921 7.3540
  • 9 5.8594 -5.8594 7.2921 6.5934
  • 10 1.1719 -10.5469 14.2921 5.4519
  • 11 3.5156 -10.5469 14.2921 6.3674
  • 12 5.8594 -10.5469 14.2921 6.3184
  • 13 1.1719 -8.2031 14.2921 6.2774
  • 14 3.5156 -8.2031 14.2921 6.5888
  • 15 5.8594 -8.2031 14.2921 6.2456
  • 16 1.1719 -5.8594 14.2921 6.3583
  • 17 3.5156 -5.8594 14.2921 6.4093
  • 18 5.8594 -5.8594 14.2921 5.8665

26
How do we split on a spatial predictor?Splits
can be regarded as models with different means
for the two groups
SPM model
SPM model
Monotonic predictor
Free predictor
Smoothed SPM model
Unsmoothed SPM model
Smooth SPM with a filter that matches the model
Free predictor
Spatial predictor
27
So
  • Treating spatial location as a free predictor
    (for the smoothed SPM) is equivalent to simply
    thresholding the smoothed SPM
  • We can choose the threshold to control the false
    splitting rate to P lt 0.05 using Bonferroni
    corrections or random field theory
  • If model width is unknown, we can make filter
    width another parameter of the model, which leads
    to scale space

28
(No Transcript)
29
Scale space smooth X(t) with a range of filter
widths, s continuous wavelet transform adds an
extra dimension to the random field X(t, s)
Scale space, no signal
34
8
22.7
6
4
15.2
2
10.2
0
-2
6.8
-60
-40
-20
0
20
40
60
S FWHM (mm, on log scale)
One 15mm signal
34
8
22.7
6
4
15.2
2
10.2
0
-2
6.8
-60
-40
-20
0
20
40
60
t (mm)
15mm signal best detected with a 15mm smoothing
filter
30
Matched Filter Theorem ( Gauss-Markov Theorem)
to best detect a signal white noise, filter
should match signal
10mm and 23mm signals
34
8
22.7
6
4
15.2
2
10.2
0
-2
6.8
-60
-40
-20
0
20
40
60
S FWHM (mm, on log scale)
Two 10mm signals 20mm apart
34
8
22.7
6
4
15.2
2
10.2
0
-2
6.8
-60
-40
-20
0
20
40
60
t (mm)
But if the signals are too close together they
are detected as a single signal half way between
them
31
Scale space can even separate two signals at the
same location!
8mm and 150mm signals at the same location
10
5
0
-60
-40
-20
0
20
40
60
170
113.7
20
76
50.8
15
S FWHM (mm, on log scale)
34
10
22.7
15.2
5
10.2
6.8
-60
-40
-20
0
20
40
60
t (mm)
32
FWHM 6.8mm
33
FWHM 9mm
34
FWHM 11mm
35
FWHM 15mm
36
FWHM 20mm
37
FWHM 26mm
38
FWHM 34mm
39
FWHM
40
FWHM
41
FWHM
42
FWHM
43
FWHM
44
FWHM
45
(No Transcript)
46
FWHM
47
FWHM
48
Functional connectivity
  • Measured by the correlation between residuals at
    every pair of voxels (6D data!)
  • Local maxima are larger than all 12 neighbours
  • P-value can be calculated using random field
    theory
  • Good at detecting focal connectivity, but
  • PCA of residuals x voxels is better at detecting
    large regions of co-correlated voxels

Activation only
Correlation only
Voxel 2


Voxel 2







Voxel 1
Voxel 1



49
Correlations gt 0.7, Plt10-10 (corrected)
First Principal Component gt threshold
50
False Discovery Rate (FDR)
  • Benjamini and Hochberg (1995), Journal of the
    Royal Statistical Society
  • Benjamini and Yekutieli (2001), Annals of
    Statistics
  • Genovese et al. (2001), NeuroImage
  • FDR controls the expected proportion of false
    positives amongst the discoveries, whereas
  • Bonferroni / random field theory controls the
    probability of any false positives
  • No correction controls the proportion of false
    positives in the volume

51
P lt 0.05 (uncorrected), T gt 1.64 5 of volume is
false
Signal Gaussian white noise
Signal
True
Noise
False
FDR lt 0.05, T gt 2.82 5 of discoveries is false
P lt 0.05 (corrected), T gt 4.22 5 probability of
any false
52
Comparison of thresholds
  • FDR depends on the ordered P-values
  • P1 lt P2 lt lt Pn. To control the FDR at a
    0.05, find
  • K max i Pi lt (i/n) a, threshold the
    P-values at PK
  • Proportion of true 1 0.1 0.01
    0.001 0.0001
  • Threshold T 1.64 2.56 3.28
    3.88 4.41
  • Bonferroni thresholds the P-values at a/n
  • Number of voxels 1 10 100 1000
    10000
  • Threshold T 1.64 2.58 3.29
    3.89 4.42
  • Random field theory resels volume / FHHM3
  • Number of resels 0 1 10
    100 1000
  • Threshold T 1.64 2.82 3.46
    4.09 4.65

53
P lt 0.05 (uncorrected), T gt 1.64 5 of volume is
false
54
FDR lt 0.05, T gt 2.66 5 of discoveries is false
55
P lt 0.05 (corrected), T gt 4.90 5 probability of
any false
Write a Comment
User Comments (0)
About PowerShow.com