Application of Class Discovery and Class Prediction Methods to Microarray Data - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Application of Class Discovery and Class Prediction Methods to Microarray Data

Description:

For each sample, the weighted votes for each class are summed to get VALL and VAML. The sample is assigned to the class with the higher total, provided the ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 32
Provided by: kellie3
Category:

less

Transcript and Presenter's Notes

Title: Application of Class Discovery and Class Prediction Methods to Microarray Data


1
Application of Class Discovery and Class
Prediction Methods to Microarray Data
  • Kellie J. Archer, Ph.D.
  • Assistant Professor
  • Department of Biostatistics
  • kjarcher_at_vcu.edu

2
Basis of Cancer Diagnosis
  • Pathologist makes an interpretation based upon a
    compendium of knowledge which may include
  • Morphological appearance of the tumor
  • Histochemistry
  • Immunophenotyping
  • Cytogenetic analysis
  • etc.

3
Diffuse Large B-Cell Lymphoma
4
Clinically Distinct DLBCL Subgroups
5
Improved Cancer Diagnosis Identify sub-classes
  • Divide morphologically similar tumors into
    different groups based on response.
  • Application of microarrays Characterize
    molecular variations among tumors by monitoring
    gene expression
  • Goal microarrays will lead to more reliable
    tumor classification and sub-classification
    (therefore, more appropriate treatments will be
    administered resulting in improved outcomes)

6
Distinguishing two types of acute leukemia (AML
vs. ALL)
  • Golub, T.R. et al 1999. Molecular classification
    of cancer class discovery and class prediction
    by gene expression monitoring. Science 286
    531-537.
  • http//www-genome.wi.mit.edu/cgi-bin/cancer/datase
    ts.cgi (near bottom of page)

7
Distinguishing AML vs. ALL
  • 38 BM samples (27 childhood ALL, 11 adult AML)
    were hybridized to Affymetrix GeneChips
  • GeneChip included 6,817 human genes.
  • Affymetrix MAS 4.0 software was used to perform
    image analysis.
  • MAS 4.0 Average Difference expression summary
    method was applied to the probe level data to
    obtain probe set expression summaries.
  • Scaling factor was used to normalize the
    GeneChips.
  • Samples were required to meet quality control
    criteria.

8
Distinguishing AML vs. ALL
  • Class comparison
  • Neighborhood analysis
  • Class prediction
  • Weighted voting

9
Class Discovery Distinguishing AML vs. ALL
  • The mean of a random variable X is a measure of
    central location of the density of X.
  • The variance of a random variable is a measure of
    spread or dispersion of the density of X.
  • Var(X)E(X-?)2 ?(X - ?)2/(n-1)
  • Standard deviation ?(X)

10
Class Discovery Distinguishing AML vs. ALL
  • For each gene, compute the log of the expression
    values. For a given gene g,

For ALL
Let
represent the mean log expression value
represent the stdev log expression value.
Let
For AML
represent the mean log expression value
Let
represent the stdev log expression value.
Let
11
Class Discovery Distinguishing AML vs.
ALLIllustration usingALL AML example.xls
12
Class Discovery Distinguishing AML vs. ALL
  • For each gene, compute a relative class
    separation (quasi-correlation measure) as follows
  • Define neighborhoods of radius r about classes 1
    and 2 such that P(g,c) gt r or
  • P(g,c) lt -r. r was chosen to be 0.3

13
Aside
  • This differs from Pearsons correlation and is
    therefore not confined to -1,1 interval

14
Aside Illustration usingCorrelation.xls
15
Class Discovery Distinguishing AML vs. ALL
  • A permutation test was used to calculate whether
    the observed number of genes in a neighborhood
    was significantly higher than expected.

16
Permutation based methods
  • Permutation based adjusted p-values
  • Under the complete null, the joint distribution
    of the test statistics can be estimated by
    permuting the columns of the gene expression
    matrix
  • Permuting entire columns creates a situation in
    which membership to the Class 1 and Class 2
    groups is independent of gene expression but
    preserves the dependence structure between genes

17
Permutation based methods
18
Permutation based methods
  • Permutation algorithm for the bth permutation,
    b1,,B
  • 1) Permute the n labels of the data matrix X
  • 2) Compute relative class separation P(g1,c)b,,
    P(gp,c)b for each gene gi.
  • The permutation distribution of the relative
    class separation P(g,c) for gene gi, i1,,p is
    given by the empirical distribution of
    P(g,c)j,1,, P(g,c)j,B.

19
Distinguishing AML vs. ALL
  • Class comparisons using neighborhood analysis
    revealed approximately 1,100 genes were
    correlated with class (AML or ALL) than would be
    expected by chance.

20
Class Prediction Distinguishing AML vs. ALL
  • For set of informative genes, each expression
    value xi votes for either ALL or AML, depending
    on whether its expression value is closer to µALL
    or µAML
  • Let µALL represent the mean expression value for
    ALL
  • Let µAML represent the mean expression value for
    AML
  • Informative genes were the n/2 genes with the
    largest P(g,c) and the n/2 genes with the
    smallest P(g,c)
  • Golub et al choose n 50

21
Class Prediction Distinguishing AML vs. ALL
  • wi is a weighting factor that reflects how well
    the gene is correlated with class distinction
    wivi is the weighted vote
  • For each sample, the weighted votes for each
    class are summed to get VALL and VAML
  • The sample is assigned to the class with the
    higher total, provided the Prediction Strength
    (PS) gt 0.3 where
  • PS (Vwin Vlose)/ (Vwin Vlose)

22
Class Prediction Distinguishing AML vs. ALL
23
Class Prediction Distinguishing AML vs. ALL
  • Checking model adequacy
  • Cross-validation of training dataset
  • Applied model to an independent dataset of 34
    samples

24
Class Discovery
  • Determine whether the samples can be divided
    based only on gene expression without regard to
    the class labels
  • Self-organizing maps

25
Hypothesis Testing
  • The hypothesis that two means ?1 and ?2 are equal
    is called a null hypothesis, commonly abbreviated
    H0.
  • This is typically written as H0 ?1 ?2
  • Its antithesis is the alternative hypothesis, HA
    ?1 ? ?2

26
Hypothesis Testing
  • A statistical test of hypothesis is a procedure
    for assessing the compatibility of the data with
    the null hypothesis.
  • The data are considered compatible with H0 if any
    discrepancy from H0 could readily be due to
    chance (i.e., sampling error).
  • Data judged to be incompatible with H0 are taken
    as evidence in favor of HA.

27
Hypothesis Testing
  • If the sample means calculated are identical, we
    would suspect the null hypothesis is true.
  • Even if the null hypothesis is true, we do not
    really expect the sample means to be identically
    equal because of sampling variability.
  • We would feel comfortable concluding H0 is true
    if the chance difference in the sample means
    should not exceed a couple of standard errors.

28
T-test
  • In testing H0 ?1 ?2 against HA ?1 ? ?2 note
    that we could have restated the null hypothesis
    as
  • H0 ?1 - ?2 0 and HA ?1 - ?2 ? 0
  • To carry out the t-test, the first step is to
    compute the test statistic and then compare the
    result to a t-distribution with the appropriate
    degrees of freedom (df)

29
T-test
  • Data must be independent random samples from
    their respective populations
  • Sample size should either be large or, in the
    case of small sample sizes, the population
    distributions must be approximately normally
    distributed.
  • When assumptions are not met, non-parametric
    alternatives are available (Wilcoxon Rank
    Sum/Mann-Whitney Test)

30
T-test Probe set 208680_at
Sample number ALL AML
1 2013.7 1974.6
2 2141.9 2027.6
3 2040.2 1914.8
4 1973.3 1955.8
5 2162.2 1963.0
6 1994.8 2025.5
7 1913.3 1865.1
8 2068.7 1922.4
2038.5 1956.1
s2 7051.284 3062.991
n 8 8
31
T-test Probe set 208680_at
P0.039
Write a Comment
User Comments (0)
About PowerShow.com