Some Statistical Issues in Microarray Data Analysis - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Some Statistical Issues in Microarray Data Analysis

Description:

Some Statistical Issues in Microarray Data Analysis Alex S nchez Estad stica i Bioinform tica Departament d Estad stica Universitat de Barcelona – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 89
Provided by: AlexSa1
Learn more at: https://www.ub.edu
Category:

less

Transcript and Presenter's Notes

Title: Some Statistical Issues in Microarray Data Analysis


1
Some Statistical Issues in Microarray Data
Analysis
  • Alex Sánchez
  • Estadística i Bioinformàtica
  • Departament dEstadística Universitat de
    Barcelona
  • Unitat dEstadística i BioinformàticaIR-HUVH

2
Outline
  • Introduction
  • Experimental design
  • Selecting differentially expressed genes
  • Statistical tests
  • Significance testing
  • Linear models and Analysis of the variance
  • Multiple testing
  • Software for microarray data analysis

3
Introduction
4
Microarray experiments Overview
5
Why are we talking of statistics?
  • A microarray experiment is, as called, an
    experiment, that is
  • It has been performed to determine if some
    previous hypothesis are true or false (although
    it can also lead to new hypotheses)
  • It is subject to errors which may arise from many
    sources

6
Sources of variability
  • Biological Heterogeneity in Population
  • Specimen Collection/ Handling Effects
  • Tumor surgical bx, FNA
  • Cell Line culture condition, confluence level
  • Biological Heterogeneity in Specimen
  • RNA extraction
  • RNA amplification
  • Fluor labeling
  • Hybridization
  • Scanning
  • PMT voltage
  • laser power

(Geschwind, Nature Reviews Neuroscience, 2001)
7
Categories of variability
  • Systematic variability
  • Amount of RNA in the biopsy
  • Efficiencies of lab procedures such as
  • RNA extraction,
  • reverse transcription,
  • Labeling or
  • photodetection
  • Random variation
  • PCR yield
  • DNA quality
  • spotting efficiency,
  • spot size
  • cross-/unspecific hybridization
  • stray signal

8
Dealing with systematic variability
  • Systematic variability has similar effects on
    many measurements
  • Corrections can be estimated from data
  • CALIBRATION or NORMALIZATION is the general name
    for processes that correct for systematic
    variability

9
Dealing with random variation
  • Random variation cannot be explicitly accounted
    for
  • Usual way to deal with it is to assume some ERROR
    MODELS (e.g. eiN(0, s2))
  • Assuming these error models are true
  • EXPERIMENTAL DESIGN is (must be) used to control
    the action of random variation
  • STATISTICAL INFERENCE is (must be) used to
    extract conclusions in the presence of random
    variation

10
Biological question
Experimental design
Failed
Microarray experiment
Quality Measurement
Image analysis
Today
Normalization
Pass
Analysis
Discrimination
Clustering
Testing
Estimation
Biological verification and interpretation
11
Experimental design
12
Why experimental design?
  • The objective of experimental design is to make
    the analysis of the data and the interpretation
    of the results
  • As simple and as powerful as possible
  • Given the purpose of the experiment
  • And the constraints of the experimental material

13
Scientific aims and design choice
  • The primary focus of the experiments needs to be
    clearly stated, whether it is
  • to identify differentially expressed genes
  • to search for specific gene-expression patterns
  • to identify phenotypic subclasses
  • Aim of the experiment guides design choice
  • Sometimes only one choice is reasonable
  • Sometimes different options available

14
Designing microarray experiments
  • The appropriate design of a microarray experiment
    must consider
  • Design of the array
  • Allocation of mRNA samples to the slides

15
I Layout of the array
  • Which sequences to use
  • cDNAs? Selection of cDNA from library
  • Riken, NIA, etc
  • Affymetrix? PMs and MMs
  • Oligo probes selection (from Operon, Agilent,
    etc)
  • Control probes
  • What ?. Where should controls be put
  • How many sequences to use
  • Should there be replicate spots within a slide?

16
II Allocating samples in slides
  • Types of Samples
  • Replication technical vs biological
  • Pooled vs individual samples
  • Different design layout / data analysis
  • Scientific aim of the experiment
  • Efficiency, Robustness, Extensibility
  • Physical limitations (cost)
  • Number of slides
  • Amount of material

17
Basic principles of experimental design
  • Apply the following principles to best attain the
    objectives of experimental design
  • Replication
  • Local control or Blocking
  • Randomization

18
1. Replication
  • Its important
  • To reduce uncertainty (increase precision)
  • To obtain sufficient power for the tests
  • As a formal basis for inferential procedures
  • Consider different types of replicates
  • Technical
  • Duplicate spots
  • Multiple hybridizations from the same sample
  • Biological
  • Repeat most what is expected to vary most!

19
Biological vs Technical Replicates
_at_ Nature reviews G. Churchill (2002)
20
Replication vs Pooling
  • mRNA from different samples are often combined to
    form a pooled-sample or pool. Why?
  • If each sample doesnt yield enough mRNA
  • To compensate an excess of variability ? ?
  • Statisticians tend not to like it but pooling may
    be OK if properly done
  • Combine several samples in each pool
  • Use several pools from different samples
  • Do not use pools when individual information is
    important (e.g.paired designs)

21
2. Blocking
  • Assume we wish to perform an experiment to
    compare two treatments.
  • The samples or their processing may not be
    homogeneous There are blocks
  • Subjects Male/Female
  • Arrays produced in two lots (February, March)
  • If there are systematic differences between
    blocks the effects of interest (e.g. tretament)
    may be confounded
  • Observed differences are attributable to
    treatment effect or to confounding factors?

22
Confounding block with treatment effects
  • Two alternative designs to investigate treatment
    effects
  • Left Treatment effects confounded with Sex and
    Batch effect
  • Right Treatments are balanced between blocks
  • Influence of blocks is automatically compensated
  • Statistical analysis may separate block from
    treatment efefect

23
3. Randomisation
  • Randomly assigning samples to groups to eliminate
    unspecific disturbances
  • Randomly assign individuals to treatments.
  • Randomise order in which experiments are
    performed.
  • Randomisation required to ensure validity of
    statistical procedures.
  • Block what you can and randomize what you cannot

24
Experimental layout
  • How are mRNA samples assigned to arrays
  • The experimental layout has to be chosen so that
    the resulting analysis can be done as efficient
    and robust as possible
  • Sometimes there is only one reasonable choice
  • Sometimes several choices are available

25
Example I Only one design choice
  • Case 1 Meaningful biological control (C)
  • Samples Liver tissue from 4 mice treated by
    cholesterol modifying drugs.
  • Question 1 Genes that respond differently
    between the T and the C.
  • Question 2 Genes that responded similarly
    across two or more treatments relative to
    control.
  • Case 2 Use of universal reference.
  • Samples Different tumor samples.
  • Question To discover tumor subtypes.

26
Example 2 a number of different designs are
suitable for use (1)
  • Time course experiments
  • Design choice depends on the comparisons of
    interest

27
How can we decide?
  • A-optimality choosee design which minimizes
    variance of estimates of effects of interest
  • A simple example Direct vs indirect estimates

28
Summary
  • Selection of mRNA samples is important
  • Most important biological replicates
  • Technical replicates also useful, but different
  • If needed and possible use pooling wisely
  • Choice of experimental layout guided by
  • The scientific question
  • Experimental design principles
  • Efficiency and robustness considerations
  • Correspondence between experimental
    Designs-Linear Models-ANOVA can be exploited to
    select model and analyze data

29
Experimental design, Linear Models and Analysis
of the Variance
  • In experimental design the different sources of
    variability influencing the observed response may
    be identified.
  • These sources can be related with the response
    using a linear model
  • Analysis of the variance can be used to
    separately estimate and test the relative
    importance of each source of variability.

30
Statistical methods to detect differentially
expressed genes
31
Class comparison Identifying differentially
expressed genes
  • Identify genes differentially expressed between
    different conditions such as
  • Treatment, cell type,... (qualitative covariates)
  • Dose, time, ... (quantitative covariate)
  • Survival, infection time,... !
  • Estimate effects/differences between groups
    probably using log-ratios, i.e. the difference on
    log scale log(X)-log(Y) log(X/Y)

32
What is a significant change?
  • Depends on the variability within groups, which
    may be different from gene to gene.
  • To assess the statistical significance of
    differences, conduct a statistical test for each
    gene.

33
Different settings for statistical tests
  • Indirect comparisons 2 groups, 2 samples,
    unpaired
  • E.g. 10 individuals 5 suffer diabetes, 5 healthy
  • One sample fro each individual
  • Typically Two sample t-test or similar
  • Direct comparisons Two groups, two samples,
    paired
  • E.g. 6 individuals with brain stroke.
  • Two samples from each one from healthy (region
    1) and one from affected (region 2).
  • Typically One sample t-test (also called paired
    t-test) or similar based on the individual
    differences between conditions.

34
Different ways to do the experiment
  • An experiment use cDNA arrays (two-colour) or
    affy (one-colour).
  • Depending on the technology used allocation of
    conditions to slides changes.

Type of chip Experiment cDNA(2-col) Affy (1-col)
10 indiv. Diab (5) Heal (5) Reference design. (5) Diab/Ref (5) Heal/Ref Comparison design. (5) Diab vs (5) Heal
6 indiv. Region 1 Region 2 6 slides 1 individual per slide (6) reg1/reg2 12 slides (6) Paired differences
35
Natural measures of discrepancy
For Direct comparisons in two colour or
paired-one colour.
For Indirect comparisons in two colour or
Direct comparisons in one colour.
36
Some Issues
  • Can we trust average effect sizes (average
    difference of means) alone?
  • Can we trust the t statistic alone?
  • Here is evidence that the answer is no.

Gene M1 M2 M3 M4 M5 M6 Mean SD t
A 2.5 2.7 2.5 2.8 3.2 2 2.61 0.40 16.10
B 0.01 0.05 -0.05 0.01 0 0 0.003 0.03 0.25
C 2.5 2.7 2.5 1.8 20 1 5.08 7.34 1.69
D 0.5 0 0.2 0.1 -0.3 0.3 0.13 0.27 1.19
E 0.1 0.11 0.1 0.1 0.11 0.09 0.10 0.01 33.09
Courtesy of Y.H. Yang
37
Some Issues
  • Can we trust average effect sizes (average
    difference of means) alone?
  • Can we trust the t statistic alone?
  • Here is evidence that the answer is no.

Gene M1 M2 M3 M4 M5 M6 Mean SD t
A 2.5 2.7 2.5 2.8 3.2 2 2.61 0.40 16.10
B 0.01 0.05 -0.05 0.01 0 0 0.003 0.03 0.25
C 2.5 2.7 2.5 1.8 20 1 5.08 7.34 1.69
D 0.5 0 0.2 0.1 -0.3 0.3 0.13 0.27 1.19
E 0.1 0.11 0.1 0.1 0.11 0.09 0.10 0.01 33.09
Courtesy of Y.H. Yang
  • Averages can be driven by outliers.

38
Some Issues
  • Can we trust average effect sizes (average
    difference of means) alone?
  • Can we trust the t statistic alone?
  • Here is evidence that the answer is no.

Gene M1 M2 M3 M4 M5 M6 Mean SD t
A 2.5 2.7 2.5 2.8 3.2 2 2.61 0.40 16.10
B 0.01 0.05 -0.05 0.01 0 0 0.003 0.03 0.25
C 2.5 2.7 2.5 1.8 20 1 5.08 7.34 1.69
D 0.5 0 0.2 0.1 -0.3 0.3 0.13 0.27 1.19
E 0.1 0.11 0.1 0.1 0.11 0.09 0.10 0.01 33.09
  • ts can be driven by tiny variances.

Courtesy of Y.H. Yang
39
Variations in t-tests (1)
  • Let
  • Rg mean observed log ratio
  • SEg standard error of Rg estimated from data on
    gene g.
  • SE standard error of Rg estimated from data
    across all genes.
  • Global t-test tRg/SE
  • Gene-specific t-test tRg/SEg

40
Some pros and cons of t-test
Test Pros Cons
Global t-test tRg/SE Yields stable variance estimate Assumes variance homogeneity ? biased if false
Gene-specific tRg/SEg Robust to variance heterogeneity Low power Yields unstable variance estimates (due to few data)
41
T-tests extensions
SAM (Tibshirani, 2001)
Regularized-t (Baldi, 2001)
EB-moderated t (Smyth, 2003)
42
Up to here Can we generate a list of candidate
genes?
With the tools we have, the reasonable steps to
generate a list of candidate genes may be
?
A list of candidateDE genes
We need an idea of how significant are these
values ?Wed like to assign them p-values
43
Significance testing
44
Nominal p-values
  • After a test statistic is computed, it is
    convenient to convert it to a p-valueThe
    probability that a test statistic, say S(X),
    takes values equal or greater than that taken on
    the observed sample, say S(X0), under the
    assumption that the null hypothesis is
    true pPS(X)gtS(X0)H0 true

45
Significance testing
  • Test of significance at the a level
  • Reject the null hypothesis if your p-value is
    smaller than the significance level
  • It has advantages but not free from criticisms
  • Genes with p-values falling below a prescribed
    level may be regarded as significant

46
Hypothesis testing overview for a single gene
Reported decision Reported decision
H0 is Rejected (gene is Selected) H0 is Accepted (gene not Selected)
State of the nature ("Truth") H0 is false (Affected) TP, prob 1-a FN, prob 1-b Type II error Sensitiviy TP/TPFN
State of the nature ("Truth") H0 is true (Not Affected) FP, PRej H0H0lt a Type I error TN , prob b Specificity TN/TNFP
Positive predictive value TP/TPFP Negative predictive value TN/TNFN
47
Calculation of p-values
  • Standard methods for calculating p-values
  • (i) Refer to a statistical distribution table
    (Normal, t, F, ) or
  • (ii) Perform a permutation analysis

48
(i) Tabulated p-values
  • Tabulated p-values can be obtained for standard
    test statistics (e.g.the t-test)
  • They often rely on the assumption of normally
    distributed errors in the data
  • This assumption can be checked (approximately)
    using a
  • Histogram
  • Q-Q plot

49
Example
  • Golub data, 27 ALL vs 11 AML samples, 3051 genes
  • A t-test yields 1045 genes with plt 0.05

50
(ii) Permutations tests
  • Based on data shuffling. No assumptions
  • Random interchange of labels between samples
  • Estimate p-values for each comparison (gene) by
    using the permutation distribution of the
    t-statistics
  • Repeat for every possible permutation, b1B
  • Permute the n data points for the gene (x). The
    first n1 are referred to as treatments, the
    second n2 as controls
  • For each gene, calculate the corresponding two
    samplet-statistic, tb
  • After all the B permutations are done put p
    b tb tobserved/B

51
Permutation tests (2)
52
Volcano plot fold change vs log(odds)1
Significant change detected
No change detected
1 log(odds) is proportional to -log (p-value)
53
Linear models and Analysis of the Variance to
analyze designed experiments
54
From experimental design to linear models
  • Some weaknesses of statistical framework
  • What to do if treatment has more than 2 levels?
  • How to deal with more than one treatment or
    experimental condition?
  • How to deal with nuisance factors such as batch
    effects, covariates, etc?
  • Most of this can be solved with an alternative
    approach Analysis of the Variance

55
Multiple testing
56
How far can we trust the decision?
  • The test "Reject H0 if p-val a"
  • is said to control the type I error because,
    under a certain set of assumptions,the
    probability of falsely rejecting H0 is less than
    a fixed small threshold
  • Nothing is warranted about PFN?
  • Optimal tests are built trying to minimize this
    probability
  • In practical situations it is often high

57
What if we wish to test more than one gene at
once? (1)
  • Consider more than one test at once
  • Two tests each at 5 level. Now probability of
    getting a false positive is 1 0.950.95
    0.0975
  • Three tests ? 1 0.953 0.1426
  • n tests ? 1 0.95n
  • Converge towards 1 as n increases
  • Small p-values dont necessarily imply
    significance!!! ? We are not controlling the
    probability of type I error anymore

58
What if we wish to test more than one gene at
once? (2) a simulation
  • Simulation of this process for 6,000 genes with 8
    treatments and 8 controls
  • All the gene expression values were simulated
    i.i.d from a N (0,1) distribution, i.e. NOTHING
    is differentially expressed in our simulation
  • The number of genes falsely rejected will be on
    the average of (6000 a), i.e. if we wanted to
    reject all genes with a p-value of less than 1
    we would falsely reject around 60 genes
  • See example

59
Multiple testing Counting errors
Decision reported Decision reported Decision reported Decision reported
H0 is Rejected (Genes Selected) H0 is Rejected (Genes Selected) H0 is accepted (Genes not Selected) H0 is accepted (Genes not Selected) Total
State of the nature ("Truth") H0 is false (Affected) ma -am0 (S) (m-mo)-(ma -am0) (T) m-mo
State of the nature ("Truth") H0 is true (Not Affected) am0 (V) mo-am0 (U) mo
Total Total Ma (R) m-ma (m-R) m
V Type I errors false positives T
Type II errors false negatives All these
quantities could be known if m0 was known
60
How does type I error control extend to multiple
testing situations?
  • Selecting genes with a p-value less than a
    doesnt control for PFP anymore
  • What can be done?
  • Extend the idea of type I error
  • FWER and FDR are two such extensions
  • Look for procedures that control the probability
    for these extended error types
  • Mainly adjust raw p-values

61
Two main error rate extensions
  • Family Wise Error Rate (FWER)
  • FWER is probability of at least one false
    positive
  • FWER Pr( of false discoveries gt0) Pr(Vgt0)
  • False Discovery Rate (FDR)
  • FDR is expected value of proportion of false
    positives among rejected null hypotheses
  • FDR EV/R Rgt0 EV/R Rgt0PRgt0

62
FDR and FWER controlling procedures
  • FWER
  • Bonferroni (adj Pvalue minnPvalue,1)
  • Holm (1979)
  • Hochberg (1986)
  • Westfall Young (1993) maxT and minP
  • FDR
  • Benjamini Hochberg (1995)
  • Benjamini Yekutieli (2001)

63
Difference between controlling FWER or FDR
  • FWER? Controls for no (0) false positives
  • gives many fewer genes (false positives),
  • but you are likely to miss many
  • adequate if goal is to identify few genes that
    differ between two groups
  • FDR? Controls the proportion of false positives
  • if you can tolerate more false positives
  • you will get many fewer false negatives
  • adequate if goal is to pursue the study e.g. to
    determine functional relationships among genes

64
Steps to generate a list of candidate genes
revisited (2)
Nominal p-valuesP1, P2, , PG
A list of candidateDE genes
Select genes with adjusted P-valuessmaller than
a
Adjusted p-valuesaP1, aP2, , aPG
65
Example
  • Golub data, 27 ALL vs 11 AML samples, 3051 genes
  • Bonferroni adjustment 98 genes with padjlt 0.05
    (praw lt 0.000016)

66
Extensions
  • Some issues we have not dealt with
  • Replicates within and between slides
  • Several effects use a linear model
  • ANOVA are the effects equal?
  • Time series selecting genes for trends
  • Different solutions have been suggested for each
    problem
  • Still many open questions

67
Examples
68
Ex. 1- Swirl zebrafish experiment
  • Swirl is a point mutation causing defects in the
    organization of the developing embryo along its
    ventral-dorsal axis
  • As a result some cell types are reduced and
    others are expanded
  • A goal of this experiment was to identify genes
    with altered expression in the swirl mutant
    compared to the wild zebrafish

69
Example 1 Experimental design
  • Each microarray contained 8848 cDNA probes
    (either genes or EST sequences)
  • 4 replicate slides 2 sets of dye-swap pairs
  • For each pair, target cDNA of the swirl mutant
    was labeled using one of Cy5 or Cy3 and the
    target cDNA of the wild type mutant was labeled
    using the other dye

2
Wild type
Swirl
2
70
Example 1. Data analysis
  • Gene expression data on 8848 genes for 4 samples
    (slides) Each hybridixed with Mutant and Wild
    type
  • On a gene-per-gene basis this is a one-sample
    problem
  • Hypothesis to be tested for each gene
  • H0 log2(R/G)0
  • The decision will be based on average log-ratios

71
Example 2 . Scanvenger receptor BI (SR-BI)
experiment
  • Callow et al. (2000). A study of lipid metabolism
    and atherosclerosis susceptibility in mice.
  • Transgenic mice with SR-BI gene overexpressed
    have low HDL cholesterol levels.
  • Goal To identify genes with altered expression
    in the livers of transgenic mice with SR-BI gene
    overexpressed mice (T) compared to normal FVB
    control mice (C).

72
Example 2. Experimental design
  • 8 treatment mice (Ti) and 8 control ones (Ci).
  • 16 hybridizations liver mRNA from each of the 16
    mice (Ti , Ci ) is labelled with Cy5, while
    pooled liver mRNA from the control mice (C) is
    labelled with Cy3.
  • Probes 6,000 cDNAs (genes), including 200
    related to pathogenicity.

T
8
C
8
C
73
Example 2. Data analysis
  • Gene expression data on 6348 genes for 16
    samples 8 for treatment (log T/C) and 8 for
    control (log (C/C))
  • On a gene-per-gene basis this is a 2 sample
    problem
  • Hypothesis to be tested for each gene
  • H0 log (R1/G)-log (R2/G)0
  • Decision will be based on average difference of
    log ratios

74
Software for microarray data analysis
75
Introduction
  • Microarray experiments generate huge quantities
    of data which have to be
  • Stored, managed, visualized, processed
  • Many options available. However
  • No tool satisfies all users needs
  • Trade-off. A tool must be
  • Powerful but user friendly
  • Complete but without too many options,
  • Flexible but easy to start with and go further
  • Available, to date, well documented but affordable

76
So, what you need is R?
  • R is an open-source system for statistical
    computation and graphics. It consists of
  • A language
  • A run-time environment with
  • Graphics, a debugger, and
  • Access to certain system functions,
  • It can be used
  • Interactively, through a command language
  • Or running programs stored in script files

77
http//www.r-project.org/
78
Some pros cons
  • Powerful,
  • Used by statisticians
  • Easy to extend
  • Creating add-on packages
  • Many already available
  • Freely available
  • Unix, windows Mac
  • Lot of documentation
  • Not very easy to learn
  • Command-based
  • Documentation sometimes cryptic
  • Memory intensive
  • Worst in windows
  • Slow at times
  • We believe the effort is worth the pity!!!
  • If you just want to do statistical analysis ?
    Easy to find alternatives
  • If you intend to do microarray data analysis?
    Probably one of best options

79
R and Microarrays
  • R is a popular tool between statisticians
  • Once they started to work with microarrays they
    continued using it
  • To perform the analysis
  • To implement new tools
  • This gave rise very fast to lots of free R-based
    software to analyze microarrays
  • The Bioconductor project groups many of these
    (but not all) developments

80
The Bioconductor project
  • Open source and open development software project
    for the analysis and comprehension of genomic
    data.
  • Most early developments as R packages.
  • Extensive documentation and training material
    from short courseshttp//www.bioconductor.org/wor
    kshop.html.
  • Has reached some stability but still evolving
    !!!? what is now a standard may not be so in a
    future.

81
There's much more than R!
  • Give a look at
  • "My microarray software comparison"
  • http//ihome.cuhk.edu.hk/b400559/arraysoft.html

82
Examples
83
Ex. 1- Swirl zebrafish experiment
  • Swirl is a point mutation causing defects in the
    organization of the developing embryo along its
    ventral-dorsal axis
  • As a result some cell types are reduced and
    others are expanded
  • A goal of this experiment was to identify genes
    with altered expression in the swirl mutant
    compared to the wild zebrafish

84
Example 1 Experimental design
  • Each microarray contained 8848 cDNA probes
    (either genes or EST sequences)
  • 4 replicate slides 2 sets of dye-swap pairs
  • For each pair, target cDNA of the swirl mutant
    was labeled using one of Cy5 or Cy3 and the
    target cDNA of the wild type mutant was labeled
    using the other dye

2
Wild type
Swirl
2
85
Example 1. Data analysis
  • Gene expression data on 8848 genes for 4 samples
    (slides) Each hybridixed with Mutant and Wild
    type
  • On a gene-per-gene basis this is a one-sample
    problem
  • Hypothesis to be tested for each gene
  • H0 log2(R/G)0
  • The decision will be based on average log-ratios

86
Example 2 . Scanvenger receptor BI (SR-BI)
experiment
  • Callow et al. (2000). A study of lipid metabolism
    and atherosclerosis susceptibility in mice.
  • Transgenic mice with SR-BI gene overexpressed
    have low HDL cholesterol levels.
  • Goal To identify genes with altered expression
    in the livers of transgenic mice with SR-BI gene
    overexpressed mice (T) compared to normal FVB
    control mice (C).

87
Example 2. Experimental design
  • 8 treatment mice (Ti) and 8 control ones (Ci).
  • 16 hybridizations liver mRNA from each of the 16
    mice (Ti , Ci ) is labelled with Cy5, while
    pooled liver mRNA from the control mice (C) is
    labelled with Cy3.
  • Probes 6,000 cDNAs (genes), including 200
    related to pathogenicity.

T
8
C
8
C
88
Example 2. Data analysis
  • Gene expression data on 6348 genes for 16
    samples 8 for treatment (log T/C) and 8 for
    control (log (C/C))
  • On a gene-per-gene basis this is a 2 sample
    problem
  • Hypothesis to be tested for each gene
  • H0 log (R1/G)-log (R2/G)0
  • Decision will be based on average difference of
    log ratios
Write a Comment
User Comments (0)
About PowerShow.com