Gene expression - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Gene expression

Description:

Title: Part 1 Microarray Timeseries Analysis with replicates OSM and EGF treatments over time Author: Computer Centre Last modified by: WEHI ITS Created Date – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 66
Provided by: Compute83
Category:

less

Transcript and Presenter's Notes

Title: Gene expression


1
Gene expression
  • Terry Speed
  • Lecture 4, December 18, 2001

2
Thesis the analysis of gene expression data is
going to be big in 21st century statistics
  • Many different technologies, including
  • High-density nylon membrane arrays
  • Serial analysis of gene expression (SAGE)
  • Short oligonucleotide arrays (Affymetrix)
  • Long oligo arrays (Agilent)
  • Fibre optic arrays (Illumina)
  • cDNA arrays (Brown/Botstein)

3
Total microarray articles indexed in Medline
4
Common themes
  • Parallel approach to collection of very large
    amounts of data (by biological standards)
  • Sophisticated instrumentation, requires some
    understanding
  • Systematic features of the data are at least as
    important as the random ones
  • Often more like industrial process than single
    investigator lab research
  • Integration of many data types clinical,
    genetic, molecular..databases

5
Biological background
DNA
G T A A T C C T C
C A T T A G G A G
6
Idea measure the amount of mRNA to see which
genes are being expressed in (used by) the
cell. Measuring protein might be better, but is
currently harder.
7
Reverse transcription
Clone cDNA strands, complementary to the mRNA
G U A A U C C U C
mRNA
Reverse transcriptase
T T A G G A G
cDNA
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
8
cDNA microarray experiments
  • mRNA levels compared in many different
    contexts
  • Different tissues, same organism (brain v.
    liver)
  • Same tissue, same organism (ttt v. ctl, tumor v.
    non-tumor)
  • Same tissue, different organisms (wt v. ko, tg,
    or mutant)
  • Time course experiments (effect of ttt,
    development)
  • Other special designs (e.g. to detect spatial
    patterns).

9
cDNA microarrays
cDNA clones
10
cDNA microarrays
  • Compare the genetic expression in two samples of
    cells

PRINT cDNA from one gene on each spot
SAMPLES cDNA labelled red/green
e.g. treatment / control normal / tumor
tissue
11
HYBRIDIZE Add equal amounts of labelled cDNA
samples to microarray.
SCAN
Laser
Detector
12
Biological question Differentially expressed
genes Sample class prediction etc.
Experimental design
Microarray experiment
16-bit TIFF files
Image analysis
(Rfg, Rbg), (Gfg, Gbg)
Normalization
R, G
Estimation
Testing
Clustering
Discrimination
Biological verification and interpretation
13
Some statistical questions
  • Image analysis addressing, segmenting,
    quantifying
  • Normalisation within and between slides
  • Quality of images, of spots, of (log) ratios
  • Which genes are (relatively) up/down regulated?
  • Assigning p-values to tests/confidence to
    results.

14
Some statistical questions, ctd
  • Planning of experiments design, sample size
  • Discrimination and allocation of samples
  • Clustering, classification of samples, of genes
  • Selection of genes relevant to any given analysis
  • Analysis of time course, factorial and other
    special experiments..... much more.

15
Some bioinformatic questions
  • Connecting spots to databases, e.g. to sequence,
    structure, and pathway databases
  • Discovering short sequences regulating sets of
    genes direct and inverse methods
  • Relating expression profiles to structure and
    function, e.g. protein localisation
  • Identifying novel biochemical or signalling
    pathways, ..and much more.

16
Part of the image of one channel false-coloured
on a white (v. high) red (high) through yellow
and green (medium) to blue (low) and black scale
17
Does one size fit all?
18
Segmentation limitation of the fixed circle
method
Fixed Circle
SRG
Inside the boundary is spot (foreground), outside
is not.
19
Some local backgrounds
Single channel grey scale
We use something different again a smaller, less
variable value.
20
Quantification of expression
  • For each spot on the slide we calculate
  • Red intensity Rfg - Rbg
  • fg foreground, bg background, and
  • Green intensity Gfg - Gbg
  • and combine them in the log (base 2) ratio
  • Log2( Red intensity / Green intensity)

21
Gene Expression Data
  • On p genes for n slides p is O(10,000), n is
    O(10-100), but growing,

Slides
slide 1 slide 2 slide 3 slide 4 slide 5 1
0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49
0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10
0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.
06 1.06 1.35 1.09 -1.09 ...
Genes
Gene expression level of gene 5 in slide 4

Log2( Red intensity / Green intensity)
These values are conventionally displayed on a
red (gt0) yellow (0) green (lt0) scale.
22
(No Transcript)
23
The red/green ratios can be spatially biased
  • .

Top 2.5of ratios red, bottom 2.5 of ratios green
24
The red/green ratios can be intensity-biased
M log2R/G log2R - log2G
Values should scatter about zero.
(log2R log2G )/2
25
Normalization how we fix the previous problem
The curved line becomes the new zero line
Orange Schadt-Wong rank invariant set
Red line lowess
smooth
Yellow GAPDH, tubulin Light blue MSP
pool / titration
26
Normalizing before
2
0
M
-2
-4
6 8 10
12 14 16
27
Normalizing after
M normalised
6 8 10
12 14 16
28
A basic problem
  • SCIENTIFIC To determine which genes are
    differentially expressed between two sources of
    mRNA (trt, ctl).
  • STATISTICAL To assign appropriately adjusted
    p-values to thousands of genes.

29
Apo AI experiment (Callow et al 2000, LBNL)
Goal. To identify genes with altered expression
in the livers of Apo AI knock-out mice (T)
compared to inbred C57Bl/6 control mice (C).
  • 8 treatment mice and 8 control mice
  • 16 hybridizations liver mRNA from each of the
    16 mice (Ti , Ci ) is labelled with Cy5,
    while pooled liver mRNA from the control mice
    (C) is labelled with Cy3.
  • Probes 6,000 cDNAs (genes), including 200
    related to lipid metabolism.

30
Leukemia experiments (Golub et al 1999,WI)
  • Goal. To identify genes which are differentially
    expressed in acute lymphoblastic leukemia (ALL)
    tumours in comparison with acute myeloid
    leukemia (AML) tumours.
  • 38 tumour samples 27 ALL, 11 AML.
  • Data from Affymetrix chips, some
    pre-processing.
  • Originally 6,817 genes 3,051 after reduction.
  • Data therefore a 3,051 ? 38 array of expression
    values.

31
Univariate hypothesis testing
Initially, focus on one
gene only. We wish to test the null
hypothesis H that the gene is not
differentially expressed. In order to
do so, we use a two sample t-statistic

32
Single-step adjustments of pi
  • Bonferroni min (mpi, 1), m genes
  • Sidák 1 - (1 - pi)m
  • minP method of Westfall and Young
  • Pr( min Pl
    pi H)
  • 1lm
  • maxT method of Westfall and Young
  • Pr( max Tl ti H0C )
  • 1lm

33
More powerful methods step-down adjustments
The idea S Holms modification of
Bonferroni. Also applies to Sidák, maxT, and
minP. We illustrate this last adjustment.
34
Step-down adjustment of minP
  • Initialization Order the unadjusted p-values
    such that pr1 pr2 ??? prm. The
    indices r1, r2, r3,.. are fixed for given data.
  • Step-down adjustment
  • Compare min Pr1, ??? , Prm with pr1
  • Compare min Pr2, ??? , Prm with pr2
  • Compare min Pr3 ??? , Prm with pri3 .
  • m. Compare Prm with prm
  • Enforce the monotonicity on the adjusted pri

35
(No Transcript)
36
gene t unadj. p minP plower maxT
index statistic (?104) adjust. adjust.
2139 -22 1.5 .53 8 ? 10-5 2 ? 10-4
4117 -13 1.5 .53 8 ? 10-5 5 ? 10-4
5330 -12 1.5 .53 8 ? 10-5 5 ? 10-4
1731 -11 1.5 .53 8 ? 10-5 5 ? 10-4
538 -11 1.5 .53 8 ? 10-5 5 ? 10-4
1489 -9.1 1.5 .53 8 ? 10-5 1 ? 10-3
2526 -8.3 1.5 .53 8 ? 10-5 3 ? 10-3
4916 -7.7 1.5 .53 8 ? 10-5 8 ? 10-3
941 -4.7 1.5 .53 8 ? 10-5 0.65
2000 3.1 1.5 .53 8 ? 10-5 1.00
5867 -4.2 3.1 .76 0.54 0.90
4608 4.8 6.2 .93 0.87 0.61
948 -4.7 7.8 .96 0.93 0.66
5577 -4.5 12 .99 0.93 0.74
37
Apo AI. Histogram Q-Q plot
ApoA1
38
(No Transcript)
39
Brief discussion
  • Not mentioned strong vs weak control of Type 1
    error.
  • The minP adjustment seems more conservative than
    the maxT adjustment, but is essentially
    model-free.
  • The adjusted minP values are very discrete it
    seems that 12,870 permutations are not enough for
    6,000 tests.
  • Extends to other statistics Wilcoxon, paired t,
    F, blocked F..
  • Major question in practice minP, maxT or
    something else?
  • Wanted are guidelines for use of minP in terms of
    sample sizes and number of genes.
  • Other approaches False Discovery Rate (V/R),
    Bayes.

40
From a study of the mouse olfactory system
Main (Auxiliary) Olfactory Bulb
VomeroNasal Organ
Olfactory Epithelium
From Buck (2000)
41
Axonal connectivity between the nose and
the mouse olfactory bulb
Neocortex
gt2M, 1,800 types
Two principles zone-to-zone projection, and
glomerular convergence
42
Of interest the hardwiring of the vertebrate
olfactory system
  • Expression of a specific odorant receptor gene by
    an olfactory neuron.
  • Targeting and convergence of like axons to
    specific glomeruli in the olfactory bulb.

43
The biological question in this case
  • Are there genes with spatially restricted
    expression patterns within
  • the olfactory bulb?

44
(No Transcript)
45
Layout of the cDNA Microarrays
  • Sequence verified mouse cDNAs
  • 19,200 spots in two print groups of 9,600 each
  • 4 x 4 grid, each with 25 x24 spots
  • Controls on the first 2 rows of each grid.

77
pg1
pg2
46
Design How We Sliced Up the Bulb
A
D
P
L
V
M
47
Design Two Ways to Do the Comparisons
  • Goal 3-D representation of gene expression

Compare all samples to a common reference sample
(e.g., whole bulb)
Multiple direct comparisons between different
samples (no common reference)
48
An Important Aspect of Our Design
D
A
Different ways of estimating the same
contrast e.g. A compared to P Direct A-P
Indirect A-M (M-P) or
A-D (D-P) or -(L-A) - (P-L)
L
M
P
V
How do we combine these?
49
Analysis using a linear model
Define a matrix X so that E(M)X?
Use least squares
estimates for A-L, P-L, D-L, V-L, M-L In
practice, we use robust regression. Estimates for
other estimable contrasts follow in the usual
way.
50
The Olfactory Bulb Experiments
completed so far
51
Contrasts Patterns
  • Because of the connectivity of our
    experiment, we can estimate all 15 different
    pairwise comparisons directly and/or indirectly.
  • For every gene we thus have a pattern based
    on the 15 pairwise comparisons.

Gene 15,228
52
Contrasts patternsanother way
  • Instead of estimating pairwise comparisons
    between each of the six effects, we can come
    closer to estimating the effects themselves by
    doing so subject to the standard zero sum
    constraint (6 parameters, 5 d.f.).
  • What we estimate for A, say, subject to this
    constraint, is in reality an estimate of
  • A - 1/6(A P D V M
    L).
  • This set of parameter estimates gives results
    similar to, but better than, the ones we would
    have obtained had we carried out the experiments
    with whole-bulb reference tissue.
  • In effect we have created the whole-bulb
    reference in silico.

53
Alternative pattern representation
Gene 15,228 once again.
54
Reconstruction of the Bulb as a CubeExpression
of Gene 15,228
High
Low
Expression Level
55
Patterns, More Globally...
  • Can we identify genes with interesting patterns
    of expression across the bulb?
  • Two approaches

1. Find the genes whose expression fits specific,
predefined patterns. 2. Perform cluster analysis
- see what expression patterns emerge.
56
Clustering procedure
  • Start with a sets of genes exhibiting some
    minimal level of differential expression across
    the bulb here 650 were chosen from all 15
    contrasts.
  • Carry out hierarchical clustering, building a
    dendrogram Mahalanobis distance and Ward
    agglomeration (minimum variance) were used.
  • Now consider all clusters of 2 or more genes
    in the tree. Singles are added separately.
  • Measure the heterogeneity h of a cluster by
    calculating the 15 SDs
  • across the cluster of each of the pairwise
    effects, and taking the largest.
  • Choose a score s (see plots) and take all
    maximal disjoint clusters with
  • h lt s. Here we used s 0.46 and obtained
    16 clusters.

57
Red genes chosen Bluecontrols 15 p/w effects
58
(No Transcript)
59
The 16 groups systematically arranged (6 point
representation)

60
(No Transcript)
61
Validation of Gene 15,228 Expression Pattern by
RNA In Situ Hybridization
62
Validation of predicted patterns using in situ
hybridization and neurolucida reconstructions
from them.
63
Some statistical research stimulated by
microarray data analysis
  • Experimental design Churchill Kerr
  • Image analysis Zuzan West, .
  • Data visualization Carr et al
  • Estimation Ideker et al, .
  • Multiple testing Westfall Young , Storey, .
  • Discriminant analysis Golub et al,
  • Clustering Hastie Tibshirani, Van der Laan,
    Fridlyand Dudoit,
    .
  • Empirical Bayes Efron et al, Newton et al,.
    Multiplicative models Li Wong
  • Multivariate analysis Alter et al
  • Genetic networks DHaeseleer et al and
    more

64
Acknowledgments
  • Statistical collaborators
  • Yee Hwa Yang (Berkeley)
  • Sandrine Dudoit (Berkeley)
  • Ingrid Lönnstedt (Uppsala)
  • Natalie Thorne (WEHI)
  • Mauro Delorenzi (WEHI)
  • CSIRO Image Analysis Group
  • Michael Buckley
  • Ryan Lagerstorm
  • WEHI
  • Glenn Begley
  • Suzie Grant
  • Rob Good
  • PMCI
  • Chuang Fong Kong
  • Ngai Lab (Berkeley)
  • Cynthia Duggan
  • Jonathan Scolnick
  • Dave Lin
  • Vivian Peng
  • Percy Luu
  • Elva Diaz
  • John Ngai
  • LBNL
  • Matt Callow
  • RIKEN Genomic Sciences Center
  • Yasushi Okazaki
  • Yoshihide Hayashizaki


65
  • Some web sites
  • Technical reports, talks, software etc.
  • http//www.stat.berkeley.edu/users/terry/zarray/Ht
    ml/
  • Statistical software R GNUs S
    http//lib.stat.cmu.edu/R/CRAN/
  • Packages within R environment
  • -- Spot http//www.cmis.csiro.au/iap/spot.htm
  • -- SMA (statistics for microarray analysis)
    http//www.stat.berkeley.edu/users/terry/zarray/So
    ftware /smacode.html
Write a Comment
User Comments (0)
About PowerShow.com