Gene expression - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Gene expression

Description:

Parallel approach to collection of very large amounts of data (by biological standards) ... From Buck (2000) From a study of the mouse olfactory system ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 64
Provided by: cen7154
Category:
Tags: buck | expression | gene

less

Transcript and Presenter's Notes

Title: Gene expression


1
Gene expression
  • Statistics 246, Week 3, 2002

2
Thesis the analysis of gene expression data is
going to be big in 21st century statistics
  • Many different technologies, including
  • High-density nylon membrane arrays
  • Serial analysis of gene expression (SAGE)
  • Short oligonucleotide arrays (Affymetrix)
  • Long oligo arrays (Agilent)
  • Fibre optic arrays (Illumina)
  • cDNA arrays (Brown/Botstein)

3
Total microarray articles indexed in Medline
4
Common themes
  • Parallel approach to collection of very large
    amounts of data (by biological standards)
  • Sophisticated instrumentation, requires some
    understanding
  • Systematic features of the data are at least as
    important as the random ones
  • Often more like industrial process than single
    investigator lab research
  • Integration of many data types clinical,
    genetic, molecular..databases

5
Biological background
DNA
G T A A T C C T C
C A T T A G G A G
6
Idea measure the amount of mRNA to see which
genes are being expressed in (used by) the
cell. Measuring protein might be better, but is
currently harder.
7
Reverse transcription
Clone cDNA strands, complementary to the mRNA
G U A A U C C U C
mRNA
Reverse transcriptase
T T A G G A G
cDNA
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
8
cDNA microarray experiments
  • mRNA levels compared in many different
    contexts
  • Different tissues, same organism (brain v.
    liver)
  • Same tissue, same organism (ttt v. ctl, tumor v.
    non-tumor)
  • Same tissue, different organisms (wt v. ko, tg,
    or mutant)
  • Time course experiments (effect of ttt,
    development)
  • Other special designs (e.g. to detect spatial
    patterns).

9
cDNA microarrays
cDNA clones
10
cDNA microarrays
  • Compare the genetic expression in two samples of
    cells

PRINT cDNA from one gene on each spot
SAMPLES cDNA labelled red/green
e.g. treatment / control normal / tumor
tissue
11
HYBRIDIZE Add equal amounts of labelled cDNA
samples to microarray.
SCAN
Laser
Detector
12
Biological question Differentially expressed
genes Sample class prediction etc.
Experimental design
Microarray experiment
16-bit TIFF files
Image analysis
(Rfg, Rbg), (Gfg, Gbg)
Normalization
R, G
Estimation
Testing
Clustering
Discrimination
Biological verification and interpretation
13
Some statistical questions
  • Image analysis addressing, segmenting,
    quantifying
  • Normalisation within and between slides
  • Quality of images, of spots, of (log) ratios
  • Which genes are (relatively) up/down regulated?
  • Assigning p-values to tests/confidence to
    results.

14
Some statistical questions, ctd
  • Planning of experiments design, sample size
  • Discrimination and allocation of samples
  • Clustering, classification of samples, of genes
  • Selection of genes relevant to any given analysis
  • Analysis of time course, factorial and other
    special experiments..... much more.

15
Some bioinformatic questions
  • Connecting spots to databases, e.g. to sequence,
    structure, and pathway databases
  • Discovering short sequences regulating sets of
    genes direct and inverse methods
  • Relating expression profiles to structure and
    function, e.g. protein localisation
  • Identifying novel biochemical or signalling
    pathways, ..and much more.

16
Part of the image of one channel false-coloured
on a white (v. high) red (high) through yellow
and green (medium) to blue (low) and black scale
17
Does one size fit all?
18
Segmentation limitation of the fixed circle
method
Fixed Circle
SRG
Inside the boundary is spot (foreground), outside
is not.
19
Some local backgrounds
Single channel grey scale
We use something different again a smaller, less
variable value.
20
Quantification of expression
  • For each spot on the slide we calculate
  • Red intensity Rfg - Rbg
  • fg foreground, bg background, and
  • Green intensity Gfg - Gbg
  • and combine them in the log (base 2) ratio
  • Log2( Red intensity / Green intensity)

21
Gene Expression Data
  • On p genes for n slides p is O(10,000), n is
    O(10-100), but growing,

Slides
slide 1 slide 2 slide 3 slide 4 slide 5 1
0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49
0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10
0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.
06 1.06 1.35 1.09 -1.09 ...
Genes
Gene expression level of gene 5 in slide 4

Log2( Red intensity / Green intensity)
These values are conventionally displayed on a
red (gt0) yellow (0) green (lt0) scale.
22
(No Transcript)
23
The red/green ratios can be spatially biased
  • .

Top 2.5of ratios red, bottom 2.5 of ratios green
24
The red/green ratios can be intensity-biased
M log2R/G log2R - log2G
Values should scatter about zero.
(log2R log2G )/2
25
Normalization how we fix the previous problem
The curved line becomes the new zero line
Orange Schadt-Wong rank invariant set
Red line lowess
smooth
Yellow GAPDH, tubulin Light blue MSP
pool / titration
26
Normalizing before
2
0
M
-2
-4
6 8 10
12 14 16
27
Normalizing after
M normalised
6 8 10
12 14 16
28
From a study of the mouse olfactory system
Main (Auxiliary) Olfactory Bulb
VomeroNasal Organ
Olfactory Epithelium
From Buck (2000)
29
Axonal connectivity between the nose and
the mouse olfactory bulb
Neocortex
gt2M, 1,800 types
Two principles zone-to-zone projection, and
glomerular convergence
30
Of interest the hardwiring of the vertebrate
olfactory system
  • Expression of a specific odorant receptor gene by
    an olfactory neuron.
  • Targeting and convergence of like axons to
    specific glomeruli in the olfactory bulb.

31
The biological question in this case
  • Are there genes with spatially restricted
    expression patterns within
  • the olfactory bulb?

32
(No Transcript)
33
Layout of the cDNA Microarrays
  • Sequence verified mouse cDNAs
  • 19,200 spots in two print groups of 9,600 each
  • 4 x 4 grid, each with 25 x24 spots
  • Controls on the first 2 rows of each grid.

77
pg1
pg2
34
Design How We Sliced Up the Bulb
A
D
P
L
V
M
35
Design Two Ways to Do the Comparisons
  • Goal 3-D representation of gene expression

Compare all samples to a common reference sample
(e.g., whole bulb)
Multiple direct comparisons between different
samples (no common reference)
36
An Important Aspect of Our Design
D
A
Different ways of estimating the same
contrast e.g. A compared to P Direct A-P
Indirect A-M (M-P) or
A-D (D-P) or -(L-A) - (P-L)
L
M
P
V
How do we combine these?
37
Analysis using a linear model
Define a matrix X so that E(M)X?
Use least squares
estimates for A-L, P-L, D-L, V-L, M-L In
practice, we use robust regression. Estimates for
other estimable contrasts follow in the usual
way.
38
The Olfactory Bulb Experiments
completed so far
39
Contrasts Patterns
  • Because of the connectivity of our
    experiment, we can estimate all 15 different
    pairwise comparisons directly and/or indirectly.
  • For every gene we thus have a pattern based
    on the 15 pairwise comparisons.

Gene 15,228
40
Contrasts patternsanother way
  • Instead of estimating pairwise comparisons
    between each of the six effects, we can come
    closer to estimating the effects themselves by
    doing so subject to the standard zero sum
    constraint (6 parameters, 5 d.f.).
  • What we estimate for A, say, subject to this
    constraint, is in reality an estimate of
  • A - 1/6(A P D V M
    L).
  • This set of parameter estimates gives results
    similar to, but better than, the ones we would
    have obtained had we carried out the experiments
    with whole-bulb reference tissue.
  • In effect we have created the whole-bulb
    reference in silico.

41
Alternative pattern representation
Gene 15,228 once again.
42
Reconstruction of the Bulb as a CubeExpression
of Gene 15,228
High
Low
Expression Level
43
Patterns, More Globally...
  • Can we identify genes with interesting patterns
    of expression across the bulb?
  • Two approaches

1. Find the genes whose expression fits specific,
predefined patterns. 2. Perform cluster analysis
- see what expression patterns emerge.
44
Clustering procedure
  • Start with a sets of genes exhibiting some
    minimal level of differential expression across
    the bulb here 650 were chosen from all 15
    contrasts.
  • Carry out hierarchical clustering, building a
    dendrogram Mahalanobis distance and Ward
    agglomeration (minimum variance) were used.
  • Now consider all clusters of 2 or more genes
    in the tree. Singles are added separately.
  • Measure the heterogeneity h of a cluster by
    calculating the 15 SDs
  • across the cluster of each of the pairwise
    effects, and taking the largest.
  • Choose a score s (see plots) and take all
    maximal disjoint clusters with
  • h lt s. Here we used s 0.46 and obtained
    16 clusters.

45
Plots guiding choice of clusters of genes
Number of genes
Number of clusters (patterns)
Cluster heterogeneity h (max of 15 SDs)
46
Red genes chosen Bluecontrols 15 p/w effects
47
(No Transcript)
48
The 16 groups systematically arranged (6 point
representation)

49
(No Transcript)
50
Validation of Gene 15,228 Expression Pattern by
RNA In Situ Hybridization
51
Gene 15,228 another in situ view
52
384 (group 3)
53
3-dimension reconstruction from in-situ data
15,228 5,291 8,496 384
54
Are the pattens we found real?
  • Heres how we attempted to show that the
    answer is a qualified yes.
  • Each cluster average (pattern) has a
    strength we can measure by
  • its root-mean-square (RMS). The n16
    clusters we found have an average RMS of av
    0.3. Both n and av being strongly determined by
    our heterogeneity cut-off score of s0.46.
  • Now consider randomizing the labels (e.g.
    P-A) on our hybridizations and repeating the
    entire analysis, keeping the cut-off score at
    0.46. We typically get fewer, weaker patterns,
    with less contrast in the red-green patchwork.
    One such is on the next page.
  • 500 independent random relabellings had a
    mean av value of 0.18, an SD of 0.07 and a max
    av value of 0.26, cf. 0.3 in our data. Our
    clusters are definitely non-random in some
    sense.

55
Real
Random
56
Problem
  • We later tried all this with a different set
    of data, one which made use of reference mRNA
    had generally lower S/N, and where the
    inveestigator sought fewer interesting patterns.
  • We found that the patterns the previous method
    discovered were similarly quite distinct in av
    values from those in randomly labelled hybs, but
    this time, the av values were significantly
    lower than random. It all depends where you
    are on the curve.

57
Where next?
  • I feel that we need a new idea. The previous
    one doesnt seem to have worked. Or did it?
  • Just clustering and taking averages seems too
    easy.
  • But maybe clustering is all there is to
    patterns, once we have decided on the appropriate
    and context dependent profile to cluster, and
    selected the genes, but I keep wondering

58
Some statistical research stimulated by
microarray data analysis
  • Experimental design Churchill Kerr
  • Image analysis Zuzan West, .
  • Data visualization Carr et al
  • Estimation Ideker et al, .
  • Multiple testing Westfall Young , Storey, .
  • Discriminant analysis Golub et al,
  • Clustering Hastie Tibshirani, Van der Laan,
    Fridlyand Dudoit,
    .
  • Empirical Bayes Efron et al, Newton et al,.
    Multiplicative models Li Wong
  • Multivariate analysis Alter et al
  • Genetic networks DHaeseleer et al and
    more

59
In closing The pervasiveness of microarray
technology
  • and the statistical problems that go with it
  • Hybridization of target DNA or RNA to large
    numbers of probes attached to a solid support in
    a microarray format has a much wider
    applicability.
  • All such applications have their own
    statistical problems. Here are two relating to
    the previous lectures.

60
Meiosis data in which all exchanges are
precisely located (from microarrays)
Yeast
Figure courtesy of J Derisi
61
Exon Arrays can validate Exon Predictions and
assemble Gene Structures
One or more Probes per Predicted Exon
Predicted exon
Predicted exon
  • Verify predicted exons on a genome-wide scale.
  • Group exons into genes via co-regulation.

This and the next slide courtesy of Rosetta
62
Tiling arrays can identify exons and refine
gene structures
Predicted exon
Predicted exon
63
Acknowledgments
  • Statistical collaborators
  • Yee Hwa Yang (Berkeley)
  • Sandrine Dudoit (Berkeley)
  • Ingrid Lönnstedt (Uppsala)
  • Natalie Thorne (WEHI)
  • Mauro Delorenzi (WEHI)
  • CSIRO Image Analysis Group
  • Michael Buckley
  • Ryan Lagerstorm
  • WEHI
  • Glenn Begley
  • Suzie Grant
  • Rob Good
  • PMCI
  • Chuang Fong Kong
  • Ngai Lab (Berkeley)
  • Cynthia Duggan
  • Jonathan Scolnick
  • Dave Lin
  • Vivian Peng
  • Percy Luu
  • Elva Diaz
  • John Ngai
  • LBNL
  • Matt Callow
  • RIKEN Genomic Sciences Center
  • Yasushi Okazaki
  • Yoshihide Hayashizaki

Write a Comment
User Comments (0)
About PowerShow.com