Lecture 17 Gene expression and the transcriptome II - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 17 Gene expression and the transcriptome II

Description:

SAGE = Serial Analysis of Gene Expression ... sporulation. temperature shock. reducing shock. The Data. each measurement represents ... – PowerPoint PPT presentation

Number of Views:591

Avg rating:3.0/5.0

Slides: 42

Provided by: heri4

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 17 Gene expression and the transcriptome II

1
Lecture 17Gene expression and the transcriptome
II
2
SAGE

SAGE Serial Analysis of Gene Expression
Based on serial sequencing of 15-bp tags that are
unique to each and every gene
SAGE is a method to determine absolute abundance
of every transcript expressed in a population of
cells

3
SAGE

15-bp gene-specific tags are produced by elegant
series of molecular biology manipulations and
then concatenated into a single molecule (string)
for automated sequencing
By sequencing the concatenated fragments, the
number of copies of each tag can be counted
A list of each unique tag and its abundance in
the population is assembled

4
SAGE

At least 50,000 tags are required per sample to
approach saturation, the point where each
expressed gene (eukaryotic cell) is represented
at least twice
SAGE costs about 5000 per sample
Too expensive to do replicated comparisons like
is done with microarrays

5
Transcript abundance in typical eukaryotic cell

lt100 transcripts account for 20 of of total mRNA
population, each being present in between 100 and
1000 copies per cell
These encode ribosomal proteins and other core
elements of transcription and translation
machinery, histones and further taxon-specific
genes
General, basic and most important cellular
mechanisms

6
Transcript abundance in typical eukaryotic cell
(2)

Several hundred intermediate-frequency
transcripts, each making 10 to 100 copies, make
up for a further 30 of mRNA
These code for housekeeping enzymes, cytoskeletal
components and some unusually abundant cell-type
specific proteins
Pretty basic housekeeping things

7
Transcript abundance in typical eukaryotic cell
(3)

Further 50 of mRNA is made up of tens of
thousands low-abundance transcripts (lt10), some
of which may be expressed at less than one copy
per cell (on average)
Most of these genes are tissue-specific or
induced only under particular conditions
Specific or special purpose products

8
Transcript abundance in typical eukaryotic cell
(4)

Get some feel for the numbers (can be a factor 2
off but order of magnitude about right)
If
80 transcripts 400 copies 32,000 (20)
600 transcripts 75 copies 45,000 (30)
25,000 transcripts 3 copies 75,000 (50)
Then Total 150,000 mRNA molecules

9
Transcript abundance in typical eukaryotic cell
(5)

This means that most of the transcripts in a cell
population contribute less than 0.01 of the
total mRNA
Say 1/3 of higher eukaryote genome is expressed
in given tissue, then about 10,000 different tags
should be detectable
Taking into account that half the transcriptome
is relatively abundant, at least 50,000 different
tags should be sequenced to approach saturation
(at least 10 copies per transcript on average)

10
SAGE analysis of yeast (Velculesco et al., 1997)
1.0 0.75 0.5 0.25 0
17 38 45
Fraction of all transcripts
1000 100 10 1
0.1
Number of transcripts per cell
11
SAGE quantitative comparison

A tag present in 4 copies in one sample of 50,000
tags, and in 2 copies in another sample, may be
twofold expressed but is not going to be
significant
Even 20 to 10 tags might not be statistically
significant given the large numbers of
comparisons
Often, 10-fold over- or under-expression is taken
as threshold

12
SAGE quantitative comparison

A great advantage of SAGE is that method is
unbiased by experimental conditions
Direct comparison of data sets is possible
Data produced by different groups can be pooled
Web-based tools for performing comparisons of
samples all over the world exist (e.g. SAGEnet
and xProfiler)

13
Genome-Wide Cluster AnalysisEisen dataset

Eisen et al., PNAS 1998
S. cerevisiae (bakers yeast)
all genes ( 6200) on a single array
measured during several processes
human fibroblasts
8600 human transcripts on array
measured at 12 time points during serum
stimulation

14
The Eisen Data

79 measurements for yeast data
collected at various time points during
diauxic shift (shutting down genes for
metabolizing sugars, activating those for
metabolizing ethanol)
mitotic cell division cycle
sporulation
temperature shock
reducing shock

15
The Data

each measurement represents
Log(Redi/Greeni)
where red is the test expression level, and green
is
the reference level for gene G in the i th
experiment
the expression profile of a gene is the vector
of
measurements across all experiments G1 .. Gn

16
The Data

m genes measured in n experiments
g1,1 g1,n
g2,1 . g2,n
gm,1 . gm,n

Vector for 1 gene
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Eisen et al. Results

redundant representations of genes cluster
together
but individual genes can be distinguished from
related genes by subtle differences in expression
genes of similar function cluster together
e.g. 126 genes strongly down-regulated in
response to stress

21
Eisen et al. Results

126 genes down-regulated in response to stress
112 of the genes encode ribosomal and other
proteins related to translation
agrees with previously known result that yeast
responds to favorable growth conditions by
increasing the production of ribosomes

22
Partitional Clustering

divide instances into disjoint clusters
flat vs. tree structure
key issues
how many clusters should there be?
how should clusters be represented?

23
(No Transcript)
24
Partitional Clustering from aHierarchical
Clustering
we can always generate a partitional clustering
from ahierarchical clustering by cutting the
tree at some level
25
K-Means Clustering

assume our instances are represented by vectors
of real values
put k cluster centers in same space as
instances
now iteratively move cluster centers

26
K-Means Clustering

each iteration involves two steps
assignment of instances to clusters
re-computation of the means

27
K-Means Clustering

in k-means clustering, instances are assigned to
one and only one cluster
can do soft k-means clustering via Expectation
Maximization (EM) algorithm
each cluster represented by a normal distribution
E step determine how likely is it that each
cluster generated each instance
M step move cluster centers to maximize
likelihood of instances

28
(No Transcript)
29
Ecogenomics
Algorithm that maps observed clustering behaviour
of sampled gene expression data onto the
clustering behaviour of contaminant labelled gene
expression patterns in the knowledge base
Sample
Compatibility scores

Condition n (contaminant n)
Condition 1 (contaminant 1)
Condition 2 (contaminant 2)
Condition 3 (contaminant 3)
30
Protein-protein interactions
31
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
If you want to know whether any particular
proteins bound to protein X. Then such proteins
can be found by the yeast two-hybrid system.
The two-hybrid system allows in vivo detection
of protein-protein interactions as well as the
analysis of the affinity of these interactions.
32
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction

Two-hybrid technology exploits the fact that
transcriptional activators are modular in nature.
Two physically distinct functional domains are
necessary to get transcription
a DNA binding domain (DBD) that binds to the DNA
of the promoter and
an activation domain (AD) that binds to the
basal transcription apparatus and activates
transcription.

33
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction

In the yeast two-hybrid system, the known gene
encoding X, is cloned into the "bait" vector.
In this way, the gene for X is placed into a
plasmid next to the gene encoding a DNA-binding
domain from some transcription factor.
For instance, if the gene for X is cloned into
the pHybLex/Zeo vector, X would be expressed as a
fusion protein containing bacterially-derived
LexA DBD.

34
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
Separately, a second gene (or a library of
cDNAs encoding potential interactors), Y, is
cloned in frame adjacent to an activation domain
of a different transcription factor. For
instance, it could be inserted next to the DNA
encoding the B42 activation domain (AD) in a
"prey" vector such as pYESTrp2.
35
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
Thus, in one strain of yeast, a known protein X
is fused to the DNA binding domain of a
transcription factor and in another strain,
unknown proteins are fused to the activation
domain of another transcription factor. If one of
the unknown proteins combines with X, it will
bring the AD over to the DBD, and transcription
will be activated. So the plasmids containing the
"bait" (known protein/DBD) and "prey" (unknown
protein/AD) are then placed into a yeast strain,
where a marker gene has a promoter containing the
sequence bound by the bait protein DBD.
36
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
However, in order to form a working
transcription factor, it needs the AD provided by
the "prey." The only way that it can get this
activation domain is if the known protein X
combines with some unknown protein Y that is
carrying the AD. If the X and Y proteins
interact, the B42 AD is brought into proximity of
the LexA DBD and transcription of the reporter
gene is activated. The activation of the reporter
gene can be screened by enzyme activity, light
release, or cell growth, depending on the type of
reporter gene activated.
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41

Literature two-hybrid systems
Fields, S and Song, O. 1989. A novel genetic
system to detect protein-protein interactions.
Nature 340245 -246.
2. Gyuris, J., Golemis, E., Chertkov, H., and
Brent, R. 1993. Cdi1, a human G1 and S phase
protein phosphatase that associates with Cdk2.
Cell. 75 791-803.

Write a Comment

User Comments (0)