Lecture 17 Gene expression and the transcriptome II - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 17 Gene expression and the transcriptome II

Description:

SAGE = Serial Analysis of Gene Expression ... sporulation. temperature shock. reducing shock. The Data. each measurement represents ... – PowerPoint PPT presentation

Number of Views:591
Avg rating:3.0/5.0
Slides: 42
Provided by: heri4
Category:

less

Transcript and Presenter's Notes

Title: Lecture 17 Gene expression and the transcriptome II


1
Lecture 17Gene expression and the transcriptome
II
2
SAGE
  • SAGE Serial Analysis of Gene Expression
  • Based on serial sequencing of 15-bp tags that are
    unique to each and every gene
  • SAGE is a method to determine absolute abundance
    of every transcript expressed in a population of
    cells

3
SAGE
  • 15-bp gene-specific tags are produced by elegant
    series of molecular biology manipulations and
    then concatenated into a single molecule (string)
    for automated sequencing
  • By sequencing the concatenated fragments, the
    number of copies of each tag can be counted
  • A list of each unique tag and its abundance in
    the population is assembled

4
SAGE
  • At least 50,000 tags are required per sample to
    approach saturation, the point where each
    expressed gene (eukaryotic cell) is represented
    at least twice
  • SAGE costs about 5000 per sample
  • Too expensive to do replicated comparisons like
    is done with microarrays

5
Transcript abundance in typical eukaryotic cell
  • lt100 transcripts account for 20 of of total mRNA
    population, each being present in between 100 and
    1000 copies per cell
  • These encode ribosomal proteins and other core
    elements of transcription and translation
    machinery, histones and further taxon-specific
    genes
  • General, basic and most important cellular
    mechanisms

6
Transcript abundance in typical eukaryotic cell
(2)
  • Several hundred intermediate-frequency
    transcripts, each making 10 to 100 copies, make
    up for a further 30 of mRNA
  • These code for housekeeping enzymes, cytoskeletal
    components and some unusually abundant cell-type
    specific proteins
  • Pretty basic housekeeping things

7
Transcript abundance in typical eukaryotic cell
(3)
  • Further 50 of mRNA is made up of tens of
    thousands low-abundance transcripts (lt10), some
    of which may be expressed at less than one copy
    per cell (on average)
  • Most of these genes are tissue-specific or
    induced only under particular conditions
  • Specific or special purpose products

8
Transcript abundance in typical eukaryotic cell
(4)
  • Get some feel for the numbers (can be a factor 2
    off but order of magnitude about right)
  • If
  • 80 transcripts 400 copies 32,000 (20)
  • 600 transcripts 75 copies 45,000 (30)
  • 25,000 transcripts 3 copies 75,000 (50)
  • Then Total 150,000 mRNA molecules

9
Transcript abundance in typical eukaryotic cell
(5)
  • This means that most of the transcripts in a cell
    population contribute less than 0.01 of the
    total mRNA
  • Say 1/3 of higher eukaryote genome is expressed
    in given tissue, then about 10,000 different tags
    should be detectable
  • Taking into account that half the transcriptome
    is relatively abundant, at least 50,000 different
    tags should be sequenced to approach saturation
    (at least 10 copies per transcript on average)

10
SAGE analysis of yeast (Velculesco et al., 1997)
1.0 0.75 0.5 0.25 0
17 38 45
Fraction of all transcripts
1000 100 10 1
0.1
Number of transcripts per cell
11
SAGE quantitative comparison
  • A tag present in 4 copies in one sample of 50,000
    tags, and in 2 copies in another sample, may be
    twofold expressed but is not going to be
    significant
  • Even 20 to 10 tags might not be statistically
    significant given the large numbers of
    comparisons
  • Often, 10-fold over- or under-expression is taken
    as threshold

12
SAGE quantitative comparison
  • A great advantage of SAGE is that method is
    unbiased by experimental conditions
  • Direct comparison of data sets is possible
  • Data produced by different groups can be pooled
  • Web-based tools for performing comparisons of
    samples all over the world exist (e.g. SAGEnet
    and xProfiler)

13
Genome-Wide Cluster AnalysisEisen dataset
  • Eisen et al., PNAS 1998
  • S. cerevisiae (bakers yeast)
  • all genes ( 6200) on a single array
  • measured during several processes
  • human fibroblasts
  • 8600 human transcripts on array
  • measured at 12 time points during serum
    stimulation

14
The Eisen Data
  • 79 measurements for yeast data
  • collected at various time points during
  • diauxic shift (shutting down genes for
    metabolizing sugars, activating those for
    metabolizing ethanol)
  • mitotic cell division cycle
  • sporulation
  • temperature shock
  • reducing shock

15
The Data
  • each measurement represents
  • Log(Redi/Greeni)
  • where red is the test expression level, and green
    is
  • the reference level for gene G in the i th
    experiment
  • the expression profile of a gene is the vector
    of
  • measurements across all experiments G1 .. Gn

16
The Data
  • m genes measured in n experiments
  • g1,1 g1,n
  • g2,1 . g2,n
  • gm,1 . gm,n

Vector for 1 gene
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Eisen et al. Results
  • redundant representations of genes cluster
    together
  • but individual genes can be distinguished from
    related genes by subtle differences in expression
  • genes of similar function cluster together
  • e.g. 126 genes strongly down-regulated in
    response to stress

21
Eisen et al. Results
  • 126 genes down-regulated in response to stress
  • 112 of the genes encode ribosomal and other
    proteins related to translation
  • agrees with previously known result that yeast
    responds to favorable growth conditions by
    increasing the production of ribosomes

22
Partitional Clustering
  • divide instances into disjoint clusters
  • flat vs. tree structure
  • key issues
  • how many clusters should there be?
  • how should clusters be represented?

23
(No Transcript)
24
Partitional Clustering from aHierarchical
Clustering
we can always generate a partitional clustering
from ahierarchical clustering by cutting the
tree at some level
25
K-Means Clustering
  • assume our instances are represented by vectors
    of real values
  • put k cluster centers in same space as
    instances
  • now iteratively move cluster centers

26
K-Means Clustering
  • each iteration involves two steps
  • assignment of instances to clusters
  • re-computation of the means

27
K-Means Clustering
  • in k-means clustering, instances are assigned to
    one and only one cluster
  • can do soft k-means clustering via Expectation
    Maximization (EM) algorithm
  • each cluster represented by a normal distribution
  • E step determine how likely is it that each
    cluster generated each instance
  • M step move cluster centers to maximize
    likelihood of instances

28
(No Transcript)
29
Ecogenomics
Algorithm that maps observed clustering behaviour
of sampled gene expression data onto the
clustering behaviour of contaminant labelled gene
expression patterns in the knowledge base
Sample
Compatibility scores


Condition n (contaminant n)
Condition 1 (contaminant 1)
Condition 2 (contaminant 2)
Condition 3 (contaminant 3)
30
Protein-protein interactions
31
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
  If you want to know whether any particular
proteins bound to protein X. Then such proteins
can be found by the yeast two-hybrid system.
The two-hybrid system allows in vivo detection
of protein-protein interactions as well as the
analysis of the affinity of these interactions.
32
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
  •  
  • Two-hybrid technology exploits the fact that
    transcriptional activators are modular in nature.
  • Two physically distinct functional domains are
    necessary to get transcription
  • a DNA binding domain (DBD) that binds to the DNA
    of the promoter and
  • an activation domain (AD) that binds to the
    basal transcription apparatus and activates
    transcription.

33
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
  •  
  • In the yeast two-hybrid system, the known gene
    encoding X, is cloned into the "bait" vector.
  • In this way, the gene for X is placed into a
    plasmid next to the gene encoding a DNA-binding
    domain from some transcription factor.
  • For instance, if the gene for X is cloned into
    the pHybLex/Zeo vector, X would be expressed as a
    fusion protein containing bacterially-derived
    LexA DBD.

34
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
  Separately, a second gene (or a library of
cDNAs encoding potential interactors), Y, is
cloned in frame adjacent to an activation domain
of a different transcription factor. For
instance, it could be inserted next to the DNA
encoding the B42 activation domain (AD) in a
"prey" vector such as pYESTrp2.
35
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
  Thus, in one strain of yeast, a known protein X
is fused to the DNA binding domain of a
transcription factor and in another strain,
unknown proteins are fused to the activation
domain of another transcription factor. If one of
the unknown proteins combines with X, it will
bring the AD over to the DBD, and transcription
will be activated. So the plasmids containing the
"bait" (known protein/DBD) and "prey" (unknown
protein/AD) are then placed into a yeast strain,
where a marker gene has a promoter containing the
sequence bound by the bait protein DBD.
36
Home
back to Chapter 5 page zygote

The Two-Hybrid System for the Detection of
Protein-Protein Interactions
Protein-protein interaction
  However, in order to form a working
transcription factor, it needs the AD provided by
the "prey." The only way that it can get this
activation domain is if the known protein X
combines with some unknown protein Y that is
carrying the AD. If the X and Y proteins
interact, the B42 AD is brought into proximity of
the LexA DBD and transcription of the reporter
gene is activated. The activation of the reporter
gene can be screened by enzyme activity, light
release, or cell growth, depending on the type of
reporter gene activated.
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
  • Literature two-hybrid systems
  • Fields, S and Song, O. 1989. A novel genetic
    system to detect protein-protein interactions.
    Nature 340245 -246.
  • 2. Gyuris, J., Golemis, E., Chertkov, H., and
    Brent, R. 1993. Cdi1, a human G1 and S phase
    protein phosphatase that associates with Cdk2.
    Cell. 75 791-803.
Write a Comment
User Comments (0)
About PowerShow.com