Title: Evolution of ProteinCoding Genes and the Generalized MistranslationInduced Misfolding Hypothesis'
1Evolution of Protein-Coding Genesandthe
Generalized Mistranslation-Induced Misfolding
Hypothesis.
Yuri Wolf1, Irina Gopich2, Eugene Koonin1, David
Lipman1 1 NCBI and 2 NIDDK, NIH, Bethesda, MD, USA
MCCMB, July 2009, Moscow
2Evolution Rate Two Invariants
The concept of "molecular clock" family-specific
evolutionary rates remain near constant over long
periods Zuckerkandl E, Pauling L. Molecules as
documents of evolutionary history. J Theor Biol.
1965, 8357-366.
Protein A faster
sequence distance
Protein B slower
time
Similar shape of organism-specific distributions
of evolutionary rates. Grishin NV, Wolf YI,
Koonin EV. From complete genomes to measures of
substitution rate variability within and between
proteins. Genome Res. 2000, 10991-1000.
Bacteria Archaea Eukaryota
3Intrinsic Constraints and Importance
First explanation for differences in evolutionary
rates of protein-coding genes it is determined
by intrinsic structural-functional constraints
and gene dispensability Wilson AC, Carlson SS,
White TJ. Biochemical evolution. Annu Rev
Biochem. 1977, 46573-639.
Ri f(Pi)g(Qi)
Taken for granted for 20 years empirical
studies started in late 1990s Hurst LD, Smith
NG. Do essential genes evolve slowly? Curr Biol.
1999, 9747-750 many others in 2000-2006. A
very complex picture emerged evolution rate
appears to be very weakly correlated with
dispensability. The strongest observed correlate
for the evolution rate is expression level.
4Gene Status
"STATUS"
"phenomic" measures
"evolutionary" measures
The concept of "gene status" as the most
important meta-variable affecting phenomic
variables (directly) and evolutionary variables
(indirectly) explains the observed pattern of
correlations and allows to predict correlations
for novel measurements. No mechanistic
explanation is suggested, however Wolf YI,
Carmel L, Koonin EV. Unifying measures of gene
function and evolution. Proc Biol Sci. 2006,
2731507-1515.
5Mistranslation-Induced Misfolding
"We propose, and demonstrate using a
molecular-level evolutionary simulation, that
selection against toxicity of misfolded proteins
generated by ribosome errors suffices to create
all of the observed covariation between
evolutionary rate and expression Drummond DA,
Wilke CO. Mistranslation-induced protein
misfolding as a dominant constraint on
coding-sequence evolution. Cell. 2008,
134341-352.
6Generalized MIM Hypothesis
overall effect
intrinsic properties of the protein
expression level
Specific properties of the gene determine the
likelihood of a disturbance. Intrinsic properties
of the protein (structural-functional
constraints, SFC) determine the outcome of a
disturbance. Expression level amplifies the
overall effect of the disturbance (amplification
by expression, ABE). Highly expressed genes are
fine-tuned to reduce the misfolding cost.
7Why ABE?
Efficiency 50
8ABE and Sequence Evolution
poor folding
cost of misfolding
robust folding
expression
High expression Large difference in cost Large
difference in fitness Strongly constrained
Low expression Small difference in cost Small
difference in fitness Weakly constrained
robustness
sequence space
9SFC vs. ABE
Relative importance of these factors is unknown...
?
a
10SFC vs. ABE
unequal representation of different types of
structures
evolution rate
expression level
because in real-world data influence of
expression is conflated with structural-functional
influence (i.e. highly-expressed proteins are
not a random subset by structure).
11SFC vs. ABE the first glance
Domains in multidomain proteins different
structural/functional properties, but same
expression level
Can be used to investigate SFC effects when ABE
effects are controlled for Wolf MY, Wolf YI,
Koonin EV. Comparable contributions of
structural-functional constraints and expression
level to the rate of protein sequence evolution.
Biol Direct. 2008, 340. If ABE dominates,
different domains within multidomain proteins
should evolve at the same rate (rate
homogenization) if SFC dominates, domains in
multidomain proteins should evolve at their
"normal" rates (rate independence). We found that
neither is true, i.e. a ? ?.
12New Experimental Data
Two parallel arrays of (apparently very clean)
protein MassSpec abundance data, well correlated
with evolution rate Schrimpf SP et al.
Comparative functional analysis of the
Caenorhabditis elegans and Drosophila
melanogaster proteomes. PLOS Biol. 2009, 7e48
13SFC vs. ABE an orthogonal look
orthologs have the same structure and function
RATE EXPR RATE EXPR
so the difference between their evolution rates
is determined by the difference between their
expression levels.
Can be used to investigate ABE effects when SFC
effects are controlled.
14Model Assumptions
- Evolution rate is a multiplicative function of
- structural and functional constraints (SFC)
- effect of amplification by expression (ABE)
- other unknown factors random noise errors of
rate estimation. - Both SFC and ABE effects are well approximated by
power functions of a hidden SF constraint
factor and protein translation rate
respectively. SFC effect is gene- and
organism-independent ABE effect is
gene-independent - Translation rate is estimated by measuring the
abundance of the corresponding gene product. - Effects of unknown factors, random noise and
imperfections of rate and estimation can be
combined into a single random variable,
independent from other variables.
15Model Structure
AY
AX
re
?
?
TX
TY
aX
aY
S
?
?
RY
RX
Relationships between the model components.
16Observables
AY
AX
TX
TY
S
RY
RX
Observed correlations.
17Model Basic Equations
Parameters ? strength of the SFC effect aX,
aY strength of the ABE effect in two organisms
(e.g. worm and fly) ? relationship between
translation rate and measured abundance re
correlation between errors in abundance
measurements Data (log scale,
standardized) RX,i, RY,i evolution rates of
orthologous genes in two organisms AX,i, AY,i
abundances of orthologs in two organisms Compute
d correlations rR, rA, rRAXX, rRAYY, rRAXY,
rRAYX
ABE effect
ev. rate
SFC effect
random factors
abundance
random factors
translation rate
18Model Solution
aX, aY and ? are expressed using the observed
correlations (rR, rA, rRAXX, rRAYY, rRAXY, rRAYX)
and (unknown) parameters characterizing the
experimental procedure ? and re.
19Worm and Fly Correlations
Variables rA 0.80 rR 0.52 rRAXX,
rRAYY -0.41 -0.34 rRAXY, rRAYX -0.37 -0.32 r?
-0.09 - p-value 1.7x10-5
rRAXX
rRAYY
Rfly
Rworm
Afly
20Worm and Fly Solution Area
aX, aY and ? are expressed through ? and re.
allowed values
Not all possible values of 0?1 and 0re1
satisfy boundary conditions (e.g. ?20)
re
impossible values
perfect measurement
?
21Worm and Fly Solution Surfaces
aX
aY
?
re
re
re
?
?
?
- Two approaches
- assumption of perfect measurement (? 1, re 0)
- Bayesian reasoning ("median" values of aX, aY
and ? such that area of e.g. ?(?,re)lt? is 0.5
of total area)
Variable "Perfect" "Median" aX, aY -0.17,
-0.10 -0.22, -0.13 ? -0.68 -0.64 ?/a 4.0,
6.9 2.9, 4.9
22Estimates with Other Data
?/a estimate for "Perfect" "Median" worm/fly
MassSpec 4.0, 6.9 2.9, 4.9 worm/fly MassSpec
(bootstrap) 2.8, 20.7 worm/fly Affymetrix
mRNA 7.0, 72.5 5.2, 24.6 human/mouse EST
mRNA 9.3, 51.1
- Conclusions (based on worm/fly MassSpec data and
median estimates - correlation with abundance explains 10-17 of
variance in evolution rate of protein-coding
genes - SFC and ABE together could explain 49-57 of rate
variance - the SFC effect is 3-5 times stronger than the ABE
effect - SFC alone would explain 41 of rate variance
ABE alone would explain 2-5 of rate variance - SFC and ABE are correlated at r?0.37 the
combined effect explains the remaining 8-15 of
rate variance
23Generalized MIM Hypothesis
naive MIM model
generalized MIM model
folding robustness
high --- expression --- low
high --- expression --- low
fitness
narrow fitness peak low R
wide fitness peak high R
narrow fitness peak low R
wide fitness peak high R
24Acknowledgments
Irina Gopich (NIDDK)
Eugene Koonin (NCBI)
David Lipman (NCBI)
Sabine Schrimpf and Christian von Mering (of
Schrimpf et al., 2009 University of Zurich,
Switzerland)