Title: The Evolution of Protein Structure and Function as Studied through Structural Bioinformatics
1The Evolution of Protein Structure and Function
as Studied through Structural Bioinformatics
- Philip E. Bourne
- Skaggs School of Pharmacy and Pharmaceutical
Sciences - University of California San Diego
- pbourne_at_ucsd.edu
2Agenda
- What is structural bioinformatics and how do YOU
drive it? - Prerequisites the sequence-structure-function
relationship - Some exciting developments
- Using protein structure to study evolution
- Functional prediction, pathway mapping and the
RCSB PDB response - Unsolved problems
- Structure comparison
- Domain definition
- What more could be done to drive the field
forward?
3(No Transcript)
4Personal Definition
- Improving our understanding of living systems
through the study of macromolecular structure en
masse - Each structure is a data point is an effort to
gain broader understanding
2nd Edition J. Gu and P.E. Bourne (Eds.) John
Wiley and Sons NJ
What is Structural Bioinformatics?
5A Field Driven by Your Activity
Depositions to the PDB by decade
Number of released entries
Year
What is Structural Bioinformatics?
6A Field Subject to Some Bias
Enzymes
Proportion of enzyme classes relative to total
enzyme structures
Percent
Lysozyme Blake, Koenig, Mair, North, Phillips,
Sarma (1965) Nature 206 757
Ribonuclease Kartha, Bello, Harker (1967) Nature
213, 862-865 Wyckoff, Hardman, Allewell,
Inagami, Johnson, Richards (1967) J. Biol. Chem.
242, 3753-3757.
Decade
RNA-containing structures
Protein/RNA complexes
tRNA J.L. Sussman, S.-H. Kim (1976) Biochem
Biophys Res Commun. 6889-96 J.D. Robertus,
J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown,
B.F.C. Clark, A. Klug (1974) Nature 250
546-551.
RNA only
DNA/RNA hybrid
Protein/DNA/RNA complexes
What is Structural Bioinformatics?
Decade
7A Field Subject to Some Bias PDB vs Human
Genome EC Hydrolases Begins to Illustrate
the Bias in the PDB
PDB
2.5 Transferring alkyl or aryl groups over
represented in PDB 2.4 Glycosyltransferases
under represented in PDB
Ensembl Human Genome Annotation
What is Structural Bioinformatics?
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
8Agenda
- What is structural bioinformatics and how do YOU
drive it? - Prerequisites the sequence-structure-function
relationship - Some exciting developments
- Using protein structure to study evolution
- Functional prediction, pathway mapping and the
RCSB PDB response - Unsolved problems
- Structure comparison
- Domain definition
- What more could be done to drive the field
forward?
9Sequence vs Structure
Twilight Zone
Midnight Zone
The classic hssp curve from Sander and Schneider
(1991) Proteins 956-68
The Sequence Structure Function Relationship
10There Are No Absolute Rules - Similar Sequences
Different Structures
1HMPA Glycosyltransferase
1PIV1 Viral Capsid Protein
80 Residue Stretch (Yellow) with Over 40
Sequence Identity
The Sequence Structure Function Relationship
11Structure vs Function Follows a Power Law
Distribution
- Some folds are promiscuous and adopt many
different functions - superfolds
Qian J, Luscombe NM, Gerstein M. JMB
2001313(4)673-81
The Sequence Structure Function Relationship
12Examples of Superfolds..
The Sequence Structure Function Relationship
13Structure Is Highly Redundant
Structure Alignments using CE with zgt4.0
I.N. Shindyalov and P.E. Bourne 2000 Proteins
38(3), 247-260
The Sequence Structure Function Relationship
14How Can we Utilize these Seemingly Complex
Relationships?
15Agenda
- What is structural bioinformatics and how do YOU
drive it? - Prerequisites the sequence-structure-function
relationship - Some exciting developments
- Using protein structure to study evolution
- Functional prediction, pathway mapping and the
RCSB PDB response - Unsolved problems
- Structure comparison
- Domain definition
- What more could be done to drive the field
forward?
16Natures Reductionism
There are 20300 possible proteins gtgtgtgt all the
atoms in the Universe
9.5M protein sequences from UniProt/TrEMBL
(10/09)
38,221 protein structures Yield 1195 folds, 1962
superfamilies, 3902 families (SCOP 1.75)
Using Protein Structure to Study Evolution
17Consider First the Evolutionary History of One
Superfamily the Protein Kinase-like Superfamily
E. Scheeff and P.E. Bourne 2005 PLoS Comp. Biol.
1(5) e49.
Using Protein Structure to Study Evolution
18The Protein Kinase-like Superfamily
- A large family important to signal transduction
in eukaryotes and many bacteria. - Phosphotransferases transfer phosphate group
from ATP to Ser/Thr or Tyr residue on target
protein, producing a range of downstream
signaling effects. - PKA an example of a typical protein kinase
(TPK) fold, shown in open book format
Using Protein Structure to Study Evolution
19The Protein Kinase-Like Superfamily
- A range of different families, all
phosphotransferases - A variety of different targets
- All possess a core cassette of elements shared
with the TPKs - ATP binding
- Catalysis
- Structures can be highly variable, particularly
in the substrate binding regions
Family Structural Representative Phosphorylates Biological result
Typical Protein Kinases (TPKs) Protein Kinase A (PKA) Ser/Thr or Tyr residues of proteins Range of signaling effects
Alpha kinases Channel Kinase (ChaK) Ser/Thr residues in alpha-helices Range of signaling effects
Actin-Fragmin Kinase (AFK) Actin-Fragmin Kinase (AFK) Thr residue of actin Control of actin polymerization
Phosphatidyl -inositol 3- and 4-kinases Phosphatidylinositol 3-kinase (PI3K) Phosphatidylinositol (PI), PI-phosphates, PI-bisphosphates Range of second-messenger signaling effects
Phosphatidyl-inositol phosphate kinases Phosphatidylinositol phosphate kinase (PIPK) PI-phosphates Range of second-messenger signaling effects
Choline/ ethanolamine kinases Choline Kinase (CK) Choline Part of pathway that eventually produces phoshpatidylcholine, important constituent of membranes
Aminoglycoside Kinases Aminoglycoside Kinases (AK) Aminoglycoside antibiotics Antibiotic resistance
Using Protein Structure to Study Evolution
20Method
- Begin with a multiple structure alignment using
CE-MC (NAR 2004) of 30 comparable TPKs and APKs
and manually correct in a pair-wise manner over a
period of 1-2 person years - Review the literature on each structure
- Review the associated sequence alignments derived
from structure
E. Scheeff and P.E. Bourne 2005 PLoS Comp. Biol.
1(5) e49.
Using Protein Structure to Study Evolution
21Let Us Side Track for One Minute on Structural
Bioinformatics Methodology
Biological vs Geometric Alignments Plastocyanin
versus Azurin (from Godzik 1996)
Maintain 9 of 10 interactions RMSD 1.5 Å
Maintain 5 of 10 interactions RMSD 0.5 Å
Structural Bioinformatics Unsolved Problems
22Phosphoinositide-3 Kinase (D) and Actin-Fragmin
Kinase (E)
PKA
ChaK (Channel Kinase)
Using Protein Structure to Study Evolution
23Can We Propose an Evolutionary History for the
Protein Kinase-Like Superfamily?
1 2 3 4 5
- Bayesian inference of phylogeny (MrBayes)
- Manual structure alignment produces very
high-quality sequence alignment of diverse
homologues - But, sequence information too degraded to produce
branching with sufficient support (i.e. a high
posterior probability) - Addition of a matrix of structural
characteristics (similar to morphological
characteristics) produces a well supported
combined model - Neither sequence structural characteristics
sufficient to alone produce resolved tree, must
be used in combination.
1BO1 Atypical 0 0 0 0 1
1IA9 Atypical 1 1 1 1 0
1E8X Atypical 1 0 1 1 1
1CJA Atypical 1 0 1 1 1
1NW1 Atypical 1 0 1 0 0
1J7U Atypical 1 0 1 0 1
1CDK AGC 1 1 1 0 1
1O6L AGC 1 1 1 0 1
1OMW AGC 1 1 1 0 1
1H1W AGC 1 1 1 0 1
1MUO Other 1 1 1 0 1
1TKI CAMK 1 0 1 0 1
1JKL CAMK 1 0 1 0 1
1A06 CAMK 1 0 1 0 1
1PHK CAMK 1 0 1 0 1
1KWP CAMK 1 0 1 0 1
1IA8 CAMK 1 0 1 0 0
1GNG CMGC 1 0 1 0 1
1HCK CMGC 1 0 1 0 1
1JNK CMGC 1 0 1 0 1
1HOW CMGC 1 0 1 0 1
1LP4 Other 1 0 1 0 1
1F3M STE 1 0 1 0 1
1O6Y Other 1 0 1 0 1
1CSN CK1 1 0 1 0 1
1B6C TKL 1 0 1 0 1
2SRC TK 1 0 1 0 1
1LUF TK 1 0 1 0 1
1IR3 TK 1 0 1 0 1
1M14 TK 1 0 1 0 1
1GJO TK 1 0 1 0 1
Example columns 1) Ion pair analogous to K72-E91
in PKA 2) a-Helix B present 3) State of a-Helix C
(0 kinked, 1 straight) 4) State of Strand 4 (0
kinked, 1 straight) 5) a-Helix D present
Using Protein Structure to Study Evolution
24Proposed Evolutionary History for the Protein
Kinase-Like Superfamily
APH
AGC
- Suggests distinctive history for atypical
kinases, as opposed to intermittent divergence
from the typical protein kinases (TPKs) - TPK portion of tree shows high degree of
agreement with Manning tree - Branching is supported by species representation
of kinase families
CK
CAMK
0.64
AFK
0.97
CMGC
1.0
0.85
0.78
TKL
PI3K
CK1
TK
- Atypical kinase families Blue
- Typical protein kinase groups (subfamilies) Red
- Branch labels posterior probability of branch
PIPKIIß
ChaK
Using Protein Structure to Study Evolution
25What Happens if We Use Structure to Look Across
Superfamilies?
Yang, Doolittle Bourne (2005) PNAS 102(2) 373-8
Using Protein Structure to Study Evolution
26To Answer this Question We Only Need to Make Use
of Existing Resources!
- SCOP Further catalogs Natures reductionism
into structural domains, folds, families and
superfamilies - SUPERFAMILY assigns the above to fully sequenced
proteomes
Using Protein Structure to Study Evolution
27Use of SCOP Superfamilies
- How do you distinguish convergent versus
divergent evolution? - The SCOP notion of SUPERFAMILY with evidence of
weak sequence relationships can be used to
discount convergence.
Using Protein Structure to Study Evolution
28Structure Provides an Evolutionary Fingerprint
Distribution among the three kingdomsas taken
from SUPERFAMILY
- Superfamily distributions would seem to be
related to the complexity of life - Update of the work of Caetano-Anolles2 (2003)
Genome Biology 131563
153/14
21/2
310/0
645/49
1
9/1
29/0
68/0
Any genome / All genomes
Using Protein Structure to Study Evolution
29The Unique Superfamily in Archaea d.17.6
- Archaeosine tRNA-guanine transglycosylase (tgt),
C2 domain - First step in the biosynthesis of an
archaea-specific modified base, archaeosine
(7-formamidino-7-deazaguanosine) - Found in tRNAs
- Was found exclusively in Archaea.
Reference Interpro IPR004804
Using Protein Structure to Study Evolution
30Method Distance Determination
Presence/Absence Data Matrix
(FSF) SCOP SUPERFAMILY organisms organisms organisms
(FSF) SCOP SUPERFAMILY C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
Distance Matrix
C. intestinalis C. briggsae F. rubripes
C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0
Using Protein Structure to Study Evolution
31Is Structure a Useful Discriminator of Species? -
Yes
Eukaryota
Bacteria
Archaea
The method cleanly placed all species in their
correct superkingdoms
Yang, Doolittle Bourne (2005) PNAS 102(2) 373-8
Using Protein Structure to Study Evolution
32If Structure is so Conservedis it a Useful Tool
in the Study of Evolution?The Answer Would
Appear to be Yes
- It is possible to generate a reasonable tree of
life from merely the presence or absence of
superfamilies (FSFs) within a given proteome
Yang, Doolittle Bourne (2005) PNAS 102(2) 373-8
Using Protein Structure to Study Evolution
33The Influence of Environment on Life
Chris Dupont Scripps Institute of
Oceanography UCSD
DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47)
17822-17827
Using Protein Structure to Study Evolution
34Consider the Distribution of Disulfide Bonds
among Folds
- Disulphides are only stable under oxidizing
conditions - Oxygen content gradually accumulated during the
earths evolution - The divergence of the three kingdoms occurred
1.8-2.2 billion years ago - Oxygen began to accumulate 2.0 billion years
ago - Logical deduction disulfides more prevalent in
folds (organisms) that evolved later - This would seem to hold true
- Can we take this further?
1
Using Protein Structure to Study Evolution
35Evolution of the Earth
- 4.5 billion years of change
- 30050K
- 1-5 atmospheres
- Constant photoenergy
- Chemical and geological changes
- Life has evolved in this time
- The ocean was the cradle for 90 of evolution
Using Protein Structure to Study Evolution
36Theoretical Levels of Trace Metals and Oxygen in
the Deep Ocean Through Earths History
- Whether the deep ocean became oxic or euxinic
following the rise in atmospheric oxygen (2.3
Gya) is debated, therefore both are shown (oxic
ocean-solid lines, euxinic ocean-dashed lines). - The phylogenetic tree symbols at the top of the
figure show one idea as to the theoretical
periods of diversification for each Superkingdom.
Replotted from Saito et al, 2003 Inorganica
Chimica Acta 356 308-318
Using Protein Structure to Study Evolution
37The Gaia Hypothesis
Gaia (pronounced /'ge?.?/ or /'ga?.?/) "land" or
"earth", from the Greek Ga?a is a Greek goddess
personifying the Earth
- Gaia - a complex entity involving the Earth's
biosphere, atmosphere, oceans, and soil the
totality constituting a feedback system which
seeks an optimal physical and chemical
environment for life on this planet.
James Lovelock
Using Protein Structure to Study Evolution
38The Question
- Have the emergent properties of an organism as
judged by its protein content been influenced by
the environment? - Will do this by consideration of the metallomes
of a broad range of species - The metallomes can only be deduced by
consideration of the protein structures to which
the metal is covalently bound - Will hypothesize that these emergent properties
in turn influenced the environment
Using Protein Structure to Study Evolution
39Making the Metallome of Each Species Can Only
be Done from Structure and Requires Human Effort
- Start with SCOP
- Each superfamily level assignment was checked
manually for metal binding - All the structures representing the family had to
bind the metal for it to be considered
unambiguous - The literature was consulted to resolve
ambiguities - Superfamily database used to map to proteomes
- 23 Archaea, 233 Bacteria, 57 Eukaryota
- Cu, Ni, Mo ignored (lt0.3) of proteome
Using Protein Structure to Study Evolution
40Levels of Ambiguity
- Ambiguous superfamily binds different metals or
have members that are not known to bind metals - Ditto families
- Approx 50 of superfamilies and 10 of families
are ambiguous - Only unambiguous families used in this study
Using Protein Structure to Study Evolution
41Superfamily Distribution As Well As Overall
Content Has Changed
Using Protein Structure to Study Evolution
42Fe Containing Proteins in Bacteria
- A quantile plot showing the percent of Bacterial
proteomes each Fe-binding fold family occurs in
(x). - This plot also shows the average copy number of
that fold family in the proteomes where it occurs
(?). - Few Fe-binding folds are in most proteomes.
- Widespread Fe-binding folds are not necessarily
abundant. - Similar trends are observed for Zn, Mn, and Co in
all three Superkingdoms.
Using Protein Structure to Study Evolution
43Metal Binding Proteins are Not Consistent Across
Superkingdoms
Since these data are derived from current species
they are independent of evolutionary events such
as duplication, gene loss, horizontal transfer
and endosymbiosis
Using Protein Structure to Study Evolution
44Power Laws Fundamental Constants in the
Evolution of Proteomes
- A slope of 1 indicates that a group of structural
domains is in equilibrium with genome growth,
while a slope gt 1 indicates that the group of
domains is being preferentially duplicated (or
retained in the case of genome reductions).
van Nimwegen E (2006) in Koonin EV, Wolf YI,
Karev GP, (Ed.). Power laws, scale-free
networks, and genome biology
Using Protein Structure to Study Evolution
45Metal Binding Proteins are Not Consistent Across
Superkingdoms
Using Protein Structure to Study Evolution
46Why are the Power Laws Different for Each
Superkingdom?
- Power laws are likely influenced by selective
pressure. Qualitatively, the differences in the
power law slopes describing Eukarya and Prokarya
are correlated to the shifts in trace metal
geochemistry that occur with the rise in oceanic
oxygen - We hypothesize that proteomes contain an imprint
of the environment at the time of the last common
ancestor in each Superkingdom - This suggests that Eukarya evolved in an oxic
environment, whereas the Prokarya evolved in
anoxic environments
Using Protein Structure to Study Evolution
47Do the Metallomes Contain Further Support for
this Hypothesis?
Using Protein Structure to Study Evolution
48e- Transfer ProteinsSame Broad Function, Same
Metal, Different Chemistry Induced by the
Environment?
- Fe-S clusters
- Fe bound by S
- Cluster held in place by Cys
- Generally negative reduction potentials
- Very susceptible to oxidation
- Cytochromes
- Fe bound by heme (and amino-acids)
- Generally positive reduction potentials
- Less susceptible to oxidation
Using Protein Structure to Study Evolution
49Hypothesis
- Emergence of cyanobacteria changed oxygen
concentrations - Impacted relative metal ion concentrations in the
ocean - Organisms evolved to use these metals in new ways
to evolve new biological processes eg complex
signaling\ - This in turn further impacted the environment
- Only protein structures could reveal such
dependencies
Using Protein Structure to Study Evolution
50Agenda
- What is structural bioinformatics and how do YOU
drive it? - Prerequisites the sequence-structure-function
relationship - Some exciting developments
- Using protein structure to study evolution
- Functional prediction, pathway mapping and the
RCSB PDB response - Unsolved problems
- Structure comparison
- Domain definition
- What more could be done to drive the field
forward?
51Our Methods are Still Not Good Enough - The 3D
Domain Assignment Problem
A domain is a fundamental structural, functional
and evolutionary unit of a protein
Compact Stable Have hydrophobic core Fold
independently Perform specific function Can be
re-shuffled and put together in different
combinations Evolution works on the level of
domain
Unsolved Problems 3D Domain Definition
52Evaluation of automatic domain assignment methods
Large structures, complex architectures
Structures with issues (all/most methods)
1dcea
Very small simple domains difficult to separate.
Issues minimum domain size, low contact density
NCBI method, PDP, DomainParser 5
Experts 3
PUU 6
1bxrc
1e88a
DomainParser 5
Experts 6
NCBI methods 8
NCBI 2
PDP 2
PUU 1
Experts 3
PDP 2
Unsolved Problems 3D Domain Definition
PUU 2
53Manual vs. Automatic Consensus
Chains with manual consensus 375 (80 of entire
dataset) Chains with automatic consensus 374
(80 of entire dataset) Chains with consensus
(automatic or manual) 424 (90.6 of entire
dataset)
Automatic consensus only 46 chains (10.9 of
chains with consensus)
Manual consensus only 47 chains (11.1 of chains
with consensus)
Automatic consensus and manual consensus disagree
3 chains (0.7 of chains with consensus)
JMB 2004 339(3), 647-678
Unsolved Problems 3D Domain Definition
54Natalie Dawson Unpublished
http//itol.embl.de/
55Natalie
56Agenda
- What is structural bioinformatics and how do YOU
drive it? - Prerequisites the sequence-structure-function
relationship - Some exciting developments
- Using protein structure to study evolution
- Functional prediction, pathway mapping and the
RCSB PDB response - Unsolved problems
- Structure comparison
- Domain definition
- What more could be done to drive the field
forward?
57Structure determination or modeling of whole
metabolic network
58What are the implications of this?
- Biochemical reactions, pathways, and networks can
now be described in the context of entire cells - Enables more realistic simulations of the
behavior of metabolic networks - Better understanding of evolution - compare
pathways between organisms - Predict effects of mutations and drugs
- Synthetic Biology
59(No Transcript)
60Pathway
61(No Transcript)
62Agenda
- What is structural bioinformatics and how do YOU
drive it? - Prerequisites the sequence-structure-function
relationship - Some exciting developments
- Using protein structure to study evolution
- Functional prediction, pathway mapping and the
RCSB PDB response - Unsolved problems
- Structure comparison
- Domain definition
- What more could be done to drive the field
forward?
63Better Interoperability Between the Data and the
Literature Upon Which it is Based
What More Could be Done to Drive the Field
Forward?
64Data
Knowledge
Database Knowledgebase Wikis
Datapacks Journals
Data Only
Annotation
Data Annotation
Data Some Annotation
Data Some Annotation Some Integration
PLoS iStructure
65The Database View
www.rcsb.org/pdb/explore/literature.do?structureId
1TIM
Context
What More Could be Done to Drive the Field
Forward?
66The Literature View Web 3.0?
http//betastaging.rcsb.org
What More Could be Done to Drive the Field
Forward?
67Acknowledgements
- Protein-protein Interactions
- JoLan Chung Wei Wang
- Functional Flexibility
- Jenny Gu Michael Gribskov
- Multipolar Representation
- Apostol Gramada
- Funding, NSF, NIH