The Evolution of Protein Structure and Function as Studied through Structural Bioinformatics

1 / 67
About This Presentation
Title:

The Evolution of Protein Structure and Function as Studied through Structural Bioinformatics

Description:

The Evolution of Protein Structure and Function as Studied through Structural Bioinformatics – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 68
Provided by: jenn68
Learn more at: http://www.sdsc.edu

less

Transcript and Presenter's Notes

Title: The Evolution of Protein Structure and Function as Studied through Structural Bioinformatics


1
The Evolution of Protein Structure and Function
as Studied through Structural Bioinformatics
  • Philip E. Bourne
  • Skaggs School of Pharmacy and Pharmaceutical
    Sciences
  • University of California San Diego
  • pbourne_at_ucsd.edu

2
Agenda
  • What is structural bioinformatics and how do YOU
    drive it?
  • Prerequisites the sequence-structure-function
    relationship
  • Some exciting developments
  • Using protein structure to study evolution
  • Functional prediction, pathway mapping and the
    RCSB PDB response
  • Unsolved problems
  • Structure comparison
  • Domain definition
  • What more could be done to drive the field
    forward?

3
(No Transcript)
4
Personal Definition
  • Improving our understanding of living systems
    through the study of macromolecular structure en
    masse
  • Each structure is a data point is an effort to
    gain broader understanding

2nd Edition J. Gu and P.E. Bourne (Eds.) John
Wiley and Sons NJ
What is Structural Bioinformatics?
5
A Field Driven by Your Activity
Depositions to the PDB by decade
Number of released entries
Year
What is Structural Bioinformatics?
6

A Field Subject to Some Bias
Enzymes
Proportion of enzyme classes relative to total
enzyme structures
Percent
Lysozyme Blake, Koenig, Mair, North, Phillips,
Sarma (1965) Nature 206 757
Ribonuclease Kartha, Bello, Harker (1967) Nature
213, 862-865 Wyckoff, Hardman, Allewell,
Inagami, Johnson, Richards (1967) J. Biol. Chem.
242, 3753-3757.
Decade
RNA-containing structures
Protein/RNA complexes
tRNA J.L. Sussman, S.-H. Kim (1976) Biochem
Biophys Res Commun. 6889-96 J.D. Robertus,
J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown,
B.F.C. Clark, A. Klug (1974) Nature 250
546-551.
RNA only
DNA/RNA hybrid
Protein/DNA/RNA complexes
What is Structural Bioinformatics?
Decade
7
A Field Subject to Some Bias PDB vs Human
Genome EC Hydrolases Begins to Illustrate
the Bias in the PDB
PDB
2.5 Transferring alkyl or aryl groups over
represented in PDB 2.4 Glycosyltransferases
under represented in PDB
Ensembl Human Genome Annotation
What is Structural Bioinformatics?
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
8
Agenda
  • What is structural bioinformatics and how do YOU
    drive it?
  • Prerequisites the sequence-structure-function
    relationship
  • Some exciting developments
  • Using protein structure to study evolution
  • Functional prediction, pathway mapping and the
    RCSB PDB response
  • Unsolved problems
  • Structure comparison
  • Domain definition
  • What more could be done to drive the field
    forward?

9
Sequence vs Structure
Twilight Zone
Midnight Zone
The classic hssp curve from Sander and Schneider
(1991) Proteins 956-68
The Sequence Structure Function Relationship
10
There Are No Absolute Rules - Similar Sequences
Different Structures
1HMPA Glycosyltransferase
1PIV1 Viral Capsid Protein
80 Residue Stretch (Yellow) with Over 40
Sequence Identity
The Sequence Structure Function Relationship
11
Structure vs Function Follows a Power Law
Distribution
  • Some folds are promiscuous and adopt many
    different functions - superfolds

Qian J, Luscombe NM, Gerstein M. JMB
2001313(4)673-81
The Sequence Structure Function Relationship
12
Examples of Superfolds..
The Sequence Structure Function Relationship
13
Structure Is Highly Redundant
Structure Alignments using CE with zgt4.0
I.N. Shindyalov and P.E. Bourne 2000 Proteins
38(3), 247-260
The Sequence Structure Function Relationship
14
How Can we Utilize these Seemingly Complex
Relationships?
15
Agenda
  • What is structural bioinformatics and how do YOU
    drive it?
  • Prerequisites the sequence-structure-function
    relationship
  • Some exciting developments
  • Using protein structure to study evolution
  • Functional prediction, pathway mapping and the
    RCSB PDB response
  • Unsolved problems
  • Structure comparison
  • Domain definition
  • What more could be done to drive the field
    forward?

16
Natures Reductionism
There are 20300 possible proteins gtgtgtgt all the
atoms in the Universe
9.5M protein sequences from UniProt/TrEMBL
(10/09)
38,221 protein structures Yield 1195 folds, 1962
superfamilies, 3902 families (SCOP 1.75)
Using Protein Structure to Study Evolution
17
Consider First the Evolutionary History of One
Superfamily the Protein Kinase-like Superfamily
E. Scheeff and P.E. Bourne 2005 PLoS Comp. Biol.
1(5) e49.
Using Protein Structure to Study Evolution
18
The Protein Kinase-like Superfamily
  • A large family important to signal transduction
    in eukaryotes and many bacteria.
  • Phosphotransferases transfer phosphate group
    from ATP to Ser/Thr or Tyr residue on target
    protein, producing a range of downstream
    signaling effects.
  • PKA an example of a typical protein kinase
    (TPK) fold, shown in open book format

Using Protein Structure to Study Evolution
19
The Protein Kinase-Like Superfamily
  • A range of different families, all
    phosphotransferases
  • A variety of different targets
  • All possess a core cassette of elements shared
    with the TPKs
  • ATP binding
  • Catalysis
  • Structures can be highly variable, particularly
    in the substrate binding regions

Family Structural Representative Phosphorylates Biological result
Typical Protein Kinases (TPKs) Protein Kinase A (PKA) Ser/Thr or Tyr residues of proteins Range of signaling effects
Alpha kinases Channel Kinase (ChaK) Ser/Thr residues in alpha-helices Range of signaling effects
Actin-Fragmin Kinase (AFK) Actin-Fragmin Kinase (AFK) Thr residue of actin Control of actin polymerization
Phosphatidyl -inositol 3- and 4-kinases Phosphatidylinositol 3-kinase (PI3K) Phosphatidylinositol (PI), PI-phosphates, PI-bisphosphates Range of second-messenger signaling effects
Phosphatidyl-inositol phosphate kinases Phosphatidylinositol phosphate kinase (PIPK) PI-phosphates Range of second-messenger signaling effects
Choline/ ethanolamine kinases Choline Kinase (CK) Choline Part of pathway that eventually produces phoshpatidylcholine, important constituent of membranes
Aminoglycoside Kinases Aminoglycoside Kinases (AK) Aminoglycoside antibiotics Antibiotic resistance
Using Protein Structure to Study Evolution
20
Method
  • Begin with a multiple structure alignment using
    CE-MC (NAR 2004) of 30 comparable TPKs and APKs
    and manually correct in a pair-wise manner over a
    period of 1-2 person years
  • Review the literature on each structure
  • Review the associated sequence alignments derived
    from structure

E. Scheeff and P.E. Bourne 2005 PLoS Comp. Biol.
1(5) e49.
Using Protein Structure to Study Evolution
21
Let Us Side Track for One Minute on Structural
Bioinformatics Methodology
Biological vs Geometric Alignments Plastocyanin
versus Azurin (from Godzik 1996)
Maintain 9 of 10 interactions RMSD 1.5 Å
Maintain 5 of 10 interactions RMSD 0.5 Å
Structural Bioinformatics Unsolved Problems
22
Phosphoinositide-3 Kinase (D) and Actin-Fragmin
Kinase (E)
PKA
ChaK (Channel Kinase)
Using Protein Structure to Study Evolution
23
Can We Propose an Evolutionary History for the
Protein Kinase-Like Superfamily?
1 2 3 4 5
  • Bayesian inference of phylogeny (MrBayes)
  • Manual structure alignment produces very
    high-quality sequence alignment of diverse
    homologues
  • But, sequence information too degraded to produce
    branching with sufficient support (i.e. a high
    posterior probability)
  • Addition of a matrix of structural
    characteristics (similar to morphological
    characteristics) produces a well supported
    combined model
  • Neither sequence structural characteristics
    sufficient to alone produce resolved tree, must
    be used in combination.

1BO1 Atypical 0 0 0 0 1
1IA9 Atypical 1 1 1 1 0
1E8X Atypical 1 0 1 1 1
1CJA Atypical 1 0 1 1 1
1NW1 Atypical 1 0 1 0 0
1J7U Atypical 1 0 1 0 1
1CDK AGC 1 1 1 0 1
1O6L AGC 1 1 1 0 1
1OMW AGC 1 1 1 0 1
1H1W AGC 1 1 1 0 1
1MUO Other 1 1 1 0 1
1TKI CAMK 1 0 1 0 1
1JKL CAMK 1 0 1 0 1
1A06 CAMK 1 0 1 0 1
1PHK CAMK 1 0 1 0 1
1KWP CAMK 1 0 1 0 1
1IA8 CAMK 1 0 1 0 0
1GNG CMGC 1 0 1 0 1
1HCK CMGC 1 0 1 0 1
1JNK CMGC 1 0 1 0 1
1HOW CMGC 1 0 1 0 1
1LP4 Other 1 0 1 0 1
1F3M STE 1 0 1 0 1
1O6Y Other 1 0 1 0 1
1CSN CK1 1 0 1 0 1
1B6C TKL 1 0 1 0 1
2SRC TK 1 0 1 0 1
1LUF TK 1 0 1 0 1
1IR3 TK 1 0 1 0 1
1M14 TK 1 0 1 0 1
1GJO TK 1 0 1 0 1
Example columns 1) Ion pair analogous to K72-E91
in PKA 2) a-Helix B present 3) State of a-Helix C
(0 kinked, 1 straight) 4) State of Strand 4 (0
kinked, 1 straight) 5) a-Helix D present
Using Protein Structure to Study Evolution
24
Proposed Evolutionary History for the Protein
Kinase-Like Superfamily
APH
AGC
  • Suggests distinctive history for atypical
    kinases, as opposed to intermittent divergence
    from the typical protein kinases (TPKs)
  • TPK portion of tree shows high degree of
    agreement with Manning tree
  • Branching is supported by species representation
    of kinase families

CK
CAMK
0.64
AFK
0.97
CMGC
1.0
0.85
0.78
TKL
PI3K
CK1
TK
  • Atypical kinase families Blue
  • Typical protein kinase groups (subfamilies) Red
  • Branch labels posterior probability of branch

PIPKIIß
ChaK
Using Protein Structure to Study Evolution
25
What Happens if We Use Structure to Look Across
Superfamilies?
Yang, Doolittle Bourne (2005) PNAS 102(2) 373-8
Using Protein Structure to Study Evolution
26
To Answer this Question We Only Need to Make Use
of Existing Resources!
  • SCOP Further catalogs Natures reductionism
    into structural domains, folds, families and
    superfamilies
  • SUPERFAMILY assigns the above to fully sequenced
    proteomes

Using Protein Structure to Study Evolution
27
Use of SCOP Superfamilies
  • How do you distinguish convergent versus
    divergent evolution?
  • The SCOP notion of SUPERFAMILY with evidence of
    weak sequence relationships can be used to
    discount convergence.

Using Protein Structure to Study Evolution
28
Structure Provides an Evolutionary Fingerprint
Distribution among the three kingdomsas taken
from SUPERFAMILY
  • Superfamily distributions would seem to be
    related to the complexity of life
  • Update of the work of Caetano-Anolles2 (2003)
    Genome Biology 131563

153/14
21/2
310/0
645/49
1
9/1
29/0
68/0
Any genome / All genomes
Using Protein Structure to Study Evolution
29
The Unique Superfamily in Archaea d.17.6
  • Archaeosine tRNA-guanine transglycosylase (tgt),
    C2 domain
  • First step in the biosynthesis of an
    archaea-specific modified base, archaeosine
    (7-formamidino-7-deazaguanosine)
  • Found in tRNAs
  • Was found exclusively in Archaea.

Reference Interpro IPR004804
Using Protein Structure to Study Evolution
30
Method Distance Determination
Presence/Absence Data Matrix
(FSF) SCOP SUPERFAMILY organisms organisms organisms
(FSF) SCOP SUPERFAMILY C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
Distance Matrix
C. intestinalis C. briggsae F. rubripes
C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0
Using Protein Structure to Study Evolution
31
Is Structure a Useful Discriminator of Species? -
Yes
Eukaryota
Bacteria
Archaea
The method cleanly placed all species in their
correct superkingdoms
Yang, Doolittle Bourne (2005) PNAS 102(2) 373-8
Using Protein Structure to Study Evolution
32
If Structure is so Conservedis it a Useful Tool
in the Study of Evolution?The Answer Would
Appear to be Yes
  • It is possible to generate a reasonable tree of
    life from merely the presence or absence of
    superfamilies (FSFs) within a given proteome

Yang, Doolittle Bourne (2005) PNAS 102(2) 373-8
Using Protein Structure to Study Evolution
33
The Influence of Environment on Life
Chris Dupont Scripps Institute of
Oceanography UCSD
DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47)
17822-17827
Using Protein Structure to Study Evolution
34
Consider the Distribution of Disulfide Bonds
among Folds
  • Disulphides are only stable under oxidizing
    conditions
  • Oxygen content gradually accumulated during the
    earths evolution
  • The divergence of the three kingdoms occurred
    1.8-2.2 billion years ago
  • Oxygen began to accumulate 2.0 billion years
    ago
  • Logical deduction disulfides more prevalent in
    folds (organisms) that evolved later
  • This would seem to hold true
  • Can we take this further?

1
Using Protein Structure to Study Evolution
35
Evolution of the Earth
  • 4.5 billion years of change
  • 30050K
  • 1-5 atmospheres
  • Constant photoenergy
  • Chemical and geological changes
  • Life has evolved in this time
  • The ocean was the cradle for 90 of evolution

Using Protein Structure to Study Evolution
36
Theoretical Levels of Trace Metals and Oxygen in
the Deep Ocean Through Earths History
  • Whether the deep ocean became oxic or euxinic
    following the rise in atmospheric oxygen (2.3
    Gya) is debated, therefore both are shown (oxic
    ocean-solid lines, euxinic ocean-dashed lines).
  • The phylogenetic tree symbols at the top of the
    figure show one idea as to the theoretical
    periods of diversification for each Superkingdom.

Replotted from Saito et al, 2003 Inorganica
Chimica Acta 356 308-318
Using Protein Structure to Study Evolution
37
The Gaia Hypothesis
Gaia (pronounced /'ge?.?/ or /'ga?.?/) "land" or
"earth", from the Greek Ga?a is a Greek goddess
personifying the Earth
  • Gaia - a complex entity involving the Earth's
    biosphere, atmosphere, oceans, and soil the
    totality constituting a feedback system which
    seeks an optimal physical and chemical
    environment for life on this planet.

James Lovelock
Using Protein Structure to Study Evolution
38
The Question
  • Have the emergent properties of an organism as
    judged by its protein content been influenced by
    the environment?
  • Will do this by consideration of the metallomes
    of a broad range of species
  • The metallomes can only be deduced by
    consideration of the protein structures to which
    the metal is covalently bound
  • Will hypothesize that these emergent properties
    in turn influenced the environment

Using Protein Structure to Study Evolution
39
Making the Metallome of Each Species Can Only
be Done from Structure and Requires Human Effort
  1. Start with SCOP
  2. Each superfamily level assignment was checked
    manually for metal binding
  3. All the structures representing the family had to
    bind the metal for it to be considered
    unambiguous
  4. The literature was consulted to resolve
    ambiguities
  5. Superfamily database used to map to proteomes
  6. 23 Archaea, 233 Bacteria, 57 Eukaryota
  7. Cu, Ni, Mo ignored (lt0.3) of proteome

Using Protein Structure to Study Evolution
40
Levels of Ambiguity
  • Ambiguous superfamily binds different metals or
    have members that are not known to bind metals
  • Ditto families
  • Approx 50 of superfamilies and 10 of families
    are ambiguous
  • Only unambiguous families used in this study

Using Protein Structure to Study Evolution
41
Superfamily Distribution As Well As Overall
Content Has Changed
Using Protein Structure to Study Evolution
42
Fe Containing Proteins in Bacteria
  • A quantile plot showing the percent of Bacterial
    proteomes each Fe-binding fold family occurs in
    (x).
  • This plot also shows the average copy number of
    that fold family in the proteomes where it occurs
    (?).
  • Few Fe-binding folds are in most proteomes.
  • Widespread Fe-binding folds are not necessarily
    abundant.
  • Similar trends are observed for Zn, Mn, and Co in
    all three Superkingdoms.

Using Protein Structure to Study Evolution
43
Metal Binding Proteins are Not Consistent Across
Superkingdoms
Since these data are derived from current species
they are independent of evolutionary events such
as duplication, gene loss, horizontal transfer
and endosymbiosis
Using Protein Structure to Study Evolution
44
Power Laws Fundamental Constants in the
Evolution of Proteomes
  • A slope of 1 indicates that a group of structural
    domains is in equilibrium with genome growth,
    while a slope gt 1 indicates that the group of
    domains is being preferentially duplicated (or
    retained in the case of genome reductions).

van Nimwegen E (2006) in Koonin EV, Wolf YI,
Karev GP, (Ed.). Power laws, scale-free
networks, and genome biology
Using Protein Structure to Study Evolution
45
Metal Binding Proteins are Not Consistent Across
Superkingdoms
Using Protein Structure to Study Evolution
46
Why are the Power Laws Different for Each
Superkingdom?
  • Power laws are likely influenced by selective
    pressure. Qualitatively, the differences in the
    power law slopes describing Eukarya and Prokarya
    are correlated to the shifts in trace metal
    geochemistry that occur with the rise in oceanic
    oxygen
  • We hypothesize that proteomes contain an imprint
    of the environment at the time of the last common
    ancestor in each Superkingdom
  • This suggests that Eukarya evolved in an oxic
    environment, whereas the Prokarya evolved in
    anoxic environments

Using Protein Structure to Study Evolution
47
Do the Metallomes Contain Further Support for
this Hypothesis?
Using Protein Structure to Study Evolution
48
e- Transfer ProteinsSame Broad Function, Same
Metal, Different Chemistry Induced by the
Environment?
  • Fe-S clusters
  • Fe bound by S
  • Cluster held in place by Cys
  • Generally negative reduction potentials
  • Very susceptible to oxidation
  • Cytochromes
  • Fe bound by heme (and amino-acids)
  • Generally positive reduction potentials
  • Less susceptible to oxidation

Using Protein Structure to Study Evolution
49
Hypothesis
  • Emergence of cyanobacteria changed oxygen
    concentrations
  • Impacted relative metal ion concentrations in the
    ocean
  • Organisms evolved to use these metals in new ways
    to evolve new biological processes eg complex
    signaling\
  • This in turn further impacted the environment
  • Only protein structures could reveal such
    dependencies

Using Protein Structure to Study Evolution
50
Agenda
  • What is structural bioinformatics and how do YOU
    drive it?
  • Prerequisites the sequence-structure-function
    relationship
  • Some exciting developments
  • Using protein structure to study evolution
  • Functional prediction, pathway mapping and the
    RCSB PDB response
  • Unsolved problems
  • Structure comparison
  • Domain definition
  • What more could be done to drive the field
    forward?

51
Our Methods are Still Not Good Enough - The 3D
Domain Assignment Problem
A domain is a fundamental structural, functional
and evolutionary unit of a protein
Compact Stable Have hydrophobic core Fold
independently Perform specific function Can be
re-shuffled and put together in different
combinations Evolution works on the level of
domain
Unsolved Problems 3D Domain Definition
52
Evaluation of automatic domain assignment methods
Large structures, complex architectures
Structures with issues (all/most methods)
1dcea
Very small simple domains difficult to separate.
Issues minimum domain size, low contact density
NCBI method, PDP, DomainParser 5
Experts 3
PUU 6
1bxrc
1e88a
DomainParser 5
Experts 6
NCBI methods 8
NCBI 2
PDP 2
PUU 1
Experts 3
PDP 2
Unsolved Problems 3D Domain Definition
PUU 2
53
Manual vs. Automatic Consensus
Chains with manual consensus 375 (80 of entire
dataset) Chains with automatic consensus 374
(80 of entire dataset) Chains with consensus
(automatic or manual) 424 (90.6 of entire
dataset)
Automatic consensus only 46 chains (10.9 of
chains with consensus)
Manual consensus only 47 chains (11.1 of chains
with consensus)
Automatic consensus and manual consensus disagree
3 chains (0.7 of chains with consensus)
JMB 2004 339(3), 647-678
Unsolved Problems 3D Domain Definition
54
Natalie Dawson Unpublished
http//itol.embl.de/
55
Natalie
56
Agenda
  • What is structural bioinformatics and how do YOU
    drive it?
  • Prerequisites the sequence-structure-function
    relationship
  • Some exciting developments
  • Using protein structure to study evolution
  • Functional prediction, pathway mapping and the
    RCSB PDB response
  • Unsolved problems
  • Structure comparison
  • Domain definition
  • What more could be done to drive the field
    forward?

57
Structure determination or modeling of whole
metabolic network
58
What are the implications of this?
  • Biochemical reactions, pathways, and networks can
    now be described in the context of entire cells
  • Enables more realistic simulations of the
    behavior of metabolic networks
  • Better understanding of evolution - compare
    pathways between organisms
  • Predict effects of mutations and drugs
  • Synthetic Biology

59
(No Transcript)
60
Pathway
61
(No Transcript)
62
Agenda
  • What is structural bioinformatics and how do YOU
    drive it?
  • Prerequisites the sequence-structure-function
    relationship
  • Some exciting developments
  • Using protein structure to study evolution
  • Functional prediction, pathway mapping and the
    RCSB PDB response
  • Unsolved problems
  • Structure comparison
  • Domain definition
  • What more could be done to drive the field
    forward?

63
Better Interoperability Between the Data and the
Literature Upon Which it is Based
What More Could be Done to Drive the Field
Forward?
64
Data
Knowledge
Database Knowledgebase Wikis
Datapacks Journals
Data Only
Annotation
Data Annotation
Data Some Annotation
Data Some Annotation Some Integration
PLoS iStructure
65
The Database View
www.rcsb.org/pdb/explore/literature.do?structureId
1TIM
Context
What More Could be Done to Drive the Field
Forward?
66
The Literature View Web 3.0?
http//betastaging.rcsb.org
What More Could be Done to Drive the Field
Forward?
67
Acknowledgements
  • Protein-protein Interactions
  • JoLan Chung Wei Wang
  • Functional Flexibility
  • Jenny Gu Michael Gribskov
  • Multipolar Representation
  • Apostol Gramada
  • Funding, NSF, NIH
Write a Comment
User Comments (0)