Title: CS 177 Proteins I (Structure-function relationships)
1CS 177 Proteins I (Structure-function
relationships)
Review of protein structures Computational
modeling Three-dimensional structural analysis
in laboratory
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
2Structure-function relationships
Recommended readings
A science primer Molecular modeling http//www.nc
bi.nlm.nih.gov/About/primer/molecularmod.html
Brown, S.M. (2000) Bioinformatics, Eaton
Publishing, pp. 99-119 Veeramalai, M.
Gilbert, D. Bioinformatics Tools for protein
structure visualisation and analysis http//www.br
c.dcs.gla.ac.uk/mallika/Publications/scwbiw-artic
le.htm Mount, D.W. (2001) Bioinformatics,Cold
Spring Harbor Lab Press, pp.382-478
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
3Review of protein structure
Primary structure
Proteins are chains of amino acids joined by
peptide bonds
The N-C?-C sequence is repeated throughout the
protein, forming the backbone
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
The bonds on each side of the C? atom are free to
rotate within spatial constrains,the angles of
these bonds determine the conformation of the
protein backbone
The R side chains also play an important
structural role
4Review of protein structure
Secondary structure
Interactions that occur between the CO and N-H
groups on amino acidsMuch of the protein core
comprises ? helices and ? sheets, folded into a
three-dimensional configuration- regular
patterns of H bonds are formed between
neighboring amino acids- the amino acids have
similar angles- the formation of these
structures neutralizes the polar groups on each
amino acid- the secondary structures are tightly
packed in a hydrophobic environment- Each R side
group has a limited volume to occupy and a
limited number of interactions with other R
side groups
? sheet
? helix
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
5 Secondary structure
Other Secondary structure elements(no
standardized classification)
- random coil
- loop
- others (e.g. 310 helix, ?-hairpin, paperclip)
Super-secondary structure
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
- In addition to secondary structure elements
that apply to all proteins (e.g. helix, sheet)
there are some simple structural motifs in some
proteins
- These super-secondary structures (e.g.
transmembrane domains, coiled coils,
helix-turn-helix, signal peptides) can give
important hints about protein function
6Review of protein structure
Q If we have all the Psi and Phi angles in a
protein, do we then have enough information
to describe the 3-D structure?
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
7Tertiary structure
- The tertiary structure describes the organization
in three dimensionsof all the atoms in the
polypeptide - The tertiary structure is determined by a
combination of different types of bonding-
Ionic interactions between oppositely charged
residues can pull them together, - Hydrogen Bonds - Hydrogens are partially
positively charged, are attracted to partially
negative oxygens. (weaker) - - van Der Waals - hydrophobic residues become
attractive to each other when forced together
by exclusion from the aqueous surroundings.
(weakest) -
- Many of these bonds are very week and easy to
break, but hundreds or thousands working together
give the protein structure great stability - If a protein consists of only one polypeptide
chain, this level then describes thecomplete
structure
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
8Tertiary structure
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
9Tertiary structure
Proteins can be divided into two general classes
based on their tertiary structure - Fibrous
proteins have elongated structure with the
polypeptide chains arranged in long strands.
This class of proteins serves as major structural
component of cells Examples silk, keratin,
collagen
- Globular proteins have more compact, often
irregular structures. This class of proteins
includes most enzymes and most proteins
involved in gene expression and regulation
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
10Quaternary structure
The quaternary structure defines the conformation
assumed by a multimeric protein.The individual
polypeptide chains that make up a multimeric
protein are often referred toas protein
subunits. Subunits are joined by ionic, H and
hydrophobic interactions ExampleHaemoglobin(4
subunits)
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
11Summary protein structure
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
12Summary protein structure
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
13Structure displays
Common displays are (among others) cartoon,
spacefill, and backbone
spacefill
backbone
cartoon
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
14Need for analyses of protein structures
A protein performs metabolic, structural, or
regulatory functions in a cell. Cellular
biochemistry works based on interactions between
3-D molecular structures
The 3-D structure of a protein determines its
function
Therefore, the relationship of sequence to
function is primarily concerned with
understanding the 3-D folding of proteins and
inferring protein functions from these 3-D
structures(e.g. binding sites, catalytic
activities, interactions with other molecules)
The study of protein structure is not only of
fundamental scientific interest in terms of
understanding biochemical processes, but also
produces very valuable practical benefits
Medicine The understanding of enzyme function
allows the design of new and improved drugs
Agriculture Therapeutic proteins and drugs for
veterinary purposes and for treatment of plant
diseases Industry Protein engineering has
potential for the synthesis of enzymes to carry
out various industrial processes on a mass scale
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
15Sources of protein structure information
3-D macromolecular structures stored in databases
The most important database the Protein Data
Bank (PDB)The PDB is maintained by the Research
Collaboratory for Structural Bioinformatics
(RCSB) and can be accessed at three different
sites (plus a number of mirror sites outside the
USA) - http//rcsb.rutgers.edu/pdb (Rutgers
University)- http//www.rcsb.org/pdb/ (San Diego
Supercomputer Center)- http//tcsb.nist.gov/pdb/
(National Institute for Standards and
Technology) It is the very first
bioinformatics database ever build
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
16Sources of protein structure information
Experimental structure determination
In practice, most biomolecular structures (gt99
of structures in PDB) are determined using three
techniques- X-ray crystallography (low to very
high resolution) Problem requires crystals
difficult to crystallize proteins by maintaining
their native conformation not all protein can
be crystallized
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
17X-ray crystallography
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
18Sources of protein structure information
Experimental structure determination
In practice, most biomolecular structures (gt99
of structures in PDB) are determined using three
techniques- X-ray crystallography (low to very
high resolution) Problem requires crystals
difficult to crystallize proteins by maintaining
their native conformation not all protein can
be crystallized - Nuclear magnetic resonance
(NMR) spectroscopy of proteins in solution
(medium to high resolution) Problem Works only
with small and medium size proteins (50 of
proteins cannot be studied with this method)
requires high solubility - Electron microscopy
and crystallography (low to medium resolution)
Problem (still) relatively low resolution
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
Experimental methods are still very time
consuming and expensive in most cases the
experimental data will contain errors and/or are
incomplete. Thus the initial model needs to be
refined and rebuild
19Sources of protein structure information
Computational Modeling
Researches have been working for decades to
develop procedures for predicting protein
structure that are not so time consuming and not
hindered by size and solubility constrains. As
protein sequences are encoded in DNA, in
principle, it should therefore be possible to
translate a gene sequence into an amino acid
sequence, and topredict the three-dimensional
structure of the resulting chain from this amino
acid sequence
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
20Computational modeling
How to predict the protein structure?
Ab initio prediction of protein structure from
sequence not yet. Problem the information
contained in protein structures lies essentially
in theconformational torsion angles. Even if we
only assume that every amino-acid residuehas
three such torsion angles, and that each of these
three can only assume oneof three "ideal" values
(e.g., 60, 180 and -60 degrees), this still
leaves us with 27possible conformations per
residue.
For a typical 200-amino acid protein, this would
give 27200 (roughly 1.87 x 10286)possible
conformations!
Q Cant we just generate all these
conformations, calculate their energy and
see which conformation has the lowest
energy?
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
21Computational modeling
Solution homology modeling
Homology (comparative) modeling attempts to
predict structure on the strengthof a proteins
sequence similarity to another protein of known
structure
Basic idea a significant alignment of the query
sequence with a target sequence from PDB is
evidence that the query sequence has a similar
3-D structure (current threshold 40 sequence
identity). Then multiple sequence alignment and
pattern analysis can be used to predict the
structure of the protein
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
22Computational modeling
Flow chart for protein structure prediction (from
Mount, 2001)
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
23Computational modeling
Protein sequence - partial or full sequences
predicted through gene finding
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
24Computational modeling
Database similarity search - sequence is used as
a query in a database similarity search against
proteins in PDB
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
25Computational modeling
- Does the sequence align with a protein of known
structure? - Yes if the database similarity search reveals
a significant alignment between the query
sequence and a PDB target sequence, the alignment
can be used to position the amino acids of
the query sequence in the same approximate 3-D
structure - No proceed to protein family analysis
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
26Computational modeling
- Protein family analysis/relationship to known
structure - Family (structural context) structures that
have a significant level of structural
similarity but not necessarily significant
sequence similarity - the goal is to exploit these structure sequence
relationships two questions 1) is the new
protein a member of a family, 2) does the family
have a predicted structural fold? - analyze sequence for family specific profiles
and patterns. Available databases 3D-Ali,
3D-PSSM, BLOCKS, eMOTIF, INTERPRO, Pfam ) - if the family analysis reveals that the query
protein is a member of a family with a
predicted structural fold, multiple alignment can
be used for structural modeling
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
27Computational modeling
- Protein family analysis/relationship to known
structure - if the family analysis is unsuccessful, proceed
to structural analyses
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
28Computational modeling
- Structural analysis
- several different types of analyses to infer
structural information - presence of small amino acid motifs in a protein
can be indicator of a biochemical function
associated with a particular structure. Motifs
are available from the Prosite catalog - spacing and arrangement of amino acids (e.g.
hydrophobic amino acids) provide important
structural clues that can be used for modeling - certain amino acid combinations can occur in
certain types of secondary structure - - These structural analyses can provide clues as
to the presence of active sites and regions of
secondary structure. These information can help
to identify a new protein as a member of a
known structural class
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
29Computational modeling
- 3-D structural analysis in lab
- proteins that fail to show any relationship to
proteins of known structure are candidates for
structural analyses (X-ray crystallography, NMR).
There are about 600 known fold families and new
structures are frequently found to have already
known structural fold. Accordingly, protein
families with no relatives of known structure may
represent a novel fold
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
30Computational modeling summary
Partial or full sequencespredicted through gene
finding
Similarity searchagainst proteins in PDB
Find structures that have a significantlevel of
structural similarity (but notnecessarily
significant sequence similarity)
Alignment can be used to position theamino
acids of the query sequence inthe same
approximate 3-D structure
If member of a family with a predicted
structural fold, multiple alignment can be used
for structural modeling
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
Infer structural information (e.g. presence of
smallamino acid motifs spacing and arrangement
ofamino acids certain typical amino acid
combinationsassociated with certain types of
secondary structure)can provide clues as to the
presence of active sites andregions of
secondary structure
Structural analyses in the lab(X-ray
crystallography, NMR)
31Computational modeling summary
How to predict the protein structure?
Ab initio prediction of protein structure from
sequence
Homology (comparative) modeling attempts to
predict structure on the strength of a
proteins sequence similarity to another protein
of known structure
Experimental structure determination
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
32Computational modeling
Viewing protein structures
A number of molecular viewers are freely
available and run on most computer platforms and
operating systems Examples Cn3D 4.1
(stand-alone) Rasmol (stand-alone) Chime
(Web browser based on Rasmol) Swiss 3D viewer
Spdbv (stand-alone) All these viewers can use
the PDB identification code or the structural
file from PDB
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory