Bioinformatics Tools for Structural Biology - PowerPoint PPT Presentation

1 / 76

About This Presentation

Title:

Bioinformatics Tools for Structural Biology

Description:

Bioinformatics Tools for Structural Biology – PowerPoint PPT presentation

Number of Views:115

Avg rating:3.0/5.0

Slides: 77

Provided by: rajarsh8

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics Tools for Structural Biology

1
Bioinformatics Tools for Structural Biology

Rajarshi Maiti
Supervisor Dr. David Wishart
M.Sc. Student
Faculty of Pharmacy and Pharmaceutical
Sciences

698 Seminar April 20, 2005
2
Bioinformatics Tools
Structural Biology
Part I 3D Structure prediction of Proteins
Part II Structural Superposition of Proteins
Part III Protein Motions and Movements
Description of a web server called MovieMaker
Description of a Structure Prediction algorithm
Description of a web server called SuperPose
3
Part I
Distance Geometry Approach for 3D Protein
Structure Prediction
4
Why study Protein structures?

Lot of drugs or drug targets are proteins
The shape and size of a proteins active site is
helpful for designing new drug molecules
Structural information of all the proteins in the
human proteome will lead to robust and
cost-efficient drug discovery and development

5
Protein Structure Determination

Protein folding process
One of the most challenging problems in the life
sciences
Has puzzled scientists over the last 30 years
Anfinsens experiment in 1973 demonstrated that
the amino acid sequence contained enough
information for the folding process
Since then, researchers have tried to find the
protein folding pathway

6
Different levels of Protein structure
Primary structure
METPQTHVVIKEASGHTSERNMKLGH
Secondary structure
METPQTHVVIKEASGHTSERNMKLGH
HHHHHHHCCCCCCCBBBBBBBBBBBB
Folding process
Beta Sheet
Helix
Loop/Coil
Tertiary structure
7
The Folding Process

How long is the folding process?
One might guess that a protein randomly samples
all of its conformations before settling on the
lowest energy conformer
In 1968, Cyrus Levinthal demonstrated that this
is not so

8
The Folding Process..

Levinthals calculations
Let us consider a 100 residue protein, and let
each residue take 2 possible conformations
Therefore
2100 possible conformations
If 10-12 secs is allowed to sample each
conformation, then
total time 1018 seconds 1010 years !!!!!!
Age of universe 20 billion years 1017 seconds
Hence, proteins do not sample all possible
conformations

9
The Folding Process..
10-6 10-3 seconds
Native structure with minimum energy
Protein folding landscape
Protein folding pathway
10
Protein structure determination

Experimental techniques
NMR Spectroscopy
X-ray crystallography
Laborious and lengthy process (months to years)
Not all proteins are experimentally amenable to
structure determination

Hence, computer based techniques have been tried
and are still being explored to find the 3D
structures of proteins from its sequence
11
Protein structure determination
Computational methods
Ab initio or de novo
Homology Modeling

Models the structure from sequence
Simulates the natural folding process
Generates rough models (gt5? RMSD)

Requires sequence similarity (gt30)
Most accurate structure prediction method
Quality of structure depends on the
sequence similarity
Structures are of high quality (lt1.5? RMSD)

12
Homology Modeling
13
Ab initio methods for3D protein structure
determination

Systematic searches
Monte Carlo sampling
Genetic Algorithms
Distance Geometry

Two main components of an ab initio prediction
program are
Efficient sampling of the conformational search
space
Scoring function to select the near native
structure

14
Systematic Search
Feasible for small peptides, not suitable for
proteins
15
Monte Carlo sampling
16
Genetic Algorithm

Concept borrowed from Computer Science
A population of conformations is spawned
Mutation, reproduction and crossover operations
are performed to simulate the natural selection
process

Mutation Operator
100100101001010101
100100001001010101
Reproduce Operator
100100001001010101
100100001001010101
100100001001 010101
X 001000101010 101001
100100001001 101001
001000101010 010100
Crossover Operator
17
Distance based methods

Useful when inter atomic/residual distances are
known

5.4Å
N atoms/residues N(N-1)/2 distances
1.2Å
2.2Å
1.4Å
5.4Å
Distance space
Embedding process
2.2Å
1.2Å
3.5Å
Distance Geometry process
18
Distance Geometry Process
A B C D E F
A B C D E F
0
DAB
DAC
DAD
DAF
DAE
DBE
DBF
DBC
DBD
dBA
0
Setup a Distance Matrix
DCD
DCF
DCE
0
dCB
dCA
0
DDE
DDF
dDC
dDB
dDA
DEF
0
dEA
dEB
dEC
dED
0
dFB
dFC
dFA
dFD
dFE
Embedding Process
DAB Maximum distance between A and B
dAB Minimum distance between A and B
(x, y, z)
(x, y, z)
Cartesian space
(x, y, z)
(x, y, z)
19
Distance Matrix for 1ROP
53 residues, so 53(53-1)/2 1378 distances
LARGE Distance Matrix
Computationally expensive and time consuming
20
Simplify by using a Reduced Representation

Represent the proteins secondary structure as
blocks.

Helix Cylinder Beta sheet Rectangular board
21
Simplification.
Distance matrix setup
1 3 7 8
D
A
D
A
Distance matrix setup
4
Reduced representation
4
C
B
C
B
22
How do we get the distances for the blocks?
D
A
B
C
23
Sequence to Structure
Randomize
Conformational search in distance space
Embedding
Embedding
Cartesian space
24
Sequence to Structure
Excluded Volume Check
Cartesian space
Coordinate mapping
Select best structure by scoring function
25
Coordinate Mapping
(x1, y1, z1)
(x4, y4, z4)
Parallel and anti-parallel beta sheet templates
(x2, y2, z2)
(x3, y3, z3)
Insert loop
Superimpose helices on start and end coordinates
27 residue helical template
Loop library
26
Conformational search using Distance Geometry
METPQTHVVIKEASGHTSERNMKLGHFERDSAMNPLTVWY
Obtain secondary structure information from
sequence
F L O W C H A R T
METPQTHVVIKEASGHTSERNMKLGHFERDSAMNPLTVWY HHHHHHHCC
CCCCCBBBBBBBBBBBBBCCCCCCHHHHHH
Convert secondary structure information to blocks
METPQTHVVIKEASGHTSERNMKLGHFERDSAMNPLTVWY
Conformational search
Best 3D structure
27
Results
All atom/residue distances
Reduced representation
LARGE Distance Matrix
small Distance Matrix
Cluster Computers
Desktop Computer
28
Results..
2.0 GHz processor 512 MB RAM
216 Pentium III 450 MHz CPUs
29
Results..
Randomizing Embedding Excluded Volume Check
No. of viable conformations
Distance space
30
Results..
Best Structure
Total Time
Total time Randomization
Embedding
Excluded Volume Check
Coordinate Mapping
31
Results..
1ROP (55 residues) BB RMSD 2.32Å Search time
11.6 mins Total time 16 hours
2SPZ (49 residues) BB RMSD 3.81Å Search time
17.2 mins Total time 7 hours
1BW5(46 residues) BB RMSD 4.01Å Search time
16.9 mins Total time 2 hours
Blue Native Protein Red Predicted Protein
32
Results..
1VII (29 residues) BB RMSD 2.63Å Search time
5.4 mins Total time 12 hours
1ENH (45 residues) BB RMSD 3.61Å Search time
16.9 mins Total time 2 hours
1BDC (44 residues) BB RMSD 3.17Å Search time
17.7 mins Total time 12 hours
1AHO (31 residues) BB RMSD 3.75Å Search time
18.9 mins Total time 7 hours
Blue Native Protein Red Predicted Protein
33
Selection of near-native structure
Scoring function
Near native structure
Library of generated structures
The second half of the protein folding
problem!!!!!!
34
Summary

A fast conformational search algorithm for 3D
structure generation for small proteins has been
developed.
Results for small proteins can be obtained in
hours, not days.
Results are comparable to algorithms that employ
cluster computing.
The algorithm can be run on a desktop computer.
Although not an ab initio method, it can be so if
secondary structure can be accurately predicted.

35
Future work

Add the ability to map complex beta sheet
topologies.
Develop a scoring function that can select the
near native structure.
Try to generate the structure of larger proteins.

36
SuperPose A web server for Automated Protein
Structure Superposition
Part II
http//wishart.biology.ualberta.ca/SuperPose
37
Introduction

Who Cares?
Review of Superposition
Identifying Corresponding Points Between
Structures
Multiple Structure Superposition
The SuperPose Web Site

http//wishart.biology.ualberta.ca/SuperPose
38
Who Cares?

NMR Spectroscopists

1YUA, 26 Chains
39
Who Cares?

Structural Biologists

40
Who Cares?

Evolutionary Biologists

41
Principles of SuperPosition

How do we superimpose these two cubes?

42
Principles of SuperPosition

Identify corresponding points.

43
Principles of SuperPosition
2. Identify the common centre and the principle
axes for each structure.
44
Principles of SuperPosition
3. Translate the two structures so their centres
overlap.
45
Principles of SuperPosition
4. Rotate the two structures so the average
distance between corresponding points is
minimized, and their principal axes overlap.
46
Principles of SuperPosition

Rotations can be accomplished by multiplying each
atom coordinate with appropriate rotation
matrices.

47
Superposition Methods

Rotation by Euler Angles
Lagrangian Multipliers
Least Squares Minimization
Quaternion method (fastest)

48
Quaternions

Invented by W. Hamilton in 1843
Can be thought of as quadruplets of real numbers
Very fast as computers work faster with algebraic
rather than trigonometric functions
A quaternion is an extension of the complex
number
q w xi yj zkwhere w is a real number,
and x, y, and z are complex numbers

49
Identifying Corresponding Points Between Protein
Structures

Sequence Alignment

PDB_Entry_A 1 SDKIIHLTDDSFDTDVLKA--DGAILVDFWA
EWCGPCKMIAPILDEIADE 48
........... ......... P
DB_Entry_B 1 MVKQIESKTAFQEALDAAGDKLVVVDFSAT
WCGPCKMIKPFFHSLSEK 48 PDB_Entry_A 49
YQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQ
98 ....
.............. ..
. PDB_Entry_B 49 YSNVIFL-EVDVDDCQDVASECEVKCTP
TFQFFKKGQ----KVGEFS-GA 92 PDB_Entry_A 99
LKEFLDANLA 108
.... PDB_Entry_B 93 NKEKLEATINELV
105
2TRXA - 3TRXA
50
Identifying Corresponding Points Between Protein
Structures..
Problem Low Homology
3TRX - 3GRX1
Length 163 Identity 11/163 ( 6.7)
Similarity 14/163 ( 8.6) Gaps
139/163 (85.3)
3TRX_model_de 1
MVKQIESK 8

.... 3GRX_model_1_ 1
ANVEIYTKETCPYSHRAKALLSSKGVSFQELPIDGNAAKREEMIKRSGRT
50 3TRX_model_de 9 TAFQ--------------EA
LDAAG--DKLVVVDFSATWCGPCKMIKPFF 42
.. .. ..
3GRX_model_1_ 51 TVPQIFIDAQHIGGYDDLY
ALDARGGLDPLLK
82 3TRX_model_de 43 HSLSEKYSNVIFLEVDVDDCQDVAS
ECEVKCTPTFQFFKKGQKVGEFSGA 92

3GRX_model_1_ 83
82 3TRX_model_de
93 NKEKLEATINELV 105
3GRX_model_1_ 83
82
51
Identifying Corresponding Points Between Protein
Structures..
Solution Secondary Structure Alignment
3TRX - 3GRX1
Sequence13TRX_model_default_chain_default Sequenc
e2 3GRX_model_1_chain_default Score.... 600
Test Stat 5.31 Matches.. 64 Sequence1
CEEEECCHHHHHHHHHHHCCEEEEEEEEECCCHHHHHCCCCCCHHHHHCC
Matching.
Sequence2
CEEEEEEECCCHHHHHHHH HHHHHCC Structure
CBBBBBBBCCCHHHHHHHH
HHHHHCC Sequence1 CEEEEEEEECCCHHHHHHHCCCCEEEEEEE
ECCCCCEEECCCCHHHHHHH Matching.
Sequence2
CEEEEEECCCCHHHHHHHHHCCCCCCEEEEECCCCC
CHHHHHHHH Structure CBBBBBBCCCCHHHHHHHHHCCCCCCBBB
BBCCCCC CHHHHHHHH Sequence1
HHHCC Matching. Sequence2
HHHCCCCCCCC Structure HHHCCCCCCCC
52
Identifying Corresponding Points Between Protein
Structures..
Problem Multiple Structural Forms Open and
closed forms of calmodulin 1A29 and 1CLL
Length 145 Identity 143/145 (98.6)
Similarity 143/145 (98.6) Gaps
2/145 ( 1.4) Score 730.0
1A29_model_de 1
QLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMI
50
1CLL_model_de
1 LTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD
MI 49 1A29_model_de 51
NEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISA
100
1CLL_model_de
50 NEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGY
ISA 99 1A29_model_de 101
AELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMT
144
1CLL_model_de 100
AELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA
144
53
Identifying Corresponding Points Between Protein
Structures..
Solution Subdomain Alignment
1A29 and 1CLL
54
How do we do Subdomain Alignment ?
1. Create and analyze a Difference Distance Matrix
2. Create Distance Matrices for both structures
and subtract the matrices
0.9
0.9
0.9
0.9
2
2
3
3
1
1
2
2
1.2
0.8
0.7
1.1
0.9
2.3
4
4
1 2 3 4 1 0 0.9 2.0 1.2 2 0 0.9
0.7 3 0 0.8 4 0
1 2 3 4 1 0 0.9 2.0 2.3 2 0
0.9 1.1 3 0 0.9 4
0
1 2 3 4 1 0 0 0 1.1 2 0
0.9 0.4 3 0 0.1 4
0

Difference Distance Matrix
55
Subdomain Alignment..
1.Plot the magnitude of the distance as a colour
shade
2.Analyze the difference distance matrix for
similar subdomains
3.The DD Matrix will have regions that are
similar, and regions that are different
Different region
Similar region
56
Subdomain Alignment..
1A29 and 1CLL
57
Multiple Structure Superposition..
How do you optimally superimpose more than 2
structures?
Used for superposing structures in an NMR ensemble
Average Structure
Step 1
Step 3
3-Structure Superposition
Initial 2-Structure Superposition
Step 2
Structure 3
58
S U P E R P O S E O U T P U T S
59
SuperPose Home Page
http//wishart.biology.ualberta.ca/SuperPose
60
Part IIIMovieMaker A web server for Rapid
Rendering of Protein Motions
http//wishart.biology.ualberta.ca/moviemaker/
61
Why study protein motions?

Proteins are not static (bend, vibrate,
open/close,
assemble/disassemble) in a variety of
ways
Protein motions play an active role in
Enzymatic activity (Citrate Synthase)
Formation of assemblies (Motor proteins)
Chaperonin proteins (GroEL and GroES)
Related to diseases (Prion protein in
scrapie)
(Gp41 in AIDS)

62
Motions are performed by

Mobile loop regions
Streptavidin binding to biotin
Zymogen binding to chymotrypsinogen
Domain movements
Hinge Motion Shear
Motion

Alcohol Dehydrogenase
Opening and Closing of Calmodulin
63
Protein Motions

Very difficult to study
Complicated biological phenomena
Occurs over 10-12 sec to hours
Motions can range from few Å to nanometers
Generated by
Molecular Dynamics (MD) Simulation
XPLOR, CHARMM - takes long time (hours days)
Interpolation between two end states
Cartesian Interpolation

64
Visualizing motions or morphs

Morphs can be best visualized as movies
Mpeg, avi files
Animated gifs
Easy to create
Fast and can be easily shown in a web browser
Intermediate structures are needed between two
end conformers

65
MovieMaker web server

Uses Cartesian interpolation to generate the
intermediate structures between two end states
Does not use Molecular Dynamics
Fast and creates a variety of motions and morphs
Displays the motion as an animated gif over a web
browser

(xB, yB, zB)
(xA, yA, zA)
( N1, N2, N3, N4, .)
(xA, yA, zA)
(xB, yB, zB)
66
MovieMaker web server
http//wishart.biology.ualberta.ca/moviemaker/
67
MovieMaker web server options