Bioinformatics Tools for Structural Biology - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

Bioinformatics Tools for Structural Biology

Description:

Bioinformatics Tools for Structural Biology – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 77
Provided by: rajarsh8
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Tools for Structural Biology


1
Bioinformatics Tools for Structural Biology
  • Rajarshi Maiti
  • Supervisor Dr. David Wishart
  • M.Sc. Student
  • Faculty of Pharmacy and Pharmaceutical
    Sciences

698 Seminar April 20, 2005
2
Bioinformatics Tools
Structural Biology
Part I 3D Structure prediction of Proteins
Part II Structural Superposition of Proteins
Part III Protein Motions and Movements
Description of a web server called MovieMaker
Description of a Structure Prediction algorithm
Description of a web server called SuperPose
3
Part I
Distance Geometry Approach for 3D Protein
Structure Prediction
4
Why study Protein structures?
  • Lot of drugs or drug targets are proteins
  • The shape and size of a proteins active site is
    helpful for designing new drug molecules
  • Structural information of all the proteins in the
    human proteome will lead to robust and
    cost-efficient drug discovery and development

5
Protein Structure Determination
  • Protein folding process
  • One of the most challenging problems in the life
    sciences
  • Has puzzled scientists over the last 30 years
  • Anfinsens experiment in 1973 demonstrated that
    the amino acid sequence contained enough
    information for the folding process
  • Since then, researchers have tried to find the
    protein folding pathway

6
Different levels of Protein structure
Primary structure
METPQTHVVIKEASGHTSERNMKLGH
Secondary structure
METPQTHVVIKEASGHTSERNMKLGH
HHHHHHHCCCCCCCBBBBBBBBBBBB
Folding process
Beta Sheet
Helix
Loop/Coil
Tertiary structure
7
The Folding Process
  • How long is the folding process?
  • One might guess that a protein randomly samples
    all of its conformations before settling on the
    lowest energy conformer
  • In 1968, Cyrus Levinthal demonstrated that this
    is not so

8
The Folding Process..
  • Levinthals calculations
  • Let us consider a 100 residue protein, and let
    each residue take 2 possible conformations
  • Therefore
  • 2100 possible conformations
  • If 10-12 secs is allowed to sample each
    conformation, then
  • total time 1018 seconds 1010 years !!!!!!
  • Age of universe 20 billion years 1017 seconds
  • Hence, proteins do not sample all possible
    conformations

9
The Folding Process..
10-6 10-3 seconds
Native structure with minimum energy
Protein folding landscape
Protein folding pathway
10
Protein structure determination
  • Experimental techniques
  • NMR Spectroscopy
  • X-ray crystallography
  • Laborious and lengthy process (months to years)
  • Not all proteins are experimentally amenable to
    structure determination

Hence, computer based techniques have been tried
and are still being explored to find the 3D
structures of proteins from its sequence
11
Protein structure determination
Computational methods
Ab initio or de novo
Homology Modeling
  • Models the structure from sequence
  • Simulates the natural folding process
  • Generates rough models (gt5? RMSD)
  • Requires sequence similarity (gt30)
  • Most accurate structure prediction method
  • Quality of structure depends on the
  • sequence similarity
  • Structures are of high quality (lt1.5? RMSD)

12
Homology Modeling
13
Ab initio methods for3D protein structure
determination
  • Systematic searches
  • Monte Carlo sampling
  • Genetic Algorithms
  • Distance Geometry
  • Two main components of an ab initio prediction
    program are
  • Efficient sampling of the conformational search
    space
  • Scoring function to select the near native
    structure

14
Systematic Search
Feasible for small peptides, not suitable for
proteins
15
Monte Carlo sampling
16
Genetic Algorithm
  • Concept borrowed from Computer Science
  • A population of conformations is spawned
  • Mutation, reproduction and crossover operations
    are performed to simulate the natural selection
    process

Mutation Operator
100100101001010101
100100001001010101
Reproduce Operator
100100001001010101
100100001001010101
100100001001 010101
X 001000101010 101001
100100001001 101001
001000101010 010100
Crossover Operator
17
Distance based methods
  • Useful when inter atomic/residual distances are
    known

5.4Å
N atoms/residues N(N-1)/2 distances
1.2Å
2.2Å
1.4Å
5.4Å
Distance space
Embedding process
2.2Å
1.2Å
3.5Å
Distance Geometry process
18
Distance Geometry Process
A B C D E F
A B C D E F
0
DAB
DAC
DAD
DAF
DAE
DBE
DBF
DBC
DBD
dBA
0
Setup a Distance Matrix
DCD
DCF
DCE
0
dCB
dCA
0
DDE
DDF
dDC
dDB
dDA
DEF
0
dEA
dEB
dEC
dED
0
dFB
dFC
dFA
dFD
dFE
Embedding Process
DAB Maximum distance between A and B
dAB Minimum distance between A and B
(x, y, z)
(x, y, z)
Cartesian space
(x, y, z)
(x, y, z)
19
Distance Matrix for 1ROP
53 residues, so 53(53-1)/2 1378 distances
LARGE Distance Matrix
Computationally expensive and time consuming
20
Simplify by using a Reduced Representation
  • Represent the proteins secondary structure as
    blocks.

Helix Cylinder Beta sheet Rectangular board
21
Simplification.
Distance matrix setup
1 3 7 8
D
A
D
A
Distance matrix setup
4
Reduced representation
4
C
B
C
B
22
How do we get the distances for the blocks?
D
A
B
C
23
Sequence to Structure
Randomize
Conformational search in distance space
Embedding
Embedding
Cartesian space
24
Sequence to Structure
Excluded Volume Check
Cartesian space
Coordinate mapping
Select best structure by scoring function
25
Coordinate Mapping
(x1, y1, z1)
(x4, y4, z4)
Parallel and anti-parallel beta sheet templates
(x2, y2, z2)
(x3, y3, z3)
Insert loop
Superimpose helices on start and end coordinates
27 residue helical template
Loop library
26
Conformational search using Distance Geometry
METPQTHVVIKEASGHTSERNMKLGHFERDSAMNPLTVWY
Obtain secondary structure information from
sequence
F L O W C H A R T
METPQTHVVIKEASGHTSERNMKLGHFERDSAMNPLTVWY HHHHHHHCC
CCCCCBBBBBBBBBBBBBCCCCCCHHHHHH
Convert secondary structure information to blocks
METPQTHVVIKEASGHTSERNMKLGHFERDSAMNPLTVWY
Conformational search
Best 3D structure
27
Results
All atom/residue distances
Reduced representation
LARGE Distance Matrix
small Distance Matrix
Cluster Computers
Desktop Computer
28
Results..
2.0 GHz processor 512 MB RAM
216 Pentium III 450 MHz CPUs
29
Results..
Randomizing Embedding Excluded Volume Check
No. of viable conformations
Distance space
30
Results..
Best Structure
Total Time
Total time Randomization
Embedding
Excluded Volume Check
Coordinate Mapping
31
Results..
1ROP (55 residues) BB RMSD 2.32Å Search time
11.6 mins Total time 16 hours
2SPZ (49 residues) BB RMSD 3.81Å Search time
17.2 mins Total time 7 hours
1BW5(46 residues) BB RMSD 4.01Å Search time
16.9 mins Total time 2 hours
Blue Native Protein Red Predicted Protein
32
Results..
1VII (29 residues) BB RMSD 2.63Å Search time
5.4 mins Total time 12 hours
1ENH (45 residues) BB RMSD 3.61Å Search time
16.9 mins Total time 2 hours
1BDC (44 residues) BB RMSD 3.17Å Search time
17.7 mins Total time 12 hours
1AHO (31 residues) BB RMSD 3.75Å Search time
18.9 mins Total time 7 hours
Blue Native Protein Red Predicted Protein
33
Selection of near-native structure
Scoring function
Near native structure
Library of generated structures
The second half of the protein folding
problem!!!!!!
34
Summary
  • A fast conformational search algorithm for 3D
    structure generation for small proteins has been
    developed.
  • Results for small proteins can be obtained in
    hours, not days.
  • Results are comparable to algorithms that employ
    cluster computing.
  • The algorithm can be run on a desktop computer.
  • Although not an ab initio method, it can be so if
    secondary structure can be accurately predicted.

35
Future work
  • Add the ability to map complex beta sheet
    topologies.
  • Develop a scoring function that can select the
    near native structure.
  • Try to generate the structure of larger proteins.

36
SuperPose A web server for Automated Protein
Structure Superposition
Part II
http//wishart.biology.ualberta.ca/SuperPose
37
Introduction
  • Who Cares?
  • Review of Superposition
  • Identifying Corresponding Points Between
    Structures
  • Multiple Structure Superposition
  • The SuperPose Web Site

http//wishart.biology.ualberta.ca/SuperPose
38
Who Cares?
  • NMR Spectroscopists

1YUA, 26 Chains
39
Who Cares?
  • Structural Biologists

40
Who Cares?
  • Evolutionary Biologists

41
Principles of SuperPosition
  • How do we superimpose these two cubes?

42
Principles of SuperPosition
  • Identify corresponding points.

43
Principles of SuperPosition
2. Identify the common centre and the principle
axes for each structure.
44
Principles of SuperPosition
3. Translate the two structures so their centres
overlap.
45
Principles of SuperPosition
4. Rotate the two structures so the average
distance between corresponding points is
minimized, and their principal axes overlap.
46
Principles of SuperPosition
  • Rotations can be accomplished by multiplying each
  • atom coordinate with appropriate rotation
    matrices.

47
Superposition Methods
  • Rotation by Euler Angles
  • Lagrangian Multipliers
  • Least Squares Minimization
  • Quaternion method (fastest)

48
Quaternions
  • Invented by W. Hamilton in 1843
  • Can be thought of as quadruplets of real numbers
  • Very fast as computers work faster with algebraic
    rather than trigonometric functions
  • A quaternion is an extension of the complex
    number
  • q w xi yj zkwhere w is a real number,
    and x, y, and z are complex numbers

49
Identifying Corresponding Points Between Protein
Structures
  • Sequence Alignment

PDB_Entry_A 1 SDKIIHLTDDSFDTDVLKA--DGAILVDFWA
EWCGPCKMIAPILDEIADE 48
........... ......... P
DB_Entry_B 1 MVKQIESKTAFQEALDAAGDKLVVVDFSAT
WCGPCKMIKPFFHSLSEK 48 PDB_Entry_A 49
YQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQ
98 ....
.............. ..
. PDB_Entry_B 49 YSNVIFL-EVDVDDCQDVASECEVKCTP
TFQFFKKGQ----KVGEFS-GA 92 PDB_Entry_A 99
LKEFLDANLA 108
.... PDB_Entry_B 93 NKEKLEATINELV
105
2TRXA - 3TRXA
50
Identifying Corresponding Points Between Protein
Structures..
Problem Low Homology
3TRX - 3GRX1
Length 163 Identity 11/163 ( 6.7)
Similarity 14/163 ( 8.6) Gaps
139/163 (85.3)
3TRX_model_de 1
MVKQIESK 8

.... 3GRX_model_1_ 1
ANVEIYTKETCPYSHRAKALLSSKGVSFQELPIDGNAAKREEMIKRSGRT
50 3TRX_model_de 9 TAFQ--------------EA
LDAAG--DKLVVVDFSATWCGPCKMIKPFF 42
.. .. ..
3GRX_model_1_ 51 TVPQIFIDAQHIGGYDDLY
ALDARGGLDPLLK
82 3TRX_model_de 43 HSLSEKYSNVIFLEVDVDDCQDVAS
ECEVKCTPTFQFFKKGQKVGEFSGA 92

3GRX_model_1_ 83
82 3TRX_model_de
93 NKEKLEATINELV 105
3GRX_model_1_ 83
82
51
Identifying Corresponding Points Between Protein
Structures..
Solution Secondary Structure Alignment
3TRX - 3GRX1
Sequence13TRX_model_default_chain_default Sequenc
e2 3GRX_model_1_chain_default Score.... 600
Test Stat 5.31 Matches.. 64 Sequence1
CEEEECCHHHHHHHHHHHCCEEEEEEEEECCCHHHHHCCCCCCHHHHHCC
Matching.
Sequence2
CEEEEEEECCCHHHHHHHH HHHHHCC Structure
CBBBBBBBCCCHHHHHHHH
HHHHHCC Sequence1 CEEEEEEEECCCHHHHHHHCCCCEEEEEEE
ECCCCCEEECCCCHHHHHHH Matching.
Sequence2
CEEEEEECCCCHHHHHHHHHCCCCCCEEEEECCCCC
CHHHHHHHH Structure CBBBBBBCCCCHHHHHHHHHCCCCCCBBB
BBCCCCC CHHHHHHHH Sequence1
HHHCC Matching. Sequence2
HHHCCCCCCCC Structure HHHCCCCCCCC
52
Identifying Corresponding Points Between Protein
Structures..
Problem Multiple Structural Forms Open and
closed forms of calmodulin 1A29 and 1CLL
Length 145 Identity 143/145 (98.6)
Similarity 143/145 (98.6) Gaps
2/145 ( 1.4) Score 730.0
1A29_model_de 1
QLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMI
50
1CLL_model_de
1 LTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD
MI 49 1A29_model_de 51
NEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISA
100
1CLL_model_de
50 NEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGY
ISA 99 1A29_model_de 101
AELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMT
144
1CLL_model_de 100
AELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA
144
53
Identifying Corresponding Points Between Protein
Structures..
Solution Subdomain Alignment
1A29 and 1CLL
54
How do we do Subdomain Alignment ?
1. Create and analyze a Difference Distance Matrix
2. Create Distance Matrices for both structures
and subtract the matrices
0.9
0.9
0.9
0.9
2
2
3
3
1
1
2
2
1.2
0.8
0.7
1.1
0.9
2.3
4
4
1 2 3 4 1 0 0.9 2.0 1.2 2 0 0.9
0.7 3 0 0.8 4 0
1 2 3 4 1 0 0.9 2.0 2.3 2 0
0.9 1.1 3 0 0.9 4
0
1 2 3 4 1 0 0 0 1.1 2 0
0.9 0.4 3 0 0.1 4
0

Difference Distance Matrix
55
Subdomain Alignment..
1.Plot the magnitude of the distance as a colour
shade
2.Analyze the difference distance matrix for
similar subdomains
3.The DD Matrix will have regions that are
similar, and regions that are different
Different region
Similar region
56
Subdomain Alignment..
1A29 and 1CLL
57
Multiple Structure Superposition..
How do you optimally superimpose more than 2
structures?
Used for superposing structures in an NMR ensemble
Average Structure
Step 1
Step 3
3-Structure Superposition
Initial 2-Structure Superposition
Step 2
Structure 3
58
S U P E R P O S E O U T P U T S
59
SuperPose Home Page
http//wishart.biology.ualberta.ca/SuperPose
60
Part IIIMovieMaker A web server for Rapid
Rendering of Protein Motions
http//wishart.biology.ualberta.ca/moviemaker/
61
Why study protein motions?
  • Proteins are not static (bend, vibrate,
    open/close,
  • assemble/disassemble) in a variety of
    ways
  • Protein motions play an active role in
  • Enzymatic activity (Citrate Synthase)
  • Formation of assemblies (Motor proteins)
  • Chaperonin proteins (GroEL and GroES)
  • Related to diseases (Prion protein in
    scrapie)

  • (Gp41 in AIDS)

62
Motions are performed by
  • Mobile loop regions
  • Streptavidin binding to biotin
  • Zymogen binding to chymotrypsinogen
  • Domain movements
  • Hinge Motion Shear
    Motion

Alcohol Dehydrogenase
Opening and Closing of Calmodulin
63
Protein Motions
  • Very difficult to study
  • Complicated biological phenomena
  • Occurs over 10-12 sec to hours
  • Motions can range from few Å to nanometers
  • Generated by
  • Molecular Dynamics (MD) Simulation
  • XPLOR, CHARMM - takes long time (hours days)
  • Interpolation between two end states
  • Cartesian Interpolation

64
Visualizing motions or morphs
  • Morphs can be best visualized as movies
  • Mpeg, avi files
  • Animated gifs
  • Easy to create
  • Fast and can be easily shown in a web browser
  • Intermediate structures are needed between two
    end conformers

65
MovieMaker web server
  • Uses Cartesian interpolation to generate the
    intermediate structures between two end states
  • Does not use Molecular Dynamics
  • Fast and creates a variety of motions and morphs
  • Displays the motion as an animated gif over a web
    browser

(xB, yB, zB)
(xA, yA, zA)
( N1, N2, N3, N4, .)
(xA, yA, zA)
(xB, yB, zB)
66
MovieMaker web server
http//wishart.biology.ualberta.ca/moviemaker/
67
MovieMaker web server options
  • 1) Continuous Rotation

Rotation of the 21 chains of GROEL (1AON)
http//wishart.biology.ualberta.ca/moviemaker/
68
MovieMaker web server options
  • 2) Cartesian Interpolation between two conformers

Hinge motion of DNA Polymerase beta (1BPD and
2BPF)
http//wishart.biology.ualberta.ca/moviemaker/
69
MovieMaker web server options
  • 3) Small scale vibrations of Myoglobin (1MYF)

http//wishart.biology.ualberta.ca/moviemaker/
70
MovieMaker web server options
  • 4) Animation of Ligand Docking

Docking of N-ethyl sulphite morpholine to
Glutaredoxin mutant(1ABA)
http//wishart.biology.ualberta.ca/moviemaker/
71
MovieMaker web server options
  • 5) Animation of Protein Oligomerization

Oligomerization of a pentamer molecule (1C48)
http//wishart.biology.ualberta.ca/moviemaker/
72
MovieMaker web server options
  • 6) Motion between the chains of an NMR ensemble

28 chains of the pointed domain (1BQV)
http//wishart.biology.ualberta.ca/moviemaker/
73
MovieMaker web server options
  • 7) Protein unfolding/folding

Folding and Unfolding of 1ABA
http//wishart.biology.ualberta.ca/moviemaker/
74
  • Demonstration of
  • SuperPose and MovieMaker

http//wishart.biology.ualberta.ca/SuperPose
http//wishart.biology.ualberta.ca/moviemaker/
75
Acknowledgements
  • Dr. David Wishart
  • Dr. Gary Van Domselaar
  • Haiyan Zhang
  • Other members of Wishart Lab
  • Faculty of Pharmacy and Pharmaceutical Sciences

76
Questions ?
Write a Comment
User Comments (0)
About PowerShow.com