Dr Richard Jackson School of Biochemistry - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Dr Richard Jackson School of Biochemistry

Description:

The FAD-binding domain. Cluster (K=3.5, there are 14 clusters) ... 11 FAD-binding domain. 5 Ferredoxin-like (marginal participation: ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 48
Provided by: jack121
Category:

less

Transcript and Presenter's Notes

Title: Dr Richard Jackson School of Biochemistry


1
Dr Richard
JacksonSchool of Biochemistry Molecular
Biology, University of Leeds Leeds LS2 9JT, UK
e-mail address jackson_at_bmb.leeds.ac.uk
Prediction analysis of Protein
Interactions
2
Prediction analysis of Protein
Interactions
Finding Atomic level similarity between ligand
binding sites
Mammalian signal transduction - of interest in a
large of therapeutic areas V-src, c-src, fyn,
p56-lck, Syk, Grb2, ZAP-70 etc.
Molecular Modelling of SH2 domain-peptide
interactions
a probabalistic method for ligand
docking by sampling low energy conformational
space
Q-fit
Protein-ligand Docking
3
Comparing Ligand binding sites in biomolecules
  • Fold - CATH domain classification
  • Secondary Structure Elements - sub-domain
    motifs
  • Residue - catalytic triad (sequence
    conserved)
  • Atomic level Similarity - catalytic sites /
    binding sites
  • all potentially independent of sequence
    conservation

Potential uses
  • Structural Classification of protein-protein/liga
    nd binding sites
  • Structural Genomics Determining functional
    sites on proteins of unknown function
    (nucleotide binding?)
  • Structure-based drug design Identification of
    key 3D characteristics of functional group
    recognition

4
Comparing binding sites of Protein-small molecule
interactions
etc.
Main-chain similarity in a subset of AMP binding
proteins Generated using atomic similarity (i.e.
no sequence information is used) via a computer
vision based method
5
Ligand-binding sites
Ligand Atom-sets PO4 756 AMP 122 ADP
508 ATP 453 GDP 248 GTP 216 NAD
797 NADP 521 FAD 728 FMN 131
PO4 constellation local environments
7.0 Å radius
Typically 70 atoms
6
Geometric Hashing
e.g.
1 3 5 C C O
1 3 6 C C N
2 4 7 N C C
2 6 8 N N C 3
5 6 C O N 3
5 7 C O C 3
6 8 C N C 4 7
9 C C O 4 7
10 C C N 5 6
8 O N C 5 7 9
O C O 5 7 10 O C
N 6 8 11 N C C
7 9 10 C O N
Allowed triplets (subject to distance criteria)
Given matching atom types Are two triangles
d-compatible ?
atomset1 atomset2
3D grid superposition
7
Molecule A No of ATOMS 87
Molecule B No of ATOMS 63
Geometric Hashing
n_seed Mol. A 270 (No. of triplets) n_seed
Mol. B 233 Number of matches 6354 No.
Transformations 1165 (d-compatible) Highest
SCORE 42
Mean 9.1 SDev 7.5 ZScore 4.4
Frequency
Score
8
Comparison between BK-GH
Graph Theory Maximal Clique algorithm (Bron
Kerbosch)
There is a good agreement for the clique sizes
found by the two algorithms The key difference
is in speed. Average for a single comparison GH
0.1 sec.
BK 11.0 sec. (i.e. x100 more) i.e. for an
all-against-all comparison of 3700 local
environments (7 Million comparisons) GH 8.5
days BK 2.5
years
9
All-against-all similarity matrix and
Hierarchical Clustering
GH
CL
2phk_ATP_PO4_3.pdb 0 28 21 10 11 9
1atp_ATP_PO4_1.pdb 0 0 47 13 16
18 1cz7_ADP_PO4_3.pdb 0 0 0 14 16
16 1del_AMP_PO4_3.pdb 0 0 0 0 12
15 4ukd_ADP_PO4_2.pdb 0 0 0 0 0
14 1ses_AMP_PO4_2.pdb 0 0 0 0 0 0
Clustering methods Single/Complete linkage,
Group average, Wards information loss method
All-Against-All isomorphisms
10
The Nucleotide Mono-Di-Tri Phosphate
3726 nucleotide phosphate environments
AMP 108 ADP 450 etc.

CL
GH
UPGMA Mojenas
0 28 21 10 11 9 0 0 37 13
16 18 0 0 0 14 16 16 0 0
0 0 12 15 0 0 0 0 0
14 0 0 0 0 0 0
CL
K Clusters 2.0 136 2.5 48 3.0 29 4.0 11 5.0 6
6.0 4 9.0 2
K2.0
476 representatives
GH
11
Filtering the noise from clusters
0 30
50 80
FL
0 30
50 60
FL
12
Adenine-X-P Clustering
K 12.5
K 7.0
1 7 9 5 8 12 3 10 6
4 2 11
Annotate the results by and
patterns
13
The Structural P-loop (Nucleotide Mono-Di-Tri
Phosphate) This is the first coherent Cluster
(K6.0, there are 4 clusters) - Cluster
has 32 separate group representatives (from
476) - Ligand ATP gt ADP gt GDP gt GTP
CATH/SCOP-fold (domain) 39 P-loop nucleotide
triphosphate hydrolases 2 Transducin (alpha
subunit) 1 Rubredoxin-like 1 Phosphoeol
pyruvate carboxykinase 1 left-handed superhelix
PROSITE 30 PS0017 - consensus pattern
G-K-T/S 1 PS0692 1 PS0411 1 PS0113
T/S
K
G
14
The FAD-binding domain Cluster (K3.5, there are
14 clusters) - Cluster has 13 separate
group representatives (from 476) - Ligand FAD
(19) gtgt ADP (2)
PROSITE 2 PS0435 Peroxidase_1 false
positives! (Proximal heme-ligand
signature) DET-LIVMTA-x(2)-LIVM-LIVMSTAG-
SAG- LIVMSTAG-H-STA-LIVMFY.
CATH/SCOP-fold (domain) 11 FAD-binding domain 5
Ferredoxin-like (marginal participation with
only one amino acid residue) 2 domains not
classified in SCOP/CATH
G
G
G
15
Overview
Empirical tuning
GH
CL
FL
?
Structural Classification of Binding Sites e.g.
3,700 local nucleotide phosphate environments
Generate Binding sites templates
Ability to infer function of new proteins from
Structural Genomics
16
Defining binding site clefts and cavities
using SURFNET (Laskowski, UCL)
The ligand is bound in the largest cleft in over
83 of proteins. (Laskowski et al., 1996)
Thermolysin main binding site cavity (transparent
blue surface) with 3 inhibitors. PDB codes 1tmn
(green), 1tlp (red) and 4tmn (blue)
17
Defining binding site clefts and cavities
  • Study of all ATP binding sites in the PDB
  • average No. of protein atoms defined by
    contacts (lt5Å) 94 atoms
  • average No. of protein atoms defined by largest
    binding site cavity 755 atoms
  • average percentage of shared atoms
    97
  • Which SURFNET cleft did the binding site lie
    (mostly) in ?
  • Cleft 1 2 3 ?4
  • 89 8 2 1

Ligand
parameter dependent
18
For a new protein how long to search all known
binding sites in the PDB ?
  • - There is a considerable advantage to firstly
    comparing and clustering known binding sites.
  • Based on the nucleotide-mono-di-tri-phosphates
    comparison and clustering reduces 3700 sites -gt
    470 tight clusters.
  • - Rough approximation there are 6000 clusters
    represent all ligand functional groups in PDB
  • For one New protein which may have ten binding
    clefts of 90 atoms (at 0.1 seconds per
    comparison)
  • 1 hour 40 minutes per CPU (or 10 minutes on a
    10xCPU cluster).

19
Prediction analysis of Protein
Interactions Richard M. Jackson
Finding Atomic level similarity between ligand
binding sites
Mammalian signal transduction - of interest in a
large of therapeutic areas V-src, c-src, fyn,
p56-lck, Syk, Grb2, ZAP-70 etc.
Molecular Modelling of SH2 domain-peptide
interactions
a probabalistic method for ligand
docking by sampling low energy conformational
space
Q-fit
Protein-ligand Docking
20
SH2 Domain as a Drug Target
  • Ligand must selectively bind to SH2 domain,
    preventing association with the phosphorylated
    proteins in vivo.
  • Many potential applications, including
  • Src - regulating bone resorption,
  • Grb2 - component of ras pathway
  • Zap70 - immune suppression
  • Many structures available.

Above AP22408 bound to Lck
A four to six residue ligand can span the pY and
pY3 binding pockets Recent successes
Nonpeptide inhibitor AP22408 of Src with
antiresorptive activity and bone-targeting
properties (Shakespeare et al) Low molecular
weight, membrane-permeable phosphatase-resistant
Grb2 inhibitors (Novartis)
21
  • SH2 Domain Substrate Specificity
  • Ladbury Arold Domains show high sequence and
    structural homology.
  • Difference between a specific and non-specific
    interaction less than two orders of magnitude in
    affinity - sufficient for mutual exclusivity?
  • Little to distinguish between domains in terms
    of charge and polarity
  • Lack of specificity lead to potential drug
    exhibiting undesired effects?

four SH2 domains. Nonpolar(green) polar(grey)
charged(pos-blue, neg-red)
22
  • SH2 Domain Substrate Specificity
  • Songyang et al
  • Phosphopeptide library (containing randomised 8
    amino-acid p-Tyr containing peptides) used to
    determine sequence specificity of SH2 binding
    sites.
  • Variability in sequences provides a structural
    basis for selectivity
  • Group 1 select pTyr-hydrophilic-hydrophilic-Ile/
    Pro. Tyr at ?D5
  • Group 1a specifically select pTyr-Glu-Glu-Ile
  • Group 1b Aromatic at ?D5. Other contacting
    residues differ from 1a
  • Group 3 select pTyr-hydrophobic-X-hydrophobic.
    Ile/Cys at ?D5
  • Group 4 distinct residues at ?D5

23
SH2 Domain multiple sequence alignment
  • Ligand contacting residues are shown in red.
  • The defined binding site is highlighted in
    yellow (17 positions) i.e. positions at which
    least 50 of the domains have 1 contact with
    ligand

FLVVRSETES .TK
RESET
24
Clustering of binding site amino acid identity
scores
Phylogenetic Tree (UPGMA method)
Cluster according to the score for pair-wise
sequence identity scores for the 17 amino acid
positions defined as the binding site
Group B
Group A
Increasing the binding site definition (to gt33
of residues at given position are in contact with
ligand) does not significantly alter the groups.
Group E
Group D
Group C
25
Scorecons PdbConlabel (Valdar Thornton, 2000)
26
Results - Binding pocket conserved across domains
All Structures
Conservation patch dependent on Sh2 Domains
included
Plc, VSrc, Syk NT, Cbl
Plc, Abl, CSrc, Grb2
27
Conservation patch between groups of Songyang
et al.
Group 1A
Group 1B
All Structures
Group 3
Group 4
28
Songyang et al.
Current Classification
- nck 1 - nck 2
Group 1A
Group A
- abl
Group 1B
Group B
29
Songyang et al.
Current Classification
Group C
Group 3
Group D
Group 4
Group E
?
30
Comparison of conserved region
Group 3 P85, PLC, Shp, Csw
Exploitable diversity?
Change in Conservation Blue - Gain in
conservation score Red - Loss in conservation
score
PLC only
31
Prediction analysis of Protein
Interactions Richard M. Jackson
Finding Atomic level similarity between ligand
binding sites
Mammalian signal transduction - of interest in a
large of therapeutic areas V-src, c-src, fyn,
p56-lck, Syk, Grb2, ZAP-70 etc.
Molecular Modelling of SH2 domain-peptide
interactions
a probabalistic method for ligand
docking by sampling low energy conformational
space
Q-fit
Protein-ligand Docking
32
Structure-based drug design
A Structure required (Xray/ NMR)
Docking Lead generation/ optimisation
- by design
dissociation constant lt 10 mM
Verification by Xray/NMR is essential
33
Q-fit
a probabilistic method for ligand
docking by sampling low energy conformational
space (R.M. Jackson (2002) J. Computer-Aided
Mol. Design, 16, p43.)
  • Pre-calculate atomic preferences on a 3D grid
  • Uses Molecular Mechanics force field (Peter
    Goodfords GRID)
  • Grid points are ranked (by
  • energy and subject to
  • distance constraints) for each
  • atom type present in the ligand
  • Calculate probability P(i) using the Boltzmann
    principle
  • P(i) e-E(i)/RT
  • ? e-E(i)/RT

34
B. Rank Receptor triplets
A. Rank Receptor interactions - subject to
distance criteria
1 2056 2211 1345 2 2056
2211 1316
NH2 CH 1 2056
1345 2 2211 1316 3 2175
1362
D. Try matching Ligand triplets to Receptor
triplets
C. Find associated Ligand triplets
Ligand
Ligand
Receptor
35
  • Geometric Hashing
  • involves histograming (or binning) the atom-atom
    distances for each possible triplet.
  • The 3 integer distances are used as an address
    to the hash table that contains the identity of
    atom triplets.
  • The receptor triplets are sorted by energy.
  • Matching proceeds taking the top scoring
    receptor triplet and uses the distances to
    compute the ligand hash table address that could
    contain a match.
  • Hash table needs to be built at run time.

36
Rigid-body minimization
Q-fit
  • following initial fragment placement
  • Percentage of atom bumps lt 33 or solution is
    rejected

Downhill Simplex minimization (Nelder Mead,
1965) need only function evaluation not
derivatives therefore the method is very fast
37
Purine-nucleoside phosphorylase guanine
Q-fit
RANK Rmsd Energy(kcal/mol) 1 0.336
-28.52 2 0.542 -28.28 3
0.451 -28.11 4 0.677 -28.01
5 0.147 -28.00 6 0.515
-27.58 7 0.733 -27.51 8
0.415 -27.41 9 0.884 -27.27 10
0.633 -27.05
Top five structures (including X-ray in red)
Elapsed time on Linux workstation (600MHz) 15
sec Geometric Hashing algorithm
38
Validation Bench-marking
Q-fit
Using 9 commonly used complexes for validation
(e.g. DOCK, FlexX, GOLD, PRO_LEADS) - using
Geometric Hashing algorithm
PDBcode No. heavy Receptor- Ligand
RMSD lowest of top 5 Time atoms
-energy soln. solns. lt 2 Å
(sec.) 1abe 10 arabinose binding protein
0.47 5 13 /
a-L-arabinose 3ptb 9 trypsin/
benzamidine 1.47 5 6 2phh
10 phb hydroxylase 0.63
4 10 /
p-hydroxybenzoate 1ulb 11
Purine-nucleoside 0.34 5 15
phosphorylase / guanine 2gbp
12 periplasmic binding 0.41 5
6 protein / D-glucose 4dfr
13 dihydrofolate reductase 0.56
5 9 / pteridine 1ldm 6 L-lactate
dehydrogenase 2.41 3 6
/ oxamate 3tpi 16
trypsinogen-PTI 1.06 5
21 / Ile-Val 1stp 10
streptavidin / biotin 1.05 5 15
39
3ptb trypsin/ benzamidine
2phh p-hydroxybenzoate hydrolase /
p-hydroxybenzoate
3tpi trypsinogen-PTI / Ile-Val
1abe arabinose binding protein /
a-L-arabinose
40
Validation Bench-marking
Q-fit
9 Problem complexes used in validating GOLD
PDBcode No. heavy Receptor- Ligand
RMSD lowest of top 5 Time atoms
-energy soln. solns. lt 2 Å
(sec.) 6rsa 20 ribonuclease A 1.19
4 17 / uridine vanadate
1ack 8 acetylcholinesterase 0.65
3 11 / edrophonium 1tdb 16
thymidylate synthase / UMP 1.53 5 14
1acj 15 acetylcholinesterase 0.66
4 10 / tacrine 2ak3
18 adenylate kinase / AMP 7.52
0 13 1baf 16 Fab fragment AN02
4.27 0 12 / dinitrophenyl hapten
2mth 11 insulin / methylparaben
4.81 0 12 1mup
9 major urinary protein 3.03
0 7 / 2-(sec-butyl)
thiazoline 4fab 16 Fab fragment 4-4-20
5.28 0 6 / fluorescein (dianion)
41
Fab fragment AN02 dinitrophenyl hapten
Q-fit

X-ray in red
42
Hydrogen bonding function
Ehb Cij/d8 - Dij/d6 cosm ? GRID Angle
Donor-hydrogen-probe ? lt 90 then Ehb
0.0 New addition Lone pair interaction treated
in the same way Angle Acceptor-lone pair-
probe ? lt 90 then Ehb 0.0
Lysine
?
Glutamic acid
?
43
Some Remaining Questions
  • Force Field
  • Currently uses GRID force field with
    good results
  • Importance of hydrogen bonding in the correct
    placement of fragments (many drugs the
    predominant force is hydrophobic)
  • X-SITE - Roman Laskowski (EBI) BLEEP - John
    Mitchell (Cambridge) empirically-based methods
    for predicting favourable interaction sites for
    given atom types (based on directional atomic
    contacts observed in high-resolution protein
    structures in the PDB)

Q-fit
44
Comparing scoring methods (Jaz Sodhi)
, X-site and BLEEP of top 5
solutions correctly predicted (RMSD lt 2
Angstroms) Average RMSD of the top 5
solutions
Q-fit
45
  • Docking in Structure-based drug design
  • Screen a database of small molecule compounds
    (will require automated parameter assignments)
  • e.g. Available Chemicals Directory (MDL) contains
    70,000 compounds
  • At 1 compound per minute it would take 12 days
    on 4x processor SGI Origin
  • or
  • Representative set of pharmacophores
  • e.g. Diversity set (Developmental Therapeutics
    Program at NCI / NIH)
  • Diversity set 1990 compounds (reduction from
    71,756 compounds)
  • At 1 compound per minute it would take 8½ hours
    on 4x processor SGI Origin

46
Docking of individual (diversity) fragments
Link the fragments to create De Novo lead
CH3
CH3
CH3
CH3
1
NH3
NH3
Br
Br
2
small molecule database 1 2 3
Use fragment to search small molecule
database for compounds containing the fragment
CH3
Br
47
Acknowledgements
University College London Janet
Thornton Andreas Brakoulias Jaz Sodhi
University of Leeds Stephen Campbell
Funding The Wellcome Trust BBSRC
Write a Comment
User Comments (0)
About PowerShow.com