Efficient Nearest Neighbor Searching for Motion Planning - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Nearest Neighbor Searching for Motion Planning

Description:

Automated High-Resolution Protein Structure Determination using Residual Dipolar Couplings Anna Yershova Department of Computer Science Duke University – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 47
Provided by: 1495
Learn more at: http://msl.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficient Nearest Neighbor Searching for Motion Planning


1
Anna Yershova Department of Computer Science Duke
University February 5, 2010
Feb 5 2010, NC State University
Automated Protein Structure Determination using
RDCs
2
Introduction
Motivation
Protein Structure Determination is Important
Amino acid sequences
Structures
Functions
Protein redesign
  • High-resolution structures are needed for
  • Determining protein functions
  • Protein redesign

2
3
Introduction
Motivation
What is Protein Structure Primary Structure
The sequence of amino acids forms the
backbone.Residues are sidechains attached to the
backbone.
3
Dihedral angle
Side chain
Amino acid
4
Introduction
Motivation
What is Protein Structure Secondary Structure
Elements
Local folding is maintained by short distance
interactions.
4
5
Introduction
Motivation
What is Protein Structure 3D Fold
Global 3D folding is maintained by more distant
interactions.
Alpha-helix
Side chain
Loop
Beta-strands
5
6
Introduction
Motivation
High-Throughput Structure Determination Is
Important
The gap between sequences and structures
http//www.metabolomics.ca/News/lectures/CPI2008-s
hort.pdf
6
7
Introduction
Motivation
Current Approaches for Structure Determination
  • X-ray crystallography
  • Difficulty growing good quality crystals
  • Nuclear Magnetic Resonance (NMR) spectroscopy
  • Difficulty lengthy (expensive) time in
    processing and analyzing experimental data

Both require expressing and purifying proteins.
7
8
Introduction
Motivation
Bruce Donalds Lab
Michael Zeng Chittu Tripathy
Lincong Wang
Pei Zhou
Bruce Donald
Cheng-Yu Chen John MacMaster
8
9
Introduction
Motivation
Types of NMR Spectroscopy Data
4.2
R
Ha
NOE
133.1
172.1
B0
8.9
  • Chemical shift (CS)
  • Unique resonance frequency, serves as an ID
  • Nuclear Overhauser effect (NOE)
  • Local distance restraint between two protons
  • Residual dipolar coupling (RDC)
  • Global orientational restraint for bond vectors

9
10
Introduction
Motivation
Resonance Assignment Problem
Assigning chemical shifts to each atom
10
Bailey-Kellogg et al., 2000, 2004
http//www.pnas.org/content/102/52/18890/suppl/DC1
11
Introduction
Motivation
NOE Assignment Problem
Obtain local distance restraints between protons
A famous bottleneck
11
Bailey-Kellogg et al., 2000, 2004
12
Introduction
Motivation
Structure Determination from NOEs
NOESY spectrum
Resonance assignments
NOE assignment
Assignment
Ambiguity
Distance Geometry
NP-Hard
Saxe 79 Hendrickson 92, 95
12
13
Protein Structure Determination is Hard
Introduction
Motivation
Traditional Structure Determination Protocol
A famous bottleneck
13
14
Protein Structure Determination is Hard
Introduction
Motivation
Traditional Structure Determination Protocol
error propagation
local minima
manual intervention for initial fold and for
evaluation of NOE assignments
A famous bottleneck
Can we have a poly-time algorithm using
orientational restraints?
Yes Wang and Donald, 2004 Wang et al, 2006
14
15
Introduction
Motivation
Types of NMR Spectroscopy Data
4.2
R
Ha
NOE
133.1
172.1
B0
8.9
  • Chemical shift (CS)
  • Unique resonance frequency, serves as an ID
  • Nuclear Overhauser effect (NOE)
  • Local distance restraint between two protons
  • Residual dipolar coupling (RDC)
  • Global orientational restraint for bond vectors

15
16
Background
RDCs
RDC Equation for a Single Bond
Alignment medium
?
b
B0
v
a
S Saupe Matrix S is traceless and symmetric S
contains 5 dofs
16
17
Protein Structure Determination is Hard
Introduction
Motivation
Traditional Structure Determination VS RDC-Panda
RDC-PANDA Protocol
Constaint number of NOEs
RDCs
error propagation
RDC-ANALYTIC PACKER
local minima
Global Fold
manual intervention for initial fold and for
evaluation of NOE assignments
Sidechain Placement
NOE Assignments
XPLOR-NIH
NOE Assignments 3D Structures
17
Zeng et al. (Jour. Biomolecular NMR,2009)
18
Introduction
Motivation
Importance of Backbone Structure Determination
Global orientational restraints from RDCs
Sparce data (high-throughput, large proteins,
membraine proteins)
Compute initial fold using exact solutions to
RDC equations
Avoid the NP-Hard problem of structure
determination from NOEs
Resolve NOE assignment ambiguity
Automated side-chain resonance assignment
18
19
Introduction
Motivation
Current Limitations of RDC-Panda
  • Because it requires only 2 RDCs per residue
  • Only SSE elements can be reliably determined,
    NOEs are needed to determine structure of loops
  • Difficulty in handling missing data

19
20
Introduction
Motivation
My Current Project
  • Improve current protein structure determination
    techniques from our lab
  • Design new algorithms for protein backbone
    structure determination using orientational
    restraints from RDCs

20
21
Introduction
Motivation
Literature Overview
  • Distance geometry based structure determination
  • Braun, 1987
  • Crippen and Havel, 1988
  • More and Wu, 1999
  • Heuristic based structure determination
  • Brünger, 1992
  • Nilges et al., 1997
  • Güntert, 2003
  • Rieping et al., 2005
  • RDC-based structure determination
  • Tolman et al., 1995
  • Tjandra and Bax, 1997
  • Hus et al., 2001
  • Tian et al., 2001
  • Prestegard et al., 2004
  • Wang and Donald (CSB 2004)
  • Wang and Donald (Jour. Biomolecular NMR, 2004)
  • Wang, Mettu and Donald (JCB 2005)
  • Donald and Martin (Progress in NMR Spectroscopy,
    2009 )
  • Heuristic based automated NOE assignment
  • Mumenthaler et al., 1997
  • Nilges et al., 1997, 2003
  • Herrmann et al., 2002
  • Schwieters et al., 2003
  • Kuszewski et al., 2004
  • Huang et al., 2006
  • Automated NOE assignment starting with initial
    fold computed from RDCs
  • Wang and Donald (CSB 2005)
  • Zeng et al. (CSB 2008)
  • Zeng et al. (Jour. Biomolecular NMR,2009)
  • Automated side-chain resonance assignment
  • Li and Sanctuary, 1996, 1997
  • Marin et al., 2004
  • Masse et al., 2006
  • Zeng et al. (In submission, 2009)

21
22
Background
RDCs
RDC Equation for a Single Bond
Linear in S, A fixed v defines a hyperplane
Quadratic in v, A fixed S defines a hyperboloid
S
22
23
Background
RDCs
RDC Equation for a Single Bond
1 RDC equation defines a collection of
hyperplanes, 7 variables
Linear in S, A fixed v defines a hyperplane
Quadratic in v, A fixed S defines a hyperboloid
S
23
24
Background
RDCs
RDC Equations for a Protein Portion
24
25
Background
RDCs
RDC Equations for a Protein Portion
1
2
3
4
u1
v1
v2
1 L. Wang and B. R. Donald. J. Biomol. NMR,
29(3)223242, 2004. 2 J. Zeng, J. Boyles, C.
Tripathy, L. Wang, A. Yan, P. Zhou, and B. R.
Donald J. Biomol. NMR, Epub ahead of print
PMID19711185, 2009.
Too few equations, too many variables!
25
26
Background
RDCs
Forward Kinematics Reduces the Number of Variables
v1
Fix coordinate system.
v2
u1
26
27
Background
RDCs
RDC Equations for a Protein Portion
v1
v2
u1
27
28
Background
RDCs
RDC Equations for a Protein Portion
Recursive representation is possible!
28
29
Background
RDCs
One Equation Per Dihedral Angle is Not Enough!
  • Each equation is linear in S, and quartic in
    either tan(?) or tan(?)
  • To be able to solve this system there must be
    additional information
  • Possible scenarios
  • Additional RDC measurement(s) for each dihedral
    angle.
  • Additional alignment media.
  • Additional NOE data.
  • Modeling (Ramachandran regions, steric clashes,
    energy function)
  • Sampling (for alignment tensors)

29
30
Background
RDC-Panda
The RDC-PANDA Structure Determination Package
  • Current requirements
  • 2 RDCs per residue to obtain SSE structures
  • Sparse NOEs to pack the SSEs
  • Current bottlenecks
  • Missing data (even in long SSEs)
  • Long loops
  • Sampling for computing alignment tensor(s)
  • Sampling for the orientation of the first pp

1 L. Wang and B. R. Donald. J. Biomol. NMR,
29(3)223242, 2004. 2 J. Zeng, J. Boyles, C.
Tripathy, L. Wang, A. Yan, P. Zhou, and B. R.
Donald J. Biomol. NMR, Epub ahead of print
PMID19711185, 2009.
30
31
Background
RDC-Panda
When Saupe Matrix is Known Solution Can Be Found
Exactly!
Ellipse equations for CH bond vector
Wang Donald, 2004
Donald Martin, 2009.
32
Solution Structure of FF Domain 2 of human
transcription elongation factor CA150 (FF2) using
RDC-PANDA
Background
RDC-Panda
Solution Structure Deposited Using RDC-Panda
PDB ID 2KIQ
In collaboration with Dr. Zhous Lab
32
33
Current Project
Problem Formulation NH, CH RDCs in 2 Media
We require measurements for at least 9
consecutive bond vectors (4.5 residues) in 2
media. The goal is to handle more equations and
errors.
33
34
Current Project
Relationship to Minimization
34
35
Current Project
Relationship to Minimization and SVD
Solving an over constrained system of linear
equations is equivalent to finding a projection
of the b vector on the A hyperplane. This is also
equivalent to minimizing the least square
function of the terms.
35
36
Current Project
Relationship to Minimization
36
37
Current Project
Relationship to Minimization and SVD
b
A(?i ?i)
s
Solving such a system of non-linear equations is
not trivial! There are multiple local minima in
the corresponding minimization problem.
37
38
Current Project
Advantages
  • If the minimization problem is solved then
  • Computation of packed SSEs and loops is possible
    without additional NOE data.
  • Saupe matrices for each of the alignment medium
    can be computed without sampling.
  • Robust handling of missing values

38
39
Current Project
The Algorithm Initialization Using Helix
Initialize (?i,?i) for a helix
Compute initial approximation for Si using SVD
Compute (?i,?i) using tree search and
minimization
Update Si using SVD
39
40
Current Project
The Algorithm Protein Portion
Initialize Si to computed approximations
Compute (?i,?i) using tree search and
minimization
Update Si using SVD
40
41
Current Project
The Algorithm Computing Dihedrals
?1
Minimize each of the RMSD terms as a univariate
function.
?1
x
x
?n
x
?n
Iteratively minimize the RMSD function
x
Compute the list of best solutions.
41
42
Current Project
Advantages
  • The algorithm is converging, since every step
    minimizes RMSD function
  • If the data was perfect then the solution to
    the minimization problem would be the roots of
    the polynomials in the RMSD terms, and the
    algorithm would find ALL of them.
  • The minima of the RMSD terms give a good
    collection of initial structures for finding
    local and global minima
  • Robust handling of missing values

42
43
Preliminary Results
Preliminary Results Ubiquitin Helix
Conformation of the portion 25-31 of the helix
for human ubiquitin computed using NH and CH RDCs
in two media (red) has been superimposed on the
same portion from high-resolution X-ray structure
(PDB Id 1UBQ) (green). The backbone RMSD is 0.58
Å.
Protein RMSD (Hz) Alignment Tensor (Syy, Szz)
Ubq ?25-31 C?H? 0.32 NH 0.24 (23.66, 16.48) (53.25, 7.65)
43
44
Preliminary Results
Preliminary Results Ubiquitin Strand
Conformation of the portion 2-7 of the
beta-strand for human ubiquitin computed using NH
and CH RDCs in two media has been superimposed on
the same portion from high-resolution X-ray
structure (PDB Id 1UBQ). The backbone RMSD is
1.151 Å.
Protein RMSD (Hz) Alignment Tensor (Syy, Szz)
Ubq beta 2-7 C?H? NH (53.32, 4.83) (48.03, 14.32)
44
45
Conclusions
  • Complete and exhaustive search over the space of
    all structures minimizing the RDC fit function
    seems feasible due to understanding the structure
    of the solution.
  • Possible and exiting extensions to more/different
    data

Funding NIH
Thank you!
45
46
Comparison
Sparse
Accuracy
Data requirements vs. Accuracy (Ubiquitin)
46
Write a Comment
User Comments (0)
About PowerShow.com