Title: Multiple Structure Alignment by Optimal RMSD Implies that the Average Structure is a Consensus
1Multiple Structure Alignment by Optimal RMSD
Implies that the Average Structure is a Consensus
- Xueyi Wang and Jack Snoeyink
- Department of Computer Science
- UNC - Chapel Hill
2Outline
- Multiple structure alignment
- Average structure and wRMSD
- Minimizing wRMSD
- Results
3Protein Structure is Important
- Structures are better conserved than sequences.
- Similar structures may have similar functions.
- Protein structures increase exponentially.
Protein number
Year
4Uses of Structure Alignment
- Pairwise structure alignment
- -- Search for homologous protein
- -- Protein structure classification
- Multiple structure alignment
- -- Identify structural conserved region
- -- Provide clues for building evolutionary tree
- -- Determine consensus structure for protein
families
5Multiple Structure Alignment
- Structure abstracted as points or vectors
- Optimization methods
- -- Some methods do pairwise structure alignment
first, then use heuristics to integrate multiple
structures. - -- Some methods align all structures at once.
- Target functions usually some form of RMSD
extended from RMSD for pairwise alignment
6Outline
- Multiple structure alignment
- Average structure and wRMSD
- Minimizing wRMSD
- Results
7Average Structure and wRMSD
- Definitions of average structure and RMSD.
- Four properties of the average structure --
- Average is the best consensus!
- wRMSD.
8Definitions
- n structures (proteins) S1, S2, , Sn
- m points (atoms) in each structure with
correspondence, e.g. Si has pi1, pi2, , pim. - Average structure has points
- RMSD ,
- or SD
9Property 1
- For any aligned position k, the SD from all
structures to any point qk equals the SD to the
average point plus the SD from to qk
10Property 2
- The SD for all pairs equals SD to the average
structure
11Property 3
- For all n structures, the SD to the average
structure is less than the SD to any other
structure.
12Property 4
- The best consensus of true structure is the one
with the minimum SD to the average structure .
13wRMSD
- Add Position weight wk ? 0 on each aligned
position k. - Weighted RMSD
- All the properties still hold.
14Outline
- Multiple structure alignment
- Average structure and wRMSD
- Minimizing wRMSD
- Results
15Minimizing wRMSD
- Translate and Rotate structures in 3D space.
- For structure Si, define Ri as a 3?3 rotation
matrix and Ti as a 3?1 translation vector. - Target function
16Algorithm
1. Translate all the structures to the origin. 2.
Compute the average structure and the
wRMSD. 3. Align each structure Si to by
Horns method. 4. Compute new average structure
and wRMSDnew. 5. If wRMSD wRMSDnew lt ?,
the algorithm terminates otherwise, set
and wRMSD wRMSDnew, go to step 3.
17Algorithm
- wRMSD decreases in each iteration.
- Time complexity O(n m) for each iteration, where
n is the number of structures and m is the number
of points in each structure. - is maximized when the algorithm ends.
- Implemented using MATLAB.
18Outline
- Multiple structure alignment
- Average structure and wRMSD
- Minimizing wRMSD
- Results
19Performance Test
- Test set 23 protein families from HOMSTRAD, w/
10 structures and total aligned length 100. - All 5,000 tests for each family converge to the
same local minima, where each begin with random
rotations.
- Maximum iterations 6.
- Maximum running time 40 milliseconds.
20Consensus Structure
grs (pyridine nucleotide-disulphide
oxidoreductases class)
all 11 aligned proteins
consensus structure
Structure with minimum RMSD
Structure with maximum RMSD
21Determine Conserved Core
- Multiple structure alignment by optimal RMSD has
outlier effects. - Solution -- Another iterative algorithm based on
the previous algorithm - -- Align all the structures by previous
algorithm. - -- Find the aligned position that the average
distance to the average point are larger than a
threshold, and set the weights to zero.
22Conserved Core
grs (pyridine nucleotide-disulphide
oxidoreductases class)
proteasome A-type and B-type
23Conclusions and Future Work
- Conclusions
- -- Average structure is the consensus.
- -- Algorithm for minimizing wRMSD.
- -- Algorithm for finding conserved region.
- Future work
- -- Minimizing RMSD for gapped multiple structure
alignment (submitted).
24Acknowledgements
- Jun (Luke) Huan.
- UNC Bioinformatics and Computational Biology
training fellowship. - NIH grant GM-074127.
25The End