Predicting Protein Structures and Structural Features on a Genomic Scale Pierre Baldi School of Information and Computer Sciences Institute for Genomics and Bioinformatics University of California, Irvine - PowerPoint PPT Presentation

About This Presentation
Title:

Predicting Protein Structures and Structural Features on a Genomic Scale Pierre Baldi School of Information and Computer Sciences Institute for Genomics and Bioinformatics University of California, Irvine

Description:

... 94 residues) ACKNOWLEDGMENTS UCI: Gianluca Pollastri, Pierre-Francois Baisnee, Michal Rosen-Zvi Arlo Randall, S. Joshua Swamidass, Jianlin Cheng, Yimeng Dou, ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Predicting Protein Structures and Structural Features on a Genomic Scale Pierre Baldi School of Information and Computer Sciences Institute for Genomics and Bioinformatics University of California, Irvine


1
Predicting Protein Structures and Structural
Features on a Genomic ScalePierre BaldiSchool
of Information and Computer SciencesInstitute
for Genomics and BioinformaticsUniversity of
California, Irvine
2
UNDERSTANDING INTELLIGENCE
  • Human intelligence (inverse problem)
  • AI (direct problem)
  • Choice of specific problems is key
  • Protein structure prediction is a good problem

3
PROTEINS
  • R1
    R3


  • Ca N Cß
    Ca
  • / \ / \ /
    \ / \
  • N Cß Ca
    N Cß
  • R2

4
(No Transcript)
5
Utility of Structural Information
(Baker and Sali, 2001)
6
CAVEAT
7
REMARKS
  • Structure/Folding
  • Backbone/Full Atom
  • Homology Modeling
  • Fold Recognition (Threading)
  • Ab Initio (Physical Potentials/Molecular
    Dynamics, Statistical Mechanics/Lattice Models)
  • Statistical/Machine Learning (Training Sets, SS
    prediction)
  • Mixtures ab-initio with statistical potentials,
    machine learning with profiles, etc.

8
PROTEIN STRUCTURE PREDICTION (ab initio)
9
(No Transcript)
10
Helices
  • 1GRJ (Grea Transcript Cleavage Factor From
    Escherichia Coli)

11
Antiparallel ß-sheets
  • 1MSC (Bacteriophage Ms2 Unassembled Coat Protein
    Dimer)

12
Parallel ß-sheets
  • 1FUE (Flavodoxin)

13
Contact map
14
Secondary structure prediction
15
GRAPHICAL MODELS BAYESIAN NETWORKS
  • X1, ,Xn random variables associated with the
    vertices of a DAG Directed Acyclic Graph
  • The local conditional distributions P(XiXj j
    parent of i) are the parameters of the model.
    They can be represented by look-up tables
    (costly) or other more compact parameterizations
    (Sigmoidal Belief Networks, XOR, etc).
  • The global distribution is the product of the
    local characteristicsP(X1,,Xn) ?i P(XiXj
    j parent of i)

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
DATA PREPARATION
  •  
  • Starting point PDB data base.
  •        Remove sequences not determined by X ray
    diffraction.
  •        Remove sequences where DSSP crashes.
  •        Remove proteins with physical chain
    breaks (neighboring AA having
    distances exceeding 4 Angstroms)
  •        Remove sequences with resolution worst
    than 2.5 Angstroms.
  •        Remove chains with less than 30 AA.
  •        Remove redundancy (Hobohms algorithm,
    Smith-Waterman, PAM 120, etc.)
  • Build multiple alignments (BLAST,
    PSI-BLAST, etc.)

22
SECONDARY STRUCTURE PROGRAMS
  • DSSP (Kabsch and Sander, 1983) works by
    assigning potential backbone hydrogen bonds
    (based on the 3D coordinates of the backbone
    atoms) and subsequently by identifying repetitive
    bonding patterns.
  •   STRIDE (Frishman and Argos, 1995) in addition
    to hydrogen bonds, it uses also dihedral angles.
  •   DEFINE (Richards and Kundrot, 1988) uses
    difference distance matrices for evaluating the
    match of interatomic distances in the protein to
    those from idealized SS.

23
SECONDARY STRUCTURE ASSIGNMENTS
  • DSSP classes 
  • H alpha helix
  • E sheet
  • G 3-10 helix
  • S kind of turn
  • T beta turn
  • B beta bridge
  • I pi-helix (very rare)
  • C the rest
  • CASP (harder) assignment 
  • a H and G
  • ß E and B
  • ? the rest
  • Alternative assignment 
  • a H
  • ß B
  • ? the rest

24
ENSEMBLES
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
FUNDAMENTAL LIMITATIONS
  • 100 CORRECT RECOGNITION IS PROBABLY IMPOSSIBLE
    FOR SEVERAL REASONS
  • SOME PROTEINS DO NOT FOLD SPONTANEOUSLY OR MAY
    NEED CHAPERONES
  • QUATERNARY STRUCTURE BETA-STRAND PARTNERS MAY BE
    ON A DIFFERENT CHAIN
  • STRUCTURE MAY DEPEND ON OTHER VARIABLES
    ENVIRONMENT, PH
  • DYNAMICAL ASPECTS
  • FUZZINESS OF DEFINITIONS AND ERRORS IN DATABASES

29
(No Transcript)
30
(No Transcript)
31
BB-RNNs
32
2D RNNs
33
2D INPUTS
  • AA at positions i and j
  • Profiles at positions i and j
  • Correlated profiles at positions i and j
  • Secondary Structure, Accessibility, etc.

34
(No Transcript)
35
PERFORMANCE ()
6Å 8Å 10Å 12Å
non-contacts 99.9 99.8 99.2 98.9
contacts 71.2 65.3 52.2 46.6
all 98.5 97.1 93.2 88.5
36
Protein Reconstruction
Using predicted secondary structure and predicted
contact map
PDB ID 1HCR, chain A Sequence
GRPRAINKHEQEQISRLLEKGHPRQQLAIIFGIGVSTLYRYFPASSIKKR
MN True SS CCCCCCCCHHHHHHHHHHHCCCCHHHHHHHCECCHHH
HHHHCCCCCCCCCCC Pred SS CCCCCCCHHHHHHHHHHHHCCCCH
HHHEEHECHHHHHHHHCCCHHHHHHHCC
PDB ID 1HCR Chain A (52 residues)
Model 147 RMSD 3.47Å
37
Protein Reconstruction
Using predicted secondary structure and predicted
contact map
PDB ID 1BC8, chain C Sequence
MDSAITLWQFLLQLLQKPQNKHMICWTSNDGQFKLLQAEEVARLWGIRKN
KPNMNYDKLSRALRYYYVKNIIKKVNGQKFVYKFVSYPEILNM True
SS CCCCCCHHHHHHHHCCCHHHCCCCEECCCCCEEECCCHHHHHHHH
HHHHCCCCCCHHHHHHHHHHHHHHCCEEECCCCCCEEEECCCCHHHCC P
red SS CCCHHHHHHHHHHHHHCCCCCCEEEEECCCEEEEECCHHHH
HHHHHHHCCCCCCCHHHHHHHHHHHHHCCCEEECCCCEEEEEEECCHHHH
CC
PDB ID 1BC8 Chain C (93 residues)
Model 1714 RMSD 4.21Å
38
CASP6 Self AssessmentEvaluation based on GDT_TS
of first submitted model GDT_TS Global
Distance Test Total ScoreGDT_TS (GDT_P1
GDT_P2 GDT_P4 GDT_P8 ) / 4 Pn percentage
of residues under distance cutoff n
39
Hard Target Summary
  • Top 10 groups displayed, of 65 registered servers
  • Assessment on 25 new fold and fold recognition
    analogous target domains

N number of targets predicted Av.R.
average rank sumZ sum of Z scores on all
targets in set sumZpos sum of Z scores for
predictions with positive Z score group N Av.R.
sumZ sumZpos BAKER-ROBETTA 25 9.12 27.81 27.94 ba
ldi-group-server 24 10.04 20.47 22.33 Rokky 25
11.60 17.56 18.46 Pmodeller5 20 12.55 14.35 16.1
1 ZHOUSPARKS2 25 15.00 12.67 15.91 ACE 25 13.
08 11.63 14.89 Pcomb2 24 16.04 10.82 13.58 RAP
TOR 24 16.33 9.81 13.21 zhousp3 25 15.72 9.30
12.90 PROTINFO-AB 19 16.74 8.62 12.59
40
Hard Target Summary
  • Top 10 groups displayed, of 65 registered servers
  • Assessment on 19 new fold and fold recognition
    analogous target domains less than 120 residues

N number of targets predicted Av.R.
average rank sumZ sum of Z scores on all
targets in set sumZpos sum of Z scores for
predictions with positive Z score group N Av.R.
sumZ sumZpos baldi-group-server 19 6.74 20.61 20.6
1 BAKER-ROBETTA 19 9.11 20.44 20.57 Rokky 19 1
2.11 12.30 13.20 PROTINFO-AB 16 12.63 11.53 12.5
9 ZHOUSPARKS2 19 15.32 8.91 11.87 Pcomb2 18 1
5.39 9.54 11.48 Pmodeller5 15 14.47 9.25 11.00
PROTINFO 18 16.22 8.66 10.56 ACE 19 14.21 6.98
10.24 RAPTOR 18 17.00 7.22 9.64
41
Target T0281Detailed Target Analysis
  • Target Information
  • Length 70 amino acids
  • Resolution 1.52 Å
  • PDB code 1WHZ
  • Description Hypothetical Protein From Thermus
    Thermophilus Hb8
  • Domains single domain
  • Assessment
  • GDT_TS server rank of our 1st model 2
  • GDT_TS 51.07
  • RMSD to native 6.15

42
Target T0281Contact Map Comparison
note true map is lower left
True Map vs. Predicted Map
True Map vs. Recovered Map
43
Target T0281Structure Comparison
true structure
predicted structure
44
Target T0281Structure Comparison Superposition
True structure thick trace Predicted structure
thin trace
45
Target T0280_2Detailed Target Analysis
  • Target Information
  • Length 51 amino acids
  • Resolution 2.00 Å
  • PDB code 1WD5
  • Description Putative phosphoribosyl transferase,
    T. thermophilus
  • Domains 2nd domain, residues 53-103 of 208 AA
    sequence
  • Assessment
  • GDT_TS server rank of our 1st model 1 (also 1st
    among human groups)
  • GDT_TS 54.41
  • RMSD to native 5.81

46
Target T0280_2Contact Map Comparison
note true map is lower left
True Map vs. Predicted Map
True Map vs. Recovered Map
47
Target T0281Structure Comparison
true structure
predicted structure
48
Target T0281Structure Comparison Superposition
True structure thick trace Predicted structure
thin trace
49
THE SCRATCH SUITE
  • www.igb.uci.edu
  • DOMpro domains
  • DISpro disordered regions
  • SSpro secondary structure
  • SSpro8 secondary structure
  • ACCpro accessibility
  • CONpro contact number
  • DI-pro disulphide bridges
  • BETA-pro beta partners
  • CMAP-pro contact map
  • CCMAP-pro coarse contact map
  • CON23D-pro contact map to 3D
  • 3D-pro 3D structure (homology fold recognition
    ab-initio)

50
(No Transcript)
51
  • SISQQTVWNQMATVRTPLNFDSSKQSFCQFSVDLLGGGISVDKTGDWITL
    VQNSPISNLL
  • CCCECCCCCCEEEECCCCCCCCCCCCEEEEEEECCCCEEEECCCCCCEEE
    EECCHHHHHH
  • CCCEEEEECEEEEECCCCCCCTCCCCEEEEEEEETCSEEEECTTTTEEEE
    EECCHHHHHH
  • -----------------------------------
    ----
  • --------------------------
  • -------------------------------
  • --------------------------
  • eeeeee---e--e-e-eee-ee-eee---------e-e--eeeeee----
    ----------
  • RVAAWKKGCLMVKVVMSGNAAVKRSDWASLVQVFLTNSNSTEHFDACRWT
    KSEPHSWELI
  • HHHHHHCCCEEEEEEEEEECCEEECCCCCEEEEEEEECCCCCCCCCEEEE
    EECCCCCCCC
  • HHHHHHTTCEEEEEEEEEEEEEEECCCCCEEEEEEEECCCTTCCCEEEEE
    EECCTCCEEE
  • -----------------------
    ----------
  • --------------------
    ----
  • -----------------
    ----
  • ------------------
    ----
  • -----ee---e-------e-e-ee-e-e-e-----e--eeee--e-----
    --e-e-ee-e

52
Advantage of Machine Learning
  • Pitfalls of traditional ab-initio approaches
  • Machine learning systems take time to train
    (weeks).
  • Once trained however they can predict structures
    almost faster than proteins can fold.
  • Predict or search protein structures on a genomic
    or bioengineering scale .

53
DAG-RNNs APPROACH
  • Two steps
  • 1. Build relevant DAG to connect inputs, outputs,
    and hidden variables
  • 2. Use a deterministic (neural network)
    parameterization together with appropriate
    stationarity assumptions/weight sharingoverall
    models remains probabilistic
  • Process structured data of variable size,
    topology, and dimensions efficiently
  • Sequences, trees, d-lattices, graphs, etc
  • Convergence theorems
  • Other applications

54
(No Transcript)
55
Convergence Theorems
  • Posterior Marginals
  • sBN?dBN in distribution
  • sBN?dBN in probability (uniformly)
  • Belief Propagation
  • sBN?dBN in distribution
  • sBN?dBN in probability (uniformly)

56
Structural Databases
  • PPDB Poxvirus Proteomic Database
  • ICBS Inter Chain Beta Sheet Database

57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
Strategies for drug design
  • Block, modulate, mediate ß-sheet interactions
  • Covalent modification of a chain to prevent
    ß-sheet formation

61
(No Transcript)
62
(No Transcript)
63
Three-Stage Prediction of Protein Beta-Sheets
Using Neural Networks, Alignments, and Graph
Algorithms
  • Jianlin Cheng and Pierre Baldi
  • School of Info. and Computer Sci.
  • University of California Irvine

64
Beta-Sheet Architecture
65
Importance of Predicting Beta-Sheet Structure
  • AB-Initio Structure Prediction
  • Fold Recognition
  • Model Refinement
  • Protein Design
  • Protein Stability

66
Previous Work
  • Methods
  • Statistical potential approach for strand
    alignment. (Hubbard, 1994 Zhu and Braun, 1999)
  • Statistical potentials to improve beta-sheet
    secondary structure prediction.(Asogawa,1997)
  • Information theory approach for strand alignment.
    (Steward and Thornton, 2000)
  • Neural networks for beta-residue contacts.
    (Baldi, et.al, 2000)
  • Shortcomings
  • Focus on one single aspect not utilize
    structural contexts and evolutionary information
    not exploit constraints enough not publicly
    available.

67
Three-Stage Prediction of Beta-Sheets
  • Stage 1
  • Predict beta-residue pairings using
    2D-Recursive Neural Networks (2D-RNN).
  • Stage 2
  • Align beta-strands using alignment algorithms.
  • Stage 3
  • Predict beta-strand pairs and beta-sheet
    architecture using graph algorithms.

68
Dataset and Statistics
Num
Chains 916
Beta residues 48,996
Residue Pairs 31,638
Beta Strand 10,745
Strand Pairs 8,172
Beta Sheet 2,533
69
Stage 1 Prediction of Beta-Residue Pairings
Using 2D-RNN
Target / Output Matrix (mm)
Input Matrix I (mm)
(i,j)
2D-RNN O f(I)
(i,j)
Tij 0/1 Oij Pairing Prob.
Iij
i-2 i-1 i i1 i2 j-2 j-1 j j1 j2 i-j
Total 251 inputs
20 profiles
3 SS
2 SA
70
An Example Target
Protein 1VJG
Beta-Residue Pairing Map (Target Matrix)
71
An Example Output
72
Stage 2 Beta-Strand Alignment
Anti-parallel
1 m
  • Use output probability matrix as scoring matrix
  • Dynamic programming
  • Disallow gaps and use simplified searching
    algorithms

n 1
Parallel
1 m
1 n
Total number of alignments 2(mn-1)
73
Strand Alignment and Pairing Matrix
  • The alignment score (Pseudo Binding Energy) is
    the sum of the probabilities of paired residues.
  • The best alignment is the alignment with maximum
    score.
  • Strand Pairing Matrix.

Strand Pairing Matrix of 1VJG
74
Stage 3 Prediction of Beta-Strand Pairings and
Beta-Sheet Architecture
Strand Pairing Constraints
75
Minimum Spanning Tree Like Algorithm
Strand Pairing Graph (SPG)
Goal Find a set of connected subgraphs that
maximize the sum of pseudo-energy and
satisfy the constraints. Algorithm Minimum
Spanning Tree Like Algorithm.
76
Example of MST Like Algorithm
Assembly of beta-strands
1
2
3
4
5
6
7
Step 1 Pair strand 4 and 5
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
5
4
5
6
7
Strand Pairing Matrix of 1VJG
77
Example of MST Like Algorithm
Assembly of beta-strands
1
2
3
4
5
6
7
Step 2 Pair strand 1 and 2
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
5
4
5
6
7
2
1
Strand Pairing Matrix of 1VJGA
N
78
Example of MST Like Algorithm
Assembly of beta-strands
1
2
3
4
5
6
7
Step 3 Pair strand 1 and 3
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
5
4
5
6
7
2
1
3
Strand Pairing Matrix of 1VJGA
N
79
Example of MST Like Algorithm
Assembly of beta-strands
1
2
3
4
5
6
7
Step 4 Pair strand 3 and 6
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
5
4
5
6
7
2
1
3
6
Strand Pairing Matrix of 1VJGA
N
80
Example of MST Like Algorithm
Assembly of beta-strands
1
2
3
4
5
6
7
Step 5 Pair strand 6 and 7
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
5
4
5
6
C
7
2
1
3
6
7
Strand Pairing Matrix of 1VJGA
N
81
Beta-Residue Pairing Results
  • Sensitivity Specificity 41
  • Base-line 2.3. Ratio of improvement 17.8.
  • ROC area 0.86
  • At 5 FPR, TPR is 58
  • CMAPpro
  • Spec. and Sens. is 27. ROC area0.8.
  • TPR42 at 5 FPR.

82
Strand Pairing Results
  • Naïve algorithm of pairing all adjacent strands
  • Specificity 42
  • Sensitivity 50
  • MST like algorithm
  • Specificity 53
  • Sensitivity 59
  • gt20 correctly predicted strand pairs are
    non-adjacent strand pairs

83
Strand Alignment Results
On the correctly predicted pairs
Paring Direction Align. All Align. Anti-P Align. Para. Align. Bridge
Acc. 93 72 69 71 88
On all native pairs
Pairing Direction Align. All Align. Anti-P Align. Para. Align. Bridge
Acc. 84 66 63 66 73
  • Pairing direction is 15 higher than
  • of random algorithm.
  • Alignment accuracy is improved by gt15.

84
Application and Future Work
  • New methods for beta-residue pairings (e.g.
    Linear Programming, SVM), and strand alignment
    and pairings. More inputs (Punta and Rost, 2005).
  • Applications
  • AB-Initio Structure Sampling (beta-sheet)
  • Fold Recognition (conservation of beta-sheets)
  • Contact Map
  • Model Refinement (pairing direction/alignment)
  • Web server and dataset
  • http//www.ics.uci.edu/baldig/betasheet.html

85
A New Fold Example (CASP6)
  • 1S12 (T0201, 94 residues)

True SS
CEEEEECCCEEEEECCCCCHHHHHHHHHHHHHHHHHHHHCCCEEEEEECC
EEEEEECCCCHHHHHHHHHHHHHHHHHHHHCCCCEEEEECCCCCC
Predicted SS
CEEEEEECCEEEECCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHEHHCCC
CEEEEHHHHHHHHHHHHHHHHHHHHHHHHHCCCCEEEEEEECCC
True 12, 2-4, 3-4, 1-5
Strand Pairing Matrix
1 2 3 4 5
1 0 1.71 .05 .29 .33
2 0 .06 .41 .12
3 0 .22 .04
4 0 .53
5 0
Predicted 1-2, 2-4, 3-4, 4-5
5
1
2
4
3
Rendered in Rasmol
86
ACKNOWLEDGMENTS
  • UCI
  • Gianluca Pollastri, Pierre-Francois Baisnee,
    Michal Rosen-Zvi
  • Arlo Randall, S. Joshua Swamidass, Jianlin Cheng,
    Yimeng Dou, Yann Pecout, Mike Sweredoski,
    Alessandro Vullo, Lin Wu
  • James Nowick, Luis Villareal
  • DTU Soren Brunak
  • Columbia Burkhard Rost
  • U of Florence Paolo Frasconi
  • U of Bologna Rita Casadio, Piero Fariselli
  • www.igb.uci.edu/
  • www.ics.uci.edu/pfbaldi

87
1DFN Defensin
88
A Perfectly Predicted Example
Sequence with cysteine's position identified
MSNHTHHLKFKTLKRAWKASKYFIVGLSC29LYKFNLKSLVQTALST
LAMITLTSLVITAIIYISVGNAKAKPTSKPTIQQTQQPQNHTSPFFTEHN
YKSTHTSIQSTTLSQLLNIDTTRGITYGHSTNETQNRKIKGQSTLPATRK
PPINPSGSIPPENHQDHNNFQTLPYVPC173STC176EGNLAC18
2LSLC18 6HIETERAPSRAPTITLKKTPKPKTTKKPTKTTIHHRT
SPETKLQPKNNTATPQQG ILSSTEHHTNQSTTQI Length 257,
Total number of cysteines 5 Four bonded
cysteines form two disulfide bonds 173
-------186 ( red cysteine pair) 176 -------182
(blue cysteine pair)
Prediction Results from DIpro (http//contact.ics.
uci.edu/bridge.html) Predicted Bonded
Cysteines 173,176,182,186 Predicted disulfide
bonds Bond_Index Cys1_Position Cys2_Position 1 17
3 186 2 176 182 Prediction Accuracy for both
bond state and bond pair are 100.
89
A Hard Example with Many Non-Bonded Cysteines
Sequence with cysteine's position identified
MTLGRRLAC9LFLAC14VLPALLLGGTALASEIVGGRRARPHAWP
FMVSLQLRGGHFC55GATLIAPNFVMSAAHC71VANVNVRAVRVVL
GAHNLSRREPTRQVFAVQRIFENGYDPVNLLNDIVILQLNGSATINANVQ
VAQLPAQGRRLGNGVQC151LAMGWGLLGRNRGIASVLQELNVTVVTS
LC181RRSNVC187TLVRGRQAGVC198FGDSGSPLVC208N
GLIHGIASFVRGGC223ASGLYPDAFAPVAQFVNWIDSIIQRSEDNPC
254PHPRDPDPASRTH Length 267, Total Cysteine
Number 11 Eight bonded cysteines form four
disulfide bonds 55 ----- 71 (Red), 151 -----
208 (Blue), 181 ----- 187 (Green), 198 ----- 223
(Purple)
Prediction Results from DIpro (http//contact.ics.
uci.edu/bridge.html) Predicted Bonded
Cysteines 9,14,55,71,181,187,223,254 Predicted
Disulfide Bonds Bond_Index Cys1_Position Cys2_Pos
ition 1 55 71 (correct) 2 9 14
(wrong) 3 223 254 (wrong) 4 181 187
(correct) Bond State Recall 5 / 8 0.625,
Bond State Precision 5 / 8 0.625 Pair Recall
2 / 4 0.5 Pair Precision 2 / 4 0.5 Bond
number is predicted correctly.
90
Prediction Accuracy on SP51 Dataset on All
Cysteines
Bond Num Bond State Recall() Bond State Precision() Pair Recall() Pair Precision()
1 91 46 74 39
2 93 77 61 51
3 90 74 54 45
4 77 87 52 59
5 71 86 33 42
6 65 84 27 34
7 63 85 36 55
8 66 89 27 41
9 60 83 23 35
10 55 86 30 45
11 62 86 34 47
12 67 97 17 23
15 50 94 27 50
16 82 99 11 13
17 61 96 22 33
18 50 82 6 9
19 47 90 11 20
Overall bond state recall 78 overall bond
state precision 74 bond number prediction
accuracy 53 average difference between true
bond number and predicted bond number 1.1 .
91
CURRENT WORK
  • Feedback
  • Ex SS ? Contacts ? SS ? Contacts
  • Homology, homology, homology
  • SSpro 4.0 performs at 88

92
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com