Computational Analysis of Protein-DNA Interactions - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Computational Analysis of Protein-DNA Interactions

Description:

Identifying amino acid residues involved in protein-DNA interactions from sequence ... Identification of Helix-Turn-Helix (HTH) DNA-binding motifs. 14. HTH Motifs ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 25
Provided by: cyan4
Category:

less

Transcript and Presenter's Notes

Title: Computational Analysis of Protein-DNA Interactions


1
Computational Analysis of Protein-DNA Interactions
  • Changhui (Charles) Yan
  • Department of Computer Science
  • Utah State University

2
Problem I
  • Identifying amino acid residues involved in
    protein-DNA interactions from sequence

3
Materials And Methods
  • 56 double-stranded DNA binding proteins
    previously used in the study of Jones et al.
    (2003)
  • Encoding

4
Materials And Methods
5
Naïve Bayes Classifier
  • Leave-one-out cross-validation

Naïve Bayes
6
Naïve Bayes Classifier
  • Leave-one-out cross-validation

Naïve Bayes
7
Leave-One-Out Cross-Validations
Sequence-based Sequence-based Sequence/structure-based Sequence/structure-based
Identities (ID) ID entropy ID rASA ID rASA entropy
Correlation coefficient 0.25 0.29 0.28 0.30
Accuracy() 77 75 76 77
Specificity() 37 37 36 39
Sensitivity() 43 53 51 52
8
Predictions in The Context of 3-D Structures
Actual
Predicted
  • Pit-1, PDB 1au7
  • TP30
  • FP 16
  • TN 86
  • FN14
  • CC 0.51 (2nd)
  • Accuracy 79

9
Predictions in The Context of 3-D Structures
Predicted
Actual
  • ?-Cro, PDB 6cro
  • TP10
  • FP 5
  • TN 34
  • FN10
  • CC 0.37 (19th)
  • Accuracy 73


10
Predictions Compared With PROSITE Motifs
  • Predicted binding sites substantially overlap
    with 34 of the 37 DNA-binding PROSITE motifs
  • In 52 of the 56 proteins, the predictor
    identifies at least 20 of the DNA-binding
    residues
  • 28 of the 56 proteins contain no PROSITE motifs
    that are annotated as DNA-binding

11
Comparison With Previous Study
Method Naïve Bayes classifier Ahmad and Sarai method
Correlation Coefficient 0.26 0.23
Accuracy () 80 66
Specificity() 29 21
Sensitivity() 48 68
Ahmad, S. and Sarai, A. (2005) PSSM-based
prediction of DNA binding sites in proteins. BMC
Bioinformatics, 6, 33.
12
Summary
  • A simple sequence-based Naive Bayes classifier
    predicts interface residues in DNA-binding
    proteins with 75 accuracy, 37 specificity, 53
    sensitivity and correlation coefficient of 0.29
  • Predicted binding sites
  • correctly indicate the locations of actual
    binding sites
  • substantially overlap with known PROSITE motifs

13
Problem II
  • Identification of Helix-Turn-Helix (HTH)
    DNA-binding motifs

14
HTH Motifs
  • Sequences sharing low similarities can fold into
    a similar HTH structure
  • Identifying HTH motifs from sequence is extremely
    challenging

15
Trick 1
  • Including more information
  • Amino acid sequence
  • Secondary structure

16
Hidden Markov Model (HMM)
LQQITHIANQL-GLE----KDVVRVWF
17
Hidden Markov Model (HMM_AA_SS)
LQQITHIANQL-GLE----KDVVRVWF
HHHEEHEEEHMHE----HHEEMMEH
18
Trick 2
  • There are similarities among the 20 naturally
    occurred amino acids
  • Reduced alphabets

19
Reduced Alphabets
  • Schemes for reducing amino acid alphabet based on
    the BLOSUM50 matrix by Henikoff and Henikoff
    (1992) derived by grouping and averaging the
    similarity matrix elements as described in the
    text. (Murphy et al. 2000)

20
Cross-Families Evaluations
True Positive 1 False Positive 2
HMM_AA 3 0
HMM_AA_SS (20 letters) 3 227 0
HMM_AA_SS (Murphy_15) 3 474 0
HMM_AA_SS (Murphy_10) 3 470 3
HMM_AA_SS (Murphy_8) 3 431 5
  1. True positive HTH motifs that are correctly
    identified as such.
  2. False positive Non-HTH motifs that are
    identified as HTH motifs.
  3. The alphabet used to encode amino acid sequences.

21
Questions
22
Within-family Three-Fold Cross-Validations
Family (number of HTH motifs in the family) HMM_AA HMM_AA_SS (Murphy_15)
PF00126 (1635) 1594 1622
PF00165 (90) 63 80
PF00196 (30) 26 30
PF04545 (164) 137 164
PF01022 (42) 39 39
PF00046 (189) 176 188
PF03965 (48) 48 48
.
23
Comparisons of HMM_AA_SS with FFAS03 in
Cross-Family Evaluations
Total HTH motifs Recognized by both FFAS03 and HMM_AA_SS Recognized by FFAS03 only Recognized by HMM_AA_SS only
563 135 24 71
24
Putative HTH motifs in Ureaplasma parvum
Protein Location Annotation from Uniprot
spQ9PQE5SCPB_UREPA 176-214 Participates to chromosomal partition during cell division
spQ9PQV6RPOB_UREPA 540-587 DNA-directed RNA polymerase
spQ9PR27SYY_UREPA 340-380 Tyrosyl-tRNA synthetase
spQ9PQC2SYA_UREPA 217-265 Alanyl-tRNA synthetase
spQ9PQ74DPO3A_UREPA 365-400 DNA polymerase III subunit alpha
spQ9PQX7Y166_UREPA 507-553 Hypothetical protein
Write a Comment
User Comments (0)
About PowerShow.com