SPAM Project IV - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

SPAM Project IV

Description:

SPAM project IV 03-05-2003. Main Aims of the Project IV. 3. ... Yule's (colligation) coefficient = (NTT x NFF - NTF x NFT ) / (NTT x NFF NTF x ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 33
Provided by: spam
Category:

less

Transcript and Presenter's Notes

Title: SPAM Project IV


1
SPAM Project IV Julia Ponomarenko
San Diego Supercomputer Center, University of
California, San Diego La Jolla, CA, USA
2
Main Aims of the Project IV
Specific studies performed and components
developed
Projects
Enhancing structural alignment with interaction
patterns for DNA-binding domains
1. Improve the CE and MC-CE pairwaise- and
multiple-structure comparison algorithms,
respectively
Study and fully-automated classification of
DNA-binding protein domains (accomplished and
published in Bioinformatics, 2002)
The classification and annotation resource for
DNA-binding protein domains has been built and
made available for the community
2. With the improved structure alignments,
characterize structures according to new and
revised domain assignments and associated
domain-level annotation
Study of enhanced CE algorithm (i) detailed
representation of residue (ii) multiple HSPs
Study of structural and functional annotation and
classification of three helical bundle motifs
(3HB) containing HTH DNA-binding motifs (in
progress)
Literature review of structural and functional
specifics of HTH and 3HBs
3. Provide all algorithms and associated
annotated domain databases to a worldwide
community
Analysis of HTH and 3HB motifs, and associated
domains in PDB from the point of view of their
annotation and classification
3
(No Transcript)
4
(No Transcript)
5
Nuclear ribonucleoprotein A1
Papillomavirus-1 E2
6
Nuclear ribonucleoprotein A1
Papillomavirus-1 E2
7
SCOP (different superfamilies) 2bopa
d.58.8.1 2up1a d.58.7.1 CATH (similar
homologous superfamily) 2bopa
3.30.70.330 2up1a 3.30.70.330 DALI
(different functional families) 2bopa
DC_5_3_15 2up1a DC_5_3_26
Regulator transcription, binds dsDNA
Telomere length regulator, binds ssDNA
8
Study and Classification of DNA-binding protein
domains Data and algorithms used and developed
  • PDB Protein Data Bank was used as the source
    of original structural data
  • PDP Protein Domain Parser (Alexandrov and
    Shindyalov, Bioinformatics, 2003)
  • CE Protein structure alignment by
    Combinatorial Extension (Shindyalov, Bourne,
    1998)
  • Enhanced structural alignment with optimization
    of CE based alignment using DNA-binding
    interaction patterns
  • Domain classification algorithm using composite
    scoring function involving parameters
    representing domain structural similarity and
    matching of interaction patterns
  • Comparative analysis methodology for domain
    classifications using 2x2 table representation
    and set of seven statistical coefficients

9
Building representative set of DNA-binding domains
PDB
17,304 entries (02/13/2002)
19,006 entries (10/22/2002)
983
805 chains
1,547 domains
1,254
1,085 domains
338 domains
399
Calculating classification of DNA-binding protein
domains
10
Selection of DNA-binding protein chains/domains
by analyzing DNA-protein contacts
  • The DNA fragment size is at least 5 bp long.
  • At least 5 different protein residues are
    involved in the interaction with DNA.
  • The contact distance cutoff between interacting
    atoms was lt 5Å.
  • We did not take into account the different types
    of DNA (A, B, Z) because of the insufficient
    level of this annotation in the PDB

11
  • Representatives are different from each other as
    defined by the following criteria
  • Rmsd, root mean squared deviation between two
    aligned and compared protein domains, gt 2.0 Å
  • Z-score, statistically founded score obtained
    from CE, lt 4.5
  • Sequence identity in the alignment, lt 90
  • Rnar, ratio of the number of structurally
    aligned residues to the smallest domain length, lt
    90.

12
  • Structural comparison of representative
    DNA-binding protein domains to each other was
    performed using the CE algorithm. Two classes of
    parameters measuring domains similarity are
    considered
  • Parameters measuring structural similarity Rmsd,
    Z-score, Rnar
  • Parameter measuring the match between DNA-protein
    contact patterns, Rmat

- residue contacts with DNA matching
contacts
A and B - DNA-binding protein domains RmatX -
ratio of the number of matched (structurally
aligned) contact residues to the total number of
residues involved in contacts with DNA in the
protein X. Rmat minRmatA, RmatB
13
Illustration of the parameter Rmat measuring the
match of structurally aligned residues involved
in interaction with DNA 1IMHC2 (199- 365)
- NUCLEAR FACTOR OF ACTIVATED T CELLS 5 1RAMA1
(19- 193) - TRANSCRIPTION FACTOR NF-KB
The total number of residues involved in the
interaction with DNA 1IMHC2 18 1RAMA1
18 Among them are structurally aligned
1IMHC2 72 (red) 1RAMA1 74 (green)
Rmat 72
1IMHC2 1RAMA1
14
Realignment using scoring function taking into
account structural similarity between two protein
domains and protein-DNA contact pattern
Similarity matrix
Structure similarity term
Protein-DNA contact pattern term
where
m denotes protein residue, X protein-DNA
complex C3 is a scaling constant
15
Illustration of the result of the realignment
procedure 1IMHC2 (199- 365) - NUCLEAR
FACTOR OF ACTIVATED T CELLS 5 1RAMA1 (19-
193) - TRANSCRIPTION FACTOR NF-KB
Rmsd 2.4 Å Rnar 82 Rmat 72
16
Comparative analysis methodology for domain
classifications using 2x2 table representation
and set of statistical coefficients
  • counts of matches/mismatches between two
    classifications
  • T true (match), F false (mismatch)

Jaccards coefficient NTT / (NTT NTF NFT)
Yules (colligation) coefficient (NTT x NFF -
NTF x NFT ) / (NTT x NFF NTF x NFT )
17
Comparison of the classification for 263 (from
338) DNA-binding domain representatives with SCOP
at various threshold parameters
18
Comparison of two structural classifications
accounting (A) and not accounting (B) for
protein-DNA contacts
A
B
19
Preferred choice of parameters for the best
classification of representative DNA-binding
domains
Rmsd, Å
Not similar
5
Z-score ? 3.5
Similar if Rmat ? 80
3
Not similar
Similar
Rnar,
100
85
70
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
SPDC (Structural Protein Domain Classification)
Implementation
Sun HPC 10000 / 64 cpu (parallel using MPI) Sun
HPC 4000 / 4 cpu (single-cpu)
320 cpu hours
Building representative set of domains
2 cpu hours
Building DNA-binding domain classification
Fully automated updates (quaternally monthly ?
weekly - current with PDB)
25
1JMCA (183-299) SCOP b.40.4.3 - Single strand
DNA-binding domain b.40 OB-fold
26
(No Transcript)
27
(No Transcript)
28
Helix-turn-helix DNA binding motif
Myb Proto-Oncogene Protein a.4.1.3
MATA-1 (homeodomain) a.4.1.1 1.10.10.6
HIN RECOMBINASE a.4.1.2 1.10.10.6
Rmsd 1.7Å Z-Score 3.9Sequence identity
12.2
Rmsd 2.3Å Z-Score 3.7Sequence identity
13.5
Rmsd 2.0Å Z-Score 3.7Sequence identity
15.4
29
TFIIB C-domain binds DNA
Retinoblastoma A pocket
Cell division protein kinase
E7 peptide
TFIIB N-domain binds TBP protein
Cyclin A3
Rmsd 1.2Å Z-Score 4.6Sequence identity 18
Rmsd 2.4Å Z-Score 3.7Sequence identity 6
3HB motif could be involved in the interaction
with DNA as well as with other proteins
30
Study of structural and functional annotation and
classification of three helical bundle motifs
(3HB) containing HTH DNA-binding motifs
Representative 3HB motifs containing HTH
DNA-binding motif by Luscomber et al., 2000.
31
Study of structural and functional annotation and
classification of three helical bundle motifs
(3HB) containing HTH DNA-binding motifs
  • Plan
  • Comparison of the representative 3HBs with PDB
    structures using enhanced CE algorithm (i)
    detailed representation of residue (ii)
    multiple HSPs.
  • Post-filtering of structural neighbours based on
    similarity with representatives 3HBs using
    features (solvent accessibility, polarity of
    enviroment, secondary structure, structural
    similarity).
  • Study of the resulting set of 3HBs sequence
    similarity, structural enviroment, clastering,
    functional classification.

32
Carbamoyl phosphate synthetase
??-resolvase
Rmsd 1.3Å Z-Score 4.1Sequence identity 14
3HB motif in carbamoyl phosphate synthetase
appear to involve in the stabilization of the
tetramer
Write a Comment
User Comments (0)
About PowerShow.com