Title: Protein DNA Interactions From interactions to function prediction Sue Jones
1Protein DNA InteractionsFrom interactions to
function prediction Sue Jones
- Department of Biochemistry
- University of Sussex
- 20th Sept 2004
- EMBL Lecture Course
2Outline
- Protein-DNA Interactions importance
- Structural Data
- Predicting DNA Binding Function
- Alternative Method New Perspectives
3(No Transcript)
4Protein-DNA Interactions Importance
- Gene expression
- Transcription initiation (TATA binding protein)
- RNA synthesis (RNA polymerase)
- Transcription regulation (MAX protein)
- DNA repair (DNA glycosylase oxidative DNA
damage)
5Protein-DNA Interactions Importance
- DNA packaging (Histone H2A.e)
- DNA replication (Polymerases, Ligases, single
stranded binding proteins)
6Outline
- Protein-DNA Interactions importance
- Structural Data
- Predicting DNA Binding Function
- Alternative Method New Perspectives
7DNA
- DNA has structural flexibility
- Structure described by Watson Crick B-form
Feature B A
Type of helix RH RH
Diameter 2.37 2.55
Rise per bp 0.34 0.29
bp per turn 10 11
Major groove Wide, deep Narrow, deep
Minor groove Narrow, shallow Wide, shallow
B A Z
8Structural Data
- NDB assemble and distribute structural
information about nucleic acids - 2490 structures (25/08/04)
Protein-DNA Complex Number
Double Helix 593
Single Strand 57
http//ndbserver.rutgers.edu Berman et al., 1992.
Biophys J 63 p751
9Protein-DNA Interactions Structure
10Protein-DNA Interactions characteristics
- Major and minor groove binding
- DNA-binding motifs
- Positively charged surface areas
- Size ASA 618Å2 - 2833Å2
- Conformational changes
- DNA bending
- domain movements, quaternary changes
- Nadassy et al., 1999 Biochemistry 38 p1999
- Jones et al., 1999 J.Mol.Biol. 287 p877
11Outline
- Protein-DNA Interactions importance
- Structural Data
- Predicting DNA Binding Function
- New Perspectives
12Predicting DNA Binding Function
- Knowing a proteins function is essential in
understanding - cellular location
- interactions
- biochemical pathways
- potential as drug targets
- Prediction of protein DNA binding site given
unbound protein structure - electrostatic patches
- motifs
13Predicting Function from Structure
- Structural genomics filling in the gaps of
protein structure space - Structures solved that have low sequence identity
(lt 30 sequence identity) - Potentially little or no fold similarity to any
currently in the PDB - Require algorithms to make fast reliable
function predictions
14Predicting DNA Binding Function
- Easy to make matches between globally homologous
structures - Method aims to identify remote matches based on
local homology of a specific motif - Helix-Turn-Helix (HTH)
- C-terminal helix - major groove binding
- 1/3 DNA-binding protein families (16/54)
15HTH Motif Proteins
Hin Recombinase (1hcr)
Catabolic Activator Protein (1j59)
16HTH Motif Dataflow
120 HTH PDB Chains
NDB
PDB
Literature
PFAM
SMART
26 Hidden Markov Models
PDB
SAM-T99
Literature
Rasmol
349 HTH Chains
227 HTH Proteins
28 HMMs
3D-Templates
29 SREPS
7 HREPS
84 NI Proteins
86 NI Proteins
232 HTH Chains
30 SREPS
17HTH Template Library
1ais
1hcr
1b9m
1eto
1hcrA160-181 1b9mA32-56 1etoA73-95 1aisB1267-1293
1jhgA68-91 1lmb331-53 1orc016-36
1jhg
1lmb
1orc
18Template Scanning
- Scanning template library against 3D structures
- One template T (length n) scanned against protein
P of length m, calculated optimal gapless
superposition at each m-n1 possible positions in
P using RMSD - Based on Kabsch (1976) Acta Cryst A. 32 p922
19RMSD Distributions
1.6Å
Frequency
RMSD
368/8266 3.5 false positives
5/84 1.4 false negatives
20Improving Template Specificity
- Extending templates
- Assessing motif accessible surface area (ASA)
- 2 templates 61/8264 0.7 false positives
- ASA threshold (990Å2) 38/8264 0.5 false
positives - 3 false positives were actually real HTH
proteins not previously annotated
21New HTH Motif 1
- DNA Methyltransferase (MGMT)
- 110-129 C-terminal domain
- d and e helices
- Site directed mutagenesis
1mgtA
22New HTH Motif 2
1fy7A
- Histone acetyltransferase
- 368-388 C-terminal domain
- zinc finger N-terminal domain
- protein-protein interactions
- SCOP winged helix
23New HTH Motif 3
1taq
1tau
- Polymerase I
- 673-700 fingers subdomain
- DNA contacts O helix
- New HTH precedes O helix
24Generic Templates
25Generic Templates
Sequence Full sequence HMMs (0.001)
Structure RMSD lt 1.6
26Structural Genomics Targets
- Scanned template library against 30 target
structures from MCSG
Isocitrate lyase regulator transcription factor.
(Zhang et al., J. Biol. Chem. 2002)
27Summary
- Method combined structural data from NDB and PDB
with sequence data from PFAM and SMART - Structural template library of 7 HTH motifs
- RMSD threshold from optimal superposition
- Hit rate of 88 false positive rate of 0.5
- Recognition across families
- Template method independent of global fold
similarity - Potential to identify new DNA binding HTH motifs
28Online Function Prediction
http//www.ebi.ac.uk/thornton-srv/databases/PDNA-p
red
29Outline
- Protein-DNA Interactions importance
- Structural Data
- Predicting DNA Binding Function
- Alternative Method New Perspectives
30Alternative Statistical Model
Statistical Models for discerning protein
structures containing the DNA-binding HTH motif.
Mclaughlin and Berman, J. Mol. Biol. 2003 p43.
- Decision tree model to identify key structural
features - geometric measurements of recognition helix (RH)
helices beta sheets preceding and following - Key features
- High solvent accessibility of RH
- Hydrophobic interaction between RH 2nd helix
preceding - Predicting HTH motifs within the PDB
- 98 accuracy 0.7 false positive rate
- Predicted new HTH motifs
31Future Perspectives
- Extend method to other DNA binding motifs HLH,
HhH, ?-ribbon - Using electrostatic potentials with motifs to
improve method - Spatial templates for proteins that dont use
discrete motifs for DNA recognition
32Acknowledgements
Mario Garcia Carles Ferrer
- Department of Energy USA
- European Bioinformatics Institute
- Rutgers The State University