Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology & Center for Computational Biology / Bioinformatics Indiana University School of Medicine - PowerPoint PPT Presentation

Loading...

PPT – Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology & Center for Computational Biology / Bioinformatics Indiana University School of Medicine PowerPoint presentation | free to download - id: 3bce22-Y2U2Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology & Center for Computational Biology / Bioinformatics Indiana University School of Medicine

Description:

Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology & Center for Computational Biology / Bioinformatics – PowerPoint PPT presentation

Number of Views:511
Avg rating:3.0/5.0

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology & Center for Computational Biology / Bioinformatics Indiana University School of Medicine


1
Bioinformatics and Intrinsically Disordered
Proteins (IDPs) A. Keith DunkerBiochemistry
and Molecular Biology Center for Computational
Biology / BioinformaticsIndiana University
School of Medicine
  • Presented at
  • October 22, 2010

2
Outline
  • What are Intrinsically Disordered Proteins ?
  • Bioinformatics Applications to IDPs
  • Why dont IDPs form structure?
  • Predicting IDPs from amino acid sequence
  • Some important results from IDP prediction
  • An improved order / disorder amino acid scale
  • Predicting phosphorylation sites
  • Disorder and function two examples
  • Importance of bioinformatics to IDP research

3
Definitions Intrinsically Disordered Proteins
(IDPs) and ID Regions (IDRs)
  • Whole proteins and regions of proteins are
    intrinsically disordered if they lack stable 3D
    structure under physiological conditions,
  • But exist instead as highly dynamic, rapidly
    interconverting ensembles without particular
    equilibrium values for their coordinates or bond
    angles and with non-cooperative conformational
    changes.

4
Outline
  • What are Intrinsically Disordered Proteins ?
  • Bioinformatics Applications to IDPs
  • Why dont IDPs form structure?
  • Predicting IDPs from amino acid sequence
  • Some important results from IDP prediction
  • An improved order / disorder amino acid scale
  • Predicting phosphorylation sites
  • Disorder and function two examples
  • Importance of bioinformatics to IDP research

5
Why are IDPs / IDRs unstructured?
  • From the 1950s to now, gtgt 1,000 IDPs / IDRs
    studied and characterized
  • Visit http//www.disprot.org
  • Why do IDPs IDRs lack structure?
  • Lack a ligand or partner?
  • Denatured during isolation?
  • Folding requires conditions found inside cells?
  • Lack of folding encoded by amino acid sequence?

6
Amino Acid Compositions
Surface
Buried
7
Why are IDPs / IDRs unstructured?
  • To a first approximation, amino acid composition
    determines whether a protein folds or remains
    intrinsically disordered.
  • Given a composition that favors folding, the
    sequence details determine which fold.
  • Given a composition that favors not folding, the
    sequence details provide motifs for biological
    function.

8
Outline
  • What are Intrinsically Disordered Proteins ?
  • Bioinformatics Applications to IDPs
  • Why dont IDPs form structure?
  • Predicting IDPs from amino acid sequence
  • Some important results from IDP prediction
  • An improved order / disorder amino acid scale
  • Predicting phosphorylation sites
  • Disorder and function two examples
  • Importance of bioinformatics to IDP research

9
Prediction of Intrinsic Disorder
Aromaticity, Hydropathy, Charge, Complexity
Neural Networks, SVMs, etc.
10
First Machine-learning PredictorSDR/MDR/LDR
Predictors
  • Short Disordered Regions (SDR) 7 21 missing
    AA
  • Medium Disordered Regions (MDR) 22 44
  • Long Disordered Regions (LDR) 45 or more
  • SDR / MDR / LDR predictors Neural networks
  • Training dataset proteins with missing AA
  • SDR 34 proteins, 11,050 AA, 38 IDR, 411 IDAA
  • MDR 20 proteins, 4,764 AA, 22 IDR, 464 IDAA
  • LDR 7 proteins, 2,069 AA, 7 IDR, 465 IDAA
  • 4. Feature selection standard sequential forward
    selection
  • Accuracy 59 67 estimated by 5-cross
    validation
  • Better than chance Better on self than on not
    self

Romero P, et.al. Proc. IEEE International
Conference on Neural Networks. 190-95 (1997)
11
Next PONDRVL-XT
XN(1)
11
XN, VL1, and neural networks
14
VL1(2)
VL-XT(2)
Input features XN 8 VL1 10 XC 8
N-14
N-11
XC(1)
(1) Li X et al., Genome Informat. 9201-213
(1999) (2) Romero P et al., Proteins 4238-48
(2001)
12
Inputs for PONDRVL-XT
Accuracy (ACC) ( Corr-O)/2 (Corr-D)/2 ACC
( estimated by cross-validation ) 72 4 Li
X. et.al. Genome Informat. 9201-213(1999) Romero
P. et.al. Proteins 4238-48(2001)
13
Disorder Prediction in CASP
  • Critical Assessment of Structure Prediction
  • http//predictioncenter.org
  • CASP1(1994) to CASP9 (2010)
  • Experimentalists provide amino acid sequences as
    they are determining the structures of proteins
  • Groups register and make structure predictons
  • After structures determined, predictions
    evaluated
  • Disorder predictions introduced in CASP5 (2002)
  • CASP PREDICTIONS ARE TRULY BLIND!!!

14
Disorder Prediction in CASP
PreDisorder
VSL2
VSL2
CASP5 (2002), sensitivity replaced AUC
15
Our Performance in CASP
  • Used VL-XT, poor on short disordered regions in
    CASP5, but very well on long disordered regions.
  • VL trained mainly on long disordered regions.
  • Changed predictor in CASP6 and CASP7, new
    predictor ranked 1. Big improvement !!
  • Did not participate in CASP 8, but would not have
    ranked 1 with current predictors.
  • What was change that led to large improvement in
    CASP6??

16
Predictors of Natural Disordered
RegionsPONDRVL-XT and PONDRVSL2
M1(3)
N(1)
11
OM
1-OM
14
VL2(3)
OL
VL-XT(2)
VL1(2)
VSL2(3)
OS
VS2(3)
N-14
N-11
C(1)
VSL2 Score OLOM OS(1-OM)
N, VL1, and C are neural networks N-term 8
inputs VL1 10 inputs C-term 8 inputs
M1, VSL2-L, and VSL2-S are support vector
machines M1 54 inputs VL2 20 inputs VS2
20 inputs
(1) Li X et al., Genome Informat. 9201-213
(1999) (2) Romero P et al., Proteins 4238-48
(2001) (3) Peng K et al., BMC Bioinfo. 7208
(2006)
17
Comparison on CASP 8 Dataset
ACC 80
AUC 0.89
?
AUC Area Under Curve
?
ACC (Corr-O)/2 (Corr-D)/2
Zhang P, et.al. (unpublished results not quite
same as CASP evaluation)
18
PONDRVL-XT, PONDRVSL2B and PreDisorder
() Disordered XPA () Structured
Iakoucheva L et al., Protein Sci 3 561-571
(2001) Dunker AK et al., FEBS J 272 5129-5148
(2005) Deng X., et al., BMC Bioinformatics 10436
(2009)
19
Published Predictors of Disordered Proteins
PONDRs - VSL1 Ranked 1 in CASP 6 (2004) -
VSL2 Ranked 1 in CASP 7 (2006)
PONDRS
,- / phobics
8
7
6
5
CASP
Year
He B, et al., Cell
Res 19 929-949 (2009)
20
Outline
  • What are Intrinsically Disordered Proteins
    (IDPs)
  • Bioinformatics Applications to IDPs
  • Why dont IDPs form structure?
  • Predicting IDPs from amino acid sequence
  • Some important results from IDP prediction
  • An improved order / disorder amino acid scale
  • Predicting phosphorylation sites
  • Disorder and function two examples
  • Importance of bioinformatics to IDP research

21
How Abundant are IDRs/IDPs?
  • To Estimate Abundance of IDPs/IDRs predict on
    whole proteomes from many organisms.
  • ALERT!!
  • Lack of membrane-protein-specific disorder
    predictors means that
  • Estimates of disorder will be too low by a small
    percentage.

22
VSL2 Prediction of Abundance of Intrinsically
Disordered Proteins
Are organism-specific predictors sometimes
needed?
23
Archaea Phylogenetic Tree
gt30 gt21 gt14
gt17 lt14
Todd Lowe (http//archaea.ucsc.edu/)
24
Predicted Disorder vs. Proteome Size
25
Why So Much Disorder?Hypothesis Disorder Used
for Signaling
  • Sequence ? Structure ? Function
  • Catalysis,
  • Membrane transport,
  • Binding small molecules.
  • Sequence ? Disordered Ensemble ? Function
  • Signaling,
  • Regulation, Dunker AK, et al.,
    Biochemistry 41 6573-6582 (2002)
  • Recognition, Dunker AK, et al., Adv. Prot.
    Chem. 62 25-49 (2002)
  • Control. Xie H, et al., Proteome Res.
    6 1882-1932 (2007)

26
Outline
  • What are Intrinsically Disordered Proteins
    (IDPs)
  • Bioinformatics Applications to IDPs
  • Why dont IDPs form structure?
  • Predicting IDPs from amino acid sequence
  • Some important results from IDP prediction
  • An improved order / disorder amino acid scale
  • Predicting phosphorylation sites
  • Importance of bioinformatics to IDP research

27
A New Order / Disorder AA Scale, Part 1
  • Collect equal numbers of O and D windows of
    length 21.
  • Calculate the value of attribute, x, for each
    window.
  • For each interval of x, count how many windows
    are O and D from this, determine P (O I x) and P
    (D I x)
  • Plot P (O I x) and P (D I x) versus x.
  • Determine the areas between the two curves.
  • Area Ratio Value (area between curves / total
    area)
  • Apply to 517 aa scales http//www.genome.jp/aain
    dex .
  • Rank scales from smallest to largest
  • Campen A, et al Protein Pept Lett 15 956-963
    (2008)

28
A New Order / Disorder AA Scale, Part 2
  • Overall idea make random changes to a scale,
    test for higher ARV, repeat until no larger value
    is found.
  • Genetic Algorithm Pseudocode
  • Choose initial population
  • Repeat
  • Evaluate the fitness of each individual
  • Select a certain portion of best-ranking
    individuals
  • Breed new population through crossover mutation
  • Until terminating condition
  • ARV value improved from 0.69 for best of 517
    scales to 0.76 for new scale, called TOP-ID
  • Campen A, et al Protein Pept Lett 15 956-963
    (2008)

29
P (D l x) and P (O I x) Versus x PlotsArea
Between Curves Used to Rank Attributes, X
Extracellular Protein AA Composition
Flexibility
ARV 0.07, Rank 517/517
ARV 0.69, Rank 1/517
TOP-IDP
Positive Charge
ARV 0.76
ARV 0.36, Rank 238/517
Campen A et al., Protein Peptide Lett 15
956-963 (2008)
30
Analysis of the disorder propensity in p53 by
Top-IDP (A), PONDR VLXT (B) and PONDR VSL1 (C).
31
Chronology of Amino Acid Evolution DISORDER TO
ORDER, NON-LIFE TO LIFE
Di Mauro E, et al., in Genesis Origin of Life
on Earth and Other Planets (In press)
32
Outline
  • What are Intrinsically Disordered Proteins
    (IDPs)
  • Bioinformatics Applications to IDPs
  • Why dont IDPs form structure?
  • Predicting IDPs from amino acid sequence
  • Some important results from IDP prediction
  • An improved order / disorder amino acid scale
  • Predicting phosphorylation sites
  • Disorder and function two examples
  • Importance of bioinformatics to IDP research

33
New Phosphorylation Predictor
KNN similarity to known sites ( / -) of
phosphorylation Disorder Scores used VSL2 AA
frequencies at sequence positions before and
after phophorylation sites
Gao J et al Mol and Cell Proteomics (In press)
34
Disorder Score vs. Phosphorylation
54.9 gt 0.5
91.3 gt 0.5
Residue Positions
50.5 gt 0.5
87.6 gt 0.5
Gao J et al., Mol Cell Proteomics 9 (Epub)
(2010)
35
Outline
  • What are Intrinsically Disordered Proteins
    (IDPs)
  • Bioinformatics Applications to IDPs
  • Why dont IDPs form structure?
  • Predicting IDPs from amino acid sequence
  • Some important results from IDP prediction
  • An improved order / disorder amino acid scale
  • Predicting phosphorylation sites
  • Disorder and function two examples
  • Importance of bioinformatics to IDP research

36
Signaling Example 1 Calcineurin and Calmodulin
Meador W et al., Science 257 1251-1255 (1992)
B-Subunit
A-Subunit
Active Site
Autoinhibitory Peptide
Kissinger C et al., Nature 378641-644 (1995)
37
Example 2 p27kip1 A Disordered Domain
CDK
Cyclin A
25 ?
93 ?
p27kip1
(69 residues)
3D Structure Russo AA et al., Nature 382
325-331 (1996) DD Tompa P et al., Bioessays 4
328-340 (2008)
38
The p27kip1 Disordered Domain Used for Signal
Integration
3
1
2
1. NRTK phosphorylation _at_ Y88, signal 1. 2.
Intra-molecular phosphorylation _at_ T187, 2. 3.
Ubiquitination _at_ several possible loci, 3. 4.
Proteasome digestion of p27, then cell cycle
progression.
4
Galea CA et al., J Mol Biol 376 827-838
(2008) Dunker AK Uversky VN, Nat Chem Biol 4
229-230 (2008)
39
Outline
  • What are Intrinsically Disordered Proteins
    (IDPs)
  • Bioinformatics Applications to IDPs
  • Predicting IDPs from amino acid sequence
  • Some important results from IDP prediction
  • An improved order / disorder amino acid scale
  • Predicting phosphorylation sites
  • Disorder and function two examples
  • Importance of bioinformatics to IDP research

40
Importance of Bioinformatics to IDP and Protein
Research
  • Thousands of IDPs and IDRs have been found.
  • Not one IDP or IDR is discussed in any current
    biochemistry textbook!
  • Why? - IDPs and IDRs dont fit
  • Sequence ? Structure ? Function
  • New paradigm developed from bioinformatics
  • Sequence ? Disordered Ensemble ? Function
  • IDP prediction is changing fundamental views of
    structure-function relationships!

41
  • Thank You ! ! !

42
Collaborators
Indiana University Bin Xue Jake Chen Bill
Sullivan Predrag Radivojac Jennifer Chen Pedro
Romero Marc Cortese Derrick Johnson Chris
Oldfield Amrita Mohan Yunlong Liu Ann Roman Tom
Hurley Anna DePaoli-Roach Yuro Tagaki Siama
Zaidi Jingwei Meng Wei-Lun Hsu Hua Lu Fei
Huang Vladimir Uversky
Harbin Engineering University Bo He Kejun
Wang University of Idaho Celeste J. Brown Chris
Williams Molecular Kinetics Yugong Cheng Tanguy
LeGall Aaron Santner Plant and Food
Research Xaiolin Sun USF Gary Daughdrill Wright
State University Oleg Paliy

UCSD Lilia Iakoucheva Sebat Temple
University Zoran Obradovic Slobodan
Vucetic Vladimir Vacic Kang Peng Hiongbo
Xie Siyuan Ren Uros Midic Enzyme Institute Peter
Tompa Zsuzsanna Dosztanyi Istvan Simon Monika
Fuxreiter USU Robert Williams
About PowerShow.com