BCB 444544 Introduction to Bioinformatics - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

BCB 444544 Introduction to Bioinformatics

Description:

Sue Lamont (An Sci, ISU) Integrated genomic approaches to enhance host ... hydrophobicity. contact matrix. Yungok Ihm. i. j. 1. N. Template Structure. Contact Energy ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 48
Provided by: drena1
Category:

less

Transcript and Presenter's Notes

Title: BCB 444544 Introduction to Bioinformatics


1
BCB 444/544 - Introduction to Bioinformatics
Lecture 31 Predicting Protein Structure by
Threading - cont 31_Nov6
2
Seminars in Bioinformatics/Genomics
  • Mon Nov 6
  • Sue Lamont (An Sci, ISU) Integrated genomic
    approaches to enhance host resistance to
    food-safety pathogens
  • IG Faculty Seminar 1210 PM in 101 Ind Ed II
  • Thurs Nov 9
  • Hassane Mchauourab (Center for Structural
    Biology, Vanderbilt) Structural dynamics of
    multidrug transporters
  • Baker Center Seminar 210 PM in Howe Hall
    Auditorium
  • Sean Rice (Biol Sci, Texas Tech) Constructing an
    exact and universal evolutionary theory
  • Applie Math/EEOB Seminar 345 in 210 Bessey

3
Assignments Reading This Week
Mon Nov 6 Review Protein Structure
Prediction Ginalski et al (2005) Nucleic Acids
Res.331874 doi10.1093/nar/gki327 Wed Nov
8 1) Review SVMs in Bioinformatics Yang 2004
Briefings in Bioinformatics 5328
doi10.1093/bib/5.4.328 2) SVMs
http//en.wikipedia.org/wiki/Support_Vector_Machin
e 3) ANNs http//en.wikipedia.org/wiki/Artific
ial_neural_network Thurs Nov 9 Lab 10
Protein Structure Prediction Fri Nov 10 Chp
8.1 - 8.4 Proteomics (Previously assigned)
4
Assignments Due this week
BCB 444 544 HW5 Due at Noon, Mon Nov 6
(today) BCB 544 Only 544Extra2
Due at Noon, Mon Nov 12 Teams Must meet with
us this week
5
Deciphering the Protein Folding Code
  • Protein Structure Prediction
  • or "Protein Folding" Problem
  • Given the amino acid sequence of a protein,
    predict its
  • 3-dimensional structure (fold)
  • "Inverse Folding" Problem
  • Given a protein fold, identify every amino acid
    sequence that can adopt that
  • 3-dimensional structure

6
Tertiary Structure Prediction
  • 3 Major Approaches to Protein 3-D Structure
    Prediction
  • 1- Ab initio
  • Comparative modeling
  • 2 - Homology modeling
  • 3- Threading
  • "Comparative modeling" - term is sometimes used
    to mean just "homology modeling," but also
    sometimes used to mean both "homology modeling"
    "threading/fold recognition"
  • Most approaches exploit secondary structure
    prediction as input or filtering step
  • Recall that 2' structure prediction can be highly
    accurate
  • (gt90 on a per residue basis)
  • You will perform 2' structure prediction in lab
    this week

7
Steps in Threading
  • Align target sequence with template structures
  • (fold library) from the Protein Data Bank (PDB)
  • Calculate energy score to evaluate goodness of
    fit between target sequence template structure
  • Rank models based on energy scores

8
A Rapid Threading Approach for Protein Structure
Prediction
Kai-Ming Ho, Physics Haibo Cao Yungok
Ihm Zhong Gao James Morris Cai-zhuang
Wang Drena Dobbs, GDCB Jae-Hyung Lee Michael
Terribilini Jeff Sander
Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs,
D, Ho, KM (2004) Three-dimensional threading
approach to protein structure recognition Polymer
45687-697
9
Motivations for Assumptions of Ho Threading
Algorithm
  • Goal Develop a threading algorithm that
  • Is simple rapid enough to be used in high
    throughput applications
  • Is relatively "insensitive" to sequence
    similarity between target protein sequence
    sequence of template structure
  • (to enhance detection of remote homologs
    structures that are similar due to
    convergent evolution)
  • Can be used to answer questions such as
  • What are the predicted folds of all "unassigned"
    ORFs in Arabidopsis?
  • Does Arabidopsis have a protein with structure
    similar to mammalian Tumor Necrosis Factor (TNF)?
  • Assumptions
  • Native state of a protein is lowest free energy
    state
  • Hydrophobic interactions drive protein folding

10
Simplify Template structure representation
if
(contact)
Å
Yungok Ihm
11
Simplify Target Sequence Representation
  • Miyazawa-Jernigan (MJ) model inter-residue
    contact energy M(i, j) is a quasi-chemical
    approximation based on pair-wise contact
    statistics extracted from known protein
    structures in the PDB (20 X 20 matrix)
  • Li-Tang-Wingreen (LTW) factorize the MJ
    interaction matrix to reduce the number of
    parameters from 210 to 20 q values associated
    with 20 amino acids
  • Hydrophobic-Polar (HP) represent amino acids as
    either H (hydrophobic) or polar (P) utility of
    this simple binary alphabet representation
    promoted by Dill et al.

12
Simplify Energy Function
  • Interaction counts only if two hydrophobic
    amino acid residues are in contact
  • At residue level, pair-wise hydrophobic
    interaction is dominant
  • E ?i,j Cij Uij
  • Cij contact matrix
  • Uij U(residue I, residue J)
  • MJ U Uij
  • LTW U QiQj
  • HP U 1,0

Yungok Ihm
13
Energy calculation Contact energy
Li-Tang-Wingreen (LTW)
20 parameters
solubility hydrophobicity contact matrix
Contact Energy
with
Yungok Ihm
14
Summary of Ho Threading Procedure
Yungok Ihm
15
Can complexity be further reduced?
Haibo Cao
16
Examine eigenvectors of contact matrix
Haibo Cao
17
Represent contact matrix by its
dominanteigenvector (1D profile)
  • First eigenvector (with highest eigenvalue)
    dominates the overlap between sequence and
    structure
  • Higher ranking (rank gt 4) eigenvectors are
    sequence blind

Haibo Cao
18
Threading Alignment Align target sequence
vector with 1D profile of template structure
Cao et al Polymer 45 (2004)
19
Parameters for alignment?
  • Gap penalty
  • Insertion/deletion in helices or strands is
    strongly penalized small penalties for in/dels
    in loops
  • Gap penalties do not count in energy calculation
  • Size penalty
  • If a target residue and aligned template
    residue differ in radius by gt 0.5Å and if the
    residue is involved in gt2 contacts, alignment is
    penalized
  • Size penalties do not count in energy calculation

ALKKGFGHFDTSE
Loop
Helix
Yungok Ihm
20
How include secondary structure?
  • Predict secondary structure of target sequence
    (PSIPRED, PROF, JPRED, SAM, GOR V)
  • N total number of matches between the
    predicted secondary structure and the template
    structure
  • N- total number of mismatches
  • Ns total number of residues selected in
    alignment
  • Global fitness f 1 (N - N-) / Ns
  • Emod f Ethreading

Yungok Ihm
21
How much better is this fit than random?
  • Emod Sequence vs Structure
  • Eshuffle Shuffled Sequence vs Structure
  • Erelative Emod Eshuffled

Yungok Ihm
22
Performance Evaluation? "Blind Test"
  • CASP5 Competition
  • (Critical Assessment of Protein Structure
    Prediction)
  • Given Amino acid sequence
  • Goal Predict 3-D structure
  • (before experimental results
    published)

23
Typical Results (well, actually, our BEST
Results) HO 1-ranked CASP5 prediction for
this target
  • Target 174
  • PDB ID 1MG7

Ho, Cao, Ihm. Wang
24
Overall Performance in CASP5 Contest 8th out
of 180 (M. Levitt, Stanford)
  • FR Fold Recognition
  • (targets manually assessed by Nick Grishin)
  • --------------------------------------------------
    ---------
  • Rank Z-Score Ngood Npred NgNW NpNW
    Group-name
  • 1 24.26 9.00 12.00 9 12
    Ginalski
  • 2 21.64 7.00 12.00 7 12
    Skolnick Kolinski
  • 3 19.55 8.00 12.50 9 14
    Baker
  • 4 16.88 6.00 10.00 6 10
    BIOINFO.PL
  • 5 15.25 7.00 7.00 7 7
    Shortle
  • 6 14.56 6.50 11.50 7 13
    BAKER-ROBETTA
  • 7 13.49 4.00 11.00 4 11
    Brooks
  • 8 11.34 3.00 6.00 3 6
    Ho-Kai-Ming
  • 9 10.45 3.00 5.50 3 6
    Jones-NewFold
  • -------------------------------------------------
    ----------
  • FR NgNW - number of good predictions without
    weighting for multiple models
  • FR NpNW - number of total predictions without
    weighting for multiple models

25
Regulation of Lentivirus Replication or
"Designing New HIV Therapies"
Susan Carpenter (Washington State
Univ) Wendy Sparks Yvonne Wannemuehler Drena
Dobbs, GDCB Jae-Hyung Lee Michael
Terribilini Kai-Ming Ho, Physics Yungok
Ihm Haibo Cao Cai-zhuang Wang Gloria Culver,
BBMB Laura Dutca
BCB Fall 06 Dobbs
26
Macromolecular interactions mediated by the Rev
protein in lentiviruses (HIV EIAV)
(protein-RNA)
(protein-protein)
(protein-protein)
(protein-protein)
Susan Carpenter
27
Rev is essential for lentiviral replication
  • Rev is a small nucleoplasmic shuttling protein
  • (HIV Rev 115 aa EIAV Rev 165 aa)
  • Recognizes a specific binding site on viral RNA
  • Rev Responsive Element (RRE)
  • Interacts with CRM1 to export incompletely
    spliced viral RNAs from nucleus to the cytoplasm
  • Specific domains of Rev mediate nuclear
    localization, RNA binding, and nuclear export
  • Critical role of Rev in lentiviral replication
    makes it an attractive target for antiviral
    (AIDs) therapy

28
Problem no high resolution Rev structure! not
even for HIV Rev, despite intense effort ()
  • Why?? Rev aggregates at concentrations needed
    for NMR or X-ray crystallography
  • What about insights from sequence comparisons?
  • "undetectable" sequence similarity among Revs
    from different lentiviruses (eg, EIAV vs HIV
    lt10)
  • But
  • Lentiviral Rev proteins are functionally
    "homologous"

29
Hypothesis Rev proteins share structural
features critical for function
Approach
  • Computationally model structures of lentiviral
    Rev proteins
  • - using threading algorithm (with Ho et al)
  • Predict critical residues for RNA-binding,
    protein interaction
  • - using machine learning algorithms (with Honavar
    et al )
  • Test model and predictions
  • - using genetic/biochemical approaches (with
    Carpenter Culver)
  • - using biophysical approaches (with Andreotti
    Yu groups)
  • Initially focus on EIAV Rev RRE

30
Functional domains EIAV vs HIV Rev
  • EIAV Rev

exon 1
exon 2
1 31


165
  • HIV-1 Rev

NES - Nuclear Export Signal NLS - Nuclear
Localization Signal RBM - putative RNA Binding
Motif
31
Predicted EIAV Rev Structure
Yungok Ihm
32
Comparison of Predicted Rev Structures
Yungok Ihm
33
Structure of N-terminal region of HIV Rev
Yungok Ihm
34
Location of functional residues EIAV Rev
Critical Hydrophobic Contact?
NES
Putative RBM
Yungok Ihm
35
Mutations of hydrophobic residues predicted to be
critical for helical packing in core
L65 vs L95 L109
Single mutants Leu to Ala Leu to Asp Double
mutants Leu to Ala
Single Ala Mutation L ? A
Negligible effect on Rev activity
Insert charged aa in hydrophobic core
Single Asp Mutation L ? D
Dramatic change in Rev activity?
Double Ala Mutation L?L ? A?A
Reduction in Rev activity?
Yungok Ihm
36
Functional Analysis of Rev Structural Mutants in
vivo (CAT assay)
Wendy Sparks
37
Functional domains EIAV vs HIV Rev
- RNA interaction - Protein
interaction NES - Nuclear Export Signal NLS -
Nuclear Localization Signal RBM - putative RNA
Binding Motif
Red
Green
  • EIAV Rev
  • HIV-1 Rev

38
Predicting the RNA-binding domain of EIAV Rev
Yungok Ihm
  • 71 81 91
  • ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR

121 131 141 151 161
HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP
RVLRPGDSKRRRKHL

Michael Terribilini
31 41 51 61 71
81 91 101 111
121 131 141 151 161
DPQGPLESDQ WCRVLRQSLP EEKISSQTCI ARRHLGPGPT
QHTPSRRDRW IREQILQAEV LQERLEWRIR GVQQVAKELG
EVNRGIWREL HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP
RVLRPGDSKR RRKHL



39
Expression of MBP-ERev deletion mutants
1 31 57
125 146 165
RBM Folding?
NES
NLS
MBP-ERev
1-165
MBP
31-165
MBP
31-145
MBP
57-165
MBP
57-145
MBP
57-124
MBP
125-165
MBP
146-165
MBP
Jae-Hyung Lee
40
EIAV Rev binds specifically to RRE in vitro
Jae-Hyung Lee
41
EIAV Rev Predictions vs Experiments
PREDICTED Structure Protein binding
residues RNA binding residues
RBM
NLS/RBM
Lee et al (2006) J Virol 803844
Terribilini et al (2006) PSB 11415
Jae-Hyung Lee
42
Mutagenesis of putative RNA binding motifs
1 31
57
124 146
165
RBD
RBD
NES
NLS
ERLE
KRRRK
RRDRW
AADAA
AALA
KAAAK
ERDE
Jae-Hyung Lee
43
PREDICTED Structure Protein binding
residues RNA binding residues
RBM
FOLD?
NLS
NLS/RBM
?
?
?
ERDE
Jae-Hyung Lee
44
Summary Predictions vs Experiments
Lee et al (2006) J Virol 803844
Terribilini et al (2006) PSB 11415
45
Summary
  • Computational wet lab approaches revealed that
  • EIAV Rev has a bipartite RNA binding domain
  • Two Arg-rich RBMs are critical
  • RRDRW in central region
  • KRRRK at C-terminus, overlapping the NLS
  • Based on computational modeling, the RBMs are in
    close proximity within the 3-D structure of
    protein
  • Lentiviral Revs RRE binding sites may be more
    similar
  • in structure than has been appreciated
  • Future
  • Identify "predictive rules" for protein-RNA
    recognition

Lee et al (2006) J Virol 803844
Terribilini et al (2006) PSB 11415
46
Experimentally determine the structure!
47
Building Designer Zinc Finger DNA-binding
Proteins J Sander, P Zaback, F Fu, J
Townsend, R Winfrey D Wright, K Joung, L
Miller, D Dobbs, D Voytas
Wright et al (2006) Nature Protocols, in press
Write a Comment
User Comments (0)
About PowerShow.com