BCB 444544 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

BCB 444544

Description:

Thanks to Drena Dobbs for many borrowed & modified PPTs ... For continuous output, often use a sigmoid: 0. 1/2. 1. 0. The perceptron ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 33
Provided by: dobbslabG
Category:
Tags: bcb | sigmoid

less

Transcript and Presenter's Notes

Title: BCB 444544


1
BCB 444/544
Lecture 25 Secondary Structure Prediction 25
Oct 21
  • Thanks to Drena Dobbs for many borrowed
    modified PPTs

2
Required Reading (before lecture)
  • Wed Oct 21 - for Lecture 25
  • Chp 14
  • Fri Oct 23 for Lecture 26
  • Chp 16

3
Homework Assignments
  • HW 4 posted
  • Due Monday, October 26th by 5pm

4
Required Reading
  • Yang Zhang (2008) Progress and challenges in
    protein structure prediction. Curr. Opin.
    Struct. Biol. 18342-348.

5
544 Projects
6
Exam II
  • Exam II will be next Friday, October 31st
  • More information coming soon

7
Chou and Fasman
  • Start by computing amino acids propensities to
    belong to a given type of secondary structure

Propensities gt 1 mean that the residue type I is
likely to be found in the Corresponding secondary
structure type.
8
The GOR method
Position-dependent propensities for helix, sheet
or turn is calculated for each amino acid. For
each position j in the sequence, eight residues
on either side are considered. A helix
propensity table contains information about
propensity for residues at 17 positions when the
conformation of residue j is helical. The helix
propensity tables have 20 x 17 entries. Build
similar tables for strands and turns. GOR
simplification The predicted state of AAj is
calculated as the sum of the position-dependent
propensities of all residues around AAj.
j
9
Consensus Data Mining (CDM)
  • Developed by Jernigan Group at ISU
  • Basic premise combination of 2 complementary
    methods can enhance performance by harnessing
    distinct advantages of both methods combines
    FDM GOR V
  • FDM - Fragment Data Mining - exploits
    availability of sequence-similar fragments in the
    PDB, which can lead to highly accurate prediction
    - much better than GOR V - for such fragments,
    but such fragments are not available for many
    cases
  • GOR V - Garnier, Osguthorpe, Robson V - predicts
    secondary structure of less similar fragments
    with good performance these are protein
    fragments for which FDM method cannot find
    suitable structures
  • For references additional details
    http//gor.bb.iastate.edu/cdm/

10
Neural networks
  • The most successful methods for predicting
    secondary structure are based on neural networks.
    The overall idea is that neural networks can be
    trained to recognize amino acid patterns in known
    secondary structure units, and to use these
    patterns to distinguish between the different
    types of secondary structure.
  • Neural networks classify input vectors or
    examples into categories (2 or more)
  • They are loosely based on biological neurons.

11
Biological Neurons
Dendrites receive inputs, Axon gives output Image
from Christos Stergiou and Dimitrios Siganos
http//www.doc.ic.ac.uk/nd/surprise_96/journal/v
ol4/cs11/report.html
12
Artificial Neuron Perceptron
Image from Christos Stergiou and Dimitrios
Siganos http//www.doc.ic.ac.uk/nd/surprise_96/j
ournal/vol4/cs11/report.html
13
The perceptron
X1
w1
T
w2
X2
wN
XN
Input
Threshold Unit
Output
The perceptron classifies the input vector X into
two categories. If the weights and threshold T
are not known in advance, the perceptron must be
trained. Ideally, the perceptron must be trained
to return the correct answer on all training
examples, and perform well on examples it has
never seen. The training set must contain both
type of data (i.e. with 1 and 0 output).
14
The perceptron
Notes - The input is a vector X and the
weights can be stored in another vector
W. - the perceptron computes the dot product S
X.W - the output F is a function of S it is
often set discrete (i.e. 1 or 0), in which case
the function is the step function. For
continuous output, often use a sigmoid
1
1/2
0
0
15
The perceptron
Training a perceptron Find the weights W that
minimizes the error function
P number of training data Xi training
vectors F(W.Xi) output of the perceptron t(Xi)
target value for Xi
Use steepest descent - compute gradient -
update weight vector - iterate
(e learning rate)
16
Biological Neural Network
Image from http//en.wikipedia.org/wiki/Biological
_neural_network
17
Artificial Neural Network
A complete neural network is a set of
perceptrons interconnected such that the
outputs of some units becomes the inputs of
other units. Many topologies are possible!
Neural networks are trained just like perceptron,
by minimizing an error function
18
Neural networks and Secondary Structure prediction
  • Experience from Chou and Fasman and GOR has shown
    that
  • In predicting the conformation of a residue, it
    is important to consider a window around it.
  • Helices and strands occur in stretches
  • It is important to consider multiple sequences

19
PHD Secondary structure prediction using NN
20
PHD Input
For each residue, consider a window of size 13
13x20260 values
21
Sequence-gtStructure Network
  • for each amino acid aj, a window of 13 residues
    aj-6ajaj6 is considered
  • The corresponding rows of the sequence profile
    are fed into the neural network, and the output
    is 3 probabilities for aj P(aj,alpha), P(aj,
    beta) and P(aj,other)

22
PHD Network 1 Sequence Structure
13x20 values
3 values
Network1
Pa(i) Pb(i) Pc(i)
23
Structure-gtStructure network
  • For each aj, PHD considers now a window of 17
    residues the probabilities P(ak,alpha),
    P(ak,beta) and P(ak,other) for k in j-8,j8 are
    fed into the second layer neural network, which
    again produces probabilities that residue aj is
    in each of the 3 possible conformation

24
PHD Network 2 Structure Structure
For each residue, consider a window of size 17
3 values
3 values
17x351 values
Network2
Pa(i) Pb(i) Pc(i)
Pa(i) Pb(i) Pc(i)
25
PHD
  • Jury system PHD has trained several neural
    networks with different training sets all neural
    networks are applied to the test sequence, and
    results are averaged
  • Prediction For each position, the secondary
    structure with the highest average score is
    output as the prediction

26
Secondary Structure Prediction Methods
  • 1st Generation methods
  • Ab initio - used relatively small dataset of
    structures available
  • Chou-Fasman - based on amino acid propensities
    (3-state)
  • GOR - also propensity-based (4-state)
  • 2nd Generation methods
  • based on much larger datasets of structures now
    available
  • GOR II, III, IV, SOPM, GOR V, FDM
  • 3rd Generation methods
  • Homology-based Neural network based
  • PHD, PSIPRED, SSPRO, PROF, HMMSTR, CDM
  • Meta-Servers
  • combine several different methods
  • Consensus Ensemble based
  • JPRED, PredictProtein, Proteus

27
Secondary Structure Prediction Servers
  • Prediction Evaluation?
  • Q3 score - of residues correctly predicted
    (3-state)
  • in cross-validation experiments
  • Best results? Meta-servers
  • http//expasy.org/tools/ (scroll for 2'
    structure prediction)
  • http//www.russell.embl-heidelberg.de/gtsp/secstru
    cpred.html
  • JPred www.compbio.dundee.ac.uk/www-jpred
  • PredictProtein http//www.predictprotein.org/
    Rost, Columbia
  • Best "individual" programs? ??
  • CDM http//gor.bb.iastate.edu/cdm/
    SenJernigan, ISU
  • FDM (not available separately as server)
    ChengJernigan, ISU
  • GOR V http//gor.bb.iastate.edu/
    KloczkowskyJernigan, ISU

28
Secondary Structure Prediction for Different
Types of Proteins/Domains
  • For Complete proteins
  • Globular Proteins - use methods previously
    described
  • Transmembrane (TM) Proteins - use special
    methods
  • (next slides)
  • For Structural Domains many under development
  • Coiled-Coil Domains (Protein interaction
    domains)
  • Zinc Finger Domains (DNA binding domains),
  • others

29
SS Prediction for Transmembrane Proteins
  • Transmembrane (TM) Proteins
  • Only a few in the PDB - but 30 of cellular
    proteins are membrane-associated !
  • Hard to determine experimentally, so prediction
    important
  • TM domains are relatively 'easy' to predict!
  • Why? constraints due to hydrophobic environment
  • 2 main classes of TM proteins
  • ??- helical
  • ?- barrel

30
SS Prediction for TM ?-Helices
  • ??-Helical TM domains
  • Helices are 17-25 amino acids long (span the
    membrane)
  • Predominantly hydrophobic residues
  • Helices oriented perpendicular to membrane
  • Orientation can be predicted using "positive
    inside" rule
  • Residues at cytosolic (inside or cytoplasmic)
    side of TM helix, near hydrophobic anchor are
    more positively charged than those on lumenal
    (inside an organelle in eukaryotes) or
    periplasmic side (space between inner outer
    membrane in gram-negative bacteria)
  • Alternating polar hydrophobic residues provide
    clues to interactions among helices within
    membrane
  • Servers?
  • TMHMM or HMMTOP - 70 accuracy - confused by
    hydrophobic signal peptides (short hydrophobic
    sequences that target proteins to the
    endoplasmic reticulum, ER)
  • Phobius - 94 accuracy - uses distinct HMM
    models for TM helices
  • signal peptide sequences

31
SS Prediction for TM ?-Barrels ?
  • ?-Barrel TM domains ?
  • ?-strands are amphipathic (partly hydrophobic,
    partly hydrophilic)
  • Strands are 10 - 22 amino acids long
  • Every 2nd residue is hydrophobic, facing lipid
    bilayer
  • Other residues are hydrophilic, facing "pore" or
    opening
  • Servers? Harder problem, fewer servers
  • TBBPred - uses NN or SVM
  • Accuracy ?

32
Prediction of Coiled-Coil Domains
  • Coiled-coils
  • Superhelical protein motifs or domains, with two
    or more interacting ?-helices that form a
    "bundle"
  • Often mediate inter-protein ( intra-protein)
    interactions
  • 'Easy' to detect in primary sequence
  • Internal repeat of 7 residues (heptad)
  • 1 4 hydrophobic (facing helical interface)
  • 2,3,5,6,7 hydrophilic (exposed to solvent)
  • Helical wheel representation - can be used
    manually detect these, based on amino acid
    sequence
  • Servers?
  • Coils, Multicoil - probability-based methods
  • 2Zip - for Leucine zippers special type of CC
    in TFs
  • characterized by Leu-rich motif
    L-X(6)-L-X(6)-L-X(6)-L
Write a Comment
User Comments (0)
About PowerShow.com