BCB 444544 Introduction to Bioinformatics - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

BCB 444544 Introduction to Bioinformatics

Description:

Madan Bhattacharyya (Agron, ISU) Isolation of Signaling Genes for Phytophthora ... SCOP = Structural Classification of Proteins ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 62
Provided by: drena1
Category:

less

Transcript and Presenter's Notes

Title: BCB 444544 Introduction to Bioinformatics


1
BCB 444/544 - Introduction to Bioinformatics
Lecture 29 Protein Structure Basics
Prediction 29_Nov1
2
Seminars in Bioinformatics/Genomics
  • Mon Oct 30
  • Madan Bhattacharyya (Agron, ISU) Isolation of
    Signaling Genes for Phytophthora Resistance in
    Soybean
  • IG Faculty Seminar 1210 PM in 101 Ind Ed II
  • Thurs Nov 2
  • Peter Clote (ComS, Boston Univ) Some New
    Aspects of Protein
  • RNA Structure
  • Baker Center Seminar 210 PM in Howe Hall
    Auditorium
  • Laura Dutca (Chem, ISU) Detailed analysis of
    E.coli primary r-Protein Interaction with 16rRNA
    Implications for RNP assembly
  • BBMB Seminar 410 PM in 1414 MBB
  • Fri Nov 3
  • Heather Greenlee (BMS, ISU) Decoding the
    Rod-Photoreceptor Differentiation Pathway
  • BCB Faculty Seminar 210 PM in W142 Lago
  • Eric Henderson (GDCB, ISU) Putting the "Bio" in
    Bionanotechnology
  • GDCB Seminar 410 PM in 1414 MBB

3
Assignments Reading This Week
  • Chp 8 Proteomics
  • More Machine Learning Algorithms
  • vMon Oct 30 Chp 8.1 Proteomics - Introduction
    Chp 8.2 Protein 3D Structure
  • vWed Nov 1 Chp 8.3 Protein Interaction Networks
  • Chp 8.4 Measuring Proteins
  • Thurs Nov 2 Lab 9 Attendance Seminar
    Feedback Form (immediately after seminar)
    required
  • Peter Clote 210 PM Howe Hall Auditorium New
    Aspects of Protein RNA Structure
  • Fri Nov 3 Machine Learning - more Algorithms
  • Support Vector Machines (SVMs)
  • Neural Networks (ANNs)

4
Assignments Events
Exam 2 Keys Grades posted today Exam returned
today after class BCB 444 544 HW5
Posted online yesterday (sorry) Due by Noon,
Mon Nov 6 BCB 544 Only Teams Projects
Any questions? 544Extra2 Due Mon
Nov 6
See updated Schedule (Oct 30) posted online
5
Protein Structure Function
  • Protein Structure
  • Amino acids characteristics
  • Structural classes motifs
  • Protein functional families
  • Classification
  • Databases
  • Visualization

6
Protein Structure Function
  • Protein structure - primarily determined by
    sequence
  • Protein function - primarily determined by
    structure
  • ( structure determines interactions with other
    molecules)
  • Globular proteins
  • have compact hydrophobic core hydrophilic
    surface
  • Membrane proteins
  • have special hydrophobic domains
  • often transmembrane (TM) helices

7
Protein Structure Function
  • Protein Folding?
  • Folded proteins are only marginally stable,
    because proteins must balance stability vs
    function
  • Intrinsically disordered some domains of
    proteins (or even entire proteins) that do not
    assume a stable "fold" until they are bound to
    their partner (protein, DNA, etc.)
  • Predicting protein structure and function can be
    very difficult -- but is increasingly important

8
4 Basic Levels of Protein Structure
9
Amino Acids
  • Each of 20 different amino acids has different
    "R-Group" side chain attached to Ca

10
Peptide bond is rigid and planar
11
Hydrophobic Amino Acids
12
Charged Amino Acids
13
Polar Amino Acids
14
Certain side-chain configurations are
energetically favored (rotamers)
Ramachandran plot "Allowable" psi phi angles
15
Glycine is smallest amino acidR group H atom
  • Glycine residues increase backbone flexibility
    because they have no R group

16
Proline is cyclic
  • Proline residues reduce flexibility of
    polypeptide chain
  • Proline cis-trans isomerization is often a
    rate-limiting step in protein folding
  • Recent work suggests it also regulates ligand
    binding in native proteins -Andreotti

17
Cysteines can form disulfide bonds
  • Disulfide bonds (covalent) stabilize
  • 3-D structures
  • In eukaryotes, disulfide bonds are found only in
    secreted proteins or extracellular domains

18
Globular proteins have a compact hydrophobic core
  • Packing of hydrophobic side chains into interior
    is main driving force for folding
  • Problem? Polypeptide backbone is highly polar
    (hydrophilic) due to polar -NH and CO in each
    peptide unit these polar groups must be
    neutralized
  • Solution? Form regular secondary structures,
  • e.g., ?-helix, b-sheet, stabilized by H-bonds

19
Exterior surface of globular proteins is
generally hydrophilic
  • Hydrophobic core formed by packed secondary
    structural elements provides compact, stable core
  • "Functional groups" of protein are attached to
    this framework exterior has more flexible
    regions (loops) and polar/charged residues
  • Hydrophobic "patches" on protein surface are
    often involved in protein-protein interactions

20
Secondary Structural Elements
  • ??Helix
  • ?? Sheets
  • Loops
  • Coils

21
?a?- Helix
  • Most abundant 2' structure in proteins
  • Average length 10 aa's (10 Angstroms)
  • Length varies from 5-40 aa's
  • Alignment of H-bonds creates dipole moment
    (positive charge at NH end)
  • Often at surface of core, with hydrophobic
    residues on inner-facing side, hydrophilic on
    other side

22
??helix is stabilized by H-bonds between every
4th residue
C black O red N blue
23
R-groups are on outside of ??helix
24
Types of ??helices
  • "Standard" ??helix 3.6 residues per turn
  • H-bonds between C0 of residue n and
  • NH of residue n 4
  • Helix ends are polar almost always on surface of
    protein
  • Other types of helices?
  • n 5 ? helix
  • n 3 310 helix

25
Certain amino acids are "preferred" others are
rare in ??helices
  • Ala, Glu, Leu, Met good helix formers
  • Pro, Gly, Tyr, Ser poor
  • Amino acid composition distribution varies,
    depending on location of helix in 3-D structure

26
??-Strands Sheets
  • H-bonds formed between 5-10 consecutive residues
    in one portion of chain with another
  • set of 5-10 residues farther down chain
  • Interacting regions may be adjacent (with short
    loop between) or far apart
  • ?-sheets usually have all strands either parallel
    or antiparallel

27
Antiparallel???-sheet
28
Antiparallel???-sheet
29
Parallel???-sheet
30
Mixed??-Sheets also occur
31
?Loops
  • Connect helices and sheets
  • Vary in length and 3-D configurations
  • Are located on surface of structure
  • Are more "tolerant" of mutations
  • Are more flexible and can adopt multiple
    conformations
  • Tend to have charged and polar amino acids
  • Are frequently components of active sites
  • Some fall into distinct structural families
  • (e.g., hairpin loops, reverse turns)

32
Coils
  • Regions of 2' structure that are not helices,
    sheets, or recognizable turns
  • Intrinsically disordered regions appear to play
    important functional roles

33
Globular proteins are built from recurring
structural patterns
  • Motif or supersecondary structure
  • combination of 2' structural elements
  • Domain combination of motifs
  • Independently folding unit (foldon)
  • Functional unit

34
A few common structural motifs
  • Helix-turn-helix e.g., DNA binding
  • Helix-loop-helix e.g., Calcium binding
  • ?b-hairpin 2 adjacent antiparallel strands
  • connected by short loop
  • Greek key 4 adjacent antiparallel strands
  • b?a-b 2 parallel strands connected by helix

35
H-T-H H-L-H
36
?b-hairpin
37
Greek key
38
Beta-alpha-beta
39
Simple motifs combine to form domains
40
Large polypeptide chains fold into several domains
41
6 main classes of protein structure
  • 1) a Domains
  • Bundles of helices connected by loops
  • 2) ? Domains
  • Mainly antiparallel sheets, usually with 2 sheets
    forming sandwich
  • 3) a????Domains
  • Mainly parallel sheets with intervening helices,
    also mixed sheets
  • 4) ??? a????Domains
  • Mainly segregated helices and sheets
  • 5) Multidomain ?????????
  • Containing domains from more than one class
  • 6) Membrane cell-surface proteins

42
?-domain structures coiled-coils
43
?-domain structures 4-helix bundles
44
All-? proteins Globins
45
?-domain structures
  • Anti-parallel ? structures
  • Functionally most diverse
  • Includes
  • Up-and-down sheets or barrels
  • Propeller-like structures
  • Jelly roll barrels (from Greek key motifs)

46
Up-and-down sheets and barrel
47
Up-and-down sheets can form propeller-like
structures
48
Greek key motifs can form jelly roll barrels
49
a??-domain structures
  • 3 main classes
  • TIM barrel Core of twisted parallel strands
    close together
  • Rossman fold open twisted sheet surrounded by
    helices on both sides
  • Leucine-rich motif specific pattern of Leu
    residues, strands form a curved sheet with
    helices on outside

50
TIM barrel Rossman fold
51
Leucine rich motifs can form a???horseshoes
52
Protein structure databases visualization
software
  • PDB Protein Data Bank
  • http//www.rcsb.org/pdb/
  • (RCSB) - several structure viewers
  • MMDB Molecular Modeling Database
  • http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?db
    Structure
  • (NCBI Entrez) - Cn3D viewer
  • MSD Molecular Structure Database
  • http//www.ebi.ac.uk/msd
  • Especially good for interactions, binding sites
  • Other visualization tools PyMol JMol

53
Protein structure classification
  • SCOP Structural Classification of Proteins
  • Levels reflect both evolutionary and structural
    relationships
  • CATH Classification by Class, Architecture,
    Topology and Homology
  • DALI/FSSP
  • Fully automated structure alignments

For links discussion, comparisons of these,
see http//pdomains.rcsb.org/pdomains/index.php
54
Protein sequence databases
  • UniProt (SwissProt, PIR, EBI)
  • http//www.pir.uniprot.org
  • NCBI Protein http//www.ncbi.nlm.nih.gov/entrez/
    query.fcgi?dbProtein

55
Protein sequence structure analysis
  • Diamond STING Millennium - many useful structure
    analysis tools, including Protein Dossier
    http//trantor.bioc.columbia.edu/SMS/
  • SwissProt (UniProt)
  • knowledgebase
  • http//us.expasy.org/sprot
  • InterPRO
  • sequence analysis tools
  • http//www.ebi.ac.uk/interpro

56
Structural Genomics
  • 20,000 "traditional" genes in human genome
  • (not including miRNAs, etc.)
  • 3,000 proteins in a typical cell
  • gt 3 million sequences in UniProt
  • 40,000 protein structures in the PDB
  • Experimental determination of protein structure
    lags far behind sequence determination!
  • Goal of Structural Genomics Determine structures
    of "all" protein folds in nature, using
    combination of experimental structure
    determination methods (X-ray crystallography,
    NMR, mass spectrometry) structure prediction

57
Structural Genomics Projects
TargetDB database of structural genomics
targets http//targetdb.pdb.org
Protein Structure Prediction?
58
Protein Folding
  • "Major unsolved problem in molecular biology"
  • In cells spontaneous
  • assisted by enzymes
  • assisted by chaperones
  • In vitro many proteins fold spontaneously
  • many do not!

59
Steps in Protein Folding
  • 1- "Collapse"- driving force is burial of
    hydrophobic aas
  • (fast - msecs)
  • 2- Molten globule - helices sheets form, but
    "loose"
  • (slow - secs)
  • 3- "Final" native folded state - compaction, some
    2' structures
    rearranged
  • Native state? - assumed to be lowest free energy
  • - may be an ensemble of structures

60
Protein Dynamics
  • Protein in native state is NOT static!
  • Function of many proteins depends on
    conformational changes, sometimes large,
    sometimes small
  • Recall
  • Globular proteins are inherently "unstable"
  • (most proteins have NOT evolved for maximum
    stability)
  • Energy difference between native and denatured
    state is very small (5-15 kcal/mol)
  • (this is equivalent to 1 or 2 H-bonds!)
  • So Assumption of prediction methods that lowest
    free energy structure is "native" doesn't help a
    lot! There may be many "decoy structures" with
    very similar "energy" scores

61
Protein Structure Prediction
  • Structure is largely determined by sequence
  • BUT
  • Similar sequences can assume different
    structures
  • Dissimilar sequences can assume similar
    structures
  • Many proteins are multi-functional
  • 2 Major Protein Folding Problems
  • 1- Determination of folding pathway
  • 2- Prediction of tertiary structure from
    sequence
  • Both still largely unsolved problems
Write a Comment
User Comments (0)
About PowerShow.com