Software Can be Grouped into Two General Classes: - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Software Can be Grouped into Two General Classes:

Description:

Visualize Protein Structures ... Protein Data Bank (PDB) is the closest thing to a uniformed format ... HEADER DNA BINDING PROTEIN 08-SEP-01 1JXS ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 71
Provided by: Robert80
Category:

less

Transcript and Presenter's Notes

Title: Software Can be Grouped into Two General Classes:


1
Software for Protein Structures by NMR
  • Software Can be Grouped into Two General Classes
  • Protein Based Programs
  • Calculate Protein Structures
  • XPLOR (NIH, CNS,CXS), DYANA, CHARMM, Sybyl,
    Amber, etc.
  • Visualize Protein Structures
  • Quanta, Insight II, VMD-XPLOR, RasMol, Chimera,
    MOLMOL, MolScript etc
  • Evaluate Protein Structures
  • PROCHECK, MOLProbity, PROSA, WHATIF, Verify3D,
    etc
  • NMR Based Programs
  • NMR data processing
  • NMRPipe, ACD/NMR, Felix
  • NMR data analysis/visualization
  • NMRDraw, NMRViewJ, PIPP, SPARKY, XEASY
  • Iterative Relaxation Matrix Calculations
  • IRMA, CORMA, MARDIGRAS, XPLOR, MORASS, etc
  • Automated NMR Analysis
  • AutoAssign, AutoStructure, ARIA, PINE, CANDID,
    GARRANT, CS-ROSETTA, etc
  • Not A complete List of Software
  • New software is constantly being developed

2
Software for Protein Structures by NMR
  • Protein NMR Based Software Programs
  • There are multiple programs that have similar
    functions.
  • Not practical or necessary to discuss all the
    variety of programs that are available.
  • Applications will be discussed in general with
    specific references to a limited number
  • of programs.
  • Protein Based Programs Visualize Protein
    Structures
  • How is the protein structure stored?
  • No uniform format.
  • Protein Data Bank (PDB) is the closest thing to
    a uniformed format
  • Most programs can read and/or write PDB file
    formats
  • Just about every program has its own proprietary
    format
  • Babel program can interconvert 47 different
    structure formats
  • Common Information in a protein structure
  • atoms, residues, chains
  • X, Y, Z coordinates

3
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Protein Data Bank (PDB) format
  • Header

Unique PDB Identifier
Protein Name
Submission Date
HEADER DNA BINDING PROTEIN 08-SEP-01 1JXS
TITLE SOLUTION STRUCTURE OF THE DNA-BINDING
DOMAIN OF INTERLEUKIN TITLE 2 ENHANCER BINDING
FACTOR COMPND MOL_ID 1 COMPND 2 MOLECULE
INTERLEUKIN ENHANCER BINDING FACTOR COMPND 3
CHAIN A COMPND 4 FRAGMENT DNA-BINDING DOMAIN
COMPND 5 SYNONYM ILF-1 COMPND 6 ENGINEERED
YES SOURCE MOL_ID 1 SOURCE 2
ORGANISM_SCIENTIFIC HOMO SAPIENS SOURCE 3
ORGANISM_COMMON HUMAN SOURCE 4 GENE ILF-1
SOURCE 5 EXPRESSION_SYSTEM ESCHERICHIA COLI
SOURCE 6 EXPRESSION_SYSTEM_COMMON BACTERIA
SOURCE 7 EXPRESSION_SYSTEM_STRAIN BL21 SOURCE
8 EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID SOURCE
9 EXPRESSION_SYSTEM_PLASMID PET21A KEYWDS
DNA-BINDING DOMAIN, WINGED HELIX EXPDTA NMR, 20
STRUCTURES AUTHOR W.J.CHUANG,P.P.LIU,C.LI,Y.H.HSI
EH,S.W.CHEN,S.H.CHEN,W.Y.JENG REVDAT 1 11-MAR-03
1JXS 0 JRNL AUTH P.P.LIU,Y.C.CHEN,C.LI,Y.H.HSIEH,
S.W.CHEN,S.H.CHEN, JRNL AUTH 2
W.Y.JENG,W.J.CHUANG JRNL TITL SOLUTION STRUCTURE
OF THE DNA-BINDING DOMAIN OF JRNL TITL 2
INTERLEUKIN ENHANCER BINDING FACTOR 1 (FOXK1A)
JRNL REF PROTEINS V. 49 543 2002 JRNL REF 2
STRUCT.,FUNCT.,GENET.
Descriptive Title of Structure
All Compounds Present
Source of Sample
Authors
Publication Information
4
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Protein Data Bank (PDB) format
  • Header

REMARK 210 EXPERIMENTAL DETAILS REMARK 210
EXPERIMENT TYPE NMR REMARK 210 TEMPERATURE
(KELVIN) 300 300 300 300 REMARK 210 PH 6
6 6 6 REMARK 210 IONIC STRENGTH 125 125
125 125 REMARK 210 PRESSURE AMBIENT AMBIENT
AMBIENT REMARK 210 AMBIENT REMARK 210 SAMPLE
CONTENTS 3MM ILF, 25MM PHOSPHATE REMARK 210
BUFFER, 100MM NACL 3MM ILF, REMARK 210 25MM
PHOSPHATE BUFFER, 100MM REMARK 210 NACL 3MM ILF
U-15N, 25MM REMARK 210 PHOSPHATE BUFFER, 100MM
NACL REMARK 210 2MM ILF U-15N, 13C, 25MM
REMARK 210 PHOSPHATE BUFFER, 100MM NACL REMARK
210 REMARK 210 NMR EXPERIMENTS CONDUCTED
NOESY, DQF-COSY, TOCSY, 3D_ REMARK 210
15N-SEPARATED_NOESY, 3D_13C- REMARK 210
SEPARATED_NOESY REMARK 210 SPECTROMETER FIELD
STRENGTH 600 MHZ, 500 MHZ REMARK 210
SPECTROMETER MODEL AVANCE, DMX REMARK 210
SPECTROMETER MANUFACTURER BRUKER REMARK 210
REMARK 210 STRUCTURE DETERMINATION. REMARK 210
SOFTWARE USED AURELIA 2.7.10, XWINNMR 2.6
REMARK 210 METHOD USED HYBRID DISTANCE
GEOMETRY- REMARK 210 HBHA(CBCACO)NH
Description of Experimental Data
. . .
5
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Protein Data Bank (PDB) format
  • Header

REMARK 900 RELATED ENTRIES REMARK 900 RELATED
ID 4829 RELATED DB BMRB REMARK 900 1H, 15N AND
13C RESONANCE ASSIGNMENTS FOR THE DNA-BINDING
REMARK 900 DOMAIN OF INTERLEUKIN ENHANCER BINDING
FACTOR DBREF 1JXS A 1 98 SWS Q01167 ILF1_HUMAN
251 348 SEQRES 1 A 98 ASP SER LYS PRO PRO TYR
SER TYR ALA GLN LEU ILE VAL SEQRES 2 A 98 GLN
ALA ILE THR MET ALA PRO ASP LYS GLN LEU THR LEU
SEQRES 3 A 98 ASN GLY ILE TYR THR HIS ILE THR
LYS ASN TYR PRO TYR SEQRES 4 A 98 TYR ARG THR
ALA ASP LYS GLY TRP GLN ASN SER ILE ARG SEQRES 5
A 98 HIS ASN LEU SER LEU ASN ARG TYR PHE ILE LYS
VAL PRO SEQRES 6 A 98 ARG SER GLN GLU GLU PRO
GLY LYS GLY SER PHE TRP ARG SEQRES 7 A 98 ILE
ASP PRO ALA SER GLU SER LYS LEU ILE GLU GLN ALA
SEQRES 8 A 98 PHE ARG LYS ARG ARG PRO ARG HELIX
1 1 ALA A 9 MET A 18 1 10 HELIX 2 2 THR A 25 TYR
A 37 1 13 HELIX 3 3 TRP A 47 ASN A 58 1 12
HELIX 4 4 SER A 83 ARG A 93 1 11 SHEET 1 A 3
GLN A 23 LEU A 24 0 SHEET 2 A 3 PHE A 76 ILE A
79 -1 O TRP A 77 N LEU A 24 SHEET 3 A 3 PHE A 61
VAL A 64 -1 N VAL A 64 O PHE A 76 CRYST1 1.000
1.000 1.000 90.00 90.00 90.00 P 1 1 ORIGX1
1.000000 0.000000 0.000000 0.00000 ORIGX2
0.000000 1.000000 0.000000 0.00000 ORIGX3
0.000000 0.000000 1.000000 0.00000 SCALE1
1.000000 0.000000 0.000000 0.00000
Reference to Data in other Databases
Protein Sequence
Observed Secondary Structure Elements
Meaningless symmetry data (consistency with X-ray
structures)
. . .
6
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Protein Data Bank (PDB) format
  • Coordinates

Atom Type
Residue Type
Temperature Factor
Atom No.
Occupancy
Residue No.
Model Number (NMR structures typically Will have
multiple models in a single PDB file
Atom Identifier
. . .
X, Y, Z coordinates
Chain (structures composed of multiple proteins
will have a different chain for each protein)
Identifier (4 characters)
7
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Protein Data Bank (PDB) format
  • Coordinates
  • Other Features

End of Model
. . .
End of File
HETATM Identifier (non-protein atoms Small
molecules, ions, solvent, water etc)
Define Specific Atom Connectivity
N-Terminal NH (NH3 instead of NH)
C-Terminal O (sometimes OXT1 OXT2)
8
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Protein Data Bank (PDB) format
  • Coordinates
  • Are internally consistent i.e. the X,Y,Z
    coordinates of atom A is the appropriate bond
    distance away from the X,Y,Z coordinates of atom
    B.
  • The coordinates on an absolute scale are
    arbitrary i.e. there is no defined relationship
    between the coordinates of protein A and protein
    B, even if protein A and protein B are multiple
    copies of the same protein.
  • Alignment Issue
  • Proteins need to be aligned for any structural
    comparison
  • After alignment, can visually compare relative
    orientation/position of secondary structures,
    active-sites, bound ligands, position of
    side-chains, etc
  • After alignment, relative distance comparisons
    have meaning i.e. if 2 helix do not overlap
    perfectly a measured displacement of the helices
    is relevant
  • Alignment requires both rotational and
    translational transformation of one coordinate
    axis relative to the other.
  • one protein is remained fixed and the other
    protein(s) are aligned to it

Y
Protein A
Relative position of the 2 proteins in the X,Y,Z
coordinate system is arbitrary.
The 2 proteins are now centered in the same
coordinate frame.
Align
Protein B
X
Z
9
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Different Ways to Visualize the Same Protein
    Structure
  • Lines/Sticks
  • Connect each atom coordinate position by a
    straight line
  • Bond colored by atom type where ½ of bond
    corresponds to atom 1 and the other ½ to atom 2
  • Accurate representation of atom position
  • Poor representation of protein packing
  • Crowded
  • Reduce complexity by only displaying backbone or
    specific regions
  • Reduce complexity by zooming in on particular
    region

10
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Different Ways to Visualize the Same Protein
    Structure
  • BallStick
  • Connect each atom coordinate position by a
    straight line
  • Display each atom as a sphere
  • Accurate representation of atom position
  • poor representation of protein packing
  • Crowded
  • Reduce complexity by only displaying backbone or
    specific regions
  • Reduce complexity by zooming in on particular
    region

11
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Different Ways to Visualize the Same Protein
    Structure
  • Ribbons/Cartoon
  • Connect each Ca atom coordinate position by a
    graphical representation
  • Smooth-Fit of Ca positions
  • Not accurate representation of atom coordinates
  • Reduces Complexity of View ?No Side-chains,
    usually only backbone
  • Highlights secondary structure
  • b-strands typically shown as arrow pointing in
    direction of C-terminus
  • a-helix shown as a thick helical coil
  • random coil regions shown as tube
  • Highlights Overall fold and topology
  • Easy Comparison of Fold Families

12
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Different Ways to Visualize the Same Protein
    Structure
  • Space Filling/van der Waals
  • Each atom position represented by a sphere
  • diameter of sphere is equal to van der Waals
    radius
  • very accurate representation of protein
  • Highlights surface structure
  • identify binding pockets
  • can not visualize interior of protein without
    slicing through structure
  • Highlights packing
  • verify absence of holes in structure
  • verify tight packing of different domains, small
    molecule in binding pocket, etc

Colored coded by domain
Space Filling emphasizes hole or channel in
protein
van der Waals radii (in Å)
13
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Different Ways to Visualize the Same Protein
    Structure
  • GRASP
  • Generates a smooth topology or shape of the
    proteins surface
  • Highlights detailed surface structure
  • identify binding pockets
  • can not visualize interior of protein without
    slicing through structure
  • Can Map properties of the protein onto the
    surface
  • electrostatic
  • NMR chemical shift changes
  • NMR Dynamics X-ray B-factors
  • Conserved Residues from Sequence Alignment

GRASP surface of acetyl choline esterase
complexed with acetyl choline colored by
potential (red negative, blue positive)
GRASP surface of MMP-1 displaying NMR chemical
shift changes upon binding an inhibitor
14
Software for Protein Structures by NMR
  • Protein Based Programs Visualize Protein
    Structures
  • Demos using
  • Rasmol
  • VMD
  • Chimera

15
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • Compare to known Structures
  • All Structures have Problems or Errors as
    determined by software analysis
  • The challenge is to determine which, if any,
    errors are serious misinterpretation of the data
    and require correcting.
  • Three general rules of thumb
  • If the error is sever, far outside the norm, it
    is probably a mistake.
  • If errors cluster together, there is almost
    certainly a mistake.
  • If the structure has an odd conformation
  • knot, large holes, p-helix, f for non-Gly, etc.

Remember The comparison is made against typical
structures, your error may simply represent a
novel fold or conformation that has not been
seen. Let the Data Determine the Structure
16
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • Compare a new protein structure against
    standard parameters or values
  • standard values or trends are ascertained from
    analysis of high quality, high resolution
    structures in the PDB
  • typical features as we discussed in the
    introduction to protein structures
  • PROCHECK
  • A common program used by PDB to validate
    deposited structures
  • Assesses the "stereochemical quality" of a given
    protein structure
  • reads a PDB formatted file
  • generates 10 output postscript files
  • analyzes f, y, c1,c2 torsion angles, bond
    lengths bond angles
  • analyzes bad contacts atoms too close by van
    der waals radius
  • analyzes hydrogen bond energy
  • analyzes G-factor
  • Provides overall and per residue analyses
  • Identifies distorted geometry
  • To run the program
  • procheck filename chain resolution
  • where filename the coordinates file in
    Brookhaven format chain an optional
    one-letter chain-ID resolution a real number
    giving the resolution of the structure
  • Compares bond lengths and bond angles to
    database of standard small molecule values

17
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • PROCHECK
  • correct f, y distribution
  • most residues should fall in the most favored
    region of Ramachandran plot

Red contours indicate preferred region of the
Ramachandran plot
Colored contours indicate allowed regions of the
Ramachandran plot
18
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • PROCHECK
  • correct f, y, c1,c2 distribution as a function
    of residue type
  • most residues should fall in the preferred
    region of the
  • Ramachandran plots

Dark contours are preferred regions
19
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • PROCHECK
  • comparison of main chain parameters to standard
    values of comparable X-ray structures
  • consistent or better results with a comparable
    resolution structure implies a reliable structure

Value observed for structure at specified
resolution. Inside band indicates it is
consistent with other similar resolution
structures
Boxed Plot is Overall G-factor or Structure
Quality Score
Band indicates range of values observed as a
function X-ray resolution
20
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • PROCHECK
  • comparison of side chain parameters to standard
    values of comparable X-ray structures
  • consistent or better results with a comparable
    resolution structure implies a reliable structure

Value observed for structure at specified
resolution. Inside band indicates it is
consistent with other similar resolution
structures
Band indicates range of values observed as a
function X-ray resolution
21
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • PROCHECK
  • Complete list of structure violations
  • Per residue plot of main chain and side-chain
    parameters
  • Number of plots of statically summaries of
    parameters

22
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • MOLPROBITY
  • Provides a variety of protein structure checks
    by comparison to standard values in PDB
  • Some overlap with Procheck
  • Some unique checks including clashes and
    structure visualization

100th percentile is the best among structures
of comparable resolution 0th percentile is the
worst.
23
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • MOLPROBITY
  • Multi-criterion chart
  • per residue analysis of all problems

24
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • MOLPROBITY
  • Multi-criterion kinemage
  • view all problems

Bad rotamer
Bad backbone conformation
Choose what to display
Bad clash
25
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • MOLPROBITY
  • Single-criterion files
  • view all problems
  • Clash list
  • Ramachandran plot kinemage
  • Ramachandran plot PDF
  • Cß deviation scatter plot

Clash List Atom Pair
Distance
26
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • MOLPROBITY
  • Single-criterion files

27
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • MOLPROBITY
  • Single-criterion files
  • view all problems
  • Cß deviation scatter plot

28
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • Verify3D
  • Compares the primary sequence against the
    proteins 3D structure
  • Compares each residues position to statistical
    distribution of the 20 amino acids against
    defined structural environments.
  • based on the total area buried and fraction of
    side-chain area covered by polar atoms

Structure Environments
29
Software for Protein Structures by NMR
Buried Hydrophobic Environment
Exposed Hydrophilic Environment
3D-1D Scoring Table
30
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • Verify3D
  • Example scoring function on a per residue basis

Actual X-ray structure
Incorrect modeled structure
31
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • ProSA-Web
  • Overall model quality (Z-score)
  • compare to typical range for known NMR and X-ray
    structures
  • calculate energy for all Ca-Ca or Cb-Cb
    interactions
  • generate collection of decoy folds (50,0000) by
    using database of sequence/structure fragments
  • thus, correct fold will have low energy low
    Z-score relative to decoy structures
  • length dependent

Protein analyzed
32
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • ProSA-Web
  • Model energy as a function of amino acid
    seqeunce
  • positive values correspond to problematic
    regions
  • single-value has large fluctuation and is of
    little value
  • averaged over a window of 40 (dark) and 10
    (light) residues

Visualize the per residue energy on the structure
(identify problematic regions)
Reliable Structure (no strain energy)
33
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • WHATIF/WHATCHECK
  • Provides a variety of protein structure checks
    by comparison to standard values in PDB
  • Some overlap with Procheck
  • Some unique checks including packing parameters
  • Unique to WHATIF/WHATCHECK
  • Check for buried unsatisfied h-bond donors and
    acceptors
  • Peptide bond flip check
  • Check for amino-acid handedness
  • HIS GLN ASN side chain conformation check
  • Check for atom nomenclature
  • Side chain planarity check
  • Verification of Proline puckering
  • New Directional atomic contact analysis
  • Directional atomic contact analysis
  • Particular to X-ray Structures
  • Check for isolated water clusters
  • Atomic occupancy check
  • Symmetry check
  • Chain Name Validation

  • Similar to Procheck
  • Verification of bond lengths
  • Check for bumps (bad contacts)
  • Amino-acid side chain rotamer analysis
  • Torsion angle evaluation

34
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • WHATIF/WHATCHECH

Protein Packing Report
Warning Low packing Z-score for some
residues The residues listed in the table below
have an unusual packing environment according to
the 2nd generation quality check. The score
listed in the table is a packing normality
Z-score positive means better than average,
negative means worse than average. Only residues
scoring less than -2.50 are listed here. These
are the "unusual" residues in the structure, so
it will be interesting to take a special look at
them. 137 LYS ( 10 ) B -3.43 136
LYS ( 9 ) B -3.11 30 GLN ( 40 )
A -3.08 218 GLU ( 91 ) B -2.84
158 VAL ( 31 ) B -2.83 240 LYS (
113 ) B -2.59 231 GLU ( 104 ) B
-2.52 Warning Abnormal packing Z-score for
sequential residues A stretch of at least four
sequential residues with a 2nd generation packing
Z-score below -1.75 was found. This could
indicate that these residues are part of a
strange loop or that the residues in this range
are incomplete, but it might also be an
indication of mis-threading. The table below
lists the first and last residue in each stretch
found, as well as the average residue Z-score of
the series. 134 ASN ( 7 ) B ---
137 LYS ( 10 ) B -2.65 Warning
Structural average packing Z-score a bit
worrisome The structural 2nd generation average
quality control value is a bit low. The protein
is probably threaded correctly, but either poorly
refined, or it is just a protein with an unusual
(but correct) structure. The average quality of
properly refined X-ray structures is 0.0/-1.0.
All contacts Average -0.589 Z-score
-3.74 BB-BB contacts Average -0.178 Z-score
-1.27 BB-SC contacts Average -0.574
Z-score -3.07 SC-BB contacts Average
-0.240 Z-score -1.29 SC-SC contacts Average
-0.563 Z-score -2.79
35
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • WHATIF/WHATCHECH Packing Score
  • For each "fixed fragment" in a protein structure
    (any "largest group" of atoms that does not
    contain a torsion angle)
  • the occurrence of all possible atom types in all
    possible positions around the fixed fragment is
    counted.
  • If a certain configuration occurs very
    frequently, it is assumed to be a preferred
    configuration.
  • All preference counts for all atoms around a
    residue are used to calculate a summary score for
    each residue.
  • Quality control score for each residue is a
    Z-score
  • Describes how well this residue feels compared
    to other similar residues in well refined
    structures.
  • If the residue Z-score is negative, it feels
    less at home than the "average" residue.
  • If the Z-score is positive, it feels more at
    home than average.
  • The individual scores are not very powerful.
  • A lot of structures have a few low-scoring
    residues.
  • More useful are
  • list of sequential residues that all have low
    scores (possibly indicating a mis-threaded
    segment),
  • overall quality control Z-score
  • Impact on modeling by homology
  • Severe.
  • If a structure has a bad quality control
    Z-score, it can not be trusted.

36
Software for Protein Structures by NMR
  • Protein Based Programs Evaluate Protein
    Structures
  • WHATIF/WHATCHECH

Buried hydrogen bond donors and acceptors are
not involved in a hydrogen bond
The pairs of atoms listed have an unusually short
distance.
9 GLY ( 19 ) A N 11 TYR ( 21 ) A N 15
ILE ( 25 ) A O 29 ASP ( 39 ) A O 30 GLN (
40 ) A O 31 HIS ( 41 ) A ND1 32 ILE ( 42
) A N 33 GLN ( 43 ) A N 39 GLU ( 49 ) A
O 48 SER ( 58 ) A O 60 ASP ( 70 ) A N
62 LEU ( 72 ) A N 74 LEU ( 84 ) A N 81
GLU ( 91 ) A O 84 TYR ( 94 ) A N 92 HIS
( 102 ) A NE2 101 LEU ( 111 ) A O
45 TYR ( 55) A CZ -- 74 LEU ( 84) A CD1
0.479 2.721 INTRA 78 ARG ( 88) A CD -- 86
THR ( 96) A CG2 0.391 2.809 INTRA 109 LEU (
119) A O -- 110 GLY ( 120) A C 0.375 2.425
INTRA 110 GLY ( 120) A N -- 111 PRO ( 121) A
CD 0.365 2.635 INTRA 131 PRO ( 4) B O --
133 GLY ( 6) B N 0.358 2.192 INTRA BF 39 GLU
( 49) A O -- 40 SER ( 50) A CB 0.349 2.451
INTRA 109 LEU ( 119) A C -- 111 PRO ( 121) A
CD 0.340 2.860 INTRA 163 ASP ( 36) B O --
165 SER ( 38) B N 0.328 2.372 INTRA 114 HIS (
124) A O -- 115 PHE ( 125) A C 0.328 2.472
INTRA 165 SER ( 38) B O -- 166 ASP ( 39) B C
0.303 2.497 INTRA 98 PHE ( 108) A CB -- 120
ILE ( 130) A CG1 0.297 2.903 INTRA 132 LEU (
5) B O -- 133 GLY ( 6) B C 0.296 2.504
INTRA BF 246 LEU ( 119) B O -- 247 GLY ( 120) B
C 0.295 2.505 INTRA 113 THR ( 123) A CB --
120 ILE ( 130) A CD1 0.286 2.914 INTRA 131 PRO (
4) B O -- 132 LEU ( 5) B C 0.282 2.518
INTRA BF 151 ARG ( 24) B NH1 -- 153 LEU ( 26) B
CD2 0.278 2.822 INTRA 81 GLU ( 91) A C --
83 GLY ( 93) A N 0.277 2.623 INTRA 96 HIS (
106) A CD2 -- 216 LEU ( 89) B CD2 0.255 2.945
INTRA
. . .
. . .
37
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • Comparison of XPLOR and DYANA
  • XPLOR
  • Also known as XPLOR-NIH, CNS and CNX
  • Calculates structures using Cartesian
    coordinates
  • Uses a modified PDB file format
  • Optimizes
  • Number of specific Target Functions to refine
    protein structure
  • 1H -1H distance (NOEs)
  • Chemical shifts (both 13C 1H)
  • Coupling constants (3JNHCa)
  • Ramachandran database
  • Empirical Backbone-Backbone Hydrogen-Bonding
    Potential
  • Radius of Gyration
  • Residual Dipolar Coupling Constants
  • DYANA/CYANA
  • Dynamics geometry Algorithm for NMR Applications
  • Calculates structures using Torsional Space
  • Bond lengths and bond angles are kept fixed only
    torsion angles

38
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • First Step is Determining a Molecular Structure
    File for Your Specific Protein Sequence
  • Molecular Structure File (PSF)
  • Contains all the information to describe the
    connectivity of the protein
  • Contains atom/residue information (names, types,
    charges masses, etc.)
  • Contains structure terms (bond, angle, dihedral,
    improper, etc.)
  • Does not contain atomic coordinates!
  • Information is obtained from two standard
    databases
  • Topallhdg_new.pro
  • - connectivity information for each amino acid
  • - need to define topology for ALL non-amino
    acids
  • Parallhdg_new.pro
  • - defines expected values for bond lengths, bond
    angles, etc
  • PSF patches
  • define disulphide bonds
  • define cis peptide bonds
  • PSF file is required for ALL XPLOR calculations
  • PSF file must match exactly all the information
    in the structure or

39
Software for Protein Structures by NMR
  • An Example
  • You want to compare your NMR structure with an
    X-ray structure you obtained from the PDB
  • X-ray structure
  • - does not contain hydrogens.
  • - There is a loop that doesnt have coordinates
    (no electron density)
  • - The structure contains a number of water
    molecules and detergent molecules
  • - Identifiers are 1PDB, WAT, DET
  • NMR structure
  • - has a His-tag at the C-terminus (aid in
    purification)
  • - has three additional residues at the
    N-terminus (artifact of the cloning process)
  • - the residue numbering start at 1 instead of
    185 in the X-ray structure
  • - Identifier is the atom type (C,H,N,O)
  • Your PSF file is consistent with your NMR
    structure, so XPLOR will give numerous errors
    when you try to read both the NMR and X-ray
    coordinate files. What are your options?
  • 1) Make the X-ray coordinate file exactly match
    the NMR coordinate file
  • - add hydrogens
  • - add dummy coordinates for the missing loop
    region
  • - remove all the water molecules and detergent
    molecules
  • - change identifier
  • 2) Make the NMR coordinate file exactly match
    the X-ray coordinate file and create a

40
Software for Protein Structures by NMR
mass H 1.008mass C 12.011mass N
14.007mass O 15.999 residue ALA group
atom N typeNH1 charge-0.36 end atom HN
typeH charge 0.26 end group atom CA
typeCT charge 0.00 end atom HA typeHA
charge 0.10 end group atom CB typeCT
charge-0.30 end atom HB1 typeHA charge
0.10 end atom HB2 typeHA charge 0.10 end
atom HB3 typeHA charge 0.10 end group
atom C typeC charge 0.48 end atom O
typeO charge-0.48 end bond N HN
bond N CA bond CA HA bond CA CB
bond CB HB1 bond CB HB2 bond CB
HB3 bond CA C bond C O improper
HA N C CB !stereo CA improper HB1 HB2 CA
HB3 !stereo CBend
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    calculations
  • Topallhdg_new.pro

Partial list of atomic masses
Defines and groups all atoms, assigns a type and
charge
Defines pairs of bonded atoms
Defines a group of four atoms comprising an
improper torsion angle
41
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Topallhdg_new.pro

Atoms defined by an improper angle to maintain
proper sterochemistry are boxed. Usually set to
either 0o or 180o
Atom Types all atoms that have the same
structural properties i.e. same bond lengths,
bond angles, dihedrals are classified to the same
atom type. Simplifies the assignment of
structural parameters while keeping unique atom
identifiers.
Improper Artificial dihedral definition used
primarily to maintain planer arrangement of atoms
or proper stereochemistry in the structure
(peptide bond, aromatic rings, etc). Does not
follow the linear connectivity of a proper
dihedral angles.
The bond lengths and bond angles for CA-HA,
CB-HB1, CB-HB2, and CB-HB3 are identical. So, all
defined as CT-HA
42
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Parallhdg_new.pro

Force Constant
Ideal Value
bonds H NA kbon
0.98 bond CT CT kbon
1.53 angle HA CT C
kang 109.5 angle CA CA CT
kang 120.0 improper H X
X C kpla 0 0.0 improper
C X X C kpla 0
0.0 dihedral CA CA CT CT kdih
3 0.0 dihedral NA CC CT CT
kdih 3 0.0 NONbonded C
0.0903 3.2072 0.0903 3.2072
NONBonded CA 0.120 3.2072 0.120
3.2072 nbfix H O 44.2 1.0 44.2
1.0 nbfix H OC 44.2 1.0 44.2 1.0
. .
List all possible combinations of bonds, angles,
impropers and dihedral with ideal values, force
constants and multiplicity.
. .
. .
. .
Parameterization of van der Waals equation for
atom-atom contact.
. .
Parameterization of hydrogen-bond interactions.
multiplicity
43
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Parallhdg_new.pro
  • Defining atomic parameters is a very active area
    of molecular modeling research
  • The values in the parameter database come from
    multiple sources
  • X-ray database of high-resolution small
    molecules
  • ab initio calculations
  • experimental observations, IR, Raman, water-ion
    neutron and X-ray diffraction data, free energy
    of solvation data, etc

44
Protein Structures from an NMR Perspective
Distribution of Bond Distances in Protein
Hydrogen Bonds
45
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • XPLOR PSF Script

remarks build psf file rtf _at_/PROGRAMS/xplor-nih-2.
9.1/toppar/topallhdg_new.pro END parameter
_at_/PROGRAMS/xplor-nih-2.9.1/toppar/parallhdg_new.pr
o END segment name" " SETUPTRUE
chain LINK PEPP HEAD - TAIL
PRO END LINK to PRO LINK PEPT
HEAD - TAIL END FIRSt
PROP TAIL PRO
END FIRSt NTER
TAIL END LAST CTER HEAD
- END
sequence MET THR LEU LYS HIS HIS HIS
end end end write psf outputPROTEIN.psf
end stop
Read parameter and topology files
Initiate a segment. Repeat for each individual
chain or component of the structure
Definitions in the topology file on how to make a
peptide bond and cap the N-terminus and
C-terminus
Complete protein sequence
. .
Write out the PSF file with name PROTEIN.psf
46
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • XPLOR PSF Script
  • PATCHES

HIS HIS HIS end end end patch CISP
reference"-"(residue 109) reference""(res
idue 110) end patch DISU reference1(residue
29) reference2(residue 57) end patch
ltod referencenil(resid 8) end write psf
outputPROTEIN.psf end stop
Create a cis peptide bond between residues 109
(P) and 110
Create a disulphide bond between residues 29 and
57
Convert residue 8 to a D-amino acid
47
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • XPLOR PSF Script
  • Using Structures and Multiple Segments

rtf _at_/PROGRAMS/xplor-nih-2.9.1/toppar/topallhdg_ne
w.pro _at_molecule.top END parameter
_at_/PROGRAMS/xplor-nih-2.9.1/toppar/parallhdg_new.pr
o _at_molecule.par END segment namePROT"
SETUPTRUE chain LINK PEPP HEAD -
TAIL PRO END LINK to PRO LINK
PEPT HEAD - TAIL
END coordinates _at_PROTEIN.pdb end
end end segment nameMOLE " SETUPTRUE
CHAIN sequence CPD end
end end write psf outputPROTEIN.psf end stop
Read in your parameter and topology files
defining molecule
Instead of listing sequence, read in PDB file
Define segment MOLE that contains a single copy
of molecule (note no LINK used)
48
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Second Step is to create a linear extended
    structure of the protein sequence using idealized
  • geometry
  • Extended structure coordinate File (EXT)
  • Standard XPLOR PDB coordinate file
  • Starting point to generate a proper fold for the
    protein from experimental data

Typical extended structure created by XPLOR based
on a PSF file
49
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Third Step is to convert NMR experimental data
    into XPLOR format
  • Distance Constraints
  • a file (noe.tbl) containing a list of all
    observed/assigned NOE distant constraints

a b c
assign ( resid 3 and name HB ) ( resid 49 and
name HD ) 4.0 2.2 3.0
XPLOR assign statement
Residue number and atom name for each atom
involved in the distance constraint
Distance information
Understanding the distance information (a b
c) - a distance constraint is typically defined
with a range as opposed to an absolute
number. an upper and lower bound - in XPLOR
format upper bound a c in our
example upper bound 4.0Å 3.0Å 7.0Å
lower bound a - b in our example lower
bound 4.0Å 2.2Å 1.8Å
50
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Distance Constraints
  • Pseudo-Atoms/Wildcards

assign ( resid 3 and name HB ) ( resid 49 and
name HD ) 4.0 2.2 3.0
What atom is HB or HD? - Recall the PDB atom
nomenclature each atom gets a unique atom
identifier but each atom does not
have a unique NMR resonance a distance
constraint to Ala methyl needs to go to HB1,
HB2 and HB3. - XPLOR represents these equivalent
atoms with a single pseudo atom that is
positioned equidistant between them in the
assign statement the equivalent atoms are
represented with a wildcard ( or ) -
represents 1 character i.e. HB ? HB1
HB2 - represents 2 characters i.e. HD ?
distance constraint is to the pseudo-atom
Pseudo-atom (HB)
HD11,HD12,HD13 HD21,HD22,HD23 2 Leu d methyls
51
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Distance Constraints
  • Pseudo-Atoms/Wildcards

assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 5.8
Why Not Just Use Multiple Assign Statements? -
For a distance constraint between two sets of Leu
d methyls there would be 36 possible
combinations! - Multiple constraints between the
same sets of atoms would bias or
overemphasize that distance constraints relative
to others Each constraint would contribute
independently to a violation energy that
XPLOR attempts to minimize. Each duplication
of a constraint that is violated would
increase the likelihood that that constraint
would be satisfied at the expense of other
constraints Tipping the balance of energy to
favor one constraint - All the hydrogens may not
be simultaneously satisfied for any given
conformation. XPLOR will try to satisfy all
the constraints leading to a distorted
structure.
Pseudo-atom (HB)
52
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Distance Constraints
  • Pseudo-Atoms/Wildcards

assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 5.8
What Not Just Choose One Hydrogen to Represent
the Set? - Which one do you choose? - How do
you make the proper choice when there are
multiple distance constraints going to the
same set of hydrogens and when the
constraints are coming from very different
directions? Using Pseudo-Atoms is Not a Perfect
Solution. - distance constraint is going to
location that is spatially distinct from any
of the real atoms. - going to a center average
location - need to adjust the distance
constraints to account for the location of
the pseudo atom.
Pseudo-atom (HB)
53
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Distance Constraints
  • Pseudo-Atoms/Wildcards

assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 3.0
Distance information
How are the Distance Assignments Made? - One
common approach uses a qualitative analysis of
the NMR data to cluster the assignments as
strong, medium, weak and very weak based on the
intensity of the NOE crosspeak. - The
following rules apply Strong 2.5 0.7 0.2 ?
for NH-NH constraints use 2.5 0.7
0.6 Medium 3.0 1.2 0.3 ? for NOEs with NH
use 3.0 1.2 0.5 Weak 4.0 2.2 1.0 Very
Weak 5.0 2.0 1.0 the lower limit is always set
to slightly less than twice the hydrogen van
der Waals radius (1.8Å) For hydrogen bond
constraints constraint between O N 2.8 0.4
0.5 constraint between O HN 1.8 0.3 0.5
54
Software for Protein Structures by NMR
  • Protein Based Programs Calculate Protein
    Structures
  • General overview of XPLOR Protein Structure
    Calculations
  • Distance Constraints
  • Rules for pseudo-atom distance corrections
  • 1) For non stereoassigned CbHs add 1.0 to upper
    bound
  • if HB is used instead of HB1 or HB2
  • 2) 1.0 is added to upper bound for other
    methylenes
  • if HG, HD
Write a Comment
User Comments (0)
About PowerShow.com