Model Building, Refinement, and Validation

What can one see?

- will determine what can be ascertained
- will determine which parameters can be refined
- resolution-dependent
- note about maps
- contoured in standard deviations (s) from the

mean (which is 0.0) - experimental-type maps contoured at 1-1.25s
- difference maps contoured at 3s

6Å

3Å

1.8Å

Fitting a Model into Density

- start by tracing the backbone
- a-helices easiest to identify, b-sheets are

harder - sometimes loops might be untraceable until later

in the process (or never) - side chains come later
- check known rotamers first
- many rounds of rebuilding might be necessary

Skeletonization of the Density

(No Transcript)

Adding Side Chains

How does this Process Progress?

- first build is usually mostly backbone, some side

chains - cycles of building and refinement until model

ceases to improve - waters, ion, ligands are typically added towards

the end of the process

Refinement

- based on what I see, what can I refine?

Constraints and Restraints

- used to overcome poor dataparameter
- atoms not "free" ? improved convergence
- geometric restraints
- bond lengths and angles found in protein

structures well-known from small molecule x-ray

crystallography - penalize excessive deviations from these values
- planarity restraints for rings and planar end

groups

Non-crystallographic Symmetry

- restraint each copy of molecule in asymmetric

unit must have rmsd for all atoms below

user-defined value when compared with each other - constraint all copies in the asymmetric unit

must be identical - are the molecules identical?
- strong density in averaged map a clue
- e.g. (1.5 1.8)/2 1.65 (1.5 0.0)/2 0.7

B The Temperature Factor

- describes mean displacement from average position
- higher B more mobile less well ordered
- mx my mz if B is isotropic, need 3x3 matrix

if B is anisotropic

Evaluation of Refinement

- R-factors (Rwork and Rfree)
- Rfree is the same as Rwork but calculated for a

percentage of the data (5-10) not included in

the refinement - if model really improves, Rfree should decrease

along with Rwork

Rules of Thumb for Rwork and Rfree

- depends on resolution
- for most structures, Rfree should be less than

28, and the spread between Rwork and Rfree

should be 5 or less - very low resolution structures (3.5Å and lower)

might not conform to this - careful not to overstate conclusions

Why are R-factors so High?

- geometrical restraints not sophisticated enough

to find true minimum (R error in data, or 4-9) - R-factors higher at lower resolution because less

data but same number of geometrical parameters to

be satisfied - result is numerous very small errors (0.01-0.1Å)

in coordinate positions

Mechanics of Refinement

- perturb x, y, z, and B such that Fobs and Fcalc

come into maximal agreement - the old way least-squares minimization
- assuming errors follow a Gaussian distribution,

minimization would take following form

is the predicted value of xj

is the standard deviation for the measurement xj

More Least Squares

- general form for refinement would be
- add geometrical constraints and in practice
- real space equivalent of the x-ray part is

Still More Least Squares

- two ways to minimize

- improve the model
- introduce systematic errors that obliterate

difference density - note that the s2 weighting term is

eliminatedempirically shown to converge poorly - sign that least squares not appropriate
- also have to incorporate higher resolution data

later in refinement

Why not Least Squares?

- phases of model in

term treated as error-free - model completeness not taken into account
- leads to bias towards existing model
- all measurements treated as having equal

information content - i.e. a F with F/s 50 weighted the same as an F

with F/s 2 - additional phase information not easy to

incorporate

Maximum Likelihood to the Rescue

- if we move an atom, how it is moved depends on

the position of all other atoms - if they're not in the right place and we assume

they are, our choice of move for the target atom

will not place it correctly - we need an estimate of model accuracy and

completeness to help guide this process - maximum likelihood allows us to explicitly

account for errors in the model, completeness of

model, errors in data - additional phase info easily incorporated

Some Mathematical Background

- in these slides "" means "given", P is

probability, and L is likelihood - assume errors in observations independent

Nasty Math Shown for Effect, not to be Fully

Understood, let alone Memorized

- need to know what joint conditional probability

of observations given current model,

"P(obsmod)", looks like - the above is for an acentric reflection
- worked out in the '60s (i.e. before cable TV)

Take-home Message from Previous Slide

- Sq amount of missing scattering matter
- Sp mass accounted for by current model
- D reflects errors in current model
- is the error of a given reflection
- ? P(obsmod) depends upon the magnitude of Fo and

Fc, the errors in the Fo, the completeness of the

current model, and the accuracy of the current

model

Maximum Likelihood

- want to maximize
- equivalent to minimizing negative logarithm(LLK

is "log-likelihood") - P0(mod) is our geometric restraints term
- ( ), and

replaces - cast in similar form as LSQ, but with more

complicated terms that reflect complexity of the

problem - less biased towards model, data properly weighted

Additional Phase Information (e.g. MIR or MAD)

- easily incorporated in maximum likelihood (unlike

least-squares) as experimental constraints in the

refinement process - becomes

!

Aside Scaling Fo to Fc

- not as easy as you think
- Fc calculated essentially in vacuum, whereas real

crystal (source of Fo ) has bulk solvent (i.e.

not ordered waters you can see) - bulk solvent tends to dampen low resolution

reflections (pun intended) - poor scaling can mess up refinement
- in olden times, would exclude all reflections

below 8Å from refinement, despite fact they're

the most accurately measured

Bulk Solvent Correction Exponential Scaling Model

- assumes and have exactly opposite

phases - only really true to 15Å
- for ksol0.75, Bsol200
- 15Å reflection
- 4Å reflection

Bulk Solvent Correction The Mask Model

- mask out the protein, calculate structure factors

for everything outside the protein mask - no assumption about solvent phases
- ksolv and Bsolv determined by LSQ fit
- mask also optimized
- more robust than exponential scaling model

Rebuilding

- after a round of refinement, model phases should

be markedly improved, need for rebuilding evident - side chains added
- loops built
- waters/ions/ligands added
- incorrectly-built areas remodeled

The "2Fo-Fc" Map

The "2Fo-Fc" Map

- is our approximation to

Maximum Likelihood Maps

- 2Fo-Fc type map
- m figure of merit of model phases
- D weight reflective of errors in model
- difference map

(No Transcript)

Validation

- most obvious validation is Rfree
- SFCHECK checks structure against data
- other methods are model-based
- all involve comparing present structure to

well-refined structures in a database - some deviations from "standard" parameters will

be functionally and/or structurally necessary - others will be errors in building

Procheck

- very thorough check of a variety of

geometry-based criteria - Ramachandran plot
- main chain bond lengths and angles
- planarity of rings and end groups (R,D,N,E,Q)
- torsion angles, chirality
- close non-bonded interactions, main chain

H-bonds, disulfide bond geometry - residue by residue analysis of most of the above

(No Transcript)

(No Transcript)

Errat

- analyzes statistics of non-bonded interactions

between different atom types - highlights unusual regions, giving "confidence

level" that a region is in error - anything above the 99 confidence level in most

cases needs to be rebuilt

(No Transcript)

Verify3D

- 3D-1D profile analysis of structure versus its

own sequence - if residue is in an unusual chemical environment,

it will receive a bad score and should be

inspected - environment defined by
- area of residue buried
- fraction covered by polar atoms
- local secondary structure

PROVE

- analyzes departures from standard atomic volumes
- presented as "Z-score" or RMS(Z-score)

3 ? BAD!

2 ? BAD!

