Title: Bulk Model Construction and Molecular Replacement in CCP4 Automation
1Bulk Model Construction and Molecular Replacement
in CCP4 Automation
- Ronan Keegan, Norman Stein, Martyn Winn.
2Overview
- Brute force search method for the best model for
Molecular Replacement on a target structure. - Python script utilising HPC resources.
- Can also run on single machine.
- Two main parts
- Model Generation using a variety of methods.
- Feeding a selection of the best models into an MR
program. - User input requirements target sequence and
associated MTZ file.
3Overview
4Process Target information
- Calculate Molecular Weight
- Estimate number of molecules in the a.s.u.
- Parse MTZ file for any relevant parameters
5Searching for Homologous Structures
- Using target sequence, program consults services
based at the EBI for homologous structures based
on sequence matching (OCA). - The top match from the sequence based search is
then used for a secondary structure based search
using the MSDFold/SSM webservice. - Using results from above searches, service will
also consult PQS at the EBI for any related
multimeric structures. - As an additional option, the top hits from the
search can be aligned using Superpose to
construct an ensemble of models to be used at the
Molecular Replacement stage.
6Model Construction
- Once the search stage has been completed all of
the associated PDB structure files are retrieved. - These are then manipulated in several different
ways to create a plethora of possible models - 1) PDB Clipping (Pdbcur, Pdbset, Coord_format)
- Waters and hydrogens are removed
- Any anomalies in the structure file such as empty
fields are corrected (e.g. missing chain
identifiers) - Select most probable confirmations
- Individual chains are extracted
7Model Construction
- 2) Molrep
- Uses own sequence alignment to prune the side
chains. - Side chains are stripped to lowest common parts.
- 3) Chainsaw (Norman Stein)
- Input sequence alignment used to strip side
chains. - More severe pruning than Molrep mixed model.
- Can be given many possible alignments to create
different models from the same structure. - Can use sophisticated sequence aligning such as
PSI-Blast and FFAS.
8Molecular Replacement
- A cluster or HPC resource spawns multiple MR jobs
each taking one of the constructed models along
with the target structure data. - Phaser/Amore/Molrep can all be used to do the MR.
- Phaser used for the Ensemble of top hits.
- If and when the MR program fits the model
structure to the target data the resulting PDB
file is processed using Refmac to asses whether
it is likely to refine. - Results are then provided to the user for all of
the top scoring models. - User can retrieve the refined structures along
with any of the associated log files.
9e-HTPX
Jobs can be submitted via the e-HPTX portal to
the Daresbury e-HTPX computational resources
(cluster or condor pool) or, if the user has a
Grid Certificate, to the UK National Grid
Resources. Users can monitor the job results as
they are produced via a web page hosted on the
e-HTPX server machine and they are notified by
email when their job is complete. Refined
structure files are made available to user for
downloading upon completion. First external user
as of a couple of days ago!
10(No Transcript)
11(No Transcript)
12JCSG Targets
N.B. good homologues available
Currently working through more challenging
examples
13Other points
- Program can also be run on a single machine in a
scaled-down fashion. - Can be run from the command line.
- Easy to swap out Phaser and run Amore, Molrep or
other MR program instead. - Modularised - Model construction can be run on
its own. - Other model generating methods can easily be
inserted.
14Future Plans
- Make it smarter and quicker.
- Use better sequence alignment methods such as
PSIBlast, FFAS. - Use Normans Chainsaw program as an extra model
creation method. - Incorporate Normans Amore wrapper.
- Integrate it into Graemes XIA project make use
of scheduler code wrappers provide a Model
Generation module for XIA-MR.