Скачать презентацию Molecular Replacement Martyn Winn CCP 4 group Daresbury Скачать презентацию Molecular Replacement Martyn Winn CCP 4 group Daresbury

5b5ad90724ee14bfb13d170aa138e745.ppt

  • Количество слайдов: 48

Molecular Replacement Martyn Winn CCP 4 group, Daresbury Laboratory, UK 1 Molecular Replacement Martyn Winn CCP 4 group, Daresbury Laboratory, UK 1

What do we know from the diffraction data? • The point group of the What do we know from the diffraction data? • The point group of the new crystal form, the volume of the asymmetric unit and hence the likely number of molecules it contains (Matthews coefficient). Note: We often cannot be sure of the space group and will need to search for the solution in several. [Spacegroup determination rests on observation of absences in certain zones – eg Only l=4 n seen on the 00 l axis. Is this a 4 1 screw axis? A 4 3 screw axis? pointless tries. Scala plot? Or are there two or more molecules in the asymmetric unit in the same orientation but separated by (x, y, 1/4)? ] • The quality of the experimental intensities. • Are they complete? Saturated at low resolution? Anisotropic? • Are the intensity statistics reasonable? Could the crystal be twinned? 2

Data analysis before MR Matthews coefficient Number copies in a. s. u. Native Patterson Data analysis before MR Matthews coefficient Number copies in a. s. u. Native Patterson (translational NCS) B factor analysis Self RF (rotational NCS) 3

NON-CRYSTALLOGRAPHIC TRANSLATION VECTOR (Thank you Airlie) If the asymmetric unit contains two molecules related NON-CRYSTALLOGRAPHIC TRANSLATION VECTOR (Thank you Airlie) If the asymmetric unit contains two molecules related by a translation, then the native Patterson will have a large peak at the position representing this translation. Unlike non-crystallographic rotations, non -crystallographic translations are not useful in structure determination. Use exptl phases? In fact, they introduce awkward structure factor correlations not currently accounted for, and can make structures difficult to refine. If there are more than one molecule in the asymmetric unit you should always check for non-crystallographic translation. Asymmetric unit of unknown crystal structure with noncrystallographi c translation. Crystal Patterson has origin sized peak at the translation vector. MOLREP does this within the program 4

What else can we do if there are several molecules in the asymmetric unit What else can we do if there are several molecules in the asymmetric unit • A self rotation function can be calculated from the measured data – it does not need a model. • What can it show? If the molecule forms an oligomer, eg. a dimer, or trimer then we will see a peak in the self rotation map. • However this can be mixed up with crystal symmetry and be very confusing to interpret! 5

SELF-ROTATION FUNCTION (Thank you Airlie) If there is more than model molecule in the SELF-ROTATION FUNCTION (Thank you Airlie) If there is more than model molecule in the asymmetric unit (and no NCT), then the rotation function of the Patterson on itself will give a peak at the angle corresponding to the relative rotation between the two. Asymmetric unit of unknown crystal with noncrystallographic twofold symmetry The self rotation function does not need a model! This is useful for confirming or determining how many copies of the structure you have in the asymmetric unit. It should therefore be one of the first things you do with a new data set. If non-crystallographic symmetry is present it us extremely useful in MIR and density modification. Crystal Patterson has same two-fold symmetry near the origin (intramolecular peaks only) 6

Self Rotation Function for S 100 symmetry related 2 -folds 7 Self Rotation Function for S 100 symmetry related 2 -folds 7

Finding search models Need a PDB file for a structurally similar protein. This usually Finding search models Need a PDB file for a structurally similar protein. This usually means a homologous protein. Either you have one already? Or you search the Protein Data Bank Search is based on sequence alignment between target protein and proteins in PDB. Several bioinformatics tools can help here: OCA, MSDlite, MSDtarget - all use FASTA www. ebi. ac. uk/msd psi. BLAST - iterative searching www. ncbi. nlm. nih. gov/BLAST FFAS - profile-profile alignment ffas. ljcrf. edu/ffas-cgi/ffas. pl 8

Editing search models Don’t use a raw PDB file for Molecular Replacement unless it Editing search models Don’t use a raw PDB file for Molecular Replacement unless it is very similar (e. g. same protein, different conditions, ligand, etc. ) Edit it to: • remove residues that don’t occur in the target • remove side chain atoms that don’t occur in the target (these assume a know alignment from model to target) • remove uncertain regions of model (check B factors, occupancies) • remove flexible loops Note that we don’t add anything!! Homology modelling? Consider use of individual domains and multimers (see Mr. BUMP below) 9

Chainsaw Norman Stein, Daresbury Lab. 10 Chainsaw Norman Stein, Daresbury Lab. 10

MR model preparation: chainsaw • Molecular replacement model preparation utility that edits a PDB MR model preparation: chainsaw • Molecular replacement model preparation utility that edits a PDB search model according to a sequence alignment. • Features: – Removes un-aligned residues from the model – Prunes non-conserved residues back to the gamma atom – Preserves more atoms than in polyalanine model Unmodified template Chainsaw template Polyalanine template Example of 1 mr 6 used as a template for 1 tgx (38% sequence identity) 11

Running Chainsaw: complete PDB file model to target alignment Alignment from: original search tool Running Chainsaw: complete PDB file model to target alignment Alignment from: original search tool (FASTA, psi. BLAST, etc. ) multiple alignment (set of search models, protein family, etc. ) hand-created 12

Molrep Alexei Vagin, York http: //www. ysbl. york. ac. uk/~alexei/molrep. html 13 Molrep Alexei Vagin, York http: //www. ysbl. york. ac. uk/~alexei/molrep. html 13

Molrep: overview of functionality Performs complete MR in single step: Expt. data (MTZ) Search Molrep: overview of functionality Performs complete MR in single step: Expt. data (MTZ) Search model (PDB) Molrep Positioned search model • Individual steps for more difficult cases: CRF, TF, rigid-body • Multi-copy search: locked CRF, dyad search • Self RF • Phased TF, spherically-averaged phased TF • Improve search model • Other search models: electron density map, NMR models • Fit model in electron density map / EM map 14

MR for straightforward case via GUI: title mode MTZ file MTZ labels search model MR for straightforward case via GUI: title mode MTZ file MTZ labels search model RUN IT! 15

Other parameters DEFAULTS ARE GOOD Low resolution cut-off Molrep uses soft cut-off, Boff (BOFF, Other parameters DEFAULTS ARE GOOD Low resolution cut-off Molrep uses soft cut-off, Boff (BOFF, COMPL, RESMIN) High resolution cut-off Molrep uses soft cut-off, Badd (BADD, SIM) |F|new = |F|input *exp(-Badd*s 2)*(1 -exp(-Boff*s 2) Defaults estimated High resolution limit Absolute cut-off (RESMAX) Default estimated Radius of Patterson sphere for CRF Default is twice radius of gyration of search model, Keyword RAD, Infrequently Used Parameters in GUI 16

Cross Rotation Function polar angles Euler angles (CCP 4) R factor List of top Cross Rotation Function polar angles Euler angles (CCP 4) R factor List of top RF peaks More details here 17

Translation Function polar angles fractional translation R factor Score List of solutions: top TF Translation Function polar angles fractional translation R factor Score List of solutions: top TF for each RF solution contrast of solution 18

Identification of solutions SCORE = product Correlation Coefficient and maximal value of Packing Function Identification of solutions SCORE = product Correlation Coefficient and maximal value of Packing Function integrated into TF search removes solutions with overlapping molecules CONTRAST = ratio of top score to mean score: >2. 5 - definitely solution <2. 5 and > 1. 8 - solution <1. 8 and > 1. 5 - maybe solution <1. 5 and > 1. 3 - maybe not solution, but program accepts it <1. 3 - probably not solution 19

Finding more than one copy in the asu By default, Molrep will estimate number Finding more than one copy in the asu By default, Molrep will estimate number of copies to find. Override with NMON keyword Program flow: CRF TF for first copy Fix first copy TF for second copy Fix second copy TF for third copy. . . 20

Solving complexes • Choose first component (largest, highest similarity) • Solve for first component Solving complexes • Choose first component (largest, highest similarity) • Solve for first component (probably need to specify NMON explicitly) • New Molrep job Model in - second component Fixed in - positioned first component • Repeat for all other components Possibility to use spherically-averaged phased TF using phases from first component 21

Phaser Randy Read, Airlie Mc. Coy, Cambridge Phaser website: http: //www-structmed. cimr. cam. ac. Phaser Randy Read, Airlie Mc. Coy, Cambridge Phaser website: http: //www-structmed. cimr. cam. ac. uk/phaser/ 22

Performs complete MR in single step: Expt. data (MTZ) Phaser Search model (PDB) Positioned Performs complete MR in single step: Expt. data (MTZ) Phaser Search model (PDB) Positioned search model Use “MODE MR_AUTO” or “automated search” in the GUI • anisotropy correction • fast rotation function • fast translation function • packing • refinement and phasing loop over models 23

More functionality. . . • All steps can be run separately • Search over More functionality. . . • All steps can be run separately • Search over spacegroups: MTZ spacegroup and enantiomorph All spacegroups in MTZ point-group Selected spacegroups • Ensemble models (see later) • Brute RF and TF - slow and accurate • Normal mode analysis Generates perturbed models 24

MR for straightforward case via GUI: mode MTZ file search model specify search target MR for straightforward case via GUI: mode MTZ file search model specify search target details RUN IT! 25

FRF Euler angles (CCP 4) Top LLG and Z-scores for FRF 26 FRF Euler angles (CCP 4) Top LLG and Z-scores for FRF 26

FTF fractional translation FRF solution number Top LLG and Z-scores for FRF 27 FTF fractional translation FRF solution number Top LLG and Z-scores for FRF 27

Packing Phaser does packing check after FTF Clashes = C atoms closer than 3Å Packing Phaser does packing check after FTF Clashes = C atoms closer than 3Å Default number of clashes = 10 (beware, was 0 in older versions) 28

Solution files: . sol file produced at end of job • Contains summary of Solution files: . sol file produced at end of job • Contains summary of all solutions • Each solution contains rotations and usually translations 3 DIM vs 6 DIM • One line per model located • . sol file can be read back into Phaser in later jobs Z-score Have I solved it? less than 5 5 -6 6 -7 7 -8 more than 8 no unlikely possibly probably definitely RFZ = RF Z-score TFZ = TF Z-score 29

Ensemble models Phaser refers to search models as “ensembles” Often, ensemble contains single model, Ensemble models Phaser refers to search models as “ensembles” Often, ensemble contains single model, as in traditional MR But Phaser can use an ensemble of > 1 models, which may work better than any single model Models in an ensemble must be superposed prior to use in Phaser - use e. g. Superpose in CCP 4 N. B. Phaser will complain if: – – MW of models in ensemble are too different RMS between models is too large (In Molrep, construct ensemble as pseudo-NMR PDB file) 30

Finding more than one copy in the asu Specify > 1 in Composition of Finding more than one copy in the asu Specify > 1 in Composition of the asymmetric unit (keyword COMPOSITION. . . NUMBER) Specify > 1 in Number of copies to search for (keyword SEARCH. . . NUMBER) Phaser will issue warnings if these numbers are wrong. CRF TF for first copy Fix first copy (possibly multiple sets) CRF for second opy TF for second copy Fix second copy (possibly multiple sets). . . 31

Complexes As before, but: • Define > 1 type of component Composition of the Complexes As before, but: • Define > 1 type of component Composition of the asymmetric unit Define another component • Define > 1 ensemble Define ensembles Add ensemble • Specify all searches Search details Add another search E. g. beta-blip example in Phaser tutorial: http: //www-structmed. cimr. cam. ac. uk/phaser/tutorial/Phaser_MR_tute. html 32

Mr. BUMP Ronan Keegan, Martyn Winn, Daresbury Lab. 33 Mr. BUMP Ronan Keegan, Martyn Winn, Daresbury Lab. 33

The aim of Mr. BUMP • An automation framework for Molecular Replacement. • Particular The aim of Mr. BUMP • An automation framework for Molecular Replacement. • Particular emphasis on generating a variety of search models. • Can be used to generate models only. Wraps Phaser and/or Molrep. • Also uses a variety of helper applications (e. g. Chainsaw) and bioinformatics tools (e. g. Fasta, Mafft) • Uses on-line databases (e. g. PDB, Scop) • In favourable cases, gives “one-button” solution • In unfavourable cases, will suggest likely search models for manual investigation (lead generation) 34

The Pipeline Target MTZ & Sequence Target ` Details Template ` Search Check scores The Pipeline Target MTZ & Sequence Target ` Details Template ` Search Check scores and exit or select the next model Model ` Preparation Molecular Replacement ` & Refinement 35

Search for homologous proteins FASTA search of PDB • Sequence based search using sequence Search for homologous proteins FASTA search of PDB • Sequence based search using sequence of target structure. • Can be run locally if user has fasta 34 program installed or remotely using the OCA web-based service hosted by the EBI. All of the resulting PDB id codes are added to a list These structures are called model templates 36

Search for additional similar structures • Additional structure-based search (optional) – Top hit from Search for additional similar structures • Additional structure-based search (optional) – Top hit from the FASTA search is used as the template structure for a secondary structure based search. – Uses the SSM webservice provided by the EBI (a. k. a. MSDfold) – Any new structures found are added to the list. – Provides structural variation, not based on direct sequence similarity to target • Manual addition • Can additional PDB id codes to the list, e. g. from FFAS or psi. BLAST searches • Can add local PDB files 37

Multiple Alignment • After the set of PDB ids are collected in the FASTA Multiple Alignment • After the set of PDB ids are collected in the FASTA and SSM searches, their coordinate-based sequences are collected and put through a multiple alignment with the target sequence • Aims: – Score template structures in a consistent manner, in order to prioritise them for subsequent steps – Extract pairwise alignment between template and target for use in Chainsaw step. Multiple alignment should give a better set of alignments than the original pair-wise FASTA alignments 38

Multiple Alignment target model templates pairwise alignment Jalview 2. 08. 1 Barton group, Dundee Multiple Alignment target model templates pairwise alignment Jalview 2. 08. 1 Barton group, Dundee currently support Clustal. W or MAFFT for multiple alignment 39

Template Model Scoring • Alignment Scoring: score = sequence identity X alignment quality • Template Model Scoring • Alignment Scoring: score = sequence identity X alignment quality • Sequence identity: • Alignment quality: – Ungapped sequence identity i. e. sequence identity of aligned target residues – Dependent on the alignment length, the number of gaps created in the template alignment and the extent of each of these gaps. – The penalties given for gaps and the size of the gaps is biased so that alignments that preserve domains of the structure rather than spreading the aligned residues out score higher. The top scoring models are then used for further processing 40

Domains • Suitable templates for target domains may exist in isolation in PDB, or Domains • Suitable templates for target domains may exist in isolation in PDB, or in combination with dissimilar domains • In case of relative domain motion, may want to solve domains separately 41

Domains • Domains search: – Top scoring templates from multiple alignment are tested to Domains • Domains search: – Top scoring templates from multiple alignment are tested to see if they contain any domains. – Uses the SCOP database. This only lists domains that appear more than once in the PDB. – The database is scanned to to see if domains exist for each of the PDBs in the list of templates – Domains are then extracted from the parent PDB structure file and added to the list of template models as additional search models for MR. 42

Multimers • Multimer search: – Search for quaternary structures that may be used as Multimers • Multimer search: – Search for quaternary structures that may be used as search models. – Better signal-to-noise ratio than monomer, if assembly is correct for the target. – Multimeric structures based on top templates are retrieved using the PQS service at the EBI, and added to the list of search models – PQS will soon be replaced by the use of the PISA service at the EBI (Eugene Krissinel) 1 n 5 a 1 n 5 b 1 n 5 c 1 n 5 d SPLIT-ASU into 4 Oligomeric files of type TRIMERIC SPLIT-ASU into 2 Oligomeric files of type DIMERIC SYMMETRY-COMPLEX Oligomeric file of type DIMERIC 43

Search Model Preparation Search models prepared in four ways: 1. PDBclip – original PDB Search Model Preparation Search models prepared in four ways: 1. PDBclip – original PDB with waters removed, hydrogens removed, most probable conformations for side chains selected and chain ID’s added if missing. 2. Molrep – Molrep contains a model preparation function which will align the template sequence with the target sequence and prune the nonconserved side chains accordingly. – Chainsaw – Can be given any alignment between the target and template sequences. – Non-conserved residues are pruned back to the gamma atom. 1. Polyalanine – Created by excluding all of the side chain atoms beyond the CB atom using the Pdbset program Also create an ensemble model for Phaser based on top 5 models 44

Molecular Replacement and Refinement • The search models can be processed with Molrep or Molecular Replacement and Refinement • The search models can be processed with Molrep or Phaser or both. • The resulting models from molecular replacement are passed to Refmac for restrained refinement. • The change in the Rfree value during refinement is used as rough estimate of how good the resulting model is. final Rfree < 0. 35 or final Rfree < 0. 5 and dropped by 20% “success” final Rfree < 0. 48 or final Rfree < 0. 52 and dropped by 5% “marginal” “failure” otherwise 45 • MR scores and un-refined models available for later inspection.

Mr. BUMP on compute clusters • Mr. BUMP can take advantage of a compute Mr. BUMP on compute clusters • Mr. BUMP can take advantage of a compute cluster to farm out the Molecular Replacement jobs. • Currently Sun Grid Engine enabled clusters are supported but support will be added for LSF and condor and any other types of queuing system if there is enough demand. • All nodes terminate when one finds a solution 46

Pre-release version of Mr. BUMP • Pre-release made available in Jan 06 • Simple Pre-release version of Mr. BUMP • Pre-release made available in Jan 06 • Simple installation • Currently runs on Linux and OSX. • Windows version almost ready. • Comes with CCP 4 GUI. • Can also be run from the command line with keyword input • First citation in Obiero et al. , Acta Cryst. (2006). F 62, 757 -760 • Regular updates (currently version 0. 3. 2) http: //www. ccp 4. ac. uk/Mr. BUMP 47

A few observations. . . • In difficult cases, success in Mr. BUMP may A few observations. . . • In difficult cases, success in Mr. BUMP may depend on particular template, chain and model preparation method • Nevertheless, may get several putative solutions • Ease of subsequent model re-building, model completion may depend on choice of solution • First solution or check everything? • Expectation that quick solution required - in fact, most users seem happy to let Mr. BUMP run for long time (hours, days) • Worth checking “failed” solutions! 48