63de90f386234161410f202f2964e1e0.ppt
- Количество слайдов: 38
Know the Limitations of your Data – X-ray, NMR, EM PHAR 201/Bioinformatics I Philip E. Bourne SSPPS, UCSD Prerequisite Reading: Structural Bioinformatics Chapters 4 6 PHAR 201 Lecture 3 2012 1
When You Grab a PDB Fie What Are You Starting With? PHAR 201 Lecture 3 2012 2
Data Views • Depositor/Annotator • Type of experiment: X ray, NMR, EM • Type of molecule: protein, nucleic acid, or protein nucleic acid complex Step 2 Depositor Step 1 Validation Report PDB ID Deposit Archival Data Annotate Validate Step 3 PDB Entry Core DB Distribution Site Corrections Step 4 Depositor Approval PHAR 201 Lecture 3 2012 3
Annotation • Resolve nomenclature and format problems • Add missing required data items • Add higher level classifications • Review validation report and summary letter to the depositor • Produce and check final mm. CIF and PDB files • Update status and load database • Check data consistency across archive PHAR 201 Lecture 3 2012 4
Annotation – More Specifics • Make sure entry is complete (mandatory items from mm. CIF dictionary) • Format exchange – Converts between PDB and mm. CIF formats – Recognizes most variants of PDB format • Check nomenclature – Residue – Polymer atoms – Hydrogen atoms – Ligand atoms PHAR 201 Lecture 3 2012 5
Validation • Covalent geometry – Comparison with standard values (Engh and Huber 1; Gelbin et al. 3; Clowney et al. 2 ) – Identify outliers • Stereochemistry – check chiral centers • Close contacts in asymmetric unit and unit cell • Occupancy • Sequence in SEQRES and coordinates • Distant waters • Experimental (SFCHECK 4) 1 R. A. Engh & R. Huber. Acta Cryst. A 47 (1991): 392 -400 Clowney et al. J. Am. Chem. Soc. 118 (1991): 509 -518 3 A. Gelbin et al. J. Am. Chem. Soc. 118 (1991): 519 -529 4 A. A. Vaguine, J. Richelle, and S. J. Wodak. Acta Cryst. D 55 (1999): 191 -205. 2 L. PHAR 201 Lecture 3 2012 6
The process by which biological data in a database are annotated and validated changes over time – this introduces a temporal inconsistency PHAR 201 Lecture 3 2012 7
Summary Thus Far • The biocurators (annotators) are the unsung heroes of modern biology P. E. Bourne and J. Mc. Entyre 2006 Biocurators: Contributors to the World of Science PLo. S Comp. Biol. , (Editorial) 2(10) e 142 [PDF] – International Society for Biocuration • As a resource developer - start right and the need for data remediation in years to come will be less likely • As a resource user - be aware of the process used to provide the data and hence the limitations of the data you are using PHAR 201 Lecture 3 2012 8
The quality of the data you use in a bioinformatics experiment is a function of the method used to collect these data – understand the method PHAR 201 Lecture 3 2012 9
As of Oct 5, 2011 EM 254 PHAR 201 Lecture 3 2012 10
X ray Crystallography • • • Oldest technique Majority of the depositions A number of Nobel prizes International Union of Crystallography (IUCr). . Acta. . Method based on scattering from electrons – hydrogen atoms usually not seen (sometimes modeled in) In fact modeling in is an issue Atoms of similar atomic weight not distinguishable eg O, N, C Influence of crystal packing eg malate dehydrogenase (4 MDH) Environment in crystal highly aqueous Produces similar structures to NMR eg thioredoxin (3 TRX PHAR 201 Lecture 3 2012 11 vs 1 SRX)
The X ray Crystallography Pipeline Basic Steps Crystallomics • Isolation, Target • Expression, Data Selection • Purification, Collection • Crystallization Structure Solution Refinement PHAR 201 Lecture 3 2012 Functional Annotation Publish 12
Limitations Crystallization • Crystallization: – Non soluble – Twinning – Micro heterogeneity – Disorder PHAR 201 Lecture 3 2012 13
Limitations – Data Collection PHAR 201 Lecture 3 2012 14
Limitations Refinement PHAR 201 Lecture 3 2012 15
Limitations – Map Fitting • In an intricate study the only way to be sure that the work is correct is to make your own judgment from the electron density – this is never done. • It can be done at http: //eds. bmc. uu. se/eds/ • It requires that the experimental data (the 100 d structure factors be available) PHAR 201 Lecture 3 2012 16
Limitations – Non crystallographic Symmetry (NCS) PHAR 201 Lecture 3 2012 17
Limitations – Refinement • Introduces restraints/constraints that may or may be realistic • Water has been used unnecessarily • Resolution quoted wrongly • Standards have helped • See for example: H. Weissig, and P. E. Bourne 1999 Bioinformatics 15(10) 807 831. An Analysis of the Protein Data Bank in Search of Temporal and Global Trends PHAR 201 Lecture 3 2012 18
Limitations – Interpretation of the Biologically Active Molecule 1 QQP http: //www. pdb. org/pdb/101/static 101. do? p=education_discussion/Looking at Structures/bioassembly_tutorial. html PHAR 201 Lecture 3 2012 19
Limitations – Functional Annotation • Functional annotation is ONLY in the publication NOT PDB • Attempt to address this with GO assignments • Attempt to address this with literature integration • Structural genomics – function unknown • One structure – one to many functions (power law) – functions may be unrecognized since the PDB is relatively static • Many efforts at functional annotation PHAR 201 Lecture 3 2012 20
Why Are Understanding Limitations Important? • Later we will study reductionism – a key process in the use of biological data • As a result of reductionism you will need to choose a representative structure for the task at hand • Understanding the limitations of the experiment will help us do this PHAR 201 Lecture 3 2012 21
Summary of Important Features in using Structure Data Determined by X ray Crystallography • Resolution is a key indicator – think about it relative to atomic resolution ie 1. 54 A for a C C single bond • Disorder (ie undetermined or alternative atomic coordinates) is a natural part of many structures • R factor (all) describes the agreement of the model with the experimental data. It should be better than 0. 20 (Rfree 0. 26) PHAR 201 Lecture 3 2012 22
Summary of Important Features in using Structure Data Determined by X ray Crystallography Cont. • B (aka temperature) factors offer indicators both to the accuracy of a structure and the most mobile regions • At right is 5 EBX drawn with Quick. PDB PHAR 201 Lecture 3 2012 23
NMR PHAR 201 Lecture 3 2012 24
Features of NMR • Limited in size (25 100 k. Da) – provided labeled samples are obtainable • Selected information on proteins to ~150 k. Da • Solution study – small sample needed for soluble proteins • Only a few solid state studies • Reveals hydrogen positions • Leads to an ensemble of dynamical structures – these are rarely used in bioinformatics studies • Useful in high throughput screens to determine protein ligand interactions • Used for phasing of X ray structures ie the methods are synergistic • Until recently applicable to membrane proteins PHAR 201 Lecture 3 2012 25
NMR Methodology • Molecules are tumbling and vibrating with thermal motion • Usually labeled with H 1 C 13 N 15 P 31 in an external magnetic field have two spin states – one paired and one opposed to the external magnetic field • Detects and assigns chemical shifts of atomic nuclei with non zero spin • The shifts depend on their electronic environments ie identities and distances of nearby atoms • The system can be tuned to look at specific features of the characteristic spin moments • H 1 provides NOE constraints • Better resolution is obtained when the molecule is tumbling fast – size slows this – offset by higher magnetic field strengths • Protein must be soluble at high concentration and stable without aggregation – high throughput can show this and folded vs unfolded very quickly PHAR 201 Lecture 3 2012 26
NMR – Methodology cont. • Result is a set of distance constraints between pairs of atoms either bonded or non bonded • If there are sufficient constraints then an ensemble of possibilities results • Often this ensemble is averaged and constraints adjusted to conform to normal bond lengths and distances • Usually left with 15 30 members of the ensemble • Ideally less than 1Å RMSD between models (backbone only) • Portions of the molecule with high motion have tell tale signals eg apo calmodulin PHAR 201 Lecture 3 2012 27
BMRB http: //www. bmrb. wisc. edu/ PHAR 201 Lecture 3 2012 28
NMR Terms • COSY/NOESY spectra: Allow the space interactions between atoms to be measured and generate a 3 D structure of the protein. (what we have discussed) • TROSY Transverse Relaxation Optimized Spectroscopy: Invented about 1997. First described by Professor Kurt Wuthrich. Useful for analyzing larger protein systems. TROSY is a method for getting sharper peaks on large proteins. TROSY is best at higher fields. If the aim is to study a large complex or a chemical shift perturbation when a protein binds to a receptor using NMR, it’s better to use a 900 MHz machine than a more standard lower field machine • solid state NMR: Requires wider bore (63 or even 89 mm diameter) magnets (than solution state NMR). The higher stored energy of these wide bore magnets means that they are significantly more difficult to build, and as a result high field solid state NMR lags behind liquid state in terms of available field strength. • multidimensional (three- and four-dimensional) NMR: Introduced about 12 15 years ago. This technology has the advantage of resolving the severe overlap in 2 D spectra. PHAR 201 Lecture 3 2012 29
In both X ray crystallography and NMR there is the danger that the final structure reflects the model it was computed against PHAR 201 Lecture 3 2012 30
Additional Validation Checks • Stereochemical quality – Ramachandran plot outliers – Dihedrals, bond lengths and angles – Fold Deviation Score (FDS) – Validation Server http: //deposit. rcsb. org/validate/ PHAR 201 Lecture 3 2012 31
Use the PDB Geometry Data PHAR 201 Lecture 3 2012 32
Electron Microscopy 1 KVP STRUCTURAL ANALYSIS OF THE SPIROPLASMA VIRUS, SPV 4, IMPLICATIONS FOR EVOLUTIONARY VARIATION TO OBTAIN HOST DIVERSITY AMONG THE MICROVIRIDAE, • Able to look at large molecular assemblies • Resolution now 30 A to below 4 A • Cryo EM preserves aqueous environment (no staining) • Experimentally more tractable • Can resolve images (direct measurement of phases) or diffraction patterns • Can provide a 3 D volumetric reconstruction • Suitable for the study of membrane proteins eg bacteriorhodopsin (1990) PHAR 201 Lecture 3 2012 33
1 P 85 Real space refined coordinates of the 50 S subunit fitted into the low resolution cryo EM map of the EF G. GTP state of E. coli 70 S ribosome • Single particle reconstruction – multiple orientations of the same particle found in the specimen (viruses, ribosome…) • Electron tomography – 3 D reconstruction of a single particle (organelles, whole cells) PHAR 201 Lecture 3 2012 34
Example EM Result • Example for a hybrid study that combines elements of electron crystallography and helical reconstruction with homology modeling and molecular docking approaches in order to elucidate the structure of an actin fimbrin crosslink (Volkmann et al. , 2001 b). Fimbrin is a member of a large superfamily of actin binding proteins and is responsible for crosslinking of actin filaments into ordered, tightly packed networks such as actin bundles in microvilli or stereocilia of the inner ear. The diffraction patterns of ordered paracrystalline actin fimbrin arrays (background) were used to deduce the spatial relationship between the actin filaments (white surface representation) and the various domains of the crosslinker (the two actin binding domains of fimbrin are pink and blue, the regulatory domain cyan). Combination of this data with homology modeling and data from docking the crystal structure of fimbrin’s N terminal actin binding domain into helical reconstructions (Hanein et al. , 1998), allowed us to build a complete atomic model of the crosslinking molecule (foreground, color scheme as in surface representation of the array). • From Structural Bioinformatics 2005 p 124 PHAR 201 Lecture 3 2012 35
Example EM Result • • Example for a combination of high resolution structural information from X ray crystallography and medium resolution information from electron cryomicroscopy (here 2. 1 nm). Actin and myosin were docked into helical reconstructions of actin decorated with smooth muscle myosin (Volkmann et al. , 2000). Interaction of myosin with filamen tous actin has been im plicated in a variety of biological activities including muscle contraction, cytokinesis, cell movement, membrane transport, and certain sig nal transduction pathways. Attempts to crystallize actomyosin failed due to the tendency of actin to polymerize. Docking was performed using a global search with a density correlation measure (Volkmann and Hanein, 1999). The estimated accuracy of the fit is 0. 22 nm in the myosin portion and 0. 18 nm in the actin portion. One actin molecule is shown on the left as a molecular sur face representation. The yellow area de notes the largest hydrophobic patch on the exposed surface of the filament, a region expected to participate in actomyosin interactions. The fitted atomic model of my osin is shown on the right. The trans par ent envelope repre sents the density correspond ing to myosin in the 3 D reconstruc tion. The solution set concept (see text) was used to evaluate the results and to assign probabilities for residues to take part in the interaction. The tone of red on the myosin model is proportional to this statistically evaluated probability (the more red, the higher the prob ability). From Structural Bioinformatics 2005 p 127 PHAR 201 Lecture 3 2012 36
Small angle X ray Scattering SAXS http: //en. wikipedia. org/wiki/Small angle_X ray_scattering • Reveals shape and size of macromolecules in the range 5 25 nm • Handles partially ordered systems • No need for crystalline sample; larger molecules than NMR, but at lower resolution • Leading to hybrid techniques PHAR 201 Lecture 3 2012 37
Summary Regarding Data Limitations • • • Pay attention to the method its pluses and minuses Be aware of models Be aware of the general limitations of each method For NMR be aware of an ensemble of structures Be aware of hybrid models For all methods be aware of the parameters that govern the accuracy • You will need to know these limitations for just about any bioinformatics study since it will be necessary to choose a non redundant set (NR) – we will visit Astral and Pisces which are tools in defining an NR set PHAR 201 Lecture 3 2012 38