7f965732539348f0390edec9402c2da0.ppt
- Количество слайдов: 46
Protein Identification by Database Searching John Cottrell Matrix Science
Three ways to use mass spectrometry data for protein identification 1. Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein Protein Identification by Database Searching
Protein Identification by Database Searching
Protein Identification by Database Searching
PMF Servers on the Web ASCQ_ME: https: //www. genopole-lille. fr/logiciel/ascq_me/ Bupid: http: //zlab. bu. edu/Amemee/ Mascot: http: //www. matrixscience. com/search_form_select. html Mass. Search: http: //www. cbrg. ethz. ch/services/Mass. Search_new MS-Fit (Protein Prospector): http: //prospector. ucsf. edu/prospector/mshome. htm Pep. MAPPER: http: //www. nwsr. manchester. ac. uk/mapper/ Profound (Prowl): http: //prowl. rockefeller. edu/prowlcgi/profound. exe Mowse, Peptide. Search, Protocall, Aldente, XProteo Protein Identification by Database Searching
Search Parameters • database • taxonomy • enzyme • missed cleavages • fixed modifications • variable modifications • protein MW • estimated mass measurement error Protein Identification by Database Searching
Protein Identification by Database Searching
ØHenzel, W. J. , Watanabe, C. , Stults, J. T. , JASMS 2003, 14, 931 -942. Protein Identification by Database Searching
Peptide Mass Fingerprint Fast, simple analysis High sensitivity Need database of protein sequences • not ESTs or genomic DNA Sequence must be present in database • or close homolog Not good for mixtures • especially a minor component. Protein Identification by Database Searching
x 3 y 3 z 3 R 1 O x 2 y 2 z 2 R 2 O x 1 y 1 z 1 R 3 O H+ R 4 O H – N – C – C – OH H a 1 b 1 c 1 H H a 2 b 2 c 2 H H H a 3 b 3 c 3 ØRoepstorff, P. and Fohlman, J. (1984). Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom 11, 601. Protein Identification by Database Searching
Three ways to use mass spectrometry data for protein identification 1. Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein 2. Sequence Query Mass values combined with amino acid sequence or composition data Protein Identification by Database Searching
ØMann, M. and Wilm, M. , Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66 4390 -9 (1994). Protein Identification by Database Searching
1489. 430 tag(650. 213, GWSV, 1079. 335) Protein Identification by Database Searching
Sequence Tag Servers on the Web Mascot • http: //www. matrixscience. com/search_form_select. html MS-Seq (Protein Prospector) • http: //prospector. ucsf. edu/prospector/mshome. htm Multi. Ident (Tag. Ident, etc. ) • http: //www. expasy. org/tools/multiident/ Peptide. Search, Spider Protein Identification by Database Searching
Protein Identification by Database Searching
Protein Identification by Database Searching
Sequence Tag Rapid search times • Essentially a filter Error tolerant • Match peptide with unknown modification or SNP Requires interpretation of spectrum • Usually manual, hence not high throughput Tag has to be called correctly • Although ambiguity is OK 2060. 78 tag(977. 4, [Q|K][Q|K]EE, 1619. 7). Protein Identification by Database Searching
Three ways to use mass spectrometry data for protein identification 1. Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein 2. Sequence Query Mass values combined with amino acid sequence or composition data 3. MS/MS Ions Search Uninterpreted MS/MS data from a single peptide or from a complete LC-MS/MS run Protein Identification by Database Searching
SEQUEST ØEng, J. K. , Mc. Cormack, A. L. and Yates, J. R. , 3 rd. , An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5 976 -89 (1994) Protein Identification by Database Searching
MS/MS Ions Search Servers on the Web Inspect http: //proteomics. ucsd. edu/Live. Search/ Mascot http: //www. matrixscience. com/search_form_select. html MS-Tag (Protein Prospector) http: //prospector. ucsf. edu/prospector/mshome. htm Omssa http: //pubchem. ncbi. nlm. nih. gov/omssa/index. htm Pep. Frag (Prowl) http: //prowl. rockefeller. edu/prowl/pepfrag. html Pep. Probe http: //bart. scripps. edu/public/search/pep_probe/search. jsp RAId_Db. S http: //www. ncbi. nlm. nih. gov/CBBResearch/qmbp/RAId_Db. S/inde x. html Sonar (Knexus) http: //hs 2. proteome. ca/prowl/knexus. html X!Tandem (The GPM) http: //thegpm. org/TANDEM/index. html Not on-line Byonic, Crux, greylag, Mass. Matrix, Myrimatch, Paragon, Peaks, Pep. Splice, p. Find, Phenyx, Prob. ID, Pro. Lu. CID, Protein. Lynx GS, Sequest, SIMS, Spectrum. Mill Protein Identification by Database Searching
Protein Identification by Database Searching
Protein Identification by Database Searching
MS/MS Ions Search Easily automated for high throughput Can get matches from marginal data Can be slow No enzyme Many variable modifications Large database Large dataset MS/MS is peptide identification Proteins by inference. Protein Identification by Database Searching
Search Parameters Protein Identification by Database Searching
Search Parameters Sequence Database Protein Identification by Database Searching
Search Parameters Sequence Database • Swiss-Prot (~500, 000 entries) High quality, non-redundant • NCBInr, Uni. Ref 100 (~19, 000 entries) Comprehensive, non-identical • EST databases (>400, 000 entries) Very large and very redundant • Sequences from a single genome A consensus sequence Peptides are lost at exon-intron boundaries (Entry counts are from mid-2012) Protein Identification by Database Searching
Search Parameters Taxonomy Swiss-Prot 2010_08 Mammalia (mammals)=65104 Primates=26940 Homo sapiens (human)=20292 Other primates=6648 Rodentia (Rodents)=25473 Mus. =16358 Mus musculus (house mouse)=16307 Rattus=7533 Other rodentia=1582 Other mammalia=12691 Protein Identification by Database Searching
Search Parameters Mass Tolerances • Most search engines support separate mass tolerances for precursors and fragments • May allow fixed units (Da, mmu) or proportional (ppm, %) • Some search engines can correct for selection of 13 C peak • Unless search engine performs some type of re-calibration, need to provide conservative estimate of mass accuracy, not precision • This doesn’t have to be a guessing game. Run a standard, then look at the error graphs for strong matches Protein Identification by Database Searching
Search Parameters Enzyme can be • Fully specific • Non-specific (“no enzyme”) Some search engines support • Limited number of missed cleavage points • Semi-specific enzymes • Enzyme mixtures Protein Identification by Database Searching
Search Parameters Common peak list formats • DTA (Sequest) • PKL (Masslynx) • MGF (Mascot) • mz. Data (. XML) • mz. ML (. mz. ML) Protein Identification by Database Searching
Search Parameters Modifications • Fixed / static / quantitative modifications cost nothing • Variable / differential / non-quantitative modifications are very expensive Protein Identification by Database Searching
Search Parameters Modifications • Common artefacts Carbamylation +43 N-term, K Urea in digest buffer Deamidation +1 N Low p. H Pyro-glutamic acid -17 Q at N-term Low p. H Pyro-carbamidomethyl or carboxymethyl Cys +40 C at N-term Low p. H, delta is relative to unmodified C Oxidation +16 M (many other residues also) Gels Over alkylation +57 N-term, W Iodacetamide Over alkylation +58 N-term, W Iodoacetic acid Protein Identification by Database Searching
Site Analysis Protein Identification by Database Searching
Site Analysis Protein Identification by Database Searching
Site Analysis Ascore Beausoleil S. A. , et al. (2006) Nat. Biotechnol. 24, 1285– 1292 Max. Quant Cox J. & Mann M. (2008) Nat. Biotechnol. 26, 1367 - 1372 Olsen J. V. , et al. (2006) Cell 127, 635– 48 Inspect MS-Alignment PTMFinder Tanner S. , et al. (2008) J. Proteome Res. 7, 170– 181 Payne S. , et al. (2008) J. Proteome Res. 7, 3373– 3381 Tsur D. , et al. (2005) Nat. Biotechnol. 23, 1562– 1567 Tanner S. , et al. (2005) Anal. Chem. 77, 4626 -4639 Phospho. Score Ruttenberg B. E. , et al. (2008) J. Proteome Res. 7, 3054 -9 Debunker Lu B. , et al. (2007) Anal. Chem. 79, 1301 -10 Slo. Mo - ETD/ECD Bailey C. M. , et al. (2009) J. Proteome Res. 8, 1965 -71 Modifi. Comb Savitski M. M. , et al. (2006) Mol. Cell. Proteomics 5, 935– 48 Delta Score Savitski M. M. , et al. (2010) Mol. Cell. Proteomics mcp. M 110. 003830 Protein Identification by Database Searching
Site Analysis Protein Identification by Database Searching
Multi-pass Searches Implemented under a variety of names X!Tandem: Mascot: Spectrum Mill: Phenyx: Paragon: Model refinement Error tolerant search Search saved hits, homology mode, unassigned single mass gap 2 -rounds Thorough ID, fraglet-taglet Protein Identification by Database Searching
Scoring Total matches Incorrect matches Correct matches Score Protein Identification by Database Searching
Scoring Receiver Operating Characteristic Protein Identification by Database Searching
Sensitivity & Specificity Protein Identification by Database Searching
Sensitivity & Specificity Search a “decoy” database • Decoy entries can be reversed or shuffled or randomised versions of target entries • Decoy entries can be separate database or concatenated to target entries Gives a clear estimate of false discovery rate Protein Identification by Database Searching
Sensitivity & Specificity Total matches Incorrect matches Correct matches Score Protein Identification by Database Searching
Sensitivity & Specificity Protein Identification by Database Searching
Protein Inference General approach is to create a minimal list of proteins. “Principal of parsimony” or “Occam’s razor” Protein A Peptide 1 Protein B Peptide 1 Protein C Protein Identification by Database Searching Peptide 2 Peptide 3 Peptide 2
Further Reading: Exercises: http: //www. msms. com/exercises. html Protein Identification by Database Searching
7f965732539348f0390edec9402c2da0.ppt