Скачать презентацию Protein Sequencing Research Group Results of the PSRG Скачать презентацию Protein Sequencing Research Group Results of the PSRG

3e447851133153847107d6819d081e73.ppt

  • Количество слайдов: 51

Protein Sequencing Research Group: Results of the PSRG 2011 Study Sensitivity assessment of Edman Protein Sequencing Research Group: Results of the PSRG 2011 Study Sensitivity assessment of Edman and Mass Spectrometric Terminal Sequencing of an undisclosed protein

Current PSRG Members n n n n n Jim Walters (Chair) J. Steve Smith Current PSRG Members n n n n n Jim Walters (Chair) J. Steve Smith Wendy Sandoval Kwasi Mawuenyega Bosong Xiang Detlev Suckau* Henriette Remmer* Viswanatham Katta* Peter Hunziker (ad hoc) Jack Simpson (EB liaison) * new members added in 2010 Sigma-Aldrich University of Texas Medical Branch at Galveston Genentech, Inc. Washington University School of Medicine Monsanto, Co. Bruker Daltonics University of Michigan Genentech, Inc University of Zurich SAIC/National Cancer Institute at Frederick

PSRG – Review n 2009 Study – what techniques are complimentary to Edman – PSRG – Review n 2009 Study – what techniques are complimentary to Edman – two samples n Edman remains reliable n MS based Top down techniques performed well with great promise and bottom-techniques successful when prior knowledge of sequence or reliable database information is available n 2010 Study – follow on from 2009 using an antibody q It was necessary for ISD participants to use T 3 sequencing to obtain true terminal information q Edman analyses required deblocking of the heavy chain q The most complete de novo sequencing was obtained by bottom up participants n Status: Edman sequencing and mass spectrometry based techniques have varied strengths and weaknesses depending on several experimental factors and both have a role in biochemical research n 2010 Notables n Second year as PSRG and 3 rd year for non-Edman participants n Three new members added n With a complimentary role realized, we attempt to push the capabilities of the varied sequencing techniques, namely assay sensitivity

2011 PSRG Study Timeline PSRG committee adds three new members Oct ‘ 10 Aug 2011 PSRG Study Timeline PSRG committee adds three new members Oct ‘ 10 Aug ‘ 10 Apr ‘ 10 Study Proposal sent to EB Samples sent to participants Settled on a designer protein not in a database Data analysis Feb ‘ 11 Jan ‘ 11 explored different potential study samples for returning data 2011 Study announcement Jun ‘ 10 Mar ‘ 10 ABRF 2010 ABRF Extended deadline 2011 Sep May ‘ 10 Discussed ideas for 2011 study. Agreement upon a study design

PSRG 2011 Study Objective To obtain terminal sequence information on varying amounts of a PSRG 2011 Study Objective To obtain terminal sequence information on varying amounts of a protein sample who’s sequence was not in a database

2011 Study Design – The Sample Sets n n Participants chose which of three 2011 Study Design – The Sample Sets n n Participants chose which of three sample sets they wanted to analyze (designated A, B or C) Each sample set contained three tubes (designated 1, 2 or 3). n Each tube contains the same recombinant protein with increasing amounts of material n Participants could request any single set (received 3 tubes), two sets (6 tubes), or all three sets (9 tubes)

The Protein Sample n recombinant protein n expressed in an E. coli system n The Protein Sample n recombinant protein n expressed in an E. coli system n molecular weight ~50 k. Da n amino acid sequence of the protein is not in public domain database n sample was donated in liquid formulation in buffer n purified and AAA quantified

Sample Preparation and Distribution n Expressed protein purified using C-terminal His tag then by Sample Preparation and Distribution n Expressed protein purified using C-terminal His tag then by size exclusion chromatography and confirmed by SDS -PAGE. n protein containing fractions were quantified by AAA n dispensed into 1. 5 m. L tubes and lyophilized n dried samples were shipped as is, referred to as Set A. n or samples were resuspended and run on a gel (Set B) or pvdf membrane (Set C) and the gel/membrane slices corresponding to the ~50 k. Da band were sent to participants. A - lyophilized n the tube with lowest sample amount contains ~ 5 pmol dried, loaded on gel, or blotted on membrane B – in gel samples C – membrane

Requests of participants n Analyze samples in the designated numerical order or from lowest Requests of participants n Analyze samples in the designated numerical order or from lowest sample amount to highest and report on all samples analyzed n Edman sequencing: participants to provide amino acid yield data at every cycle n Alternative (MS based) methods: asked participants to provide the raw data files and peak lists, and method used for sequence assignment n instructed not to split sample due to the objective of the study and relatively low sample amounts n potential presence of a co-purified E. coli protein at <20 k. Da in Sample Set A is known, but of no interest to current study. n suggested buffers to use to dissolve Sample Set A (lyophilized samples). q 0. 1 %TFA q 25 m. M ammonium bicarbonate q 0. 1% TFA / 20% acetonitrile n Participants asked to fill out a survey and all survey and raw data was submitted anonymously

2011 PSRG Study Sample Set Requests 2011 PSRG Study Sample Set Requests

Survey response results (18 out of 38 Labs filled out a survey) Survey response results (18 out of 38 Labs filled out a survey)

Survey response results Survey response results

N-Terminal Techniques: Edman Degradation N-Terminal Techniques: Edman Degradation

Uses of Edman Sequencing n n n Cleavage site determination for proteases Sequencing of Uses of Edman Sequencing n n n Cleavage site determination for proteases Sequencing of MHC peptides Sequencing of synthetic peptide libraries Full characterization of proteins, especially recombinant proteins, that are present in large quantities Stoichiometry, Edman is semi-quantitative Protein identification for non-model organisms which do not have extensive DNA sequencing Domain mapping Confirmation of N-terminus As a help for mass spectrometry sequencing to perform manual subtractions Product characterization for SOPs for pharma Can distinguish between the isobaric amino acids Leucine and Isoleucine Clonality determination or antibody sequencing for cloning Adapted from: ESRG Presentation: ABRF 2005

Edman Workflows PSRG 2011 Sample Direct sequence ABI Procise Instruments: 7 - 494 HT’s Edman Workflows PSRG 2011 Sample Direct sequence ABI Procise Instruments: 7 - 494 HT’s 2 - 494 c. LC Maximum # of correct calls from N-terminus reported Sample Set Sample Format A Sample Amount (pmols) 5 15 45 Solution (lyophilized) 24 32 49 B Gel slice N/A 9* N/A C Membrane piece 26 33 33 * no supporting data provided

Summary of Edman Data Summary of Edman Data

Sample Sets A and C: N-terminal residues identified Sample Sets A and C: N-terminal residues identified

Does increasing amount of sample increase calls? Data trends toward longer reads as function Does increasing amount of sample increase calls? Data trends toward longer reads as function of increased sample amount

Edman degradation sample solubility Sample recovery was best when organic solvent was utilized. Other Edman degradation sample solubility Sample recovery was best when organic solvent was utilized. Other solvents have been shown to be OK as well, data not shown.

PSRG 2011 Edman Conclusions & Observations Edman sequencing allows for direct determination of the PSRG 2011 Edman Conclusions & Observations Edman sequencing allows for direct determination of the protein’s N-terminal sequence. n Reliable N-terminal Edman data was obtained from the lowest concentration (5 pmol) samples for both Sample sets A and C. n Generally, slightly longer read lengths were noticed as sample concentration increased. n Sequencing preview and lag became more evident as sample concentration increased. n Contaminating proteins in the sample did not contribute negatively to any Edman result. q q Sample A: concentration of contaminating protein was too low to be detected. Sample C: sample was “isolated” by running the gel prior to blotting. n No C-terminal data was produced with Edman. n One lab returned N-terminal data from Set B (gel slice). Did not provide supporting data.

N-Terminal Techniques Overview: Bottom-Up MS Techniques Enzymatic Digestion N-Terminal Techniques Overview: Bottom-Up MS Techniques Enzymatic Digestion

Uses of Bottom-up Sequencing n n n n n Protein identification via sequencing of Uses of Bottom-up Sequencing n n n n n Protein identification via sequencing of unique (internal) peptides and subsequent database search Biomarker discovery A high degree of sequence coverage can be achieved by utilizing different proteases for digestion and combining results Identification and localization of Post-translational Modifications Identification and localizations of introduced protein modifications, e. g. cross linkers Estimation of relative quantities of like proteins between samples via spectral counting Confirmation of the complete protein sequence De-novo elucidation of complete protein sequences Elucidation of the N-and C-terminus with limitations (multiple enzymes or labeling strategies) PSRG Presentation: ABRF 2011

Bottom-Up MS Experimental – LC-MS Systems All Labs used LC separation prior to peptide Bottom-Up MS Experimental – LC-MS Systems All Labs used LC separation prior to peptide analysis. Eksigent Nano. LC-2 D AB Sciex 4800 Thermo LTQ XL - 2 Thermo LTQ-Orbitrap Velos - 2 Bruker Ultraflex TOF/TOF

Bottom up Sample Preparation 909. 34 518. 27 631. 36 274. 30 PSRG 2011 Bottom up Sample Preparation 909. 34 518. 27 631. 36 274. 30 PSRG 2011 Sample 389. 23 840. 14 525. 30 794. 34 437. 01 507. 89 548. 38 939. 12 725. 28 205. 06 nd Sa 100 m. M Am. Bi. C 10 m. M Am. Bi. C 891. 38 596. 10 578. 15 175. 25 320. 14 256. 01 316. 13 215. 14 M 200 250 300 386. 13 350 482. 91 402. 96 440. 89 455. 25 400 450 500 550 600 679. 19 707. 15 728. 67 661. 30 742. 47 650 m/z 700 750 822. 11 800 872. 45 850 900 967. 971001. 43 1041. 61 1085. 46 1000 1050 1100 950 S /M MS Digestion Enzymes 1 lab did Trypsin alone Multiple enzymes Trypsin, Glu-C, Lys-C Trypsin, Glu-C 2 Trypsin, Chymotrypsin Lys-C, Lys-N 2 MASCOT 3 manual Data Explorer (AB) Manual De. Novo Mascot PEAKS 5. 2 in house analysis software

Bottom up results Bottom up results

Bottom up Strategies – Lys-C/Lys-N digest: n A Novel Method for Analyzing Protein Terminals. Bottom up Strategies – Lys-C/Lys-N digest: n A Novel Method for Analyzing Protein Terminals. n Kishimoto et. al. , ASMS 2010 TP 08 n Straightforward ladder sequencing of peptides using a lys-N metalloendopeptidase n Taouatas et. al. , NATURE METHODS. VOL. 5 NO. 5. , p 405 -407, 2008 Lys-N Vendors: Associates of Cape Cod, East Falmouth, MA, Seikagaku KK, Japan n n Proteome-wide analysis of protein carboxy termini: C terminomics. NATURE METHODS. VOL. 7 NO. 7. p 508 -511, 2010 PSRG 03

Comparison of N-terminal protein sequence of and Lys-C and Lys-N Lys-C Lys-N generates the Comparison of N-terminal protein sequence of and Lys-C and Lys-N Lys-C Lys-N generates the same Nterminal peptide as Lys-C, except there is no lysine in the sequence for the Lys-N peptide. Lys-N PSRG 03

Bottom up Strategies – Lys-C/Lys-N digest Lys-N generates peptides with same m/z as Lys-C. Bottom up Strategies – Lys-C/Lys-N digest Lys-N generates peptides with same m/z as Lys-C. Exception 1. no lysine in N-terminal peptide using Lys-N Exception 2. No lysine in C-terminal Peptide using Lys-C PSRG 03

C-terminal MS 1 spectra from Lys-C digest PSRG 03 C-terminal MS 1 spectra from Lys-C digest PSRG 03

C-terminal peptide spectra and de novo sequencing PSRG 03 C-terminal peptide spectra and de novo sequencing PSRG 03

C-terminal peptide spectra and de novo sequencing PSRG 03 C-terminal peptide spectra and de novo sequencing PSRG 03

Combining Edman and enzymatic digestion using Trypsin and Glu C to identify N-term (Part Combining Edman and enzymatic digestion using Trypsin and Glu C to identify N-term (Part #40) Sequence Calls using Edman on Sample C 3: GALRVFDEFKPLVEEPQNLIRVFDEFKPLVKPE MS/MS Data using 4700 Participant 009

Bottom-Up Conclusions Bottom up analysis involves enzymatic or chemical cleavage of the protein followed Bottom-Up Conclusions Bottom up analysis involves enzymatic or chemical cleavage of the protein followed by MS/MS analysis of the peptide mixture. n Small (6 -25 aa) fragments are generated that usually do not cover the complete protein sequence and may not include the terminal fragments. n Successful bottom up analyses utilized multiple enzymes and relied heavily on bioinformatics or manual data interpretation n Successful calling the N-terminus and C-terminus using lyophilized sample, 15 pmols n Successful calling C-terminus using in-gel sample, 15 pmol n MALDI and ESI show success as well as Orbitraps and TOF/TOF n Difficulty in assigning true N-terminal peptides however can used in complimentary fashion with Edman or dedicated chemistry to elucidate terminal peptides

N-Terminal Techniques Overview: Top-Down MS In-Source Decay Fragmentation N-Terminal Techniques Overview: Top-Down MS In-Source Decay Fragmentation

In-Source Decay (MALDI-ISD) MALDI-MS and MS/MS • Analyte + matrix on metal target plate In-Source Decay (MALDI-ISD) MALDI-MS and MS/MS • Analyte + matrix on metal target plate • Spot is excited with laser, ionization occurs • Ions are resolved by mass in TOF analyzer • Second TOF allows for MS/MS by precursor ion fragmentation MALDI-ISD • “pseudo-MS/MS” technique • Decomposition of protein in the MALDI plume at

ISD and T 3 Sequencing Suckau & Resemann, Anal Chem, Vol. 75, 21 (2003) ISD and T 3 Sequencing Suckau & Resemann, Anal Chem, Vol. 75, 21 (2003)

Uses of MALDI-Top-Down Sequencing (ISD) n n n Confirmation of N-terminus, even if modified Uses of MALDI-Top-Down Sequencing (ISD) n n n Confirmation of N-terminus, even if modified (pyro. Glu, Methyl, Acetyl, …) Confirmation of C terminus (terminal read length up to 80 residues) Protein identification from low complexity mixtures Biopharma: protein termini QC, side products elucidation (terminal truncations or elongations) Fusion site confirmation in recombinant proteins Proteolytic degradation product assignment PTM elucidation; modification sites and types, PEGylation sites Enzyme specificity testing on protein fragments (e. g. Kinase phosphorylation sites determination) Full characterization of proteins that are present in large quantities Full de novo sequencing capability up to ~15 k. Da Domain mapping Identification of ragged termini PSRG Presentation: ABRF 2011

ISD Experimental attempts Separation Sample ISD Instrumentation 0. 1% TFA 20% ACN/0. 1% TFA ISD Experimental attempts Separation Sample ISD Instrumentation 0. 1% TFA 20% ACN/0. 1% TFA C 4 ziptip Bruker Ultraflex Matrix DAN 1, 5 -diaminonapthalene Clean-Up DHB 2, 5 -dihydrobenzioc acid Chloroform-methanol precipitation Recon in 0. 1%TFA AB Sciex 4800

Study Preparation: Cl-Me. OH prec. ISD manual data analysis [PE] K/Q 1845. 3 (G) Study Preparation: Cl-Me. OH prec. ISD manual data analysis [PE] K/Q 1845. 3 (G) 1767. 1 10 0 899. 0 1117. 2 1335. 4 1973. 3 E V 721. 4 1863. 2 I/L MS/MS on 1619 b 7 1553. 6 1901. 3 1619. 1 1156. 6 1091. 7 1110. 7 50 1037. 7 1041. 6 1057. 6 60 [PK] 907. 6 927. 5 936. 5 954. 6 978. 6 995. 6 1010. 7 % Intensity 70 1562. 1 F 80 1254. 8 1277. 9 90 1490. 1 905. 6 100 1052. 7 4700 Reflector Spec #1 MC=>BC=>SM 5[BP = 1052. 7, 721] 1771. 8 1990. 0 Mass (m/z) 217. 7 2600 3850 Mass (m/z) 4475 4995. 3 4593. 8 4482. 1 4335. 0 3983. 8 3842. 6 3899. 7 3742. 5 3225 y 11(? ) y 10(? ) b 5 3109. 1 2862. 1 2715. 9 b 4 2751. 0 2408. 72412. 7 2469. 7 2524. 8 2568. 8 2636. 9 2283. 6 2168. 5 2200. 5 N I/L V R F R V 10 5 0 1975 b 10 b 8 4125. 9 4196. 9 4253. 0 2313. 6 2087. 4 75 70 65 60 55 50 45 40 35 30 (N-terminal seq obtained from Edman analysis) Red seq from ISD analysis 2011. 3 2057. 4 % Intensity G Reflector K P L V E E 4700 A L R V F D E FSpec #1 MC=>BC=>SM 5[BP = 1052. 7, 721] 5100

Summary of Top Down Analysis n None of the participants or PSRG succeeded in Summary of Top Down Analysis n None of the participants or PSRG succeeded in obtaining terminal sequences using ISD from study samples – other Top-Down methods were not attempted (ECD, ETD, …) n All participants did the routine things, but typical sample issues likely hindered analysis n Potential Reasons q Solubility - only a fraction of sample is recovered q Sample amount over estimated by traditional quantitative methods – less provided than presumed q Protein contamination has significant effect in Top-Down: problem and potential! q Limited sample availability: no investigation of problem, no optimization possible (intact MW, purity, solubility. . )

Protein LC-separation of 100 pmol sample Pepswift PS-DVB (monolithic column) 100 pmol Casein Result: Protein LC-separation of 100 pmol sample Pepswift PS-DVB (monolithic column) 100 pmol Casein Result: • Several proteins present, • Much less protein available to the analysis than anticipated by original protein quantification • ~ 5 -10 pmol instead of 100 pmol study sample

Monolithic LC separation of Lyopholized sample Protein of interest Theoretical amount of 100 pmol Monolithic LC separation of Lyopholized sample Protein of interest Theoretical amount of 100 pmol Reveals the presence of several proteins

ISD of Fraction 75 contains study sample: Matches sequence, but NOT de novo ISD of Fraction 75 contains study sample: Matches sequence, but NOT de novo

ISD of Fraction 36 +Mascot: 30 S ribosomal protein S 15 E. coli ISD of Fraction 36 +Mascot: 30 S ribosomal protein S 15 E. coli

ISD of Fraction 32 +Mascot: YOBA_ECOLI Fragment 27 -84 ISD of Fraction 32 +Mascot: YOBA_ECOLI Fragment 27 -84

ISD of Fraction 47 +Mascot: HFQ_SERP 5 N-term only (homolog to E. coli? ? ISD of Fraction 47 +Mascot: HFQ_SERP 5 N-term only (homolog to E. coli? ? )

Summary on MALDI-ISD study follow-up work n Expected ~50 k. Da protein present plus Summary on MALDI-ISD study follow-up work n Expected ~50 k. Da protein present plus contamination in the 16 k. Da range n De novo sequencing was not possible due to sample amount restrictions n Protein LC-MALDI analysis showed only ~ 5 -10 % of expected protein is available after separation n Multiple labs observed poor recovery from reverse phase columns n Protein LC-MALDI-ISD analysis theoretically starting with 100 pmols of sample q 49 N-term and 56 C-term matches – not de novo – as sample amount was much lower than thought q IDs of several bacterial Heat Shock Proteins after ISD-Mascot analysis

Comments…’but not enough time’ n I had planned to isolate/capture N-terminus but did not Comments…’but not enough time’ n I had planned to isolate/capture N-terminus but did not due to lack of time n Be more clear in instructions and allow much more time between sample arrival and data submission so that if extensive preparation is necessary, there will be time enough to perform it without affecting standard samples sequenced in the lab n Very nice setup; but I needed more time to take full advantage. As my ISD ambitions failed (!!) I turned to proteolytic digestions and PSD: Performed a lot of bottom up analyses, mainly after sulfonation… n Sorry, I did not have time to properly analyze the data and to do the experiment as if it would have to be done

Comments (continued) n did not spend time to purify or evaluate low level sequences Comments (continued) n did not spend time to purify or evaluate low level sequences by MS. . . Instructions were somewhat confusing. Not clear if the sample needed purification before Edman n Thanks! …even though we have de novo software we do NOT have a good strategy for obtaining sequence and determining N and C termini…Also, we identified quite a few peptides that likely weren't N-terminal or C-terminal…using other enzymes and finding overlapping sequences would have been a better strategy n I wouldn't mind trying another of these after I see how to approach it n I will be very interested in seeing the results of the mass spec analysis of these samples to which I do not have access…would like to see the comparison n It was very tough one to get the whole sequence even though it was not the goal n Sample has a ragged N-terminal sequence. . . Samples A 1 to A 3 were solublized in 01. % TFA and blotted but no sequence was observed…suggesting that no protein was in the tube or that it was insoluble in 0. 1% TFA. n Challenging but good.

Final conclusions n Two techniques were successfully employed in this study to obtain N-terminal Final conclusions n Two techniques were successfully employed in this study to obtain N-terminal sequence of an undisclosed protein not present in public databases. q Edman Degradation – lowest sample amounts of Samples A and C q Enzymatic Digestions – 15 pmols sample amounts of Sample A and B n For Edman, slightly longer read lengths were noticed as sample concentration increased, however, sequencing preview and lag became more evident. n De novo Bottom-up was not successful unless a priori knowledge of sequence was obtained (by Edman, database…etc). There are strategies which can be successful however the current strategies have limitations. n For Top Down, not successful in obtaining terminal sequences using ISD from study samples – other Top-Down methods were not attempted. q Likely reasons: poor recovery due to solubility, hindering impurities, Ionization, etc. n Top down was able to obtain sequence in 100 pmol sample using protein LC and MALDI-ISD strategy as long as theoretical sequence was utilized. n Time is of the essence – for committee to appropriately design and develop study and for participants to be able to properly analyze samples.

Acknowledgements n Robert English - University of Texas, Medical Branch at Galveston q Accumulation Acknowledgements n Robert English - University of Texas, Medical Branch at Galveston q Accumulation & annonimization of data n Shantanu Roychowdhury - Sigma-Aldrich q Expressed and purified protein n Anja Resemann - Bruker Daltonics q LC MALDI ISD and Top Down work n Jack Simpson and the rest of ABRF Executive Board q For support and scrutiny of study proposal n Participating labs!!!!!!!