Increasing sustainability in palaeoproteomics by optimizing digestion times for large-scale archaeological bone analyses
Data files
Apr 17, 2024 version files 135.48 KB
Abstract
Palaeoproteomic analysis of skeletal proteomes is used to provide taxonomic identifications for an increasing number of archaeological specimens. The success rate depends on a range of taphonomic factors and differences in the extraction protocols employed. By analyzing 12 archaeological bone specimens from two archaeological sites, we demonstrate that reducing digestion duration from 18 to 3 hours has no measurable impact on the obtained taxonomic identifications. Peptide marker recovery, COL1 sequence coverage, or proteome complexity are also not significantly impacted. Although we observe minor differences in sequence coverage and glutamine deamidation, these are not consistent across our dataset. A 6-fold reduction in digestion time reduces electricity consumption, and therefore CO2 emission intensities. We furthermore demonstrate that working in 96-well plates further reduces electricity consumption by 60%, in comparison to individual microtubes. Reducing digestion time therefore has no impact on the taxonomic identifications, while reducing the environmental impact of palaeoproteomic projects.
README: Increasing sustainability in palaeoproteomics by optimizing digestion times for large-scale archaeological bone analyses
https://doi.org/10.5061/dryad.cz8w9gj8j
Description of the data and file structure
Data deposited on Dryad are structured as follows:
- Digestion_time_Datasheet.csv containing all information concerning sample names, experimental information (sampling amount), and the palaeoproteomics methods data tested in this study (ZooMS and SPIN).
- Electicitymeasurement.csv concerning all data gathered during the measurement of electricity consumption of the three digestion times tested in the paper.
- Three folders: Full proteome MQ (txt files generated after the MaxQuant search against Bos taurus full proteome); msd_files_3replicates (.msd files of all LC-MS/MS raw data) and a SPIN MQ (txt files generated after the MaxQuant search against the SPIN database).
- Four R code markdowns with statistical analyses of the paper, figure generation, etc. (Full Proteome.Rmd; Main text figures.Rmd; SPIN.Rmd and ZooMS.Rmd).
Empty cells in the .csv files indicate that no data were recorded or that the corresponding column does not apply.
Sharing/Access information
Data linked to this paper can be found here (for MALDI-MS raw data and associated spectra merging code): https://doi.org/10.5281/zenodo.8290650
and using identifier PXD045027 on the ProteomeXchange data repository (LC-MS/MS raw data and associated MaxQuant searches output files)
Code/Software
After spectral identification, proteomic data analysis was conducted largely through R v.4.1.2 using tidyverse v.1.3.1, seqinr v.4.2-8, ggpubr v.0.4.0, ggdist v.3.3.0, data.table v.1.14.2, ggsci v.2.9, progressr v.0.10.0, gmp v.0.6-6, reshape2 v.1.4.4, stringi v.1.7.6, MALDIquant v.1.2, MALDIquantForeign v.0.13, janitor v.2.2.0, and wesanderson v.0.3.6. The R scripts used for the shotgun proteomics analysis are available under Rüther et al., 2022. Deamidation was quantified based on spectral intensities. Depending on data types, statistics were calculated using two-way ANOVA (Type II and Type III), linear modelling from lmerTest v.3.1-3, lme4 v.1.1-34, MASS v.7.3-60, and Kruskal Wallis tests from carData v.3.0-5, car v.3.1-0, and rstatix v.0.7.2. As prerequisites for ANOVA tests, normal distribution of residuals was checked using the Shapiro-Wilk normality test and homogeneity of the variances was assessed by Levene’s test.
Methods
Six bones from La Draga (Spain, Holocene, samples LD_01 to LD_06) and Bayisha Karst Cave (China, Pleistocene, samples BKC_07 to BKC_12) were sampled for this study. Initial sampling was divided into three sub-samples for the three digestion durations tested here (site code_sample number_3h, site code_sample number_6h, and site code_sample number_18h). Samples were then processed according to the ZooMS protocol: they were demineralised in 0.6 M hydrochloric acid (HCl) for 24 hours. The HCl supernatant was then removed and samples were rinsed thrice in 100 µL ammonium bicarbonate (50 mM, NH4HCO3, hereafter AmBic) for subsequent gelatinisation in a final volume of 100 µL AmBic for one hour at 65°C. Following gelatinisation, the 100 µL of the AmBic solution was transferred to a new microtube, to which 0.8 µg trypsin (Promega) was added for incubation at 37°C, with mild agitation at 300 rpm (VWR, Thermal Shake lite). Digestion occurred for either 3, 6, or 18 hours. To stop trypsin digestion, 2 µL of 5% trifluoroacetic acid (TFA) was added to each sample. The digested extracts were then split into two parts for separate analyses via matrix-assisted laser desorption/ionisation-time of flight mass spectrometry (MALDI-ToF MS) and liquid-chromatography tandem mass spectrometry (LC-MS/MS). To assess any potential contamination by non-endogenous peptides, we performed the extraction of laboratory blanks alongside the samples for each enzymatic digestion condition.
Mass spectrometry analyses
MALDI-ToF MS and ZooMS data analysis
For ZooMS data analysis, before MALDI-ToF MS analysis, peptides were cleaned and desalted using C18 ZipTips (Thermo Fisher) and subsequently spotted in triplicate, consisting of 0.5 µL eluted peptides and 0.5 µL alpha-cyano-4-hydroxycinnamic acid (CHCA) matrix solution, on a 384-well Opti-ToF MALDI plate insert (AB Sciex, Framingham, MA, 01701, USA) and allowed to air-dry at room temperature. MALDI spectra were automatically acquired with an AB SCIEX 5800 MALDI-ToF spectrometer (Framingham, MA, 01701, USA) in positive reflector mode for MS acquisition. Before sample acquisition, an external plate model calibration was achieved on 13 adjacent MS standard spots with a standard peptide mix (Proteomix Peptide calibration mix4, LaserBioLabs, Sophia Antipolis, France) containing bradykinin fragment 1-5 (573.315 Da), human angiotensin II (1046.542 Da), neurotensin (1672.917 Da), ACTH fragment 18-39 (2464.199) and oxidised insulin B chain (3494.651 Da). The concentration in the prepared mixture was between 27 to 167 fmol/µL. The calibration was validated according to the laboratory specifications (resolution above 10000 for 573 Da, 12000 for 1046 Da, and 15 to 25000 for other masses, error tolerance <50ppm). For the spectra where peptides resulting from trypsin autolysis were detected, an internal recalibration was applied to decrease the error tolerance below 10 ppm (trypsin peptides: 842.509 Da, 1045.56 Da, and 2211.104 Da). Laser intensity was set at 50% after optimization of the signal-to-noise ratio on several spots, then operated at up to 3,000 shots accumulated per spot, covering a mass-to-charge range of 1000 to 3500 Da for sample analysis. The triplicate data files were merged in R and converted into .msd files. ZooMS taxonomic identifications were assessed using mMass through manual peptide marker mass identification in comparison to a database of peptide marker series for medium- to large-sized mammals. Glutamine deamidation values were calculated using the Betacalc3 package.
Shotgun proteomics
For SPIN data analysis, peptide extracts were first separated using an Evosep One (Evosep, Odense, Denmark) with the 100 samples-per-day method (cycle of 14.4 min). Loading of samples was conducted at a flow rate of 2 uL/min using mobile phases of A: 5% acetonitrile and 0.1% formic acid in H2O and B: 0.1% formic acid in H2O with a gradient of 11.5 min at 1.5 uL/min. A polymicro flexible fused silica capillary tubing of 150 um inner diameter and 16 cm long home-pulled was packed with C18 bounded silica particles of 1.9 um diameter (ReproSil-Pur, C18-AQ, Dr. Maisch, Germany). The column was mounted on an electrospray source with a column oven set at 60°C with a source voltage of +2000 V, along with an ion transfer tube set at 275°C. An Exploris 480 (Thermo Fisher Scientific) was operating in data-dependent mode consisting of a first MS1 scan at a resolution of 60 000 between m/z of 350 and 1400. The twelve most intense monoisotopic precursors were selected if above 2e5 intensity with a charge state between 2 and 6 and were then dynamically excluded after one appearance with their isotopes (20 ppm) for 20 seconds. The selected peptides were acquired on MS2 at Orbitrap resolving power of 15000, normalised collision energy (HCD) set at 30%, quadrupole isolation width of 1.3 m/z, and first m/z of 120. Quality control was assessed on HeLa cells using QC displayed of 1289 protein groups for 5561 peptides at a repeating sequencing of 2.90% on MaxQuant v.2.2.3.0. The following parameters were used for the search: the raw data were searched against the human full proteome, with carbamidomethyl (C) as fixed modification and oxidation (M) and acetyl (protein N term) as variable; digestion was set as tryptic and all other parameters were kept as default.
MaxQuant search
All .raw files were analysed using MaxQuant (v.2.3.1) in two different searches. The first search was performed as described in Ruther et al., 2022 against the protein sequences database provided there. Variable modifications included oxidation (M), deamidation (NQ), Gln (Q) -> pyro-Glu, Glu (E) -> pyro-Glu, and proline (P) hydroxylation. The internal MaxQuant contaminant list was replaced with an in-house database provided by Ruther et al., 2022 (Supplementary File PR200512_HumanCons.fasta). Since all specimens except for one were identified as belonging to either Bos sp. or Bison sp., a second search was performed against the whole Bos taurus reference proteome (downloaded from Uniprot on 2022-01-20) to explore the presence of other, additional non-collagenous proteins (NCPs). Variable modifications for this search included oxidation (M), deamidation (NQ), and proline (P) hydroxylation. The internal MaxQuant contaminant list was used. Both searches were run in semi-specific Trypsin/P digestion mode. Up to five variable modifications were allowed per peptide and all other settings were left as default for both searches.
Measurement of electricity consumption
A power monitor (Cowell, model no.: PMB01) was placed in between the heating block (VWR, Thermal Shake lite) and the utilised power outlet to measure electricity consumption using either 96-well plates or Eppendorf tubes for 18 hours at 37°C. The measurements for both tubes (1.5 mL Eppendorf Protein LoBind, Eppendorf) and plates (PCR Plate, 96-well, low profile, non-skirted, 0.3 mL, Thermo Fisher Scientific) were separately conducted over the time frame of 18 hours, and replicated thrice in total. Measurements started when the heating block had reached a stable temperature of 37°C. The maximum number of tubes, 40 units, were placed in the heating block with 100µL AmBic in each tube to imitate experiment conditions. Likewise, each well in the 96-well plate was filled with 100 µL AmBic. The emission intensity (gCO2eq; grams of carbon dioxide equivalent) was then calculated by alcesusing the kWh measured and gCO2eq/kWh values available through Electricity Maps for the dates on which our experiments were conducted. The gCO2eq/kWh values were obtained from various countries (Australia, Brazil, Germany, Denmark, France, Japan, the USA, and South Africa). With this selection, we hope to cover a range of countries where high-throughput palaeoproteomics facilities exist. Furthermore, countries differ significantly in the amount of carbon released for each unit of electricity consumed, the so-called carbon intensity, for example, due to the use of nuclear energy or largely completed transitions to wind and solar energy sources. The absolute impact of electricity consumption is therefore very different depending on the country, and our selection of countries aims to also cover this range of carbon intensities. Lastly, emission intensities were calculated for each tube and PCR plate well across the three digestion durations (18h, 6h, and 3h), and for each country included in the study.