The name of the game: Palaeoproteomics and radiocarbon dates further refine the presence and dispersal of caprines in Eastern and Southern Africa
Data files
Nov 09, 2023 version files 109.40 KB
-
COL1_African_Bovidae_ref_Janzen_LeMeillour.fasta
97.97 KB
-
README.md
11.44 KB
Abstract
We report the first large-scale palaeoproteomics research on eastern and southern African zooarchaeological samples, thereby refining our understanding of early caprine (sheep and goat) pastoralism in Africa. Assessing caprine introductions is a complicated task because of their skeletal similarity to endemic wild bovid species and the sparse and fragmentary state of relevant archaeological remains. Palaeoproteomics has previously proved effective in clarifying species attributions in African zooarchaeological materials, but few comparative protein sequences of wild bovid species have been available. Using newly generated collagen type I sequences for wild species, as well as previously published sequences, we assess species attributions for elements originally identified as caprine or “unidentifiable bovid” from seventeen eastern and southern African sites that span seven millennia. We identified over 70% of the archaeological remains and the direct radiocarbon dating of domesticate specimens allows refinement of the chronology of caprine presence in both African regions. These results thus confirm earlier occurrences in eastern Africa and the systematic association of domesticated caprines with wild bovids at all archaeological sites. The combined biomolecular approach highlights repeatability and accuracy of the methods for conclusive contribution in species attribution of archaeological remains in dry African environments.
“Modern” and archaeological samples protein extraction
Both reference specimen samples and archaeological samples underwent the same preparation protocol for protein extraction. Briefly, ten to twenty milligrams of bone or tooth powder were sampled using an ethanol clean diamond drill (Supplementary Table 2). The archaeological specimens from Wakarida were mandibles with their teeth and were thus sampled twice to assess the best tissue for sampling. For these, we took one bone sample and another dentine sample, for which organic preservation was assessed using the method described in Lebon et al. (2016). Only the best-preserved sample according to the threshold discussed in Le Meillour et al. (2018) is presented here (Supplementary Table 2).
After sampling, bone powders were placed in protein LoBind 2mL tubes (Eppendorf, Germany). We followed the protocol for African remains, which is appropriate for the extraction and characterisation of proteins from remains recovered from arid environments (Le Meillour et al., 2020). Bone or tooth powders were decalcified in a neutral pH buffer (tris(hydroxymethyl)aminomethane (Tris) 0.05 M and ethylenediaminetetraacetic acid (EDTA) 0.5 M, final pH 7.4, Sigma Aldrich, Germany) for 1-8 days, depending on the sample. Solutions were first replaced every ½ day, then every 24 hours. When completely decalcified (i.e., until only a collagen “phantom” remained), the pellets were rinsed 5 times with milliQ water. The extracted proteins were solubilized in 50 mM ammonium bicarbonate (ABC, pH 8, Sigma Aldrich) for 3 hours at 65 °C. Solutions were then centrifuged for 10 min at 3000 g and collected into new microtubes. Extracted proteins were finally reduced with dithiothreitol (DTT, final concentration 10 mM, 20 min, 56 °C, 350 rpm), alkylated with iodoacetamide (final concentration 10 mM, 30 min, in the dark, at room temperature) and hydrolysed using trypsin (0.01 μg/μL, Trypsin Gold, Promega). Finally, samples were transferred into clean vials and kept at -20°C until mass spectrometry analyses.
Mass spectrometry - UHPLC-MS/MS
Bovid species digested extracts and archaeological samples were analysed independently by UHPLC-MS/MS using a workflow described previously (Le Meillour et al., 2020). Separation was performed on an Ultimate 3000-RSLC system (Thermo Fisher Scientific) with a RSLC Polar Advantage II Acclaim column (2.1 × 100 mm, 120 Å, 2.2 μm) using a flow rate of 300 μL/min and mobile phase gradient of A: H2O + formic acid (FA) 0.1% and B: acetonitrile + FA 0.08%. We used a high-resolution ESI-Q-TOF mass spectrometer (Maxis II ETD, Bruker Daltonics) in positive mode and data-dependent auto-MS/MS mode on the m/z range 200-2200. MS/MS spectra were generated using collision induced dissociation by selection of ions with charge states between 2+ and 5+ and on m/z range 300-2200. Calibration was carried out for each run with sodium formate clusters.
Type 1 collagen sequences de novo reconstruction
The raw data were converted to .mgf using the mass spectrometer manufacturer software, DataAnalysis (version 4.4, Bruker Daltonics). The sequences were reconstructed using a database assisted tool using the Byonic workflow C57 (Byologic, Protein Metrics). The restricted database used for the search contained COL1A1 and COL1A2 sequences of the following bovid species: Bos taurus (P02453 and P02465), B. mutus (L8IV51 and L8HQF7), B. indicus (A0A4W2FAL4 and A0A4W2FTM9), C. hircus (A0A452FHU9 and A0A452G3V6), O. aries (W5P481 and W5NTT7) and Pantholops hogsonii (XM_005964647.1 and XM_005985683.1). The referenced sequences can be found on NCBI and UniProt repositories and correspond to entire translated protein sequences, including signal peptide and tropocollagen. Then, in order to reconstruct only the secreted protein sequences, we excluded these parts of the sequence from further analyses. When referring to amino acid positions, we use the sheep reference protein sequences. The alpha 1 chain then starts (further first position in Supplementary Information 2) at position 170 (Q) and stops at position 1229 (K) in the UniProt reference (W5P481). Similarly, the alpha 2 chain then starts at position 80 (Q) and stops at position 1117 (A) in the UniProt reference (W5NTT7). The software parameters were set to 1% FDR, and post-translational modifications as follows: we allowed 3 tryptic missed cleavages; carbamidomethylation of cysteines was set as fixed modification; deamidation of N and Q, Gln to pyro-Glu (N term Q), phosphorylation of S and T and oxidation of M and P were set as variables modifications, with a maximum of 5 modifications allowed for one peptide. In the Byonics workflow, proline oxidations (HyP) were considered as “common modification” and every amino acid substitution was included in the “rare modification” list. Once results were obtained, peptides with potential single amino acid substitution (SAPs) were manually assessed by verifying MS/MS spectra with at least 2 peptide spectral matches (PSMs) in each of the two samples’ specimens for one species. For each species-specific peptide detected in a sample, we verified that the alternate sequence was absent by assessing PSMs manually (Supplementary Data 2). We confirmed every sequence by aligning them using the Geneious Prime® software (v. 2023.0.4) before building the final fasta file (Supplementary Information 2).
Archaeological samples identification using type I collagen sequences
Since Janzen et al. (2021) published COL1 sequences of some of the species we included in our initial dataset, the “overlapping” species sequences were excluded from this paper. Here we present data from only 9 species (compared to the 19 initially sampled for reference purposes): Ammordorcas clarkei, Eudorcas rufifrons, Gazella dorcas, Nanger dama, N. soemmeringi, Ammotragus lervia, Capra nubiana, Addax nasomaculatus and Pelea capreolus. All other species COL1 sequences were taken from Janzen et al. (2021). All archaeological samples were searched using the MaxQuant software (v.2.1.3.0, Cox and Mann, 2008) against the updated database of bovid species (Supplementary Information 2), and using the following parameters: trypsin allowed missed cleavages was set on 3; carbamidomethylation of cysteines was set as fixed modification, and deamidation (N,Q), Gln to pyro-Glu (N term Q), phosphorylation (S, T) and oxidation (M, P) as variable modifications; mass tolerances were set to 10 ppm for precursor and 0.02 Da for fragment ions; all other parameters were left as default. We considered species identification confident if at least 2 razor and unique peptides from non-overlapping parts of the sequence were covered by inspecting the evidence file provided after the MaxQuant search (Supplementary Data 3 and 4). In addition, we performed manual assessment of the species-specific peptides spectra.
MaxQuant 2.3.0 (open source)