Supplementary information from: An extensive archaeological dental calculus dataset spanning 5000 years for ancient human oral microbiome research

Data files

Jul 08, 2025 version files 407.09 MB

README.md

2.94 KB
Supplementary_Material_1_FastQC_reports_raw.zip

82.04 MB
Supplementary_Material_2_FastQC_reports_trimmed.zip

65.54 MB
Supplementary_Material_3_Fastp_reports_raw.zip

15.01 MB
Supplementary_Material_4_mapDamage.zip

244.50 MB

Abstract

Archaeological dental calculus can provide detailed insights into the ancient human oral microbiome. We offer a multi-period, multi-site, ancient shotgun metagenomic dataset consisting of 174 samples obtained primarily from archaeological dental calculus derived from various skeletal collections in the United Kingdom. This article describes all the materials used, including the skeletons’ historical period and burial location, biological sex, and age determination, data accessibility, and additional details associated with environmental and laboratory controls. In addition, this article describes the laboratory and bioinformatic methods associated with the dataset development and discusses the technical validity of the data following quality assessments, damage evaluations, and decontamination procedures. Our approach to collecting, making accessible, and evaluating bioarchaeological metadata in advance of metagenomic analysis aims to further enable the exploration of archaeological science topics such as diet, disease, and antimicrobial resistance (AMR).

https://doi.org/10.5061/dryad.jdfn2z3mk

Description of the data and file structure

Quality and authentication reports for metagenomic sequencing data associated with ancient dental calculus samples and associated controls. These include 174 samples (348 fastq files using paired end reads) from modern (n=10) and archaeological dental calculus (n=133) and tooth (n=2) samples, environmental controls (bone; n=7), and laboratory controls (extraction and library blanks; n=22).

Files and variables

File: Supplementary_Material_3_Fastp_reports_raw.zip

Description: Results of Fastp analysis for metagenomic data obtained from ancient dental calculus, and associated bone and blank controls. The zip folder contains 174 html files with the Fastp report for each sample - the file name of begins with the sample name.

File: Supplementary_Material_1_FastQC_reports_raw.zip

Description: Results of FastQC analysis for raw metagenomic data obtained from ancient dental calculus, and associated bone and blank controls. The zip folder contains 384 .html files with the FastQC report, with two files for each sample associate with forward (R1) and reverse paired-end sequencing data (R2). The file name of begins with the sample name.

File: Supplementary_Material_2_FastQC_reports_trimmed.zip

Description: Results of FastQC analysis on quality filtered metagenomic data obtained from ancient dental calculus, and associated bone and blank controls following Fastp analysis. The zip folder contains 349 .html files with the FastQC report for the filtered data, with two files for each sample associate with forward (R1) and reverse paired-end sequencing data (R2). The file name of begins with the sample name.

File: Supplementary_Material_4_mapDamage.zip

Description: Results of mapDamage analysis on human DNA sequences derived from filtered metagenomic data obtained from ancient dental calculus, and associated bone and blank controls. The zip folder contains 106 folders which contain the results for each analyzed sample - the folders are listed with the sample name (e.g., results_C028_S1_Human_pe_rmdup.sorted) and contain the output of the mapDamage software, including .txt, .csv, .log, and .pdf files.

Code/software

All bioinformatic coding scripts are available on GitHub (https://github.com/DrATedder/dental_calculus_dataset/tree/main) for all pre-processing (quality control, decontamination) and post-processing (mapDamage) protocols.

Access information

All raw sequencing data have been archived in the European Nucleotide Archive as fastq files in projects PRJEB1716, PRJEB12831, and PRJEB75938

This dataset includes FastQC reports for raw data (Supplementary Material 1), FastQC reports for trimmed data (Supplementary Material 2), Fastp reports on the impact of data quality filtering (Supplementary Material 3), and mapDamage plots of the human DNA sequences recovered from the datasets (Supplementary Material 4).

Detailed methods may be found in the associated publication. The dataset was collected from archaeological dental calculus and associated bone samples. DNA was extracted within dedicated biomolecular clean labs. After crushing to a powder, samples were pre-digested for 5 minutes with 1 mL of 0.5M EDTA to remove possible surface contamination. This pre-digestion supernatant was removed, and a further 1.1 mL of 0.5M EDTA was added and rotated at room temperature for seven days to fully demineralize. For the majority of samples, DNA was extracted from the dental calculus and bone samples using a protocol based on Dabney et al. (2013). All DNA extracts were quantified via Qubit® 2.0 Fluorometer using a High-Sensitivity DNA Assay. For each DNA extract, double-stranded whole genome shotgun Illumina libraries were prepared using a protocol based on Meyer and Kircher (2010). The dental calculus libraries were pooled in equimolar concentration and subjected to paired-end sequencing on multiple HiSeq2500 lanes at the Wellcome Trust Sanger Institute (WTSI) or on a NextSeq platform, PE 150 + 150 bp Integrated Microbiome Resource (IMR) at Dalhousie.

FastQC v0.11.9 was used to assess raw digital data quality. FastQC is a quality control tool for raw sequencing data that provides a modular collection of analyses used to gain insight into any flaws in the data before performing further analysis. The preprocessing programmer Fastp v0.23.2 was then utilised with default parameters. Fastp is a tool used to filter and trim poor-quality reads, cut adapters, repair mismatched base pairs, and produce overall quality. It also provides results that include both pre- and post-filtering data, allowing for a direct comparison of the filtering impact. Centrifuge v1.0.3 was used, with default parameters, to assign taxonomic labels by mapping sequences against the human genome, prokaryotic genomes, and viral genomes, including 106 SARS-CoV-2 complete genomes. Human reads from Centrifuge outputs were retained, and seqtk ‘subseq’ was used to convert them into fastq files that were then mapped to the human genome (hg38) (NCBI 2013) using BWA mem v0.7.17 . SAMtools v1.12 (-view -rmdup -flagstat -sort -index) was then utilised for alignment formatting and was sorted into BAM files that were run through mapDamage2 v2.2.2 with default parameters.