The dataset consists of DNA reads from high-throughput sequencing of 295 dung samples from cattle and horses collected at five sites in Denmark in 2022. From each of the five sites, 7 samples from cattle and 7 samples from horses (except the NM site where no horses were present) were collected in February, March, April, Jun,e and August, respectively. At each sampling event, a field blank was collected and sequenced alongside the other samples (see associated manuscript for details). Twenty-two of these samples (10 from the ML and SL sites respectively, and two field blanks collected in August) were sequenced alongside (and will also be used in) another project, and was thus not part of this raw data, but the filtered data from these samples are added as separate files (see description further below) and should be appended to the main dataset for the subsequent analysis.

DNA was extracted from the samples with the Fast DNA Stool Mini Kit from Qiagen, and amplified by PCR reactions with two primer sets (see PCR reagents and thermal settings, etc. in the associated manuscript). One with the BF-1/BR-1 primers (Elbrecht & Leese, 2017, https://doi.org/10.3389/fenvs.2017.00011), targeting a 217 bp fragment of COI optimized for invertebrates, and one with the ITS2-S2F/ITS4 primers (Fahner et al., 2016, https://doi.org/10.1371/journal.pone.0157505), targeting the nuclear ITS region, and optimized for plants.

During the laboratory pipeline, 32 extraction blanks (sample names including CNE) and four PCR blanks for each amplicon pool (sample names including NTC, 28 in total) were included, which were sequenced alongside the rest of the samples. Hence, in total, the uploaded raw sequencing data includes DNA reads obtained from 373 samples, which were separated into 7 batches. Each batch was used as a template for both primer sets and run through 4 replicate PCR reactions: L11-L74 for COI, and L081-L144 for ITS. For 1 library (L092), additional sequencing was performed, and thus, two separate raw data files exist from this library. See README.md for a description of how to treat these in the bioinformatic pipeline.

The raw sequencing data were run through the MetaBarFlow pipeline (https://github.com/evaegelyng/MetaBarFlow), with parameters following Thomassen et al. (2024) (https://doi.org/10.1111/mec.16847), and the exact scripts are located here: (https://doi:10.5281/zenodo.15296452). The pipeline produces an ASV list (*DADA2_nochim.otus), and a matrix with read counts of each ASV in each sample (*DADA2_nochim.table), as well as a list with taxonomic assignment of all ASVs (*classified.txt) for each data set. The taxonomic identification of DNA sequencing reads for the ITS dataset were made by blasting (blastn) against the complete NCBI genbank nt database (https://www.ncbi.nlm.nih.gov/) downloaded locally, and for the COI dataset, blasting was performed against a custom build COI database containing all COI sequences from BOLD (www.boldsystems.org) and NCBI Genbank (https://www.ncbi.nlm.nih.gov/) See Klepke et al. (2022) (https://doi.org/10.1002/edn3.340) for further description of how the database was built, and the associated publication for details of Blast parameters.

For final taxonomic assignment (score_ID column in "*classified.txt") was defined as the last common ancestor of all blast hits within the range of sequence similarity of hits to the best match, including hits within a 2% margin of the best ID, and species ID was only assigned if the best match was >98% similar.

The list of taxonomic assignments was manually checked for errors resulting from spurious reference database sequences or similar, and when such errors were spotted, the taxonomic assignment of the given ASV was corrected manually. Also, for COI, ASVs identified at higher levels than species were assigned to "putative species", which were units including the same possible IDs. For ITS, aggregations were made at the genus level. These final manually edited IDs are found in the "final_ID" columns in the files "*classified_corrected.txt".

Data from: Impacts of proactive health management on cattle and horse diets and dung biodiversity in Danish rewilding areas

Data files

Abstract

Setting up the directory structure on a HPC cluster

Place all downloaded files in the root directory (root_dir)

Putting the files in the right places

Move data for pools where additional sequencing were performed to a separate directory (L092)

Unzip the “fq.gz” files within each library (example run):

Files generated by MetaBarFlow