ASV output data for all in the details: A first assessment for the viability of metabarcoding in diet composition analysis of African wild dogs (Lycaon pictus)
Data files
Apr 09, 2025 version files 43.10 KB
-
ASV_output_data.xlsx
41.81 KB
-
README.md
1.29 KB
Abstract
DNA metabarcoding is an emerging contemporary technique in diet composition studies and stands to fill key knowledge gaps left by traditional diet analysis methods. For endangered species such as the African wild dog, the fulfilment of these knowledge gaps presents an opportunity for improved management practices and vulnerability assessments. There are an estimated ~600 African wild dogs remaining in South Africa. These dogs are generally understood to prey upon impala Aepyceros melampus and other medium-sized ungulates. Here, we present the first assessment of DNA metabarcoding as a valuable method for diet composition analysis of this highly social carnivore. DNA from faecal samples collected across seven landscape types in the Kruger National Park (KNP) was extracted and used to determine the presence of seven unique prey taxa, including novel species such as the Cape hare Lepus capensis. Impala was identified as a prey item in all landscape types, which complements the diet preference prediction made with SIA using the same samples and the existing understanding of wild dog diet. Given recommended improvements, the application of DNA metabarcoding in wild dog diet analysis shows promising prospects for identifying novel prey species and validating previous records of this endangered canids diet.
Contained in this data file are the output ASV sequences and their taxonomic matches produced using the DADA2 pipeline in R. These are the results of DNA extraction from 11 samples of wild dog faeces across landscape types in the Kruger National Park. One sheet contains all sequences produced by the pipeline, while another sheet shows the resultant sequences remaining after filtering the dataset for percent identity (pid) matches greater than 95% and removing contaminant DNA. Sample landscape types are also given.
Description of the data and file structure
The data is provided in an excel format with percent compositions of each identified species given for each sample as well as information on the quality of the taxonomic match provided. NA indicates no species match. These can be used to estimate species contributions to diet in each landscape type according to which samples fall in each landscape type. Version 2 was uploaded as the previous version was realised to be uploaded from an earlier version of the results which omitted the Swainson’s francolin (Pternistis Swainsonii) identification.
Sharing/Access information
No other sources to share from where this data was derived from.
Sample collection
Fourteen faecal samples were collected across seven landscape types of the Kruger National Park and initially stored at -20°C at the SANParks Veterinary Wildlife Services Biobank. They were released to Crossey and colleagues at the time of their study in 2021 after being cleared as negative for tuberculosis. Samples were lyophilized from frozen, ground and sieved through a 20µm metal mesh strainer in sterile laboratory conditions. This was done to remove undigested material from the faecal matter.
DNA extraction and sequencing
DNA extractions were conducted with 40 mg of faecal sample added to 140 µl of water in a 2 ml Eppendorf® (Merck) tube. Extractions were conducted for 14 samples using the DNeasy Blood & Tissue Extraction Kit (QIAGEN) with modifications. Modifications included a one-minute vortex step before a one-minute centrifugation step at 14000 rpm following the three-hour incubation step at 56°C after the addition of proteinase K (Inqaba Biotech). Following this, the supernatant was used according to the standard kit protocol. The remaining solid faecal sample was stored and used for re-extraction as required. Re-extraction was used for nine of the 14 samples, and the same protocol as described above was followed, with repeat additions of reagents and an additional three-hour incubation step. Universal vertebrate primers (Anatech) that amplified the 12S and 16S gene regions of the mitochondrial genome, designed by Wang, et al.(Table 3), were used. The primers were designed to amplify a wider set of vertebrate species than previous commonly used primers. These primers were degenerate, meaning some positions in the sequence have several possible bases labelled according to the standard IUPAC nucleotide code. For the polymerase chain reaction (PCR, Saiki et al. 1988), both forward primers for the 12S gene fragment (VertU V12S-U F1 and Vert U V12S-U F2) were added in equal volumes. Each amplification reaction was conducted in a total reaction volume of 25µl. Each 25µl reaction included: 3.5 mM MgCl2, 1 x reaction buffer, 0.25mM of each of the four deoxyribonucleotide bases (Inqaba Biotech), 0.16 µM each of the 12S or 16S-forward and reverse primers, 0.75 U of SuperTherm® Taq DNA Polymerase (ThermoFisher Scientific), 8 U of BSA (bovine serum albumin, Inqaba Biotech) and 4µl of DNA. PCR thermocycling was conducted in a 2720 Thermal Cycler (Applied Biosystems). Polymerase chain reaction cycles consisted of an initial denaturation phase that separated the strands of existing DNA and lasted for 2 min at 94°C. This was followed by 35 cycles of denaturation, annealing, and elongation. The denaturation phase lasted for 30s at 94°C, the annealing phase, where primers bind to the existing template DNA strands, continued for a 20s period at 47°C, then the elongation phase proceeded for 20s at 72°C to allow the primers to extend and synthesise the new DNA strands. Finally, an extended elongation phase lasted for 5 min at 72°C to ensure complete synthesis of all fragments. PCR amplification success was assessed through electrophoresis using 1.5% agarose (Separations) gels using 4 µl of the PCR product. Negative controls, using distilled water in place of DNA, showed no contamination. To confirm that the PCR reagents were functional, DNA extracted from the blood sample of a female lion Panthera leo, was used as the positive control. Replicate PCR reactions were performed, the number of which depended on the PCR amplification success of each reaction for each sample (between 3 - 5 repeats per sample were performed and later combined to increase the concentration of the amplified product to an acceptable level for downstream applications). After amplification, the PCR products were purified according to the protocol outlined in the IonXpressTM Plus gDNA Fragment Library Preparation user guide (ThermoFisher Scientific). PCR products were purified using 1.8X sample volume of Agencourt™ AMPure™ XP Reagent (Beckman Coulter). Following purification, the concentration of each DNA sample was measured using Qubit Fluorometric Quantification according to the protocol outlined in the Qubit™ dsDNA HS Assay Kit user guide (ThermoFisher Scientific) and using three microliters of DNA in each sample reading. Replicates were combined such that each sample had a minimum concentration of at least 100 ng/µl of DNA. The eleven samples were not pooled into a single library but were kept separate. Each sample comprised a combination of the 12S and 16S purified products, combined in equal concentrations. Library construction and sequencing were conducted by the Central Analytical Facilities, Stellenbosch University, and the Ion Torrent Next Generation Sequencing platform was used for sequencing.
Metabarcoding bioinformatics
The raw sequenced reads first underwent pre-processing to filter and correct any errors introduced during amplification and sequencing. Pre-processing was conducted using the shell script bbduk.sh in Ubuntu, part of the BBTools package. The reads were also trimmed to remove low-quality positions based on their Phred score. Primer sequences were removed, and sequences were filtered to a minimum of 80bp. Processing and post-processing were then conducted in R using the open-source software package DADA2. Sequences were further filtered in R to be a minimum of 150bp long. During processing, Amplicon Sequence Variants (ASVs) were clustered by grouping sequences based on minimal nucleotide differences, frequently one single nucleotide difference. DADA2 generated a parametric error model trained on each sequencing run to minimise and correct false positive errors, and effectively collapse the sequences into ASVs. Following clustering, taxonomic identification of ASVs was conducted using a similarity-based method that aligns query sequences with a chosen reference database. A custom reference database was generated, comprising 47 likely prey species from Kruger National Park compiled using sequences from annotated whole mitochondrial genomes present on GenBank (Appendix, Table A1). The custom database was generated based on existing knowledge of the diet of the African wild dog, prey species known to inhabit the Kruger National Park, and preliminary BLAST searches conducted on GenBank. Taxonomic assignment was accomplished using the R package taxonomizr, which efficiently assigned taxonomy and accession numbers to taxonomic IDs by providing functions that inspect NCBI taxonomy files and accession dumps. The R package Phyloseq was used to perform further analyses, remove overrepresented ASVs and ASVs with less than 10 reads, and construct a taxonomy table. Identifications with a percent identity of less than 95% were excluded from further analyses.