Data from: Dietary response of black-backed jackals (Lupulella mesomelas) to contrasted land use
Data files
Feb 26, 2026 version files 556.97 MB
-
metadata_for_samara.csv
35.16 KB
-
README.md
5.70 KB
-
SAMARA_Raw_reads.7z
556.68 MB
-
Steps_to_final_game_farm_dataset.xlsx
80.85 KB
-
Steps_to_final_livestock_dataset.xlsx
70.17 KB
-
Steps_to_final_samara_Big5_dataset.xlsx
94.10 KB
-
Usearch.sh
1.42 KB
Abstract
This dataset contains prey species detections derived from DNA metabarcoding of black-backed jackal (Lupulella mesomelas) scats collected across three land-use types in the Eastern Cape, South Africa. The data provide species-level dietary occurrence information and associated sample metadata for use in studies of trophic ecology, dietary niche breadth, and predator–prey interactions. Raw sequence data were processed through a standardized bioinformatics pipeline, and the resulting operational taxonomic unit (OTU) table was taxonomically assigned and curated to produce the final dataset. To account for variation in sequencing depth among samples, detections are provided as presence–absence records at the species level. This data was used to investigate patterns of resource use across landscapes, compare dietary composition among sites and seasons, and support meta-analyses of carnivore diet based on metabarcoding approaches.
Dataset DOI: 10.5061/dryad.n02v6wx8g
Description of the data and file structure
Bioinformatics Pipeline Summary
- Quality Control
- Initial read quality assessed using MultiQC v1.14.
- Reads with an average Q-score < 30 were discarded.
- Trimming and Sorting
- Paired-end reads were trimmed and sorted into sample-specific FASTQ files using Illumina's default pipeline.
- Primer sequences were removed using Cutadapt v3.5.
- Unassigned reads were discarded.
- Merging and Filtering
- Forward and reverse reads were merged using USEARCH v11.0.667.
- Merged reads were filtered to allow a maximum of one expected error per read.
- OTU Clustering and Taxonomic Assignment
- OTUs were clustered using the UPARSE algorithm (USEARCH), with a 97 % similarity threshold.
- Chimeric and singleton reads were removed.
- OTUs were assigned taxonomy using BLAST against the NCBI GenBank database:
- OTUs with ≥ 95 % identity were assigned to species level (if unique match) or lowest common taxonomic level (if multiple matches).
- OTUs with < 95 % identity were classified as "undetermined" and excluded.
- Species-Level Aggregation and Taxonomic Corrections
- Read counts were aggregated per species across samples.
- For taxonomically ambiguous groups (e.g., Felidae, Bovidae, Rodentia), manual correction was done by comparing co-occurrence patterns and adjusting assignments where necessary.
- Filtering Criteria
- OTUs identified as Homo sapiens or with read lengths < 70 bp ("None") were removed.
- Samples with < 1,000 total reads after filtering were excluded.
- OTUs classified as Canis spp. were attributed to black-backed jackal.
- Samples where jackal reads made up < 1 % of total reads were excluded to avoid scat misidentification.
- Final Data Transformation
- To account for variation in sequencing depth, data were converted to a presence-absence format.
- A species was considered "present" if its read count was ≥ 10, based on background levels in negative controls.
Files and variables
File: metadata_for_samara.csv
Description:
Sample metadata used to describe each scat sample.
Variables:
- Sample name: Unique identifier for each scat sample
- Sample_title (placement number and corresponding file name): This is the sample number without the “SAM” prefix.
- Site: Land-use type where the scat was collected (Game farm, Livestock farm, or the Big 5 nature reserve Samara).
- Scat collection: Date on which the sample was collected (format: DD-MM-YYYY).
- DNA extraction date: Approximate date of DNA extraction (format: YYYY-MM-DD).
- Season: Seasonal category assigned to each sample.
- Sample description: Indicates whether the sample is a faecal sample or a DNA extraction control.
Extraction controls contain no faecal material and consist only of extraction reagents. One control was included for every batch of 23 samples to monitor contamination and extraction success. - Not applicable: refer to Sample description.
Files: Steps_to_final_livestock_dataset.xlsx, Steps_to_final_game_farm_dataset.xlsx, and Steps_to_final_samara_Big5_dataset.xlsx
Description:
These files contain the final processed prey detection datasets used for statistical analyses. Each file corresponds to one of the three land use sites (Livestock farm, Game farm, and Samara Big 5 reserve) and documents the sequential filtering steps applied to the raw read data to produce the final presence–absence matrix.
Each Excel file includes four worksheets representing the four processing steps described below. The final worksheet in each file contains the dataset used for downstream analyses.
Processing steps
Step 1 – Minimum read threshold and initial filtering
OTU sequences identified as human contamination, shorter than 70 bp, or labelled “None” were removed. Only samples with more than 1,000 total reads after these exclusions were retained. Reads assigned to Canis spp. were classified as black-backed jackal.
Step 2 – Jackal read threshold
To exclude misidentified scats, only samples in which jackal reads comprised ≥ 1% of the total reads were retained.
Step 3 – Taxonomic cross-checking and manual curation
Closely related taxa that could not be reliably distinguished due to shared barcode polymorphisms were evaluated using correlation patterns among samples. Read counts were reassigned where necessary. This procedure was applied to selected members, for example Felidae, Bovidae, and Rodentia.
Step 4 – Conversion to presence–absence data
Species-level read counts were converted to binary detections to account for variation in sequencing depth. A species was recorded as present when ≥ 10 reads were detected in a sample. This threshold was based on observations from extraction and PCR blanks, which rarely exceeded 10 reads for any taxon other than Homo sapiens, Canis spp., or “None”.
Data structure (final worksheet)
Sample
Sample identifier matching Sample_name in the metadata file.
Season
Season in which the sample was collected.
Species columns
Each column represents a prey species.
Cell values
1 = species detected in the sample
0 = species not detected in the sample
File: SAMARA_Raw_reads.7z
Description:
Raw reads after Miseq sequencing
Variables:
- Sample Matches the Sample_name from the metadata
File: Usearch.sh
Description:
Shell Script for USEARCH Workflow
