Data from: Tracking ungulate diet: Comparing observational and DNA metabarcoding tools
Data files
Apr 29, 2026 version files 4.65 MB
-
Dataset_1_Direct_observations_cattle.csv
1.64 KB
-
Dataset_1_Direct_observations_sheep.csv
1.64 KB
-
Dataset_10_Plants_family_cattle.csv
175.83 KB
-
Dataset_10_Plants_family_sheep.csv
154.19 KB
-
Dataset_11_plants_rare_cattle.csv
165 KB
-
Dataset_11_plants_rare_sheep.csv
148.39 KB
-
Dataset_12_Arthropods_POO_cattle.csv
10.39 KB
-
Dataset_12_Arthropods_POO_sheep.csv
3.82 KB
-
Dataset_13_Plants_POO_cattle.csv
34.23 KB
-
Dataset_13_Plants_POO_sheep.csv
16.87 KB
-
Dataset_2_Fecal_samples_metadata_cattle.csv
2.89 KB
-
Dataset_2_Fecal_samples_metadata_sheep.csv
2.80 KB
-
Dataset_3_plant_primers.csv
653 B
-
Dataset_4_Arthropods_prev_under2_cattle.csv
205.78 KB
-
Dataset_4_Arthropods_prev_under2_sheep.csv
189.68 KB
-
Dataset_5_non-herbivores_cattle.csv
18.86 KB
-
Dataset_5_non-herbivores_sheep.csv
45 KB
-
Dataset_6_herbivorous_insects_cattle.csv
31.98 KB
-
Dataset_6_herbivorous_insects_sheep.csv
34.97 KB
-
Dataset_7_Host_plants_cattle.csv
15.65 KB
-
Dataset_7_Host_plants_sheep.csv
11.91 KB
-
Dataset_8_insects_rare_cattle.csv
14.47 KB
-
Dataset_8_insects_rare_sheep.csv
11.24 KB
-
Dataset_9_Plants_prev_under2_cattle.csv
1.66 MB
-
Dataset_9_Plants_prev_under2_sheep.csv
1.68 MB
-
README.md
7.83 KB
Abstract
Large mammalian herbivores (LMH) are abundant in grazing ecosystems and play a pivotal role in shaping vegetation characteristics. However, accurately determining their diets through traditional methods, such as direct observations, remains challenging, particularly in natural communities and mixed-species grazing systems. Recent studies have shown that DNA metabarcoding can effectively identify the plant composition in LMH diets as well as the plant-dwelling arthropods (PDA) incidentally ingested by LMH while grazing. Given the high specificity of herbivorous insects to their host plant, we hypothesize that DNA metabarcoding of arthropods ingested by LMH could offer valuable insights into their feeding preferences. The goal of this study is to evaluate the accuracy of plant and arthropod DNA metabarcoding methods in identifying the diets of sheep and cattle and to compare their performance with direct observations and known dietary patterns from the literature. To test this, we collected fecal samples from sheep and cattle grazing in the northeast Asian grasslands. We amplified arthropod DNA using COI mitochondrial markers and plant DNA using ITS1 markers, followed by Illumina sequencing. Additionally, we conducted field observations to identify plants grazed by sheep and cattle. The DNA metabarcoding methods provided a comprehensive view of the LMH diet. Both DNA metabarcoding methods successfully detected dietary differences between sheep and cattle, with sheep primarily consuming nutrient-rich forbs and cattle predominantly grazing on Poaceae, consistent with known foraging behaviors. While the constant presence of arthropods across multiple samples suggests that DNA of ingested arthropods could provide complementary information regarding LMH foraging behavior, we found such to be rather limited. However, our findings confirm that plant DNA metabarcoding is a reliable and accurate method for identifying LMH diets.
Dataset DOI: 10.5061/dryad.2fqz61323
Description of the data and file structure
Large mammalian herbivores (LMH) are abundant in grazing ecosystems and play a pivotal role in shaping vegetation characteristics. However, accurately determining their diets through traditional methods, such as direct observations, remains challenging, particularly in natural communities and mixed-species grazing systems. Recent studies have shown that DNA metabarcoding can effectively identify the plant composition in LMH diets as well as the plant-dwelling arthropods (PDA) incidentally ingested by LMH while grazing. Given the high specificity of herbivorous insects to their host plant, we hypothesize that DNA metabarcoding of arthropods ingested by LMH could offer valuable insights into their feeding preferences. The goal of this study is to evaluate the accuracy of plant and arthropod DNA metabarcoding methods in identifying the diets of sheep and cattle and to compare their performance with direct observations and known dietary patterns from the literature. To test this, we collected fecal samples from sheep and cattle grazing in the northeast Asian grasslands. We amplified arthropod DNA using COI mitochondrial markers and plant DNA using ITS1 markers, followed by Illumina sequencing. Additionally, we conducted field observations to identify plants grazed by sheep and cattle. The DNA metabarcoding methods provided a comprehensive view of the LMH diet.
Files and variables
Dataset_1_Direct_observations_sheep.csv
Results of direct observations on sheep diet.
Dataset_1_Direct_observations_cattle.csv
Results of direct observations on cattle diet.
Dataset_2_Fecal_samples_metadata_sheep.csv
Description of sheep fecal samples and the study site.
Dataset_2_Fecal_samples_metadata_cattle.csv
Description of cattle fecal samples and the study site.
Dataset_3_plant_primers.csv
Plant primers used for plant DNA metabarcoding.
Dataset_4_Arthropods_prev_under2_sheep.csv
Arthropod ASVs identified in sheep feces with prevalence <2 samples (removed from analysis).
Dataset_4_Arthropods_prev_under2_cattle.csv
Arthropod ASVs identified in cattle feces with prevalence <2 samples (removed from analysis).
Dataset_5_non-herbivores_sheep.csv
Non-herbivore arthropod ASVs identified in sheep feces (removed from analysis).
Dataset_5_non-herbivores_cattle.csv
Non-herbivore arthropod ASVs identified in cattle feces (removed from analysis).
Dataset_6_herbivorous_insects_sheep.csv
Herbivorous insect ASVs identified in sheep fecal samples.
Dataset_6_herbivorous_insects_cattle.csv
Herbivorous insect ASVs identified in cattle fecal samples.
Dataset_7_Host_plants_sheep.csv
Herbivorous insect ASVs identified in sheep fecal samples with known host plants. Host plants were assigned based on literature sources.
Dataset_7_Host_plants_cattle.csv
Herbivorous insect ASVs identified in cattle fecal samples with known host plants. Host plants were assigned based on literature sources.
Dataset_8_insects_rare_sheep.csv
Herbivorous insect ASVs with known host plants identified in sheep fecal samples (post subsampling; rarefied to 800 reads per sample).
Dataset_8_insects_rare_cattle.csv
Herbivorous insect ASVs with known host plants identified in cattle fecal samples (post subsampling; rarefied to 2,000 reads per sample).
Dataset_9_Plants_prev_under2_sheep.csv
Plant ASVs identified in sheep feces with prevalence <2 samples (removed from analysis).
Dataset_9_Plants_prev_under2_cattle.csv
Plant ASVs identified in cattle feces with prevalence <2 samples (removed from analysis).
Dataset_10_Plants_family_sheep.csv
Plant ASVs identified to the family level in sheep fecal samples.
Dataset_10_Plants_family_cattle.csv
Plant ASVs identified to the family level in cattle fecal samples.
Dataset_11_plants_rare_sheep.csv
Plant ASVs identified in sheep fecal samples (post subsampling; rarefied to 34,000 reads per sample).
Dataset_11_plants_rare_cattle.csv
Plant ASVs identified in cattle fecal samples (post subsampling; rarefied to 10,000 reads per sample).
Dataset_12_Arthropods_POO_sheep.csv
Herbivorous insect ASVs percent of occurrence (POO) in sheep samples.
Dataset_12_Arthropods_POO_cattle.csv
Herbivorous insect ASVs percent of occurrence (POO) in cattle samples.
Dataset_13_Plants_POO_sheep.csv
Plant ASVs percent of occurrence (POO) in sheep samples.
Dataset_13_Plants_POO_cattle.csv
Plant ASVs percent of occurrence (POO) in cattle samples.
Supplementary_information.docx (Zenodo)
Rarefaction curves, additional results, PCR conditions, and relative read abundance analysis.
Variables
The datasets share a common structure across files. Variables that appear in multiple files are described once here.
General sample metadata
- Sample_ID – Unique identifier for each fecal sample
- Host_species – Animal species from which the sample was collected (e.g., sheep, cattle)
- Site / Location – Sampling location (categorical)
- Date – Date of sample collection (format: YYYY-MM-DD)
- Sampling_unit / Quadrat / Plot – Identifier for sampling location or unit (if**** applicable)
Direct observation data (Dataset 1)
- Plant_taxon – Plant species or taxonomic group observed being consumed
- Observation_count – Number of observed feeding events (count)
DNA metabarcoding – general variables
(used across plant and arthropod datasets)
- ASV – Unique amplicon sequence variant identifier
- Sequence – DNA sequence of the ASV
- Sequence_length – Length of the DNA sequence (base pairs)
- Prevalence – Number of samples in which the ASV appears (count)
- Total – Total number of reads assigned to the ASV across all samples (count)
Sample-level read columns
- S.UG-XX – Columns representing individual samples. Each column**** corresponds to a unique fecal sample, with values indicating the number of sequencing reads assigned to each ASV in that sample. Column names were standardized and made unique to ensure compatibility with downstream analyses.
Taxonomic assignment
- Phylum – Taxonomic phylum
- Class – Taxonomic class
- Order – Taxonomic order
- Family – Taxonomic family
- Genus – Taxonomic genus
- Species – Taxonomic species (when available)
Arthropod-specific variables (Datasets 4–8, 12)
- Feeding_group – Ecological classification (e.g., herbivore, non-herbivore)
- Host_plant – Known host plant(s) of the herbivorous insect (based on literature)
- Filtered_status – Indicates whether ASVs were removed (e.g., prevalence <2, non-herbivores)
- Rarefied – Indicates whether data were subsampled (yes/no)
Plant-specific variables (Datasets 9–11, 13)
- Plant_taxon – Identified plant taxon (family/genus/species level depending on resolution)
- Taxonomic_level – Level of identification (family, genus, species)
- Filtered_status – Indicates whether ASVs were removed (e.g., prevalence <2)
- Rarefied – Indicates whether data were subsampled (yes/no)
Percent of Occurrence (POO) datasets (Datasets 12–13)
- POO (%) – Percent of occurrence; proportion of samples in which a given ASV or taxon is present (%)
Primer information (Dataset 3)
- Primer_name – Name of primer
- Primer_sequence – Nucleotide sequence (5’–3’)
- Target_marker – Genetic marker amplified (e.g., COI, ITS1)
- Reference – Source of primer (literature citation)
Code/software
Excel
