Skip to main content

The generality of cryptic dietary niche differences in diverse large-herbivore assemblages

Cite this dataset

Pringle, Robert et al. (2022). The generality of cryptic dietary niche differences in diverse large-herbivore assemblages [Dataset]. Dryad.


Ecological niche differences are necessary for stable species coexistence but are often difficult to discern. Models of dietary niche differentiation in large mammalian herbivores invoke the quality, quantity, and spatiotemporal distribution of plant tissues and growth-forms but are agnostic towards food-plant species identity. Empirical support for these models is variable, suggesting that additional mechanisms of resource partitioning may be important in sustaining large-herbivore diversity in African savannas. We used DNA metabarcoding to conduct a taxonomically explicit analysis of large-herbivore diets across southeastern Africa, analyzing ~4,000 fecal samples of 30 species from 10 sites in 7 countries over 6 years. We detected 893 food-plant taxa from 124 families, but just two families—grasses and legumes—accounted for the majority of herbivore diets. Nonetheless, herbivore species almost invariably partitioned food-plant taxa; diet composition differed significantly in 97% of pairwise comparisons between sympatric species, and dissimilarity was pronounced even between the strictest grazers (grass eaters), strictest browsers (non-grass eaters), and closest relatives at each site. Niche differentiation was weakest in an ecosystem recovering from catastrophic defaunation, indicating that food-plant partitioning is driven by species interactions, and stronger at low rainfall, as expected if interspecific competition is a predominant driver. Diets differed more between browsers than grazers, which predictably shaped community organization: grazer-dominated trophic networks had higher nestedness and lower modularity. That dietary differentiation is structured along taxonomic lines complements prior work on how herbivores partition plant parts and patches and suggests that common mechanisms govern herbivore coexistence and community assembly in savannas.


This archive presents data on the diets of 30 species of large mammalian herbivores in eastern and southern Africa. Data were generated via DNA metabarcoding of fecal samples collected in 10 protected areas (7 countries) across multiple seasons and years. The methods summary below is abstracted from Pansu et al. (2022); please see that paper for additional details, references, and context.

Sample collection and analysis

Fresh fecal samples were collected during vehicle-based road surveys. Methods were similar for all samples collected in Laikipia, Serengeti (Tanzania), Niassa (Mozambique), Gorongosa (Mozambique), Nyika (Malawi), Kafue (Zambia), Hwange (Zimbabwe), Hluhluwe-iMfolozi (South Africa) and Kruger (South Africa) are described here. Although there were subtle differences in the pipeline used for some sets of these samples (described below), all samples were collected and processed in a similar way. Samples from Addo Elephant National Park (South Africa) were collected as part of an independent project; as a result, there were more substantive methodological differences in the way those samples were processed (described under Addo samples, below).

Fecal samples were collected and processed as described by Kartzinel et al. (2015) and Pansu et al. (2019). Fresh samples without any adhering plant tissue were collected in unused plastic bags and kept cool until returned to camp the same day, where they were pre-processed as follows. We homogenized samples by kneading the bag and then transferred ~200 mm3 of sample (avoiding plant macroremains) into tubes containing silica beads and a stabilization/lysis buffer (Zymo Xpedition Stabilization/Lysis Solution, Zymo Research); tubes were vortexed for 30 s to lyse cells and then frozen for transport to Princeton University. Before import into the United States, samples were subjected to one of two antiviral treatments, as mandated by the US Department of Agriculture’s Animal and Plant Health Inspection Service (permits 122489, 123156, 130123 to R.M.P.). Samples collected in Laikipia from 2013–2016 were treated with proteinase K, heated to 95°C for 15 min, and treated with RNase A. Following issuance of revised regulations, samples collected from 2016–2018 were subjected to heat- only treatment of 72°C for 30 min. On arrival at Princeton University, samples were frozen and stored at −80°C and later extracted in a facility dedicated to fecal DNA analysis, using Zymo Xpedition Soil/Fecal DNA MiniPrep kit according to manufacturer’s instructions. We performed one extraction control (sample-free extract) per extraction batch (~20 to 30 samples).

We amplified a short and variable region of the chloroplast genome, the P6 loop of the trnL intron, using the universal primers

Forward (g): 5’-GGGCAATCCTGAGCCAA-3’


Tags composed of 8 base pairs (bp), each differing by ≥4 nucleotides, were added to the 5’ end of each primer to enable the multiplexing of multiple PCR products per library before high-throughput sequencing. PCRs were carried out in a 20 μL reaction volume including 2 μL template fecal DNA extract; 0.2 μM each primer; 0.2 mM each dNTP; 1X GenAmp PCR buffer II; 2.5 mM MgCl2; 0.5U AmpliTaq Gold DNA Polymerase (Applied Biosystems); 4% dimethyl sulfoxide (Sigma- Aldrich); and 0.1 mg ml−1 of Bovine Serum Albumin (New England Biolabs). Thermocycling followed a program of initial denaturing at 95°C for 10 min, followed by 35 cycles of 95°C for 30 s, 55°C for 30 s, and 72°C for 30 s, with a 2-min final extension at 72°C. Extraction controls, PCR controls (using DNA-free water instead of DNA), and positive controls (made of DNA extract of known plants) were also amplified and later sequenced.

Amplification, purification, and sequencing strategies differed slightly between samples from Laikipia (2013–2016) and those collected in other sites (2016–2018). First, for all non-Laikipia samples, we performed multiple PCR replicates (2 or 3 per extract in 2016, 3 per extract in 2017 and 2018) to monitor reproducibility of results and stochasticity of the PCR and sequencing processes. Second, PCR products from Laikipia were purified using a SequalPrep Normalization Plate Kit (Applied Biosystems); for other sites, samples were pooled per plate and purified with a MinElute PCR Purification Kit (Qiagen). Finally, libraries for Laikipia samples were prepared using a PCR-based approach and sequenced in single-end (170 bp), whereas we used PCR-free library preparation and 2×150 bp paired-end sequencing for other sites. Samples from Laikipia in 2013 and 2016, Gorongosa in 2016, and Serengeti in 2017 and 2018 were all sequenced on separate sequencing runs. Samples from Laikipia in 2014 and 2015 were sequenced together, as were the 2017 samples from all sites except Serengeti; in these cases, samples from different sites/years were placed in different libraries. All libraries were sequenced on Illumina HiSeq 2500 platform at Princeton’s Lewis-Sigler Institute for Integrative Genomics.

Sequence data were curated using OBITools v.1.2. For libraries sequenced in paired-end (i.e., all but those from Laikipia), paired-end reads were first aligned and assembled using the Illuminapairedend command; sequences with a low alignment-quality score (<40, the value corresponding to perfect alignment between the last 10 bases of each read) were discarded. Consensus sequences were then assigned to their original sample from the tag information attached to the primers, using the ngsfilter command (with default parameters allowing zero errors on tags and a maximum of two errors on primers). Identical sequences were merged with the obiuniq command, which retains information about their occurrence in each sample. Low-quality sequences were filtered out with obigrep; these included sequences with ambiguous nucleotides, those with a size outside the expected length of the barcode (<8 and >180 bp) and those represented by only one read in the entire dataset. Taxonomic assignment was performed using the ecotag command against multiple reference databases: a comprehensive local database for Mpala Conservancy in Laikipia, a partial local database for Gorongosa, a grass-specific local database for Serengeti, and a global reference database generated by in silico PCR from the EMBL database (release 134) using ecoPCR. We used the obiclean command (with parameters d = 1 and r = 0.25) to detect sequences potentially resulting from PCR and/or sequencing errors. For each PCR product, this program determines if a sequence is more likely to be a true sequence (‘head’), a sequence derived from another one (‘internal’), or a sequence from which no other sequence is derived and is itself not derived from another (‘singleton’). This information was used later in the filtering process to remove probably erroneous sequences. The fasta file was then converted into a sequence-by-sample matrix using the obitab command. Additional filtering steps were conducted in R v.3.5.3.

For sites with local reference databases, sequences were preferentially assigned to the local reference database. However, if the assignment score obtained with the local database was <98% and lower than that obtained with the global database, then the sequence was reassigned to the global database. For sites without local databases, sequences were assigned to the global reference database unless the assignment score was higher with one of the local reference databases.

To further curate the dataset, we first discarded PCR products with low numbers of reads. For this, we compared the density distribution of the log-transformed number of reads in controls and in true samples within each library, using the intersection of the two distributions as a threshold. We then removed sequences that were likely to have resulted from PCR or sequencing errors. For this, we first used the outputs of the obiclean analysis as follows: for each site and sampling bout, we discarded all sequences that were more frequently considered to be errors (internal) than true sequences (head or singleton) for that site in that year. For sites with local reference databases (Laikipia, Gorongosa, Serengeti), these sequences also needed not to match perfectly with any sequence from the local reference database, or else they were retained. We also filtered out putative contaminants by discarding any sequence that had its maximal average relative read abundance (RRA) in negative controls. Similarly, sequences that displayed low similarity (<80% identity) with their closest match were considered likely to be chimaeras and/or highly degraded sequences and therefore filtered out.

Next, we removed outlier PCR replicates (those with non-reproducible results). For each library, we iteratively determined the density distributions of within- and between-sample distances and discarded replicates that fell within the distribution of between-sample distances, the threshold being defined as the intersection of the two density distributions. This process was iterated until no further replicate was removed. Last, to reduce the impact of low-abundance false positives that can arise from tag- jumps during Illumina sequencing, we removed sequences representing <1% of reads in each sample. Remaining sequences were considered molecular operational taxonomic units (mOTUs) in subsequent analyses.

Addo Samples

 The main methodological difference between samples collected in Addo and those from all other sites was the DNA extraction method: whereas we extracted total DNA at other sites, an extracellular DNA-extraction protocol was used for samples from Addo (Taberlet et al. 2012). Fresh samples from all ruminants, except buffalo, were collected in tubes containing silica gel and stored dry until DNA extraction in the field. Samples from hindgut fermenters and buffalo were collected in unused plastic bags and kept cool until DNA extraction in the field on the same day. Samples were extracted in the field from a much larger volume of initial material than used at other sites. For this, fecal material was mixed with an equivalent amount of saturated phosphate buffer (Na2HPO4; 0.12 M; pH = 8) for 15 minutes to desorb extracellular DNA. DNA extraction followed methods described in Kerley et al. (2018).

Although the general metabarcoding approach based on the P6 loop of the trnL intron and Illumina sequencing was similar for all samples, laboratory steps and data-filtering protocols performed at Université Grenoble Alpes differed slightly from those used at Princeton University. Specifically, the primer pair used was identical to other sites, but the composition of the PCR mix and PCR conditions differed. PCRs were performed in a 20 μL reaction volume containing 10 μL of AmpliTaq Gold 360 master mix (Applied Biosystems), 0.5 μM of each primer, 0.16 μL (20 mg/mL) of bovine serum albumin (BSA, Roche Diagnostic), and 2 μL template fecal DNA extract (diluted 10 times). Polymerase activation was performed at 95°C for 10 min, followed by 40 cycles of 95°C for 30 s (denaturation), 50oC for 30 s (primer annealing), and 72°C for 60 s (extension), with a final elongation for seven minutes at 72oC. Three technical PCR replicates were performed. All experiments included extraction controls, blanks, and negative and positive PCR controls. All PCR products (samples and controls alike) were mixed together and purified using the MinElute PCR Purification Kit. Libraries were prepared using the MetaFast protocol ( protocol-amplicon-metagenomic-analysis) and sequenced in paired-end on a HiSeq 2500 platform (2×150 bp) by Fasteris (Geneva, Switzerland).

Sequence data were processed with OBITools software. Paired-end read alignment, assignment of sequences to their original samples, sequence dereplication, removal of sequences with ambiguous nucleotides, and selection of sequences based on their length (range 10–220 bp here) were conducted as described above for the non-Addo sites. In addition, sequences obtained in fewer than two different PCR replicates (with a minimum of 10 reads in at least one of them) were discarded, as were those represented by <100 reads over the entire dataset. The ecotag program was used for taxonomic assignment using a comprehensive local reference database for Addo Elephant National Park, comprising 473 plant species. Methods employed to build the reference database followed methods described in Taberlet et al. (2018, p. 25), using PCR with the same P6-loop primers. Only sequences with an assignment score ≥0.99 identity were retained. Sequences with >10% of their reads observed in controls were discarded, as were samples with a sequencing depth <2,000 reads after filtering.

Confirmation of sample identity

Whenever possible, fecal samples were collected directly after observing animals defecate. In some cases, we were forced to collect samples without a direct observation (e.g., for animals that are most active at night and/or dangerous, such as elephant, rhinoceroses, buffalo, and hippopotamus). We are highly confident of the species assignments used in this study for multiple reasons. First, the collection team always included at least one member with considerable experience in identifying herbivores and their feces. Second, fresh dung is readily distinguishable from even hours-old dung. Third, relatively few sympatric species have dung similar enough to be confused. Fourth, we performed confirmatory analyses whenever we considered it possible that identifications might be mistaken, as detailed below.

We used DNA analyses to confirm the species identity of 200 uncertain samples from 2017-2018. These DNA samples were amplified with tagged primers targeting a 16S metabarcode allowing identification of mammals to species (MamP007F: 5’-CGAGAAGACCCTATGGAGCT-3’; MamP007R: 5’-CCGAGGTCRCCCCAACC-3’). To limit amplification of human DNA, a blocking primer (MamP007_B_Hum1, 5’-GGAGCTTTAATTTATTAATGCAAACAGTACCC3-3’) was added to the mix. PCR amplifications were conducted in a final volume of 20 μL containing 2 μL of template DNA, 0.2 μM of each primer, 2 μM of human blocking primer, 0.2 mM of each dNTP, 0.2 mg.mL-1 of BSA, 1X GenAmp Gold II buffer; 2 mM MgCl2, and 1U of AmpliTaq Gold DNA Polymerase. Thermocycling conditions included an initial denaturation at 95°C for 10 min, followed by 35 cycles of denaturing at 95°C for 30s, annealing at 50°C for 30s, extension at 72°C for 30s, and a final elongation step (7 min at 72°C). PCR products were pooled and purified using the MinElute PCR Purification Kit. The library was constructed using a PCR-free protocol and sequenced in paired-end (2×250 bp) on an Illumina MiSeq platform. Data processing was performed with OBITools. Paired-end alignment, assignment of sequences to their original sample, dereplication of identical sequences, and removal of low-quality sequences were conducted as described above. We generated a mammal DNA reference database by using in silico PCR from the EMBL database (release 134) to assign sequences to herbivore taxa. PCRs with a low number of reads (<350) were discarded and non-mammalian reads were filtered out. The following criteria were used to define an acceptable assignment: (i) >98% similarity (i.e., maximum of 1 bp difference) with a reference sequence, (ii) no multiple assignments, and (iii) the top sequence was >50% of reads and was at least twice as abundant as the second. In addition, when the identity score was <100%, sequences were manually inspected to check if the barcodes of the putative species were dissimilar enough to avoid misidentification.

A comparable approach was applied to all samples from Addo (2013-2014) using the same primer pair as above. In Laikipia (2013–2016), a barcoding approach based on primer pairs targeting different regions of mitochondrial COI (cytochrome c oxidase subunit I) was used to confirm the identity of 217 of the samples used in the present study. In total, we performed confirmatory analyses for 1634 samples of 23 species, encompassing 7 of the 10 sampling sites.


T. R. Kartzinel, et al., DNA metabarcoding illuminates dietary niche partitioning by African large herbivores. Proc. Natl. Acad. Sci. U. S. A. 112, 8019–8024 (2015).

G. I. H. Kerley, et al., Diet shifts by adult flightless dung beetles Circellium bacchus, revealed using DNA metabarcoding, reflect complex life histories. Oecologia 188, 107–115 (2018).

J. Pansu, et al., Trophic ecology of large herbivores in a reassembling African ecosystem. J. Ecol. 107, 1355–1376 (2019).

J. Pansu, et al., Generality of cryptic dietary niche differentiation in diverse large-herbivore assemblages. Proc. Natl. Acad. Sci. U. S. A. doi: 10.1073/pnas.2204400119 (2022).

P. Taberlet, et al., Soil sampling and isolation of extracellular DNA from large amount of starting material suitable for metabarcoding studies. Mol. Ecol. 21, 1816–1820 (2012).

P. Taberlet, A. Bonin, L. Zinger, E. Coissac, Environmental DNA for Biodiversity Research and Monitoring (Oxford University Press, 2018).


National Science Foundation, Award: IOS-1656527

National Science Foundation, Award: DEB-1457697

National Science Foundation, Award: BCS-1461728

Cameron Schrier Foundation

Greg Carr Foundation

High Meadows Environmental Institute

National Geographic Society, Award: NGS-52921R-18

Agence Nationale de la Recherche, Award: ANR-16-CE02-0001-01

National Research Foundation, Award: 85062

Ministré Français des Affaires Étrangères, Award: 2973PM

Princeton University

Laboratory of Alpine Ecology

Nelson Mandela University