Body size modulates the extent of seasonal diet switching by large mammalian herbivores in Yellowstone National Park

Littleford-Colquhoun, Bethan 1 ; Geremia, Chris 2 ; McGarvey, Lauren2 ; Merkle, Jerod 3 ; Hoff, Hannah 1 ; Anderson, Heidi2 ; Segal, Carlisle4 ; Kartzinel, Rebecca1 ; Maywar, Ian1 ; Nantais, Natalie1 ; Moore, Camela5 ; Kartzinel, Tyler 1

Published Nov 16, 2023; Updated Aug 07, 2024 on Dryad. https://doi.org/10.5061/dryad.h18931zst

Data files

Nov 16, 2023 version files 526.44 KB

bold-specimendata-DS-YNPBP-R1.xlsx

160.75 KB
bold-trnL-DS-YNPBP-R1.fas

270.20 KB
README.md

10.09 KB
YNPP6_completeDB_20230414.fasta

85.40 KB

May 27, 2024 version files 526.73 KB

bold-specimendata-DS-YNPBP-R1.xlsx

160.75 KB
bold-trnL-DS-YNPBP-R1.fas

270.20 KB
README.md

10.37 KB
YNPP6_completeDB_20230414.fasta

85.41 KB

Aug 07, 2024 version files 526.95 KB

bold-specimendata-DS-YNPBP-R1.xlsx

160.75 KB
bold-trnL-DS-YNPBP-R1.fas

270.20 KB
README.md

10.60 KB
YNPP6_completeDB_20230414.fasta

85.41 KB

Abstract

Prevailing theories about animal foraging behaviours and the food webs they occupy offer divergent predictions about whether seasonally limited food availability promotes dietary diversification or specialisation. Emphasis on how animals compete for food predominates in work on the foraging ecology of large mammalian herbivores, whereas emphasis on how the diversity of available foods generally constrains dietary opportunity predominates work on entire food webs. Reconciling predictions about what promotes dietary diversification is challenging because species’ different body sizes and mobilities modulate how they seek and compete for resources—the mechanistic bases of common predictions may not pertain to all species equally. We evaluated predictions about five large-herbivore species that differ in body size and mobility in Yellowstone National Park using GPS-tracking and dietary DNA. The data illuminated remarkably strong and significant correlations between body size and five key indicators of diet seasonality (R² = 0.71-0.80). Compared to smaller species, bison and elk showed muted diet seasonality and maintained access to more unique foods when winter conditions constrained food availability. Evidence from GPS collars revealed size-based differences in species’ seasonal movements and habitat-use patterns, suggesting that better accounting for the allometry of foraging behaviours may help reconcile disparate ideas about the ecological drivers of seasonal diet switching.

https://doi.org/10.5061/dryad.h18931zst

CHANGES: Version 2 (May 2024) shows updated files that include a name change for one plant taxa included in each file.

CHANGES: Version 3 (August 2024) shows updated R script for data analyses that includes the addition of habitat Bray-Curtis dissimilarity calculations and the creation of maps which show sample collection sites across Yellowstone National Park.

Python scripts, R scripts, and input/output files used to quantify fine-grained dietary variation within and among populations of five large-herbivore species (pronghorn, bighorn sheep, mule deer, elk, bison) in Yellowstone National Park, USA.

First, global (step 1) and local (step 2) reference libraries are built for the trnL-P6 locus. Raw sequence reads from large-herbivore fecal samples are then cleaned and prepared (step 3) for taxonomy assignment (step 4). The taxonomies assigned using the local and global reference libraries are combined (step 5) and then analyses are conducted to determine correlations between body size and key indicators of diet seasonality (step 6).

This dataset contains all code associated with:

Creating global reference library for the trnL locus in plants (obitools_Step 1_global ref lib.sh)
Creating local Yellowstone National Park reference library for the trnL locus in plants (obitools_Step 2_local ref lib.sh)
Preparing and cleaning sequence reads from fecal samples (obitools_Step 3_prepare sequence reads.sh)
Assigning taxonomy to cleaned sequence reads using the local and global reference libraries (obitools_Step 4_assign taxonomy.sh)
Combining the local and global reference library taxonomy assignment outputs (R_Step 5_combine local and global library outputs.R)
Data analyses in R (R_Step 6_Data analyses.R)

Local reference library files:
This dataset also includes the specimen data (bold-specimendata-DS-YNPBP-R1.xlsx), trnL input fasta (bold-trnL-DS-YNPBP-R1.fas), and output fasta (YNPP6_completeDB_20230414.fasta) that were used to create the local reference library.

Both the input and and output files for the local reference library are in FASTA format where a sequence begins with a single-line description (plant taxonomy ID), followed by lines of sequence data for that taxon.

In the bold-specimendata-DS-YNPBP-R1.xlsx, there are 3 different tabs; each tab holds information regarding the plant specimens collected for the local reference library. All columns in each tab are outlined below (cells where information wasn't recorded for a specimen are shown with "n/a"):

"Lab Sheet" tab:

Project Code = unique identifier for the data project
Process ID = unique code automatically generated by BOLD systems for each new record added to project
Sample ID = internal identifier for the sample being sequenced
Field ID = identifier for specimen assigned in the field
BIN = Barcode index number
Catalog Num = identifier for specimen assigned by formal collection upon accessioning (museum ID)
rbcL Seq. Length = sequence length (bps) of rbcL locus for specimen
rbcL Trace Count = number of trace files for rbcL locus per specimen
rbcL Accession = GenBank accession number for rbcL specimen record
matK Seq. Length = sequence length (bps) of matK locus for specimen
matK Trace Count = number of trace files for matK locus per specimen
matK Accession = GenBank accession number for matK specimen record
trnL-F Seq. Length = sequence length (bps) of trnL-F locus for specimen
trnL-F Trace Count = number of trace files for trnL-F locus per specimen
trnL-F Accession = GenBank accession number for trnL-F specimen record
trnH-psbA Seq. Length = sequence length (bps) of trnH-psbA locus for specimen
trnH-psbA Trace Count =number of trace files for trnH-psbA locus per specimen
trnH-psbA Accession = GenBank accession number for trnH-psbA specimen record
Image Count = number of images associated with specimen on BOLD systems
Barcode Compliant = barcode index number marked as compliant if they contain at least one sequence that meets BOLD systems standards
Contamination = indicates specimen flagged for contamination
Stop Codon = indicates presence of stop codon in loci
Flagged Record = indicates specimen or sequence that was flagged as an issue
Collection Date = date of specimen collection in the field
Identification = taxonomic assignment of specimen
Life Stage = life stage of specimen
Extra Info = extra information about specimen
Voucher Type = indicates special case for accessioning process
Institution = Full name of institution that has physical possession of the voucher specimen
Notes = comments or notes regarding collection event

"Taxonomy" tab:

SampleID = internal identifier for the sample being sequenced
Phylum = scientific name of collected specimen identified to phylum
Class = scientific name of collected specimen identified to class
Order = scientific name of collected specimen identified to order
Family = scientific name of collected specimen identified to family
Subfamily = scientific name of collected specimen identified to subfamily
Tribe = scientific name of collected specimen identified to tribe
Genus = scientific name of collected specimen identified to genus
Species = scientific name of collected specimen identified to species
Subspecies = scientific name of collected specimen identified to subspecies
Identifier = Full name of primary individual who assigned the specimen to a taxonomic group
Identification Method = The method used to identify the specimen

"Collection Data" tab:

Sample ID = internal identifier for the sample being sequenced
Collectors = The full or abbreviated names of the individuals or team responsible for collecting the sample in the field
Collection Date = Date of specimen collection
Country/Ocean = Country that specimen was collected
State/Province = State that specimen was collected
Region = region that specimen was collected
Lat = latitude that specimen was collected (Decimal degrees)
Lon = longitude that specimen was collected (Decimal degrees)
Elev = elevation that specimen was collected (m)
Habitat = habitat classification that specimen was collected
Collection Notes = Additional collection notes

Sharing/Access information

Illumina sequence data and sample metadata are available at NCBI (BioProject accession number: PRJNA780500).

Code/Software

These coding steps are designed to follow on from one another. The files created in steps 1, 2, and 3 will be used in step 4. The files created in step 4 will be used in step 5. The files created in step 5 will be used in step 6. All code is annotated.

Steps 1-4 require the following python packages:

cutadapt
obitools

Steps 5-6 require the following R packages:

plyr
dplyr
here
tidyverse
phyloseq
vegan
vegetarian
ggplot2
reshape2
ggpubr
cowplot
car
devtools
moments
nlme
bipartite
RColorBrewer
iNEXT
cetcolor
phangorn
padr

obitools_Step 1_global ref lib.sh - code to build a global reference database for plants, we use the ecoPCR program in obitools to simulate a PCR and to extract all sequences from the EMBL that may be amplified in silico by the two primers (GGGCAATCCTGAGCCAA and CCATTGAGTCTCTGCACCTATC) used for PCR amplification.

The list of steps for building this reference database are:

Download the whole set of EMBL sequences
Download the NCBI taxonomy
Format them into the ecoPCR format
Use ecoPCR to simulate amplification and build a reference database based on putatively amplified barcodes together with their recorded taxonomic information

obitools_Step 2_local ref lib.sh - code to build a local Yellowstone National Park reference database for plants, we use the ecoPCR program in obitools to simulate a PCR. All local barcode sequences can be found on BOLD and can be amplified in silico by the two primers (GGGCAATCCTGAGCCAA and CCATTGAGTCTCTGCACCTATC) used for PCR amplification. The code results in the creation of the file "YNPP6_completeDB_20230414.fasta" which is included in this dataset.
The list of steps for building this reference database are:

Extract trnL-P6 from BOLD sequences
Format them into the ecoPCR format
Use ecoPCR to simulate amplification and build a reference database based on putatively amplified barcodes together with their recorded taxonomic information

obitools_Step 3_prepare sequence reads.sh - code to clean and prepare raw sequence reads from large-herbivore fecal samples to determine their diets.
The following steps are taken:

Remove primers from forward and reverse reads using cutadapt
Recover full sequence reads from forward and reverse reads
Remove unaligned sequence records
Dereplicate reads into uniq sequences
Denoise the sequence dataset
Clean the sequences for PCR/sequencing errors

obitools_Step 4_assign taxonomy.sh - code to assign taxonomy to sequences using global and local reference libraries in order to get the complete list of species associated to each sample. Taxonomic assignment of sequences requires a reference database compiling all possible species to be identified in the sample. Assignment is then done based on sequence comparison between sample sequences and reference sequences.
The following steps are taken for both global and local reference libraries:

Assign each sequence to a taxon
Generate the final result table

R_Step 5_combine local and global library outputs.R - R code to combine local and global reference library outputs.
The following steps are taken:

Subset databases to perfect matches (100% matches)
Generate summary statistics for subset databases
Make output files required to create a phyloseq object
Build the physeq object for further analyses

R_Step 6_Data analyses.R - R code for all analyses conducted on this comparative dietary dataset.
The main analyses performed:

Data filtering
Rarefaction
Calculation of dietary Bray-Curtis dissimilarity
Calculation of dietary richness
Calculation of total dietary breadth
Calculation of sample uniqueness at the sample level
Calculation of sample uniqueness at the species level
Calculation of habitat Bray-Curtis dissimilarity
Creation of supplementary maps of sample collection points within Yellowstone National Park

We obtained high-resolution diet profiles for pronghorn (Antilocapra americana; 48 kg adult body mass), bighorn sheep (Ovis canadensis; 75 kg), mule deer (Odocoileus hemionus; 85 kg), elk (Cervus canadensis; 241 kg), bison (Bison bison; 625 kg). Fresh dung samples from 1–5 individuals per herd were combined in approximately equal volume and thoroughly mixed.

We extracted DNA from 371 fecal samples and amplified the chloroplast trnL-P6 marker using PCR (Taberlet et al., 2007). To obtain dietary profiles, we produced 2 x 150 bp paired-end Nextera libraries for sequencing on Illumina MiSeq. To identify dietary DNA sequences, we developed two reference libraries: the ‘local’ library comprised 191 unique trnL-P6 sequences from 416 specimens representing 45 plant families from Yellowstone; the ‘global’ library was built using data from the European Molecular Biology Laboratory (release 143), which yielded 21,422 unique trnL-P6 sequences representing at least 615 plant families.

FastQC was used to ensure that both per-base and per-sequence quality scores exceeded Q20, and primers were removed from forward and reverse reads using cutadapt. All further sequence identifications were performed using obitools; forward and reverse sequences were aligned using the illuminapairedend command using a minimum alignment score of 40, and only joined sequences were retained. We used the obiuniq command to group identical sequences and tally them within samples, enabling us to quantify the relative read abundance (RRA) of each sequence. Sequences that occurred ≤2 times overall or that were <8 bp or >300bp were discarded. Sequences were considered to be likely PCR artifacts if they were highly similar to another sequence (1 bp difference) and had a much lower abundance (0.05%) in the majority of samples in which they occurred; we discarded these sequences using the obiclean command.

When inferring the taxonomy of dietary sequences to be included in the final diet profiles, we required a 100% match between each dietary sequence and a reference sequence from at least one of the libraries. After removing one sample with <1000 sequence reads, we rarefied the data to equal read counts (N = 1,453 reads per sample). The final dataset included 370 samples (25–162 per large herbivore species) and 685 plant taxa (94% identified to family, 65% to genus, and 42% to species). The taxonomic assignment of plant taxa was used to characterize plant functional types using the USDA Plants Database and the expert opinion of Yellowstone National Park’s botanists.

Body size modulates the extent of seasonal diet switching by large mammalian herbivores in Yellowstone National Park

Data files

Abstract

README: Body size modulates the extent of seasonal diet switching by large mammalian herbivores in Yellowstone National Park

Sharing/Access information

Code/Software

Methods

Works referencing this dataset