Body size modulates the extent of seasonal diet switching by large mammalian herbivores in Yellowstone National Park
Data files
Nov 16, 2023 version files 526.44 KB
-
bold-specimendata-DS-YNPBP-R1.xlsx
-
bold-trnL-DS-YNPBP-R1.fas
-
README.md
-
YNPP6_completeDB_20230414.fasta
May 27, 2024 version files 526.73 KB
-
bold-specimendata-DS-YNPBP-R1.xlsx
-
bold-trnL-DS-YNPBP-R1.fas
-
README.md
-
YNPP6_completeDB_20230414.fasta
Aug 07, 2024 version files 526.95 KB
-
bold-specimendata-DS-YNPBP-R1.xlsx
-
bold-trnL-DS-YNPBP-R1.fas
-
README.md
-
YNPP6_completeDB_20230414.fasta
Abstract
Prevailing theories about animal foraging behaviours and the food webs they occupy offer divergent predictions about whether seasonally limited food availability promotes dietary diversification or specialisation. Emphasis on how animals compete for food predominates in work on the foraging ecology of large mammalian herbivores, whereas emphasis on how the diversity of available foods generally constrains dietary opportunity predominates work on entire food webs. Reconciling predictions about what promotes dietary diversification is challenging because species’ different body sizes and mobilities modulate how they seek and compete for resources—the mechanistic bases of common predictions may not pertain to all species equally. We evaluated predictions about five large-herbivore species that differ in body size and mobility in Yellowstone National Park using GPS-tracking and dietary DNA. The data illuminated remarkably strong and significant correlations between body size and five key indicators of diet seasonality (R2 = 0.71-0.80). Compared to smaller species, bison and elk showed muted diet seasonality and maintained access to more unique foods when winter conditions constrained food availability. Evidence from GPS collars revealed size-based differences in species’ seasonal movements and habitat-use patterns, suggesting that better accounting for the allometry of foraging behaviours may help reconcile disparate ideas about the ecological drivers of seasonal diet switching.
README: Body size modulates the extent of seasonal diet switching by large mammalian herbivores in Yellowstone National Park
https://doi.org/10.5061/dryad.h18931zst
CHANGES: Version 2 (May 2024) shows updated files that include a name change for one plant taxa included in each file.
CHANGES: Version 3 (August 2024) shows updated R script for data analyses that includes the addition of habitat Bray-Curtis dissimilarity calculations and the creation of maps which show sample collection sites across Yellowstone National Park.
Python scripts, R scripts, and input/output files used to quantify fine-grained dietary variation within and among populations of five large-herbivore species (pronghorn, bighorn sheep, mule deer, elk, bison) in Yellowstone National Park, USA.
First, global (step 1) and local (step 2) reference libraries are built for the *trn*L-P6 locus. Raw sequence reads from large-herbivore fecal samples are then cleaned and prepared (step 3) for taxonomy assignment (step 4). The taxonomies assigned using the local and global reference libraries are combined (step 5) and then analyses are conducted to determine correlations between body size and key indicators of diet seasonality (step 6).
This dataset contains all code associated with:
- Creating global reference library for the trnL locus in plants (obitools_Step 1_global ref lib.sh)
- Creating local Yellowstone National Park reference library for the trnL locus in plants (obitools_Step 2_local ref lib.sh)
- Preparing and cleaning sequence reads from fecal samples (obitools_Step 3_prepare sequence reads.sh)
- Assigning taxonomy to cleaned sequence reads using the local and global reference libraries (obitools_Step 4_assign taxonomy.sh)
- Combining the local and global reference library taxonomy assignment outputs (R_Step 5_combine local and global library outputs.R)
- Data analyses in R (R_Step 6_Data analyses.R)
Local reference library files:
This dataset also includes the specimen data (bold-specimendata-DS-YNPBP-R1.xlsx), *trn*L input fasta (bold-trnL-DS-YNPBP-R1.fas), and output fasta (YNPP6_completeDB_20230414.fasta) that were used to create the local reference library.
Both the input and and output files for the local reference library are in FASTA format where a sequence begins with a single-line description (plant taxonomy ID), followed by lines of sequence data for that taxon.
In the bold-specimendata-DS-YNPBP-R1.xlsx, there are 3 different tabs; each tab holds information regarding the plant specimens collected for the local reference library. All columns in each tab are outlined below (cells where information wasn't recorded for a specimen are shown with "n/a"):
"Lab Sheet" tab:
- Project Code = unique identifier for the data project
- Process ID = unique code automatically generated by BOLD systems for each new record added to project
- Sample ID = internal identifier for the sample being sequenced
- Field ID = identifier for specimen assigned in the field
- BIN = Barcode index number
- Catalog Num = identifier for specimen assigned by formal collection upon accessioning (museum ID)
- rbcL Seq. Length = sequence length (bps) of rbcL locus for specimen
- rbcL Trace Count = number of trace files for rbcL locus per specimen
- rbcL Accession = GenBank accession number for rbcL specimen record
- matK Seq. Length = sequence length (bps) of matK locus for specimen
- matK Trace Count = number of trace files for matK locus per specimen
- matK Accession = GenBank accession number for matK specimen record
- trnL-F Seq. Length = sequence length (bps) of trnL-F locus for specimen
- trnL-F Trace Count = number of trace files for trnL-F locus per specimen
- trnL-F Accession = GenBank accession number for trnL-F specimen record
- trnH-psbA Seq. Length = sequence length (bps) of trnH-psbA locus for specimen
- trnH-psbA Trace Count =number of trace files for trnH-psbA locus per specimen
- trnH-psbA Accession = GenBank accession number for trnH-psbA specimen record
- Image Count = number of images associated with specimen on BOLD systems
- Barcode Compliant = barcode index number marked as compliant if they contain at least one sequence that meets BOLD systems standards
- Contamination = indicates specimen flagged for contamination
- Stop Codon = indicates presence of stop codon in loci
- Flagged Record = indicates specimen or sequence that was flagged as an issue
- Collection Date = date of specimen collection in the field
- Identification = taxonomic assignment of specimen
- Life Stage = life stage of specimen
- Extra Info = extra information about specimen
- Voucher Type = indicates special case for accessioning process
- Institution = Full name of institution that has physical possession of the voucher specimen
- Notes = comments or notes regarding collection event
"Taxonomy" tab:
- SampleID = internal identifier for the sample being sequenced
- Phylum = scientific name of collected specimen identified to phylum
- Class = scientific name of collected specimen identified to class
- Order = scientific name of collected specimen identified to order
- Family = scientific name of collected specimen identified to family
- Subfamily = scientific name of collected specimen identified to subfamily
- Tribe = scientific name of collected specimen identified to tribe
- Genus = scientific name of collected specimen identified to genus
- Species = scientific name of collected specimen identified to species
- Subspecies = scientific name of collected specimen identified to subspecies
- Identifier = Full name of primary individual who assigned the specimen to a taxonomic group
- Identification Method = The method used to identify the specimen
"Collection Data" tab:
- Sample ID = internal identifier for the sample being sequenced
- Collectors = The full or abbreviated names of the individuals or team responsible for collecting the sample in the field
- Collection Date = Date of specimen collection
- Country/Ocean = Country that specimen was collected
- State/Province = State that specimen was collected
- Region = region that specimen was collected
- Lat = latitude that specimen was collected (Decimal degrees)
- Lon = longitude that specimen was collected (Decimal degrees)
- Elev = elevation that specimen was collected (m)
- Habitat = habitat classification that specimen was collected
- Collection Notes = Additional collection notes
Sharing/Access information
Illumina sequence data and sample metadata are available at NCBI (BioProject accession number: PRJNA780500).
Code/Software
These coding steps are designed to follow on from one another. The files created in steps 1, 2, and 3 will be used in step 4. The files created in step 4 will be used in step 5. The files created in step 5 will be used in step 6. All code is annotated.
Steps 1-4 require the following python packages:
- cutadapt
- obitools
Steps 5-6 require the following R packages:
- plyr
- dplyr
- here
- tidyverse
- phyloseq
- vegan
- vegetarian
- ggplot2
- reshape2
- ggpubr
- cowplot
- car
- devtools
- moments
- nlme
- bipartite
- RColorBrewer
- iNEXT
- cetcolor
- phangorn
- padr
obitools_Step 1_global ref lib.sh - code to build a global reference database for plants, we use the ecoPCR program in obitools to simulate a PCR and to extract all sequences from the EMBL that may be amplified in silico by the two primers (GGGCAATCCTGAGCCAA and CCATTGAGTCTCTGCACCTATC) used for PCR amplification.
The list of steps for building this reference database are:
- Download the whole set of EMBL sequences
- Download the NCBI taxonomy
- Format them into the ecoPCR format
- Use ecoPCR to simulate amplification and build a reference database based on putatively amplified barcodes together with their recorded taxonomic information
obitools_Step 2_local ref lib.sh - code to build a local Yellowstone National Park reference database for plants, we use the ecoPCR program in obitools to simulate a PCR. All local barcode sequences can be found on BOLD and can be amplified in silico by the two primers (GGGCAATCCTGAGCCAA and CCATTGAGTCTCTGCACCTATC) used for PCR amplification. The code results in the creation of the file "YNPP6_completeDB_20230414.fasta" which is included in this dataset.
The list of steps for building this reference database are:
- Extract *trn*L-P6 from BOLD sequences
- Format them into the ecoPCR format
- Use ecoPCR to simulate amplification and build a reference database based on putatively amplified barcodes together with their recorded taxonomic information
obitools_Step 3_prepare sequence reads.sh - code to clean and prepare raw sequence reads from large-herbivore fecal samples to determine their diets.
The following steps are taken:
- Remove primers from forward and reverse reads using cutadapt
- Recover full sequence reads from forward and reverse reads
- Remove unaligned sequence records
- Dereplicate reads into uniq sequences
- Denoise the sequence dataset
- Clean the sequences for PCR/sequencing errors
obitools_Step 4_assign taxonomy.sh - code to assign taxonomy to sequences using global and local reference libraries in order to get the complete list of species associated to each sample. Taxonomic assignment of sequences requires a reference database compiling all possible species to be identified in the sample. Assignment is then done based on sequence comparison between sample sequences and reference sequences.
The following steps are taken for both global and local reference libraries:
- Assign each sequence to a taxon
- Generate the final result table
R_Step 5_combine local and global library outputs.R - R code to combine local and global reference library outputs.
The following steps are taken:
- Subset databases to perfect matches (100% matches)
- Generate summary statistics for subset databases
- Make output files required to create a phyloseq object
- Build the physeq object for further analyses
R_Step 6_Data analyses.R - R code for all analyses conducted on this comparative dietary dataset.
The main analyses performed:
- Data filtering
- Rarefaction
- Calculation of dietary Bray-Curtis dissimilarity
- Calculation of dietary richness
- Calculation of total dietary breadth
- Calculation of sample uniqueness at the sample level
- Calculation of sample uniqueness at the species level
- Calculation of habitat Bray-Curtis dissimilarity
- Creation of supplementary maps of sample collection points within Yellowstone National Park
Methods
We obtained high-resolution diet profiles for pronghorn (Antilocapra americana; 48 kg adult body mass), bighorn sheep (Ovis canadensis; 75 kg), mule deer (Odocoileus hemionus; 85 kg), elk (Cervus canadensis; 241 kg), bison (Bison bison; 625 kg). Fresh dung samples from 1–5 individuals per herd were combined in approximately equal volume and thoroughly mixed.
We extracted DNA from 371 fecal samples and amplified the chloroplast trnL-P6 marker using PCR (Taberlet et al., 2007). To obtain dietary profiles, we produced 2 x 150 bp paired-end Nextera libraries for sequencing on Illumina MiSeq. To identify dietary DNA sequences, we developed two reference libraries: the ‘local’ library comprised 191 unique trnL-P6 sequences from 416 specimens representing 45 plant families from Yellowstone; the ‘global’ library was built using data from the European Molecular Biology Laboratory (release 143), which yielded 21,422 unique trnL-P6 sequences representing at least 615 plant families.
FastQC was used to ensure that both per-base and per-sequence quality scores exceeded Q20, and primers were removed from forward and reverse reads using cutadapt. All further sequence identifications were performed using obitools; forward and reverse sequences were aligned using the illuminapairedend command using a minimum alignment score of 40, and only joined sequences were retained. We used the obiuniq command to group identical sequences and tally them within samples, enabling us to quantify the relative read abundance (RRA) of each sequence. Sequences that occurred ≤2 times overall or that were <8 bp or >300bp were discarded. Sequences were considered to be likely PCR artifacts if they were highly similar to another sequence (1 bp difference) and had a much lower abundance (0.05%) in the majority of samples in which they occurred; we discarded these sequences using the obiclean command.
When inferring the taxonomy of dietary sequences to be included in the final diet profiles, we required a 100% match between each dietary sequence and a reference sequence from at least one of the libraries. After removing one sample with <1000 sequence reads, we rarefied the data to equal read counts (N = 1,453 reads per sample). The final dataset included 370 samples (25–162 per large herbivore species) and 685 plant taxa (94% identified to family, 65% to genus, and 42% to species). The taxonomic assignment of plant taxa was used to characterize plant functional types using the USDA Plants Database and the expert opinion of Yellowstone National Park’s botanists.