Using habitat, morphological, and genetic characteristics to delineate the subspecies of Sharp-Tailed Grouse in south-central Wyoming
Data files
Nov 05, 2025 version files 82.06 MB
-
datasets.zip
82 MB
-
Linux_Code.zip
9.38 KB
-
RCode.zip
38.33 KB
-
README.md
15.29 KB
Abstract
Identifying species and subspecies is the foundation for focusing conservation efforts and studying evolutionary ecology. Subspecies delineation has occurred using multiple data types, including ecological, morphological, and genetic data. There are currently seven recognized Sharp-tailed Grouse (Tympanuchus phasianellus, Linnaeus, 1758) subspecies, with two of these subspecies occurring in Wyoming: Columbian Sharp-tailed Grouse (T. p. columbianus) and plains Sharp-tailed Grouse (T. p. jamesi). There is a third population of Sharp-tailed Grouse in south-central Wyoming with an unknown subspecific identification. Historically, this population has been classified as Columbian Sharp-tailed Grouse; however, previous genetic evidence questioned this classification. To better understand the subspecific status of this south-central Wyoming population, our study used habitat characteristics, morphological characteristics, and genetic data (microsatellite loci and single-nucleotide variants) collected from known Columbian Sharp-tailed Grouse, known plains Sharp-tailed Grouse, and the south-central Wyoming population of Sharp-tailed Grouse. We modeled differences among the populations using discriminant analysis of principal components and Random Forests classification models. Across all four datasets and both modeling techniques, we found that each population (Columbian Sharp-tailed Grouse, plains Sharp-tailed Grouse, and the south-central Wyoming population of Sharp-tailed Grouse) generally represented its own cluster. Our results suggest that the population of Sharp-tailed Grouse in south-central Wyoming is different from both Columbian and plains Sharp-tailed Grouse. We recommend further evaluation of the subspecies of Sharp-tailed Grouse using more targeted phylogenomic studies to identify if Sharp-tailed Grouse in south-central Wyoming represent a separate subspecies or are a distinct population of another subspecies. Our results potentially change our understanding of Columbian Sharp-tailed Grouse distribution and management and highlight the importance of using a more comprehensive approach to identifying subspecies.
Dryad DOI: https://doi.org/10.5061/dryad.gf1vhhmzz
Data Prepared by:
Jonathan D. Lautenbach, Department of Ecosystem Science and Management, University of Wyoming, 1000 E University Ave, Laramie, WY 82071.
Email: jonathan.lautenbach@gmail.com
Associated manuscript: Lautenbach, J.D., A.J. Gregory, S. Galla, A.C. Pratt, M.A. Schroeder, and J.L. Beck. For Review. Using ecological, morphological, and genetic data to delineate the subspecies of Sharp-tailed Grouse (Tympanuchus phasianellus, Linnaeus, 1758) in south-central Wyoming
Contains data on Sharp-tailed Grouse habitat association at eBird observation locations; data on Sharp-tailed Grouse morphological features collected at capture sites across different study areas; and microsatellite loci data from genetic samples collected from capture locations (plains and unknown Sharp-tailed Grouse) and hunter-harvested wing samples (Columbian Sharp-tailed Grouse).
All data sets (habitat association, morphology, microsatellite, and SNVs) are saved in the compressed folder 'datasets.zip.' All R code is saved in the compressed folder 'RCode.zip.' All Linux code is saved in the compressed folder 'Linux_Code.zip.'
NOTE: It is not recommended to open CSV files in Excel. Large files extend beyond the allowable number of rows and are clipped. Consider loading files directly into Program R or elsewhere.
Datasets
All datasets are saved in the compressed folder 'datasets.zip.' This folder contains the following files: habitat.csv; morphology.csv, microsatellite.csv; pileip_merged.gds; and SNVsFiltered.csv. There is a description of each of these datasets below.
Description of the Data and file structure.
Habitat Association Data
Description: These data are derived from eBird checklists (full download available from eBird.org). For a better description of the checklist variables (observation_count, common_name, state_code, protocol_type, all_species_reported, observation_date, year, day_of_year, hours_of_day, effort_hours, effort_distance_km, effort_speed_kmph, and number_observers), see a description in eBird.org or the Best Practices for Using eBird Data document (https://ebird.github.io/ebird-best-practices/). Environmental variables were obtained from the Rangeland Analysis Platform (RAP; Robinson et al. 2019, Allred et al. 2021, Jones et al. 2021), annual National LandCover Database (NLCD; Jin et al. 2019), PRISM climate data (PRISM; PRISM Climate Group 2014), and derived from a Digital Elevation Model (DEM; USGS 2011). For more information on how these were used, see the Methods section of the manuscript
File: habitat.csv
Variables:
- UID: unique observation ID
- observation_count: number of individuals observed during checklist, NA represents species/subspecies was present but no count was obtained
- common_name: common name of species observed (either Lesser Prairie-Chicken or Sharp-tailed Grouse)
- state_code: state code of the location of observation
- protocol_type: checklist protocol type, see eBird for description of the different protocols
- all_species_reported: were all species seen on the checklist reported (TRUE or FALSE)
- observation_date: date of the observation
- year: year of observation
- day_of_year: day of the year of observation (1–365[6])
- hours_of_day: Hour of the day of observation (1–24)
- effort_hours: hours of effort reported for checklist
- effort_distance_km: distance traveled (km; i.e., how far did the observer travel); 0 for non-traveling checklists
- effort_speed_kmph: speed traveled (kmph) during checklist (effort in hours/effort distance); 0 for non-traveling checklists
- number_observers: number of observers in the party for the checklist
- population: study population: LEPC = Lesser Prairie-Chicken; STGRc = Columbian Sharp-tailed Grouse; STGRp = plains Sharp-tailed Grouse; STGRu = unknown Sharp-tailed Grouse (focal population in southcentral Wyoming)
- annualbio: average annual herbaceous biomass (lbs/acre) within 1,500 m of observation location, Rangeland Analysis Platform (RAP) data
- annualcov: average percent annual herbaceous cover within 1,500 m of observation location, RAP data
- bare: average percent bare ground within 1,500 m of observation location, RAP
- conif: average percent coniferous forest canopy cover within 1,500 m of an observation location; derived from both RAP and National LandCover Database (NLCD; NLCD class 42)
- crops: average percent of land within 1,500 m of an observation location classified as cropland (NLCD class 82); NLCD
- decid: average percent deciduous forest canopy cover within 1,500 m of an observation location; derived from both RAP and NLCD (NLCD class 41)
- devel: percent of land within 1,500 m of an observation location classified as developed lands (NLCD classes 21,22,23, and 24); NLCD
- emergwet: percent of land within 1,500 m of an observation location classified as emergent wetlands (NLCD classes 90 and 95); NLCD
- litter: average percent litter within 1,500 m of an observation location; RAP
- mixed: average percent mixed forest canopy cover within 1,500 m of an observation location; derived from both RAP and NLCD (NLCD class 43)
- pasture: percent of land within 1,500 m of an observation location classified as pasture lands (NLCD class 81); NLCD
- perennialbio: average perennial herbaceous biomass (lbs/acre) within 1,500 m of an observation location; RAP
- perennialcov: average percent perennial herbaceous cover within 1,500 m of an observation location; RAP
- shrub: average percent cover of shrubs within 1,500 m of an observation location; RAP
- tree: average canopy cover of tree (all trees) within 1,500 m of an observation location; RAP
- water: percent of land within 1,500 m of an observation location classified as water (NLCD class 11); NLCD
- TPI: Topographic Position Index derived from DEM
- TRI: Terrain Ruggedness Index derived from DEM
- HLI: Heat Load Index derived from DEM
- precip: 30-year average precipitation (mm) from PRISM data
- maxtemp: 30-year average maximum temperature (°C) from PRISM climate data
Morphology dataset
Description: Morphological measurements from different populations of Tympanuchus grouse
File: morphology.csv
Variables:
- UniqueID: unique identifier of the individual captured
- species: species/population of the individual captured: LEPC = Lesser Prairie-Chicken; STGRc = Columbian Sharp-tailed Grouse; STGRp = plains Sharp-tailed Grouse; STGRu = unknown Sharp-tailed Grouse
- sex: sex of bird captured; M = Male; F = Female
- age: age of bird captured: SY = second year (~10-12 months old); ASY = after second year (≥22 months old); AHY = after hatch year (≥10 months old)
- year: year bird was captured
- date: date bird was captured
- time: time of day the bird was captured
- date_time: date and time the bird was captured
- month: month of year the bird was captured: 3 = March; 4 = April; 5 = May.
- capture_method: method the bird was captured using; Funnel = walk-in funnel traps (Haukos et al. 1990, Schroeder and Braun 1991); Dropnet = drop nets (Silvy et al. 1992); NA = method not recorded
- state: US state the bird was captured in
- day_of_year: day of the year that the bird was captured during (1–365[6])
- tail: tail length (mm)
- wing: flattened wing cord length (mm)
- head: total head length, including culmen (mm)
- beak: clumen length (mm)
- tarsustoe: tarsus + longest toe measurement (mm)
- mass: mass of bird (g)
Microsatellite Loci Genotype data
Description: For a better description of loci, see Methods for where these loci are originally published.
File: microsatellite.csv
Variables:
- birdID: unique identifier of the sample
- Species: subspecies/population of sample: cSTGR = Columbian Sharp-tailed Grouse; pSTGR = plains Sharp-tailed Grouse; uSTGR = unknown Sharp-tailed Grouse
- Sex: sex of bird: M = Male; F = Female; NA = unknown sex (wing samples, specimens that have not been sexed, or data not recorded when captured)
- ADL2301.1: first allele for adl230 loci
- ADL2301.2: second allele for adl230 loci
- BG16.1: first allele for bg16 loci
- BG16.2: second allele for bg16 loci
- LLSD7.1: first allele for llsd7 loci
- LLSD7.2: second allele for llsd7 loci
- LLST1.1: first allele for llst1 loci
- LLST1.2: second allele for llst1 loci
- SGMS066.1: first allele for sgms06.6 loci
- SGMS066.2: second allele for sgms06.6 loci
- SGMS068.1: first allele for sgms06.8 loci
- SGMS068.2: second allele for sgms06.8 loci
- SG28.1: first allele for sg28 loci
- SG28.2: second allele for sg28 loci
- TTD6.1: first allele for ttd6 loci
- TTD6.2: second allele for ttd6 loci
- TUT4.1: first allele for tut4 loci
- TUT4.2: second allele for tut4 loci
Filtered Single Nucleotide Variants (SNVs) data
Description: Filtered SNVs used in analyses. SNVs were filtered/pruned using SNPRelate::snpgdsLDpruning. For more details, see the methods in the manuscript and R code. Both the SCWY_STGR_subspecies_Analyses.Rmd and SCWY_STGR_subspecies_AppendixA.Rmd files have the pruning methods included.
File: SNVsFiltered.csv
Variables:
- SampleID: Sample Name (in SRA); samples are labeled SPECIES_BIRDID
- Species Codes: LEPC (Lesser Prairie-Chicken); STGRc (Columbian Sharp-tailed Grouse); STGRp (plains Sharp-tailed Grouse); and STGRu (unknown Sharp-tailed Grouse)
- Columns 2–453: SNV locations relative to the Lesser Prairie-Chicken Genome
- pop: population (see description of species in SampleID above)
- birdID: individual bird ID
Raw Single Nucleotide Variants (including single nucleotide polymorphisms [SNPs] and insertions and deletions [INDELs])
Description: Raw, unfiltered, and unpruned SNV data used to generate the SNVsFiltered.csv dataset above. These are the outputs from the pileup model of CLAIR3 and converted in Program R to a .gds file for use with the SNPRelate package. The original .vcf.gz file is created using the Linux code (described below) from the sequence data. We included the .gds file and not the .vcf.gz file to save time if wanting to check the pruning code. Converting from a .vcf.gz to a .gds took ~450 minutes, and the .vcf.gz file can be recreated from the raw sequence data. This file is fairly large (474 MB).
File: pileup_merged.gds
R-Code
Description: The compressed folder 'RCode.zip' contains all of the R-code to produce the data (eBird/habitat association dataset) and analyze the data. This folder contains the following files: SCWY_STGR_subspecies_Analyses.Rmd, SCWY_STGR_subspecies_AppendixA.Rmd, STGR_subspecies_Raster_Prep.Rmd, and SCWY_STGR_subspecies_eBirdFiltering.Rmd. A brief description of what each of these code files does is below. Additionally, at the beginning of each code file, there is a description of the code. This code was run using RStudio; if you do not have RStudio or you do not use RStudio, you might have to open these files in a text editor and then copy and paste them into R. A Note about this code: this code was developed using a Windows machine, and some aspects of the code may need to be changed if you are not using a Windows-based machine.
Main Manuscript analyses
File: SCWY_STGR_subspecies_Analyses.Rmd
Description: This file contains all data manipulation and analyses for the results presented in the main manuscript. All analyses were conducted in Program R. These analyses were run using an .rproj to keep all of the data in one place. If using a .rproj, you will need to update the script relative to where you saved your data. All of the analyses were run on a computer with Windows 11 Enterprise installed, 128 GB RAM, and a 12th-gen Intel Core i9-12900K 3.20 GHz processor.
Appendix A analyses
File: SCWY_STGR_subspecies_AppendixA.Rmd
Description: This file contains all data manipulation and analyses for the results presented in the main manuscript. All analyses were conducted in Program R. These analyses were run using an .rproj to keep all of the data in one place. If using a .rproj, you will need to update the script relative to where you saved your data. All of the analyses were run on a computer with Windows 11 Enterprise installed, 128 GB RAM, and a 12th-gen Intel Core i9-12900K 3.20 GHz.
Habitat raster data preparation
File: STGR_subspecies_Raster_Prep.Rmd
Description: This R Markdown document contains the code used to manipulate raster data for available environmental data. Specifically, it calculates forest canopy cover from RAP and NLCD forest data, it calculates topographic variables (HLI, TPI, and TRI) from a Digital Elevation Model, and generates binary landcover data from annual NLCD data. A description of all of the environmental data used can be found in the 'habitat.csv' description above. We do not include the raw raster data as these datasets are very large (in total >500 GB) and can be readily downloaded (description of where these data were downloaded can be found in the Methods section of the manuscript and at the beginning of this code).
eBird data manipulation and filtering
File: SCWY_STGR_subspecies_eBirdFiltering.Rmd
Description: Contains code used to filter eBird observations according to the 'Best Practices for Using eBird Data' document (https://ebird.github.io/ebird-best-practices/). This code produces the observations that are used in the 'habitat.csv' document described above. eBird observations and records are archived and freely and publicly accessible through eBird, but the eBird terms and conditions do not allow for third-party redistribution. Permission can be granted to download the eBird database (https://science.ebird.org/en/use-ebird-data/download-ebird-data-products). Note that we used the October 2023 database for our analyses, so any data downloaded after this might be slightly different.
Linux Code
All of the Linux code to run the bioinformatics on sequencing data is saved in the 'Linux_Code.zip' folder. This folder contains 8 bash scripts that were run and a 'workflow.txt' document that explains what each script does and what order they need to be run in
Description of Linux code (bioinformatics for sequencing data)
File: Linux_Code.zip
Description: Zip folder containing bash scripts to process raw sequence reads. Sequence reads are available on the SRA. Please note that you will need to modify this code to run on different machines. Programs used are described in the Methods section of the manuscript. 'pathtofiles' represents where you have the data saved. The Lesser Prairie-Chicken reference genome can be found on GenBank (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_026119805.1/). This code was run on a machine with a 12th-generation Intel Core i9-12900K 3.20 GHz processor (16 core, 24 thread), 128 GB RAM, and an Ubuntu 22.04 operating system installed.
Raw sequence reads are deposited in the SRA (BioProject PRJNA1196947)
