Using habitat, morphological, and genetic characteristics to delineate the subspecies of Sharp-Tailed Grouse in south-central Wyoming

Lautenbach, Jonathan 1 ; Gregory, Andrew2; Galla, Stephanie3; Pratt, Aaron4; Schroeder, Michael5; Beck, Jeffrey1

Published Nov 05, 2025 on Dryad. https://doi.org/10.5061/dryad.gf1vhhmzz

Data files

Nov 05, 2025 version files 82.06 MB

datasets.zip

82 MB
Linux_Code.zip

9.38 KB
RCode.zip

38.33 KB
README.md

15.29 KB

Abstract

Identifying species and subspecies is the foundation for focusing conservation efforts and studying evolutionary ecology. Subspecies delineation has occurred using multiple data types, including ecological, morphological, and genetic data. There are currently seven recognized Sharp-tailed Grouse (Tympanuchus phasianellus, Linnaeus, 1758) subspecies, with two of these subspecies occurring in Wyoming: Columbian Sharp-tailed Grouse (T. p. columbianus) and plains Sharp-tailed Grouse (T. p. jamesi). There is a third population of Sharp-tailed Grouse in south-central Wyoming with an unknown subspecific identification. Historically, this population has been classified as Columbian Sharp-tailed Grouse; however, previous genetic evidence questioned this classification. To better understand the subspecific status of this south-central Wyoming population, our study used habitat characteristics, morphological characteristics, and genetic data (microsatellite loci and single-nucleotide variants) collected from known Columbian Sharp-tailed Grouse, known plains Sharp-tailed Grouse, and the south-central Wyoming population of Sharp-tailed Grouse. We modeled differences among the populations using discriminant analysis of principal components and Random Forests classification models. Across all four datasets and both modeling techniques, we found that each population (Columbian Sharp-tailed Grouse, plains Sharp-tailed Grouse, and the south-central Wyoming population of Sharp-tailed Grouse) generally represented its own cluster. Our results suggest that the population of Sharp-tailed Grouse in south-central Wyoming is different from both Columbian and plains Sharp-tailed Grouse. We recommend further evaluation of the subspecies of Sharp-tailed Grouse using more targeted phylogenomic studies to identify if Sharp-tailed Grouse in south-central Wyoming represent a separate subspecies or are a distinct population of another subspecies. Our results potentially change our understanding of Columbian Sharp-tailed Grouse distribution and management and highlight the importance of using a more comprehensive approach to identifying subspecies.

Dryad DOI: https://doi.org/10.5061/dryad.gf1vhhmzz

Data Prepared by:

Jonathan D. Lautenbach, Department of Ecosystem Science and Management, University of Wyoming, 1000 E University Ave, Laramie, WY 82071.

Email: jonathan.lautenbach@gmail.com

Associated manuscript: Lautenbach, J.D., A.J. Gregory, S. Galla, A.C. Pratt, M.A. Schroeder, and J.L. Beck. For Review. Using ecological, morphological, and genetic data to delineate the subspecies of Sharp-tailed Grouse (Tympanuchus phasianellus, Linnaeus, 1758) in south-central Wyoming

Contains data on Sharp-tailed Grouse habitat association at eBird observation locations; data on Sharp-tailed Grouse morphological features collected at capture sites across different study areas; and microsatellite loci data from genetic samples collected from capture locations (plains and unknown Sharp-tailed Grouse) and hunter-harvested wing samples (Columbian Sharp-tailed Grouse).

All data sets (habitat association, morphology, microsatellite, and SNVs) are saved in the compressed folder 'datasets.zip.' All R code is saved in the compressed folder 'RCode.zip.' All Linux code is saved in the compressed folder 'Linux_Code.zip.'

NOTE: It is not recommended to open CSV files in Excel. Large files extend beyond the allowable number of rows and are clipped. Consider loading files directly into Program R or elsewhere.

Datasets

All datasets are saved in the compressed folder 'datasets.zip.' This folder contains the following files: habitat.csv; morphology.csv, microsatellite.csv; pileip_merged.gds; and SNVsFiltered.csv. There is a description of each of these datasets below.

Description of the Data and file structure.

Habitat Association Data

Description: These data are derived from eBird checklists (full download available from eBird.org). For a better description of the checklist variables (observation_count, common_name, state_code, protocol_type, all_species_reported, observation_date, year, day_of_year, hours_of_day, effort_hours, effort_distance_km, effort_speed_kmph, and number_observers), see a description in eBird.org or the Best Practices for Using eBird Data document (https://ebird.github.io/ebird-best-practices/). Environmental variables were obtained from the Rangeland Analysis Platform (RAP; Robinson et al. 2019, Allred et al. 2021, Jones et al. 2021), annual National LandCover Database (NLCD; Jin et al. 2019), PRISM climate data (PRISM; PRISM Climate Group 2014), and derived from a Digital Elevation Model (DEM; USGS 2011). For more information on how these were used, see the Methods section of the manuscript

File: habitat.csv

Variables:

UID: unique observation ID
observation_count: number of individuals observed during checklist, NA represents species/subspecies was present but no count was obtained
common_name: common name of species observed (either Lesser Prairie-Chicken or Sharp-tailed Grouse)
state_code: state code of the location of observation
protocol_type: checklist protocol type, see eBird for description of the different protocols
all_species_reported: were all species seen on the checklist reported (TRUE or FALSE)
observation_date: date of the observation
year: year of observation
day_of_year: day of the year of observation (1–365[6])
hours_of_day: Hour of the day of observation (1–24)
effort_hours: hours of effort reported for checklist
effort_distance_km: distance traveled (km; i.e., how far did the observer travel); 0 for non-traveling checklists
effort_speed_kmph: speed traveled (kmph) during checklist (effort in hours/effort distance); 0 for non-traveling checklists
number_observers: number of observers in the party for the checklist
population: study population: LEPC = Lesser Prairie-Chicken; STGRc = Columbian Sharp-tailed Grouse; STGRp = plains Sharp-tailed Grouse; STGRu = unknown Sharp-tailed Grouse (focal population in southcentral Wyoming)
annualbio: average annual herbaceous biomass (lbs/acre) within 1,500 m of observation location, Rangeland Analysis Platform (RAP) data
annualcov: average percent annual herbaceous cover within 1,500 m of observation location, RAP data
bare: average percent bare ground within 1,500 m of observation location, RAP
conif: average percent coniferous forest canopy cover within 1,500 m of an observation location; derived from both RAP and National LandCover Database (NLCD; NLCD class 42)
crops: average percent of land within 1,500 m of an observation location classified as cropland (NLCD class 82); NLCD
decid: average percent deciduous forest canopy cover within 1,500 m of an observation location; derived from both RAP and NLCD (NLCD class 41)
devel: percent of land within 1,500 m of an observation location classified as developed lands (NLCD classes 21,22,23, and 24); NLCD
emergwet: percent of land within 1,500 m of an observation location classified as emergent wetlands (NLCD classes 90 and 95); NLCD
litter: average percent litter within 1,500 m of an observation location; RAP
mixed: average percent mixed forest canopy cover within 1,500 m of an observation location; derived from both RAP and NLCD (NLCD class 43)
pasture: percent of land within 1,500 m of an observation location classified as pasture lands (NLCD class 81); NLCD
perennialbio: average perennial herbaceous biomass (lbs/acre) within 1,500 m of an observation location; RAP
perennialcov: average percent perennial herbaceous cover within 1,500 m of an observation location; RAP
shrub: average percent cover of shrubs within 1,500 m of an observation location; RAP
tree: average canopy cover of tree (all trees) within 1,500 m of an observation location; RAP
water: percent of land within 1,500 m of an observation location classified as water (NLCD class 11); NLCD
TPI: Topographic Position Index derived from DEM
TRI: Terrain Ruggedness Index derived from DEM
HLI: Heat Load Index derived from DEM
precip: 30-year average precipitation (mm) from PRISM data
maxtemp: 30-year average maximum temperature (°C) from PRISM climate data

Morphology dataset

Description: Morphological measurements from different populations of Tympanuchus grouse

File: morphology.csv

Variables:

UniqueID: unique identifier of the individual captured
species: species/population of the individual captured: LEPC = Lesser Prairie-Chicken; STGRc = Columbian Sharp-tailed Grouse; STGRp = plains Sharp-tailed Grouse; STGRu = unknown Sharp-tailed Grouse
sex: sex of bird captured; M = Male; F = Female
age: age of bird captured: SY = second year (~10-12 months old); ASY = after second year (≥22 months old); AHY = after hatch year (≥10 months old)
year: year bird was captured
date: date bird was captured
time: time of day the bird was captured
date_time: date and time the bird was captured
month: month of year the bird was captured: 3 = March; 4 = April; 5 = May.
capture_method: method the bird was captured using; Funnel = walk-in funnel traps (Haukos et al. 1990, Schroeder and Braun 1991); Dropnet = drop nets (Silvy et al. 1992); NA = method not recorded
state: US state the bird was captured in
day_of_year: day of the year that the bird was captured during (1–365[6])
tail: tail length (mm)
wing: flattened wing cord length (mm)
head: total head length, including culmen (mm)
beak: clumen length (mm)
tarsustoe: tarsus + longest toe measurement (mm)
mass: mass of bird (g)

Microsatellite Loci Genotype data

Description: For a better description of loci, see Methods for where these loci are originally published.

File: microsatellite.csv

Variables:

birdID: unique identifier of the sample
Species: subspecies/population of sample: cSTGR = Columbian Sharp-tailed Grouse; pSTGR = plains Sharp-tailed Grouse; uSTGR = unknown Sharp-tailed Grouse
Sex: sex of bird: M = Male; F = Female; NA = unknown sex (wing samples, specimens that have not been sexed, or data not recorded when captured)
ADL2301.1: first allele for adl230 loci
ADL2301.2: second allele for adl230 loci
BG16.1: first allele for bg16 loci
BG16.2: second allele for bg16 loci
LLSD7.1: first allele for llsd7 loci
LLSD7.2: second allele for llsd7 loci
LLST1.1: first allele for llst1 loci
LLST1.2: second allele for llst1 loci
SGMS066.1: first allele for sgms06.6 loci
SGMS066.2: second allele for sgms06.6 loci
SGMS068.1: first allele for sgms06.8 loci
SGMS068.2: second allele for sgms06.8 loci
SG28.1: first allele for sg28 loci
SG28.2: second allele for sg28 loci
TTD6.1: first allele for ttd6 loci
TTD6.2: second allele for ttd6 loci
TUT4.1: first allele for tut4 loci
TUT4.2: second allele for tut4 loci

Filtered Single Nucleotide Variants (SNVs) data

Description: Filtered SNVs used in analyses. SNVs were filtered/pruned using SNPRelate::snpgdsLDpruning. For more details, see the methods in the manuscript and R code. Both the SCWY_STGR_subspecies_Analyses.Rmd and SCWY_STGR_subspecies_AppendixA.Rmd files have the pruning methods included.

File: SNVsFiltered.csv

Variables:

SampleID: Sample Name (in SRA); samples are labeled SPECIES_BIRDID
Species Codes: LEPC (Lesser Prairie-Chicken); STGRc (Columbian Sharp-tailed Grouse); STGRp (plains Sharp-tailed Grouse); and STGRu (unknown Sharp-tailed Grouse)
Columns 2–453: SNV locations relative to the Lesser Prairie-Chicken Genome
pop: population (see description of species in SampleID above)
birdID: individual bird ID

Raw Single Nucleotide Variants (including single nucleotide polymorphisms [SNPs] and insertions and deletions [INDELs])

Description: Raw, unfiltered, and unpruned SNV data used to generate the SNVsFiltered.csv dataset above. These are the outputs from the pileup model of CLAIR3 and converted in Program R to a .gds file for use with the SNPRelate package. The original .vcf.gz file is created using the Linux code (described below) from the sequence data. We included the .gds file and not the .vcf.gz file to save time if wanting to check the pruning code. Converting from a .vcf.gz to a .gds took ~450 minutes, and the .vcf.gz file can be recreated from the raw sequence data. This file is fairly large (474 MB).

File: pileup_merged.gds

R-Code

Description: The compressed folder 'RCode.zip' contains all of the R-code to produce the data (eBird/habitat association dataset) and analyze the data. This folder contains the following files: SCWY_STGR_subspecies_Analyses.Rmd, SCWY_STGR_subspecies_AppendixA.Rmd, STGR_subspecies_Raster_Prep.Rmd, and SCWY_STGR_subspecies_eBirdFiltering.Rmd. A brief description of what each of these code files does is below. Additionally, at the beginning of each code file, there is a description of the code. This code was run using RStudio; if you do not have RStudio or you do not use RStudio, you might have to open these files in a text editor and then copy and paste them into R. A Note about this code: this code was developed using a Windows machine, and some aspects of the code may need to be changed if you are not using a Windows-based machine.

Main Manuscript analyses

File: SCWY_STGR_subspecies_Analyses.Rmd

Description: This file contains all data manipulation and analyses for the results presented in the main manuscript. All analyses were conducted in Program R. These analyses were run using an .rproj to keep all of the data in one place. If using a .rproj, you will need to update the script relative to where you saved your data. All of the analyses were run on a computer with Windows 11 Enterprise installed, 128 GB RAM, and a 12th-gen Intel Core i9-12900K 3.20 GHz processor.

Appendix A analyses

File: SCWY_STGR_subspecies_AppendixA.Rmd

Habitat raster data preparation

File: STGR_subspecies_Raster_Prep.Rmd

Description: This R Markdown document contains the code used to manipulate raster data for available environmental data. Specifically, it calculates forest canopy cover from RAP and NLCD forest data, it calculates topographic variables (HLI, TPI, and TRI) from a Digital Elevation Model, and generates binary landcover data from annual NLCD data. A description of all of the environmental data used can be found in the 'habitat.csv' description above. We do not include the raw raster data as these datasets are very large (in total >500 GB) and can be readily downloaded (description of where these data were downloaded can be found in the Methods section of the manuscript and at the beginning of this code).

eBird data manipulation and filtering

File: SCWY_STGR_subspecies_eBirdFiltering.Rmd

Description: Contains code used to filter eBird observations according to the 'Best Practices for Using eBird Data' document (https://ebird.github.io/ebird-best-practices/). This code produces the observations that are used in the 'habitat.csv' document described above. eBird observations and records are archived and freely and publicly accessible through eBird, but the eBird terms and conditions do not allow for third-party redistribution. Permission can be granted to download the eBird database (https://science.ebird.org/en/use-ebird-data/download-ebird-data-products). Note that we used the October 2023 database for our analyses, so any data downloaded after this might be slightly different.

Linux Code

All of the Linux code to run the bioinformatics on sequencing data is saved in the 'Linux_Code.zip' folder. This folder contains 8 bash scripts that were run and a 'workflow.txt' document that explains what each script does and what order they need to be run in

Description of Linux code (bioinformatics for sequencing data)

File: Linux_Code.zip

Description: Zip folder containing bash scripts to process raw sequence reads. Sequence reads are available on the SRA. Please note that you will need to modify this code to run on different machines. Programs used are described in the Methods section of the manuscript. 'pathtofiles' represents where you have the data saved. The Lesser Prairie-Chicken reference genome can be found on GenBank (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_026119805.1/). This code was run on a machine with a 12th-generation Intel Core i9-12900K 3.20 GHz processor (16 core, 24 thread), 128 GB RAM, and an Ubuntu 22.04 operating system installed.

Raw sequence reads are deposited in the SRA (BioProject PRJNA1196947)

Using habitat, morphological, and genetic characteristics to delineate the subspecies of Sharp-Tailed Grouse in south-central Wyoming

Data files

Abstract

README: Using habitat, morphological, and genetic characteristics to delineate the subspecies of Sharp-Tailed Grouse in south-central Wyoming

Datasets

Description of the Data and file structure.

Habitat Association Data

Morphology dataset

Microsatellite Loci Genotype data

Filtered Single Nucleotide Variants (SNVs) data

Raw Single Nucleotide Variants (including single nucleotide polymorphisms [SNPs] and insertions and deletions [INDELs])

R-Code

Main Manuscript analyses

Appendix A analyses

Habitat raster data preparation

eBird data manipulation and filtering

Linux Code

Description of Linux code (bioinformatics for sequencing data)