This data and code were used in a conservation genetics study of Blanchard’s Cricket Frog (Acris blanchardi; BCF) at the species’ northern range edge. We assessed genetic diversity, population structure, and landscape genetics across the southern Lower Peninsula of Michigan. We genotyped 777 frogs from 41 sites using 14 microsatellite markers. We ran the population assignment algorithm, STRUCTURE, to infer genetic populations and admixture. Twenty distinct genetic populations were found across Michigan sites. We evaluated the effects of landscape features, including geographic distance, on pairwise genetic distances we calculated for all pairwise combinations between sampled sites. The perceptual range of BCF is unknown, and landscape features can have different scales of effect, so we modelled landscape effects on genetic differentiation at different scales. Pairwise geographic distances, pairwise genetic distances, and various landscape composition and configuration variables quantified within pairwise landscape strips are included, but specific site coordinates are excluded due to the conservation status of BCF in Michigan. These are the code and data associated with the article “Landscape-driven isolation among, but high genetic diversity within, peripheral populations of a threatened frog” in the journal, Diversity and Distributions. These files offer groundwork for further assessment of landscape effects on BCF via different modelling methods, and offer a baseline for future assessments.

Dataset DOI: 10.5061/dryad.tqjq2bwbw

Description of the data and file structure

We have submitted our microsatellite data with (DATA_GP(genotypic_data_including_unused_microsat_marker).gen) and without (DATA_GP(genotypic_data_used_for_analyses).gen) a problematic marker, Acr-29, which we excluded from our analyses due to null allele problems. The DATA_IBD(segmented_IBD_data)_2.csv file contains pairwise genetic distances (Fst) and geographic distances (km) between all sampled sites. The DATA_K20(individual_admixture).xlsx file contains admixture proportions assigned to 20 genetic populations for all 777 samples. The DATA_K3(individual_admixture).xlsx file contains admixture proportions assigned to 3 genetic populations for all 777 samples. The DATA_landscape_genetics.csv file contains landscape configuration and composition values for pairwise landscape strips between sampled sites at various width, genetic distances (Fst), geographic distance (m) values for landscape genetics modelling. The code files provide workflow for assessing population structure, genetic diversity, isolation by distance, and landscape genetic modelling in R.

Files and variables

File: DATA_landscape_genetics_2.csv

Description: This is the data we used for landscape genetics modelling. Landscape composition and configuration metrics were quantified within pairwise landscape strips between sampled sites at five different width scales cantered along pairwise lines (300m, 600m, 1300m, 2600m, and a width equal to one-third the strip length). For all these scales, the same variables were calculated and named in column headers accordingly (e.g., Open_Water_300 reflects the number of open water cells within a 300m wide strip, whereas Open_Water_600 reflects the number of open water cells within a 600m wide strip). Landcover and stream data came from United States Geological Survey (National Land Cover Database), and road data came from United States Census Bureau (see published paper for full methods, and see National Land Cover Database information for full description of landcover categories). Below are descriptions of column headers through the first width scale. All following column names reflect the same variables at larger width scales, which are represented in column headers. Headers labeled with "third" are variables for strips that have widths at a one-third width:length ratio (i.e., the width of a strip is one-third its length), calculated using the Length_m and third_width columns.

Variables

Start_Site: pairwise starting site
End_Site: pairwise ending site
Cluster_random_effect: geographic cluster category
Fst: genetic distance
Length_m: geographic pairwise distance in meters
third_width: one-third geographic pairwise distance in meters, to be used for creating the ratio-based strips
Open_Water_300: number of open water cells within a given 300m strip
Developed__Open_Space_300: number of open intensity development cells within a given 300m strip
Developed__Low_Intensity_300: number of low intensity development cells within a given 300m strip
Developed__Medium_Intensity_300: number of medium intensity development cells within a given 300m strip
Developed__High_Intensity_300: number of high intensity development cells within a given 300m strip
Barren_Land_300: number of baren land cells within a given 300m strip
Deciduous_Forest_300: number of deciduous forest cells within a given 300m strip
Evergreen_Forest_300: number of evergreen forest cells within a given 300m strip
Mixed_Forest_300: number of mixed forest cells within a given 300m strip
Shrub_Scrub_300: number of shrub-scrub cells within a given 300m strip
Herbaceous_300: number of herbaceous cells within a given 300m strip
Hay_Pasture_300: number of hay-pasture cells within a given 300m strip
Cultivated_Crops_300: number of crop cells within a given 300m strip
Woody_Wetlands_300: number of woody wetland cells within a given 300m strip
Emergent_Herbaceous_Wetlands_300: number of emergent wetland cells within a given 300m strip
Open_Water_p_300: proportion of open water cells within a given 300m strip
Developed__Open_Space_p_300: proportion of open development cells within a given 300m strip
Developed__Low_Intensity_p_300: proportion of low intensity development cells within a given 300m strip
Developed__Medium_Intensity_p_300: proportion of medium intensity development cells within a given 300m strip
Developed__High_Intensity_p_300: proportion of high intensity development cells within a given 300m strip
Barren_Land_p_300: proportion of barren land cells within a given 300m strip
Deciduous_Forest_p_300: proportion of deciduous forest cells within a given 300m strip
Evergreen_Forest_p_300: proportion of evergreen forest cells within a given 300m strip
Mixed_Forest_p_300: proportion of mixed forest cells within a given 300m strip
Shrub_Scrub_p_300: proportion of shrub-scrub cells within a given 300m strip
Herbaceous_p_300: proportion of herbaceous cells within a given 300m strip
Hay_Pasture_p_300: proportion of hay-pasture cells within a given 300m strip
Cultivated_Crops_p_300: proportion of crop cells within a given 300m strip
Woody_Wetlands_p_300: proportion of woody wetland cells within a given 300m strip
Emergent_Herbaceous_Wetlands_p_300: proportion of emergent wetland cells within a given 300m strip
total_cells_300: total cells in a given 300m width strip, for use as denominator in proportion calculations
area_sq_m_300: area in square meters of a given 300m width strip
area_sq_km_300: area in square kilometers of a given 300m width strip
patch_count_300: number of landscape patches in a given 300m width strip
patch_d_300: patch density (number of patches divided by strip area in square kilometers) for a given 300m width strip
stream_length_m_300: total stream length in meters within a given 300m width strip
stream_length_km_300: total stream length in kilometers within a given 300m width strip
stream_count_300: number of stream shapefiles within a 300m width strip
stream_d_300: stream density (length of streams in kilometers divided by strip area in square kilometers) of a given 300m width strip
allroad_length_m_300: total road length in meters within a given 300m width strip
allroad_length_km_300: total road length in kilometers within a given 300m width strip
allroad_count_300: number of road shapefiles within a 300m width strip
allroad_d_300: road density (length of streams in kilometers divided by strip area in square kilometers) of a given 300m width strip
bigroad_length_m_300: total road (just large roads) length in meters within a given 300m width strip
bigroad_length_km_300: total road (just large roads) length in kilometers within a given 300m width strip
bigroad_count_300: number of road (just large roads) shapefiles within a 300m width strip
bigroad_d_300: road (just large roads) density (length of streams in kilometers divided by strip area in square kilometers) of a given 300m width strip

File: DATA_K3(individual_admixture).xlsx

Description: This file contains admixture proportions assigned to 3 genetic populations for all 777 samples. We ran program STRUCTURE for multiple iterations, which were then aggregated into one admixture result file by CLUMPP.

Variables

SITE: this column provides arbitrary site names (integers) for all individuals of a given site (i.e., all the rows where “4” is given in the SITE column are individuals from “site 4”)
K1: admixture proportion for a given individual assigned to one (arbitrarily named K1) genetic populations
K2: admixture proportion for a given individual assigned to one (arbitrarily named K2) genetic populations
K3: admixture proportion for a given individual assigned to one (arbitrarily named K3) genetic populations

File: DATA_K20(individual_admixture).xlsx

Description: This file is similar to DATA_K3(individual_admixture).xlsx, besides that this file contains admixture proportions assigned to 20, rather than 3, genetic populations for all 777 samples. The format is the same, and there are 20, arbitrarily named, genetic population columns (K1-K20).

File: DATA_IBD(segmented_IBD_data)_2.csv

Description: This file contains pairwise genetic distances (Fst) and geographic distances (km) between all sampled sites, used for isolation by distance modelling.

Variables

Site1: Starting site name (arbitrary integer)
Site2: Ending site name (arbitrary integer)
Fst: Pairwise genetic distance (Fst)
Distance: Pairwise geographic distance in kilometers

File: DATA_GP(genotypic_data_used_for_analyses).gen

Description: Microsatellite data of called alleles (6 digits – 2 three-digit base pair lengths, where repeated three-digit motifs are homozygotic for that allele) for all microsatellite loci, all 777 individuals, across all 41 sites used in our analysis. This is in in the GEN file (.gen) format for many analytical tools (e.g., STRUCTURE), whereby loci names are listed first, then individuals’ genotypes are given in rows, grouped by sampling site (Pop).

File: DATA_GP(genotypic_data_including_unused_microsat_marker).gen

Description: Same file as DATA_GP(genotypic_data_used_for_analyses).gen, only this version include problematic marker, Acr-29, which we excluded from our analyses due to null allele problems.

File: bestK_3.csv

Description: The ΔK (Evanno et al. 2005) and mean natural log of the probability of the data values for each K (number of ancestral populations) value, output from STRUCTURE HARVESTER (Earl and vonHoldt 2012) and STRUCTURE (Pritchard et al. 2000).

Earl, D.A., vonHoldt, B.M. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genet Resour 4, 359–361 (2012). https://doi.org/10.1007/s12686-011-9548-7

Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620

Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959

Variables

K: the a priori number of ancestral populations
deltaK: the ΔK value as calculated by STRUCTURE HARVESTER
Mean LnP(K): the mean natural log of the probability of the data values as calculated by STRUCTURE

File: CODE_landscape_genetics.R

Description: Our code used for landscape genetic analyses. The study species is listed as Threatened by the state of Michigan (USA), so we do not provide the specific coordinates or the files used in the isolation by distance section of the code (e.g., "longlat.csv", "IBD_matrix_km.csv"). However, the pairwise geographic and genetic distances are provided in the DATA_IBD(segmented_IBD_data)_2.csv file, and can be used in place of the "IBD_data_real_no10-3.csv" input on line 54 to follow the linear and segmented isolation by distance modeling. The DATA_landscape_genetics_2.csv file can be used in place of the "C:/Users/travr/Desktop/data_files/revised_data.csv" input within the code for all landscape genetics modeling (i.e., maximum-likelihood population effects) code.

File: CODE_population_genetics.R

Description: Our code used for population genetic analyses. The DATA_GP(genotypic_data_used_for_analyses).gen file can be used in place of the "GP_no29_no10-3.gen" input file in the code, and the DATA_GP(genotypic_data_including_unused_microsat_marker).gen file can be used in place of the "GP_no10-3.gen" input file in the code. The "rAR_n_no10-3.csv" input in the code can be constructed by combining mean the mean rarified allelic richness values and site names in the Richness object with sample sizes reported in the corresponding article. The bestk_3.csv file can be used in place of the "bestK2.csv" input file in the code. The DATA_K3(individual_admixture).xlsx file can be used (after saving as .csv) in place of the "k3_clumpp_no10-3.csv" input file in the code. The DATA_K20(individual_admixture).xlsx file can be used (after saving as .csv) in place of the "k20_clumpp_no10-3.csv" input file in the code.

Code/software

car
Fox J, Weisberg S, Price B (2023). car: Companion to Applied Regression. R package version 3.1-2. https://CRAN.R-project.org/package=car

CLUMPP
Jakobsson M, Rosenberg NA (2007). “CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure.” Bioinformatics, 23(14), 1801–1806.

corMLPE
Pope N (2021). corMLPE: Correlation Structure for Pairwise Data. GitHub repository. https://github.com/nspope/corMLPE

dplyr
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.4. https://CRAN.R-project.org/package=dplyr

ggplot2
Wickham, Hadley. (2023). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. Version 3.4.0, https://CRAN.R-project.org/package=ggplot2.

hierfstat
Goudet J (2005). “hierfstat, a package for R to compute and test hierarchical F-statistics.” Molecular Ecology Notes, 5(1), 184–186.

MuMIn
Bartoń K (2023). MuMIn: Multi-Model Inference. R package version 1.47.5. https://CRAN.R-project.org/package=MuMIn

nlme
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2023). nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme

Polychrome
Coombes KR, Wang M (2019). Polychrome: Creating and Assessing Qualitative Palettes with Many Colors. Journal of Statistical Software, 90(1), 1–23.

PopGenReport
Adamack AT, Gruber B (2014). “PopGenReport: Simplifying basic population genetic analyses in R.” Methods in Ecology and Evolution, 5(4), 384–387.

raster
Hijmans RJ (2023). raster: Geographic Data Analysis and Modeling. R package version 3.6-26. https://CRAN.R-project.org/package=raster

remotes
Hester J, Bryan J (2023). remotes: R Package Installation from Remote Repositories, Including 'GitHub'. https://CRAN.R-project.org/package=remotes

segmented
Muggeo VMR (2023). segmented: Regression Models with Break-Points/Change-Points Estimation. R package version 2.0-2. https://CRAN.R-project.org/package=segmented

STRUCTURE
Pritchard JK, Stephens M, Donnelly P (2000). “Inference of population structure using multilocus genotype data.” Genetics, 155(2), 945–959.
(5), 1322–1332.

Data and code from: Landscape-driven isolation among, but high genetic diversity within, peripheral populations of a threatened frog

Data files

Abstract

Description of the data and file structure

Files and variables

File: DATA_landscape_genetics_2.csv

Variables

File: DATA_K3(individual_admixture).xlsx

Variables

File: DATA_K20(individual_admixture).xlsx

File: DATA_IBD(segmented_IBD_data)_2.csv

Variables

File: DATA_GP(genotypic_data_used_for_analyses).gen

File: DATA_GP(genotypic_data_including_unused_microsat_marker).gen

File: bestK_3.csv

Variables

File: CODE_landscape_genetics.R

File: CODE_population_genetics.R

Code/software

Data and code from: Landscape-driven isolation among, but high genetic diversity within, peripheral populations of a threatened frog

Data files

Abstract

README: Data and code from: Landscape-driven isolation among, but high genetic diversity within, peripheral populations of a threatened frog

Description of the data and file structure

Files and variables

File: DATA_landscape_genetics_2.csv

Variables

File: DATA_K3(individual_admixture).xlsx

Variables

File: DATA_K20(individual_admixture).xlsx

File: DATA_IBD(segmented_IBD_data)_2.csv

Variables

File: DATA_GP(genotypic_data_used_for_analyses).gen

File: DATA_GP(genotypic_data_including_unused_microsat_marker).gen

File: bestK_3.csv

Variables

File: CODE_landscape_genetics.R

File: CODE_population_genetics.R

Code/software