Data and code from: Landscape-driven isolation among, but high genetic diversity within, peripheral populations of a threatened frog
Data files
Dec 15, 2025 version files 477.68 KB
-
bestK_3.csv
1.17 KB
-
CODE_landscape_genetics.R
16.18 KB
-
CODE_population_genetics.R
10.77 KB
-
DATA_GP(genotypic_data_including_unused_microsat_marker).gen
86.47 KB
-
DATA_GP(genotypic_data_used_for_analyses).gen
81.02 KB
-
DATA_IBD(segmented_IBD_data)_2.csv
24.96 KB
-
DATA_K20(individual_admixture).xlsx
106.48 KB
-
DATA_K3(individual_admixture).xlsx
34.21 KB
-
DATA_landscape_genetics_2.csv
100.95 KB
-
README.md
15.48 KB
Abstract
This data and code were used in a conservation genetics study of Blanchard’s Cricket Frog (Acris blanchardi; BCF) at the species’ northern range edge. We assessed genetic diversity, population structure, and landscape genetics across the southern Lower Peninsula of Michigan. We genotyped 777 frogs from 41 sites using 14 microsatellite markers. We ran the population assignment algorithm, STRUCTURE, to infer genetic populations and admixture. Twenty distinct genetic populations were found across Michigan sites. We evaluated the effects of landscape features, including geographic distance, on pairwise genetic distances we calculated for all pairwise combinations between sampled sites. The perceptual range of BCF is unknown, and landscape features can have different scales of effect, so we modelled landscape effects on genetic differentiation at different scales. Pairwise geographic distances, pairwise genetic distances, and various landscape composition and configuration variables quantified within pairwise landscape strips are included, but specific site coordinates are excluded due to the conservation status of BCF in Michigan. These are the code and data associated with the article “Landscape-driven isolation among, but high genetic diversity within, peripheral populations of a threatened frog” in the journal, Diversity and Distributions. These files offer groundwork for further assessment of landscape effects on BCF via different modelling methods, and offer a baseline for future assessments.
Dataset DOI: 10.5061/dryad.tqjq2bwbw
Description of the data and file structure
We have submitted our microsatellite data with (DATA_GP(genotypic_data_including_unused_microsat_marker).gen) and without (DATA_GP(genotypic_data_used_for_analyses).gen) a problematic marker, Acr-29, which we excluded from our analyses due to null allele problems. The DATA_IBD(segmented_IBD_data)_2.csv file contains pairwise genetic distances (Fst) and geographic distances (km) between all sampled sites. The DATA_K20(individual_admixture).xlsx file contains admixture proportions assigned to 20 genetic populations for all 777 samples. The DATA_K3(individual_admixture).xlsx file contains admixture proportions assigned to 3 genetic populations for all 777 samples. The DATA_landscape_genetics.csv file contains landscape configuration and composition values for pairwise landscape strips between sampled sites at various width, genetic distances (Fst), geographic distance (m) values for landscape genetics modelling. The code files provide workflow for assessing population structure, genetic diversity, isolation by distance, and landscape genetic modelling in R.
Files and variables
File: DATA_landscape_genetics_2.csv
Description: This is the data we used for landscape genetics modelling. Landscape composition and configuration metrics were quantified within pairwise landscape strips between sampled sites at five different width scales cantered along pairwise lines (300m, 600m, 1300m, 2600m, and a width equal to one-third the strip length). For all these scales, the same variables were calculated and named in column headers accordingly (e.g., Open_Water_300 reflects the number of open water cells within a 300m wide strip, whereas Open_Water_600 reflects the number of open water cells within a 600m wide strip). Landcover and stream data came from United States Geological Survey (National Land Cover Database), and road data came from United States Census Bureau (see published paper for full methods, and see National Land Cover Database information for full description of landcover categories). Below are descriptions of column headers through the first width scale. All following column names reflect the same variables at larger width scales, which are represented in column headers. Headers labeled with "third" are variables for strips that have widths at a one-third width:length ratio (i.e., the width of a strip is one-third its length), calculated using the Length_m and third_width columns.
Variables
- Start_Site: pairwise starting site
- End_Site: pairwise ending site
- Cluster_random_effect: geographic cluster category
- Fst: genetic distance
- Length_m: geographic pairwise distance in meters
- third_width: one-third geographic pairwise distance in meters, to be used for creating the ratio-based strips
- Open_Water_300: number of open water cells within a given 300m strip
- Developed__Open_Space_300: number of open intensity development cells within a given 300m strip
- Developed__Low_Intensity_300: number of low intensity development cells within a given 300m strip
- Developed__Medium_Intensity_300: number of medium intensity development cells within a given 300m strip
- Developed__High_Intensity_300: number of high intensity development cells within a given 300m strip
- Barren_Land_300: number of baren land cells within a given 300m strip
- Deciduous_Forest_300: number of deciduous forest cells within a given 300m strip
- Evergreen_Forest_300: number of evergreen forest cells within a given 300m strip
- Mixed_Forest_300: number of mixed forest cells within a given 300m strip
- Shrub_Scrub_300: number of shrub-scrub cells within a given 300m strip
- Herbaceous_300: number of herbaceous cells within a given 300m strip
- Hay_Pasture_300: number of hay-pasture cells within a given 300m strip
- Cultivated_Crops_300: number of crop cells within a given 300m strip
- Woody_Wetlands_300: number of woody wetland cells within a given 300m strip
- Emergent_Herbaceous_Wetlands_300: number of emergent wetland cells within a given 300m strip
- Open_Water_p_300: proportion of open water cells within a given 300m strip
- Developed__Open_Space_p_300: proportion of open development cells within a given 300m strip
- Developed__Low_Intensity_p_300: proportion of low intensity development cells within a given 300m strip
- Developed__Medium_Intensity_p_300: proportion of medium intensity development cells within a given 300m strip
- Developed__High_Intensity_p_300: proportion of high intensity development cells within a given 300m strip
- Barren_Land_p_300: proportion of barren land cells within a given 300m strip
- Deciduous_Forest_p_300: proportion of deciduous forest cells within a given 300m strip
- Evergreen_Forest_p_300: proportion of evergreen forest cells within a given 300m strip
- Mixed_Forest_p_300: proportion of mixed forest cells within a given 300m strip
- Shrub_Scrub_p_300: proportion of shrub-scrub cells within a given 300m strip
- Herbaceous_p_300: proportion of herbaceous cells within a given 300m strip
- Hay_Pasture_p_300: proportion of hay-pasture cells within a given 300m strip
- Cultivated_Crops_p_300: proportion of crop cells within a given 300m strip
- Woody_Wetlands_p_300: proportion of woody wetland cells within a given 300m strip
- Emergent_Herbaceous_Wetlands_p_300: proportion of emergent wetland cells within a given 300m strip
- total_cells_300: total cells in a given 300m width strip, for use as denominator in proportion calculations
- area_sq_m_300: area in square meters of a given 300m width strip
- area_sq_km_300: area in square kilometers of a given 300m width strip
- patch_count_300: number of landscape patches in a given 300m width strip
- patch_d_300: patch density (number of patches divided by strip area in square kilometers) for a given 300m width strip
- stream_length_m_300: total stream length in meters within a given 300m width strip
- stream_length_km_300: total stream length in kilometers within a given 300m width strip
- stream_count_300: number of stream shapefiles within a 300m width strip
- stream_d_300: stream density (length of streams in kilometers divided by strip area in square kilometers) of a given 300m width strip
- allroad_length_m_300: total road length in meters within a given 300m width strip
- allroad_length_km_300: total road length in kilometers within a given 300m width strip
- allroad_count_300: number of road shapefiles within a 300m width strip
- allroad_d_300: road density (length of streams in kilometers divided by strip area in square kilometers) of a given 300m width strip
- bigroad_length_m_300: total road (just large roads) length in meters within a given 300m width strip
- bigroad_length_km_300: total road (just large roads) length in kilometers within a given 300m width strip
- bigroad_count_300: number of road (just large roads) shapefiles within a 300m width strip
- bigroad_d_300: road (just large roads) density (length of streams in kilometers divided by strip area in square kilometers) of a given 300m width strip
File: DATA_K3(individual_admixture).xlsx
Description: This file contains admixture proportions assigned to 3 genetic populations for all 777 samples. We ran program STRUCTURE for multiple iterations, which were then aggregated into one admixture result file by CLUMPP.
Variables
-
SITE: this column provides arbitrary site names (integers) for all individuals of a given site (i.e., all the rows where “4” is given in the SITE column are individuals from “site 4”)
-
K1: admixture proportion for a given individual assigned to one (arbitrarily named K1) genetic populations
-
K2: admixture proportion for a given individual assigned to one (arbitrarily named K2) genetic populations
-
K3: admixture proportion for a given individual assigned to one (arbitrarily named K3) genetic populations
File: DATA_K20(individual_admixture).xlsx
Description: This file is similar to DATA_K3(individual_admixture).xlsx, besides that this file contains admixture proportions assigned to 20, rather than 3, genetic populations for all 777 samples. The format is the same, and there are 20, arbitrarily named, genetic population columns (K1-K20).
File: DATA_IBD(segmented_IBD_data)_2.csv
Description: This file contains pairwise genetic distances (Fst) and geographic distances (km) between all sampled sites, used for isolation by distance modelling.
Variables
-
Site1: Starting site name (arbitrary integer)
-
Site2: Ending site name (arbitrary integer)
-
Fst: Pairwise genetic distance (Fst)
-
Distance: Pairwise geographic distance in kilometers
File: DATA_GP(genotypic_data_used_for_analyses).gen
Description: Microsatellite data of called alleles (6 digits – 2 three-digit base pair lengths, where repeated three-digit motifs are homozygotic for that allele) for all microsatellite loci, all 777 individuals, across all 41 sites used in our analysis. This is in in the GEN file (.gen) format for many analytical tools (e.g., STRUCTURE), whereby loci names are listed first, then individuals’ genotypes are given in rows, grouped by sampling site (Pop).
File: DATA_GP(genotypic_data_including_unused_microsat_marker).gen
Description: Same file as DATA_GP(genotypic_data_used_for_analyses).gen, only this version include problematic marker, Acr-29, which we excluded from our analyses due to null allele problems.
File: bestK_3.csv
Description: The ΔK (Evanno et al. 2005) and mean natural log of the probability of the data values for each K (number of ancestral populations) value, output from STRUCTURE HARVESTER (Earl and vonHoldt 2012) and STRUCTURE (Pritchard et al. 2000).
Earl, D.A., vonHoldt, B.M. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genet Resour 4, 359–361 (2012). https://doi.org/10.1007/s12686-011-9548-7
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959
Variables
-
K: the a priori number of ancestral populations
-
deltaK: the ΔK value as calculated by STRUCTURE HARVESTER
-
Mean LnP(K): the mean natural log of the probability of the data values as calculated by STRUCTURE
File: CODE_landscape_genetics.R
Description: Our code used for landscape genetic analyses. The study species is listed as Threatened by the state of Michigan (USA), so we do not provide the specific coordinates or the files used in the isolation by distance section of the code (e.g., "longlat.csv", "IBD_matrix_km.csv"). However, the pairwise geographic and genetic distances are provided in the DATA_IBD(segmented_IBD_data)_2.csv file, and can be used in place of the "IBD_data_real_no10-3.csv" input on line 54 to follow the linear and segmented isolation by distance modeling. The DATA_landscape_genetics_2.csv file can be used in place of the "C:/Users/travr/Desktop/data_files/revised_data.csv" input within the code for all landscape genetics modeling (i.e., maximum-likelihood population effects) code.
File: CODE_population_genetics.R
Description: Our code used for population genetic analyses. The DATA_GP(genotypic_data_used_for_analyses).gen file can be used in place of the "GP_no29_no10-3.gen" input file in the code, and the DATA_GP(genotypic_data_including_unused_microsat_marker).gen file can be used in place of the "GP_no10-3.gen" input file in the code. The "rAR_n_no10-3.csv" input in the code can be constructed by combining mean the mean rarified allelic richness values and site names in the Richness object with sample sizes reported in the corresponding article. The bestk_3.csv file can be used in place of the "bestK2.csv" input file in the code. The DATA_K3(individual_admixture).xlsx file can be used (after saving as .csv) in place of the "k3_clumpp_no10-3.csv" input file in the code. The DATA_K20(individual_admixture).xlsx file can be used (after saving as .csv) in place of the "k20_clumpp_no10-3.csv" input file in the code.
Code/software
car
Fox J, Weisberg S, Price B (2023). car: Companion to Applied Regression. R package version 3.1-2. https://CRAN.R-project.org/package=car
CLUMPP
Jakobsson M, Rosenberg NA (2007). “CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure.” Bioinformatics, 23(14), 1801–1806.
corMLPE
Pope N (2021). corMLPE: Correlation Structure for Pairwise Data. GitHub repository. https://github.com/nspope/corMLPE
dplyr
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.4. https://CRAN.R-project.org/package=dplyr
ggplot2
Wickham, Hadley. (2023). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. Version 3.4.0, https://CRAN.R-project.org/package=ggplot2.
hierfstat
Goudet J (2005). “hierfstat, a package for R to compute and test hierarchical F-statistics.” Molecular Ecology Notes, 5(1), 184–186.
MuMIn
Bartoń K (2023). MuMIn: Multi-Model Inference. R package version 1.47.5. https://CRAN.R-project.org/package=MuMIn
nlme
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2023). nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme
Polychrome
Coombes KR, Wang M (2019). Polychrome: Creating and Assessing Qualitative Palettes with Many Colors. Journal of Statistical Software, 90(1), 1–23.
PopGenReport
Adamack AT, Gruber B (2014). “PopGenReport: Simplifying basic population genetic analyses in R.” Methods in Ecology and Evolution, 5(4), 384–387.
raster
Hijmans RJ (2023). raster: Geographic Data Analysis and Modeling. R package version 3.6-26. https://CRAN.R-project.org/package=raster
remotes
Hester J, Bryan J (2023). remotes: R Package Installation from Remote Repositories, Including 'GitHub'. https://CRAN.R-project.org/package=remotes
segmented
Muggeo VMR (2023). segmented: Regression Models with Break-Points/Change-Points Estimation. R package version 2.0-2. https://CRAN.R-project.org/package=segmented
STRUCTURE
Pritchard JK, Stephens M, Donnelly P (2000). “Inference of population structure using multilocus genotype data.” Genetics, 155(2), 945–959.
(5), 1322–1332.
