Distinguishing intraspecific from interspecific variation in the leopard frog species complex
Data files
Apr 23, 2025 version files 2.56 GB
-
Chambersetal_PNAS_SuppMaterials.zip
2.56 GB
-
README.md
46.47 KB
Apr 23, 2025 version files 2.56 GB
-
Chambersetal_PNAS_SuppMaterials.zip
2.56 GB
-
README.md
46.47 KB
Abstract
In an era of unprecedented biodiversity loss, the need for standardized practices to describe biological variation is becoming increasingly important. As with all scientific endeavors, species delimitation needs to be explicit, testable, and refutable. A fundamental task in species delimitation is distinguishing within-species variation from among-species variation. Many species that are distributed across large geographic areas exhibit levels of genetic variation that are as great or greater than those that exist between well-defined sympatric species. Here, we provide a workflow to distinguish between intra- and interspecific genetic variation and apply the workflow to a taxonomically problematic group of frogs (the Rana pipiens complex, or leopard frogs) that are widely distributed across Mexico and Central America. Our workflow makes use of recent advancements that pair genome-scale datasets with model-based species delimitation methods, while emphasizing the need for positive evidence of reproductive isolation to confirm the validity of geographically contiguous species boundaries. We find that intraspecific geographic variation in widespread leopard frog species has resulted in considerable taxonomic inflation of species. Ten currently recognized species are not supported in our analyses, and we here synonymize them with previously named taxa. Furthermore, we find positive evidence for the presence of three undescribed species. In addition to proposing these taxonomic changes, we provide descriptions of the data or analyses that would be needed to refute and overturn our recommendations. We recommend that all species delimitation studies (especially of geographically variable groups) clarify what new evidence would be sufficient to change the taxonomic recommendations.
This repository contains data files for:
Chambers EA, Lara-Tufiño JD, Campillo-García G, Cisneros-Bernal AY, Dudek Jr. DJ, León-Règagnon V, Townsend JH, Flores-Villela O, & Hillis DM. (2025) Distinguishing species boundaries from geographic variation. 122 (19) e2423688122.
This paper provides a suggested framework for testing whether genetic variation arises through species boundaries or intraspecific geographic variation and focuses on species in the leopard frog species complex (Rana pipiens) in Mexico and Central America. All raw data and input files can be found on Dryad, and raw, demultiplexed fastqs can be found on SRA (BioProject PRJNA1233814). All scripts can also be found on GitHub here.
DRYAD FILES: DATA FILES
Naming Conventions
-
METHOD: File or directory names contain "rana_n-1" or "pooled" to denote pooled assembly on all samples, while the term "separate" is used to refer to different subassemblies of individuals.
-
TAXONOMY: Files relating to separate assemblies are named according to the geographic region from where samples were collected (except for forreri). They are as follows:
- PACMX (n=274): Pacific coast (lowlands and foothills) of Mexico. Includes described species: R. forreri (MX), R. macroglossa (MX), R. spectabilis (Oaxaca only), R. omiltemana, R. magnaocularis, R. yavapaiensis Also includes unnamed species (Hillis & Wilcox 2005): sp. 7 (Jalisco) and sp. 8 (Puebla), as well as the "papagayo form" described by Hillis et al. (1983) and Arcelia and Colima forms.
- CENTAM (n=158): Samples from Central America. Includes described species: R. lenca, R. miadis, R. macroglossa (CENTAM), R. forreri (CENTAM), R. taylori Also includes unnamed species (Hillis & Wilcox 2005): sp. 4 (PA), sp. 5 (CR), sp. 6 (CR)
- ATL_MXPL (n=202): Atlantic coast and the Mexican Plateau. Includes described species: R. taylori, R. macroglossa, R. brownorum [not explicitly IDed], R. berlandieri, R. spectabilis, R. tlaloci, R. neovolcanica, R. chichicuahutla [not explicitly IDed]. Also includes unnamed species (Hillis & Wilcox 2005): sp. 3 (which is R. spectabilis).
- forreri (n=104): All members from the Pacific coastal lowlands. Includes described species: R. miadis, R. forreri, R. cora, R. adleri, R. hillisi, R. floresi
-
SEQUENCING: The sequencing data were delivered in four separate projects and a number of pools within those projects. Bioinformatics assembly files are named according to the project and the pool. For example, "params-JA19529_pool3.txt" is the iPyrad parameter file for pool 3 within the JA19529 project.
- JA19241 (n=130): pools 1-6
- JA19242 (n=311); pools 1-13
- JA20247 (n=20); pool 1
- JA20248 (n=163); pools 1-7
- We also incorporated several samples from Chambers et al. (2023). We first copied fastqs from their SRA archive and re-ran iPyrad steps 2-3 (n=9). This assembly is named "epirana" in our supplementary files; the params file is called "params-epirana.txt".
Please be aware that a number of samples were used in this assembly that were not included in the analyses in the final manuscript. This is why the iPyrad parameters and stats files include more samples (n=632) than are included in our final assembly (n=527). This was due to the following three reasons:
(1) Samples were dropped that were low quality (i.e., high amounts of missing data)
(2) Samples originated from species that were not included in the present study as they were more taxonomically distant from the leopard frog complex
(3) Samples were collected in series from existing samples and thus were effectively duplicates for our analyses
Data Structure
Supplementary data files are organized in directories corresponding to which part of the analysis they correspond to and the order in which they were performed:
1_Bioinformatics
contains all input and output files from running bioinformatics pipelines2_Data_processing
contains files relevant to processing output from iPyrad prior to running analyses3_Analyses
contains all input and output files from phylogenomic, population genomic, species delimitation, landscape genomic, and gene flow analyses4_Data_visualization
Chambersetal_PNAS_SuppMaterials/
|-- 1_Bioinformatics/
| | |-- Pooled_assembly/ # iPyrad files for pooled assembly
| | |-- iPyrad_input_files/ # iPyrad input files for pooled assembly
| | | |-- params-*.txt # iPyrad parameter files
| | | |-- *_barcodes_pool*.txt # Barcodes for a given job and pool
| | | |-- params-rana_n-1.txt # iPyrad parameter files for final assembly
| | |-- iPyrad_output_files/ # iPyrad output files for pooled assembly
| | | |-- pooled_rana.vcf # Variant call format file for genotypes
| | | |-- pooled_rana.phy # Concatenated Phylip with complete datasets
| | | |-- pooled_rana.snps # Concatenated Phylip file containing only SNPs
| | | |-- pooled_rana.usnps # Concatenated Phylip file with one SNP/locus
| | | |-- pooled_rana.loci # iPyrad loci file
| | | |-- pooled_rana.snps.map # A map file
| | | |-- rana_n-1_stats.txt # Stats file from step 7
| | | |-- stats_files/ # Stats files from iPyrad steps 1-5
| | | | |-- *_s1_demultiplex_stats.txt # Output stats from demplxing
| | | | |-- *_s2_rawedit_stats.txt # Output stats from step 2
| | | | |-- *_s3_cluster_stats.txt # Output stats from step 3
| | | | |-- pooled_s4_joint_estimate.txt # Output stats from step 4
| | | | |-- pooled_s5_consens_stats.txt # Output stats from step 5
| | |-- Separate_assemblies/ # iPyrad files for separate assemblies
| | | |-- iPyrad_input_files/ # iPyrad input files for separate assemblies
| | | | |-- params-*.txt # iPyrad parameter files
| | | |-- iPyrad_output_files/ # iPyrad output files for separate assemblies
| | | | |-- new_ATL_MXPL/# iPyrad output files for ATL_MXPL assembly
| | | | | |--* .phy # Phylip file with complete datasets
| | | | | |-- *.snps.phy # Concatenated Phylip file, only SNPs
| | | | | |--* _stats.txt # Stats files for all steps
| | | | | |-- *.loci # iPyrad loci file
| | | | | |--* .snps.map # iPyrad map file
| | | | | |-- *.str # Structure format file with all SNPs
| | | | | |--* .u.snps.phy # Concatenated Phylip file
| | | | | |-- *.ustr # Structure format file
| | | | | |--* .vcf # Variant call format file
| | | | |-- new_CENTAM/ # iPyrad output files for CENTAM assembly
| | | | | |--* .phy # Concatenated Phylip file
| | | | | |-- *.snps.phy # Concatenated Phylip file, SNPs
| | | | | |--* _stats.txt # Stats files for all steps
| | | | | |-- *.loci # iPyrad loci file
| | | | | |--* .snps.map # iPyrad map file
| | | | | |-- *.str # Structure format file with all SNPs
| | | | | |--* .u.snps.phy # Concatenated Phylip file
| | | | | |-- *.ustr # Structure format file
| | | | | |--* .vcf # Variant call format file
| | | |-- new_PACMX/ # iPyrad output files for PACMX assembly
| | | | | |--* .phy # Concatenated Phylip file
| | | | | |-- *.snps.phy # Concatenated Phylip file, SNPs
| | | | | |--* _stats.txt # Stats files for all steps
| | | | | |-- *.loci # iPyrad loci file
| | | | | |--* .snps.map # iPyrad map file
| | | | | |-- *.str # Structure format file with all SNPs
| | | | | |--* .u.snps.phy # Concatenated Phylip file
| | | | | |-- *.ustr # Structure format file
| | | | | |--* .vcf # Variant call format file
| | | | |-- forreri/ # iPyrad output files for forreri assembly
| | | | | |--* .phy # Concatenated Phylip file with complete datasets
| | | | | |-- *.snps.phy # Concatenated Phylip file containing only SNPs
| | | | | |--* _stats.txt # Stats files for all steps
| | | | | |-- *.loci # iPyrad loci file with info on which SNPs belonged to the same locus
| | | | | |--* .snps.map # A map file with details on which SNP belongs on which locus
| | | | | |-- *.str # Structure format file with all SNPs
| | | | | |--* .u.snps.phy # Concatenated Phylip file with one SNP per locus
| | | | | |-- *.ustr # Structure format file with one SNP per locus
| | | | | |--* .vcf # Variant call format file for encoding genotypes
|
| |-- 2_Data_processing/
| | |-- data_files_input_into_scripts/ # Files for input into data processing scripts
| | | |-- pooled_dataset_assignments.txt # Info on assembly assignment
| | | |-- pooled_missing.txt # Info on missing data
| | | |-- pooled_statefreqs.txt # State frequencies calculated using PAUP*
| | | |-- PACMX_metadata.txt # Metadata associated with PACMX assembly
| | | |-- ATL_MXPL_metadata.txt # Metadata associated with ATL_MXPL assembly
| | | |-- CENTAM_metadata.txt # Metadata associated with CENTAM assembly
| | | |-- forreri_metadata.txt # Metadata associated with forreri assembly
| | | |-- forreri_sites.txt # Site-level data for forreri samples
| | | |-- forreri_envpcs_results.txt # Variance explained by each PC
| | | |-- bioclim_vars.txt # Bioclimatic variable names
| | | |-- mx_centam_merged.* # Merged Mexico + CentAm shapefile for FEEMS
|
| |-- 3_Analyses/
| | |-- 1_phylo/ # Files associated with phylogenomic analysis in RAxML-ng
| | | |-- input_files/ # Input files for RAxML-ng analysis
| | | | |-- pooled_rana.snps_min10k.recode.vcf # VCF for 10K SNPs
| | | | |-- pooled_rana.snps_min10K.phy # Phylip file
| | | | |-- pooled_rana.snps_min10K_invarrem.phy # Input RAxML-ng file for RAxML-ng, invar sites removed
| | | | |-- pooled_rana.snps_min10K_invarrem.nexus # Nexus file for RAxML-ng, invar sites removed
| | | |-- output_files/ # All output files associated with RAxML-ng run
| | | | |-- pruned_tree.bestTree.tree # Best RAxML tree
| | | | |-- pruned_tree.support.tree # Best RAxML tree with bootstrap support values
| | | | |-- pruned_tree_tips.txt # Tip labels for RAxML tree
|
| | |-- 2_popgen/ # Files associated with running admixture analysis
| | | |-- ATL_MXPL/ # All admixture files associated with ATL_MXPL
| | | | |-- input_files/ # Input files for running admixture
| | | | | |-- atlmxpl_indskeep.txt # List of samples to be included for admixture analysis
| | | | | |-- ATL_MXPL_relaxed.recode.vcf # VCF with samples to be included for admixture analysis
| | | | | |-- ATL_MXPL_relaxed_0.25miss.recode.vcf # VCF with missing data filter
| | | | | |-- ATL_MXPL_relaxed_0.25miss.lmiss # Variant-based missing data report for above
| | | | | |-- ATL_MXPL_relaxed_0.25miss.imiss # Individual-based missing data report for above
| | | | | |-- ATL_MXPL_relaxed_0.25miss_ldpruned.txt # List of sites to prune
| | | | | |-- ATL_MXPL_relaxed_0.25miss_ldp.recode.vcf # LD-pruned VCF for admixture analysis
| | | | | |-- ATL_MXPL_relaxed_0.25miss_ldp.bim # Plink bim file for admixture analysis
| | | | | |-- ATL_MXPL_relaxed_0.25miss_ldp.bed # Plink bed file for admixture analysis
| | | | | |-- ATL_MXPL_relaxed_0.25miss_ldp.fam # Plink fam file for admixture analysis
| | | | | |-- ATL_MXPL_relaxed_0.25miss_ldp.log # Plink log file for admixture analysis
| | | | | |-- ATL_MXPL_relaxed_0.25miss_ldp.nosex # Plink nosex file for admixture analysis
| | | | |-- output_files/ # Admixture output files, named based on K value and repetition
| | | | | |-- *.out # Output log from admixture
| | | | | |-- *.Q # Q matrix from admixture
| | | |-- PACMX/ # All admixture files associated with PACMX assembly
| | | | |-- input_files/ # Input files for running admixture
| | | | | |-- pacmx_indskeep.txt # List of samples to be included for admixture analysis
| | | | | |-- new_PACMX_relaxed.recode.vcf # VCF with samples to be included for admixture analysis
| | | | | |-- new_PACMX_relaxed_0.25miss.recode.vcf # VCF with missing data filter
| | | | | |-- new_PACMX_relaxed_0.25miss.lmiss # Variant-based missing data report for above
| | | | | |-- new_PACMX_relaxed_0.25miss.imiss # Individual-based missing data report for above
| | | | | |-- new_PACMX_relaxed_0.25miss_ldpruned.txt # List of sites to prune
| | | | | |-- new_PACMX_relaxed_0.25miss_ldp.recode.vcf # LD-pruned VCF for admixture analysis
| | | | | |-- new_PACMX_relaxed_0.25miss_ldp.bim # Plink bim file for admixture analysis
| | | | | |-- new_PACMX_relaxed_0.25miss_ldp.bed # Plink bed file for admixture analysis
| | | | | |-- new_PACMX_relaxed_0.25miss_ldp.fam # Plink fam file for admixture analysis
| | | | | |-- new_PACMX_relaxed_0.25miss_ldp.log # Plink log file for admixture analysis
| | | | | |-- new_PACMX_relaxed_0.25miss_ldp.nosex # Plink nosex file for admixture analysis
| | | | |-- output_files/ # Admixture output files, named based on K value and repetition
| | | | | |-- *.out # Output log from admixture
| | | | | |-- *.Q # Q matrix from admixture
| | | |-- forreri/ # All admixture files associated with forreri assembly
| | | | |-- input_files/ # Input files for running admixture
| | | | | |-- forreri_0.25miss.recode.vcf # VCF with missing data filter
| | | | | |-- forreri_0.25miss.lmiss # Variant-based missing data report for above
| | | | | |-- forreri_0.25miss.imiss # Individual-based missing data report for above
| | | | | |-- forreri_0.25miss_ldpruned.txt # List of sites to prune
| | | | | |-- forreri_0.25miss_ldp.recode.vcf # LD-pruned VCF for admixture analysis
| | | | | |-- forreri_0.25miss_ldp.bim # Plink bim file for admixture analysis
| | | | | |-- forreri_0.25miss_ldp.bed # Plink bed file for admixture analysis
| | | | | |-- forreri_0.25miss_ldp.fam # Plink fam file for admixture analysis
| | | | | |-- forreri_0.25miss_ldp.log # Plink log file for admixture analysis
| | | | | |-- forreri_0.25miss_ldp.nosex # Plink nosex file for admixture analysis
| | | | |-- Output_files/ # Admixture output files, named based on K value and repetition
| | | | | |-- *.out # Output log from admixture
| | | | | |-- *.Q # Q matrix from admixture
| | | |-- CENTAM/ # All admixture files associated with CENTAM assembly
| | | | |-- input_files/ # Input files for running admixture
| | | | | |-- centam_indskeep.txt # List of samples to be included for admixture analysis
| | | | | |-- new_CENTAM_relaxed.recode.vcf # VCF with samples to be included for admixture
| | | | | |-- new_CENTAM_relaxed_0.25miss.recode.vcf # VCF with missing data filter
| | | | | |-- new_CENTAM_relaxed_0.25miss.lmiss # Variant-based missing data report for above
| | | | | |-- new_CENTAM_relaxed_0.25miss.imiss # Individual-based missing data report for above
| | | | | |-- new_CENTAM_relaxed_0.25miss_ldpruned.txt # List of sites to prune
| | | | | |-- new_CENTAM_relaxed_0.25miss_ldp.recode.vcf # LD-pruned VCF for admixture analysis
| | | | | |-- new_CENTAM_relaxed_0.25miss_ldp.bim # Plink bim file for admixture analysis
| | | | | |-- new_CENTAM_relaxed_0.25miss_ldp.bed # Plink bed file for admixture analysis
| | | | | |-- new_CENTAM_relaxed_0.25miss_ldp.fam # Plink fam file for admixture analysis
| | | | | |-- new_CENTAM_relaxed_0.25miss_ldp.log # Plink log file for admixture analysis
| | | | | |-- new_CENTAM_relaxed_0.25miss_ldp.nosex # Plink nosex file for admixture analysis
| | | | |-- output_files/ # Admixture output files, named based on K value and repetition
| | | | | |-- *.out # Output log from admixture
| | | | | |-- *.Q # Q matrix from admixture
|
| | |-- 3_hhsd/ # All files associated with running HHSD analysis
| | | | |-- mxpl/ # All HHSD files associated with mxpl group (subset from ATL_MXPL assembly)
| | | | | |-- input_files/ # HHSD input files for mxpl group
| | | | | | |-- ATL_MXPL_0.6miss_ldp.recode.vcf # VCF with stricter miss filter
| | | | | | |-- mxpl_indv.txt # Inds to be retained for mxpl HHSD analysis from ATL_MXPL
| | | | | | |-- mxpl_remove.txt # Inds to be removed from ATL_MXPL
| | | | | | |-- mxpl_06.recode.vcf # VCF with inds retained for HHSD analysis from ATL_MXPL
| | | | | | |-- mxpl_06.lmiss # Variant-based missing data report from above file
| | | | | | |-- mxpl_06.phy # Phylip file for input into HHSD
| | | | | | |-- mxpl_0675.recode.vcf # Smaller VCF with inds retained for HHSD analysis
| | | | | | |-- mxpl_0675.lmiss # Variant-based missing data report from above file
| | | | | | |-- mxpl_0675.phy # Phylip file for input into HHSD
| | | | | | |-- mxpl-Imap.txt # Population assignment for HHSD
| | | | | | |-- cf_mxpl_merge_0120.txt # Control file for merge alg with migprior(0.1, 20)
| | | | | | |-- cf_mxpl_split_0120.txt # Control file for split alg with migprior(0.1, 20)
| | | | | | |-- cf_mxpl0675_merge_0110.txt # Control file for merge with migprior(0.1, 10)
| | | | | | |-- cf_mxpl0675_split_0110.txt # Control file for split with migprior(0.1, 10)
| | | | | | |-- cf_mxpl0675_merge_220.txt # Control file for merge with migprior(2, 20)
| | | | | | |-- cf_mxpl0675_split_220.txt # Control file for split with migprior(2, 20)
| | | | | |-- output_files/ # HHSD output files for mxpl group
| | | | | | |-- mxpl_merge_0120/ # Output files for merge alg with migprior(0.1, 20)
| | | | | | | |-- Iteration_*/# HHSD output files corresponding to respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment from iteration
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- mxpl_split_0120/ # Output files for split algorithm with migprior(0.1, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment from iteration
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- mxpl0675_merge_0110/ # Output files for split alg with migprior(0.1, 10)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- mxpl0675_split_0110/ # Output files for split with migprior(0.1, 10)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- mxpl0675_merge_220/ # Output files for split algorithm with migprior(2, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- mxpl0675_split_220/ # Output files for split algorithm with migprior(2, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | |-- foothills/ # All HHSD files associated with foothills group (subset from PACMX assembly)
| | | | | |-- input_files/ # HHSD input files for foothills group
| | | | | | |-- PACMX_0.6miss_ldp.recode.vcf # VCF with stricter missing data filter
| | | | | | |-- foothills_indv.txt # Inds to be retained from PACMX
| | | | | | |-- foothills_remove.txt # Individuals to be removed from PACMX
| | | | | | |-- foothills_06.recode.vcf # VCF with inds retained for mxpl analysis
| | | | | | |-- foothills_06.lmiss # Variant-based missing data report from above
| | | | | | |-- foothills_06.phy # Phylip file for input into HHSD
| | | | | | |-- foothills_07.recode.vcf # Smaller VCF with inds retained for foothills
| | | | | | |-- foothills_07.lmiss # Variant-based missing data report from above
| | | | | | |-- foothills_07.phy # Phylip file for input into HHSD
| | | | | | |-- foothills-Imap.txt # Population assignment for HHSD
| | | | | | |-- cf_foothills_merge_0120.txt # Control file for merge with migprior(0.1, 20)
| | | | | | |-- cf_foothills_split_0120.txt # Control file for split with migprior(0.1, 20)
| | | | | | |-- cf_foothills07_merge_0110.txt # Control file for merge with migprior(0.1, 10)
| | | | | | |-- cf_foothills07_split_0110.txt # Control file for split with migprior(0.1, 10)
| | | | | | |-- cf_foothills07_merge_220.txt # Control file for merge with migprior(2, 20)
| | | | | | |-- cf_foothills07_split_220.txt # Control file for split with migprior(2, 20)
| | | | | |-- output_files/ # HHSD output files for foothills group
| | | | | | |-- foothills_merge_0120/ # Output files for merge alg with migprior(0.1, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- foothills_split_0120/ # Output files for split with migprior(0.1, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- foothills07_merge_0110/ # Output files for merge with migprior(0.1, 10)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- foothills07_split_0110/ # Output files for split with migprior(0.1, 10)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment from iteration
| | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | |-- foothills07_merge_220/ # Output files for merge alg with migprior(2, 20)
| | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment from iteration
| | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | |-- foothills07_split_220/ # Output files for split alg with migprior(2, 20)
| | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment from iteration
| | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | |-- forreri/ # All HHSD files associated with forreri group
| | | | | |-- input_files/ # HHSD input files for forreri group
| | | | | | |-- forreri_0.6miss_ldp.recode.vcf # VCF with stricter missing data filter
| | | | | | |-- forreri_0.6miss.lmiss # Variant-based missing data report
| | | | | | |-- forreri_0.6miss_ldp.phy # Phylip file for input into HHSD
| | | | | | |-- forreri_0.75miss_ldp.recode.vcf # Smaller VCF with stricter missdata filter
| | | | | | |-- forreri_075.lmiss # Variant-based missing data report from above
| | | | | | |-- forreri_075.phy # Phylip file for input into HHSD
| | | | | | |-- forreri-Imap.txt # Population assignment for HHSD
| | | | | | |-- cf_forreri_merge_0120.txt # Control file for merge with migprior(0.1, 20)
| | | | | | |-- cf_forreri_split_0120.txt # Control file for split with migprior(0.1, 20)
| | | | | | |-- cf_forreri075_merge_0110.txt # Control file for merge with migprior(0.1, 10)
| | | | | | |-- cf_forreri075_split_0110.txt # Control file for split with migprior(0.1, 10)
| | | | | | |-- cf_forreri075_merge_220.txt # Control file for merge with migprior(2, 20)
| | | | | | |-- cf_forreri075_split_220.txt # Control file for split with migprior(2, 20)
| | | | | |-- output_files/ # HHSD output files for forreri group
| | | | | | |-- forr06_K5_merge_0120/ # Output files for merge with migprior(0.1, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for respective iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision for given iteration
| | | | | | | | |-- estimated_M.csv # Estimated migration rates between pops
| | | | | | | | |-- estimated_tau_theta.csv # Est tau and theta vals between pops
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- forr06_K5_split_0120/ # Output files for split alg with migprior(0.1, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision
| | | | | | | | |-- estimated_M.csv # Estimated migration rates
| | | | | | | | |-- estimated_tau_theta.csv # Estimated tau and theta vals
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- forreri075_merge_0110/ # Output files for merge alg with migprior(0.1, 10)
| | | | | | | |-- Iteration_*/ # HHSD output files for iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision
| | | | | | | | |-- estimated_M.csv # Estimated migration rates
| | | | | | | | |-- estimated_tau_theta.csv # Estimated tau and theta vals
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- forreri075_split_0110/ # Output files for split alg with migprior(0.1, 10)
| | | | | | | |-- Iteration_*/ # HHSD output files for iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision
| | | | | | | | |-- estimated_M.csv # Estimated migration rates
| | | | | | | | |-- estimated_tau_theta.csv # Estimated tau and theta vals
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment from iteration
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- forreri075_merge_220/ # Output files for merge alg with migprior(2, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision
| | | | | | | | |-- estimated_M.csv # Estimated migration rates
| | | | | | | | |-- estimated_tau_theta.csv # Estimated tau and theta vals
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
| | | | | | |-- forreri075_split_220/ # Output files for split alg with migprior(2, 20)
| | | | | | | |-- Iteration_*/ # HHSD output files for iteration
| | | | | | | | |-- decision.csv # HHSD spp delim decision
| | | | | | | | |-- estimated_M.csv # Estimated migration rates
| | | | | | | | |-- estimated_tau_theta.csv # Estimated tau and theta vals
| | | | | | | | |-- proposal_bpp_mcmc.txt # MCMC results from BPP
| | | | | | | | |-- proposal_bpp_out.txt # Output from BPP
| | | | | | | | |-- proposed_ctl.ctl # Control file for BPP
| | | | | | | | |-- proposed_imap.txt # Imap file for BPP
| | | | | | | | |-- RESULT_IMAP.txt # Resulting pop assignment
| | | | | | | | |-- RESULT_TREE.txt # Resulting tree from iteration
|
| | |-- 4_landgen/ # All files associated with running MMRR and GDM analyses
| | | |-- forreri_envlayers.tif # Environmental layers for forreri
| | | |-- forreri_PCenv.tif # Stack of three raster PCs for landgen analysis
| | | |-- forreri_cors_env.csv # Correlation coefficients between bioclim vars
| | | |-- forreri_geodist_ldp.txt # Geographic distances between forreri samples
| | | |-- forreri_gendist_ldp.txt # Genetic distances between forreri samples
| | | |-- forreri_site_freqs.txt # Mean site-based allele freqs for each locus
|
| | |-- 5_feems/ # All files associated with running FEEMS analysis
| | | |-- input_files/ # All input files associated with FEEMS analysis
| | | | |-- forreri_0.25miss_ldp_n103.recode.vcf.gz # VCF with sample missing coords removed
| | | | |-- forreri_FILT.* # VCF and Plink files for input into FEEMS
| | | | |-- forreri_coords.txt # Sampling coordinates
| | | | |-- forreri_outer.txt # Outer coordinates
| | | | |-- forr_grid.shp # Discrete global grid
| | | |-- output_files/ # All output files associated with FEEMS analysis
| | | | |-- lamb_grid.csv # Grid of lambda values for cross-validation analysis
| | | | |-- feems_edges.csv # FEEMS edge positions
| | | | |-- feems_node_pos.csv # FEEMS node positions
| | | | |-- feems_node_pos_T.csv # FEEMS node positions
| | | | |-- feems_nodes.csv # Node numbering
| | | | |-- feems_w.csv # Weights of edges
| | | | |-- mean_cv_err.csv # Results from cross-validation analysis
|
|-- 4_Data_visualization/
| |-- data_files_input_into_scripts/ # Files for input into data visualization scripts
| | |-- guatemala_bocourt.txt # Coordinates relevant to macroglossa type locality
| | |-- type_localities.txt # Coordinates for type localities
| | |-- type_localities_rec.txt # Recommended taxonomy type localities
| | |-- type_localities_notrec.txt # Type localities for synonymized taxa
| | |-- forreri_typelocalities.txt # Type localities for forreri complex
| | |-- forreri_sites.txt # Site-level data for forreri sampling coords
| | |-- PACMX_sites.txt # Site-level data for forreri sampling coords
| | |-- *_order.txt # Ordering of inds for pop gen plotting
| |-- rana_taxonomy.txt # Taxonomic history of Rana
ZENODO FILES: SCRIPTS AND SUPPLEMENTARY TABLES AND FIGURES doi: 10.5281/zenodo.15001401 and 10.5281/zenodo.15001403
Contains code to conduct above analyses and data visualization. N.B.: Code can also be found on GitHub here; if you do so just dump all the files (without changing directory names!) into data/
)
Chambersetal_PNAS_Software/
|-- analysis/
| |-- 1_data_processing/ # Scripts to process iPyrad output files and get summary stats
| | |-- bioinformatics_processing.sh # Walkthrough for running iPyrad
| | |-- basic_data_characteristics.sh # Walkthrough and bash code to get basic summary stats, invariant sites
| | |-- basic_stats.R # Calculate average read depth from iPyrad stats files
| |-- 2_phylo/ # Scripts associated with generated phylogenetic tree using RAxML-ng
| | |-- RAXML-ng.sh # Code to reconstruct RAxML-ng tree for pooled dataset
| | |-- remove_invariant_sites.R # Script to remove invariant sites prior to running RAXML-ng
| | |-- state_freqs.R # Calculate state frequencies for RAXML-ng
| |-- 3_popgen/ # Scripts to run admixture
| | |-- population_structure.sh # Walkthrough and code to run admixture
| | |-- LD-pruning.R # Performs LD-pruning on data prior to running admixture
| |-- 4_hhsd/ # Scripts to run species delimitation using HHSD
| | |-- HHSD.sh # Walkthrough for running HHSD
| | |-- hhsd.R # Script to generate Imap file and locus numbering
| | |-- hhsd_functions.R # Functions used by hhsd.R script
| | |-- process_loci_file.py # Script to retrieve full loci from iPyrad output files for select SNPs
| |-- 5_landgen/ # Scripts to perform landscape genomic analyses
| | |-- env_data.R # Gather and process environmental data layers for landscape genomics
| | |-- land_gen.R # Run landscape genomic analyses & generate Figs. 5 and S6
| | |-- land_gen_functions.R # Functions used by land_gen.R
| |-- 6_feems/ # Scripts to run FEEMS
| | |-- FEEMS_preprocessing.sh # Code to process data for FEEMS
| | |-- FEEMS.R # Code to generate FEEMS input files
| | |-- run_feems.py # Code to actually run FEEMS
|
|-- data_viz/ # Scripts for data visualization and figures
| |-- pop_gen_figures.R # Make any parts of figs. to do with admixture: 2B, 2C, 3B, 3C, S5
| |-- pop_gen_functions.R # Functions used by pop_gen_figures.R script
| |-- FEEMS_figures.R # Make Figs. 5C and S9
| |-- land_gen.R # Make Fig. S8; Figs. 5 and S6 created in analysis/5_landgen/land_gen.R
| |-- mapping.R # Make any figs. that are maps: 1C, 2A, 3A, S3, macroglossa type locality
| |-- mapping_functions.R # Functions used for mapping
| |-- rana_colors.R # Function to retrieve color-coding for various analyses
| |-- rana_taxonomy.R # Make Fig. S2: taxonomic history of leopard frog complex
Chambers_Rana_PNAS_SupportingInformation.pdf
: Supplementary Methods, Results, Tables, and Figures
- Supporting Methods
- Supporting Results
- Taxonomic Recommendations
- Taxonomic History of the Leopard Frog Species Complex
- Note on the Type Localities of R. forreri, R. macroglossa, and R. omiltemana
- SI References
- Figures S1 to S9
- Tables S1 to S4
Supplementary Figure and Table Captions
FIGURE S1. Species delimitation workflow used in the present study.
FIGURE S2. Historical examination of the taxonomy within the Rana pipiens complex (modified from ref. 47) demonstrating the number of species recognized at the time (red line) and cumulative number of currently accepted species (black line). Major paradigm shifts related to accepted species concepts are indicated in colored boxes, with relevant time points indicated with labeled dashed lines. Details on taxonomic revisions are provided in SI Appendix, Taxonomic History of the Leopard Frog Species Complex, Taxonomic Recommendations, and Table S1. Details on species descriptions can be found in SI Appendix, Table S4.
FIGURE S3. Elevational profile and regions in Mexico referred to in text. Indicated regions combine the following Mexican biogeographic provinces: Central Mexican Plateau (includes high plains north and south provinces); Atlantic Coastal Lowlands (includes Tamaulipas, Gulf of Mexico, Yucatán, and Petén provinces); Pacific Coastal Lowlands (includes Sonoran and Pacific coast provinces); Sierra Madre de Chiapas (includes Soconusco and Chiapas highlands provinces); Sierra Madre de Oaxaca (includes the Oaxaca province). The remaining five indicated regions (Sierra Madre Occidental, Sierra Madre Oriental, Sierra Madre del Sur, Transvolcanic Belt, and Balsas Basin) are equivalent to biogeographic provinces. For simplicity, we have omitted the three Mexican biogeographic provinces that are within the states of Baja California del Norte and del Sur (California, Baja California, and del Cabo) as they fall outside the current study’s region of interest. Dashed lines indicate Mexican state boundaries. Biogeographic provinces modified based on map provided in (48).
FIGURE S4. Phylogeny for pooled assembly (n=479) reconstructed using maximum likelihood in RAxML-ng (49, 50). Tip labels are consistent with sample identifiers from bioinformatics analysis; tip circles are colorized according to the best K value obtained from ADMIXTURE results (see SI Appendix, Fig. S5). Collection localities at the state level (for Mexico and the United States; remaining countries labeled at the country level) are indicated in gray. If applicable, relevant information (e.g., paratypes, holotypes, specimens collected at type localities, specimens collected at sympatric localities, or undescribed species numbers from [7]) are provided in bolded gray text. Bootstrap support values indicated as colorized node circles. Four separate assemblies used for analyses are indicated on the right-hand side of the figure; darker gray bars indicate individuals that were included. Abbreviations: ATL_MXPL: individuals from the Central Mexican Plateau, Transvolcanic Belt, Sierra Madre Oriental and northern Atlantic Coastal Lowlands; PACMX: individuals from the Pacific Coastal Lowlands and Balsas Basin; CENTAM: individuals from Central America; forreri: individuals belonging to the R. forreri complex.
FIGURE S5. Results from ADMIXTURE (51) for four separate assemblies: (A) Atlantic Coastal Lowlands and Central Mexican Plateau (ATL_MXPl; n = 189); (B) Central American individuals (CENTAM; n = 140); (C) Pacific Coastal Lowlands and foothills of Mexico (PACMX; n = 245); and (D) R. forreri (forreri; n = 104). Topmost panels are cross-validation mean error and standard deviation across five replicate runs for each assembly; red vertical lines indicate the optimal number of clusters selected for final analyses (Figs. 2 and 3; indicated also in bold in bar plots) and blue boxes represent K values for which bar plots are illustrated below. In all cases, individuals were included only if they had greater than 10,000 SNPs. Black vertical lines in bar plots represent subspecies or species boundaries recognized in this study. Sequencing IDs are provided at the top of bar plots; see Dataset S1 for corresponding metadata.
FIGURE S6. Heuristic hierarchical species delimitation (4) results for (A) the Pacific Coastal Lowlands and Balsas Basin (forreri grouping); (B) the foothills of the Sierra Madre Occidental and the Sierra Madre del Sur, and the western Transvolcanic Belt (foothills grouping); and (C) the Central Mexican Plateau and Atlantic Coastal Lowlands (mxpl grouping) for both HHSD merge (left) and split (right) algorithms. Bars above guide trees indicate which populations were considered in HHSD analysis and are colorized according to estimated gdi value. Bars are ordered bottom to top based on iterative merges or splits made by HHSD. Dotted lines connecting bars indicate which populations were compared.
FIGURE S7. Mantel test results for the R. forreri complex (forreri assembly; n = 103). The Mantel r statistic was 0.77 (p-value = 0.001).
FIGURE S8. (A) Environmental values for the third highest principal component from a raster PCA for the R. forreri complex study area with sampling localities used in landscape genomic analyses indicated as black circles; white lines denote country borders and Mexican states. (B) Bioclimatic variables that were included in our landscape genomic analyses loaded onto the first and third highest principal components from a raster PCA. Bioclimatic variables 13 and 14 (BIO 13 and BIO 14) loaded highly onto raster PC3 and correspond to precipitation of the wettest and driest months, respectively.
FIGURE S9. Results from cross-validation analysis from Fast Estimation of Effective Migration Surfaces (52, 53). Red dashed line indicates selected lambda value (20) which minimized cross-validation error.
TABLE S1. Currently recognized species within the Scurrilirana subgroup within the R. pipiens complex (Pantherana), indicating numbers of individuals that were included and taxonomic recommendations of the current study. Also included are undescribed species from (9) and (7) for which representatives were included in this study.
TABLE S2. Basic statistics for assemblies generated using iPyrad v.0.9.85 (54).
TABLE S3. Table S3. Heuristic hierarchical species delimitation (4) results for three data groupings, three migration rate priors, and for both HHSD merge and split algorithms. See Fig. 4 for a visualization of population assignment groupings used as input populations in HHSD guide tree. The threshold for the genealogical divergence index (gdi; ref. 5) was the same for all runs: for merge algorithm, it was set to <0.3 and for the split algorithm, it was set to >0.7. We ran each parameter set for 200,000 generations with 10% of samples discarded as burnin. Results shown for migration priors of (0.1, 10) and (2, 20) are for downsampled datasets while those for priors of (0.1, 20) are for full datasets. See SI Appendix, Supporting Methods for details and SI Appendix, Figure S6 for HHSD results for visualization of results.
Table S4. Taxonomic changes over time in the accepted number of species within the Rana pipiens complex (at the time) and the cumulative number of species that are currently recognized (before the publication of this study, other than the final point). Results are plotted in *SI Appendix, *Figure S2.
DATASET S1
Metadata for all samples included in the current study, including collection codes, catalog numbers, locality information, subspecies-level identification, date collected, and information on which individuals were included in which bioinformatic assemblies.
Samples: 25 species (+ 8 undescribed species) of leopard frogs (n=510) and 4 species of outgroup frogs (n=17); total number of samples = 527
Sequencing: ddRAD for all samples (demultiplexed fastq files available on SRA, accession number: PRJNA1233814)
Bioinformatics assembly: iPyrad
Analyses: Phylogenetic tree reconstruction (RAxML-ng), population genetic structure (admixture), species delimitation (HHSD), landscape genomics (GDM and MMRR), and gene flow (FEEMS)