A total evidence approach justifies taxonomic splitting of the endangered Pecos Gambusia into three species
Data files
Nov 05, 2025 version files 2.46 GB
-
Gam_SNP.TRS.F09.vcf.gz
186.76 MB
-
Gan_SNP.TRS.F09.vcf.gz
68.22 MB
-
Original_landmark_photos.zip
115.59 MB
-
PCA_and_PERMANOVA.xlsx
31.50 KB
-
README.md
4.34 KB
-
Sample_metadata.txt
28.59 KB
-
SNP.TRS.QC.recode.vcf.gz
2.09 GB
-
Supplement_Measurements_counts.xlsx
40.51 KB
Abstract
Gambusia nobilis is a federally Endangered species found across a fragmented distribution within the Pecos River Drainage of Texas and New Mexico, USA. Drought, human water usage, and potential hybridization and competition with introduced congeners threaten species persistence. Therefore, a population genomics study was conducted to provide critical information for conservation planning. Unsupervised clustering suggested hierarchical structure, with a primary K =3, and deep divergences were detected among samples grouped into the Leon Creek watershed, the Toyah Creek watershed, and water bodies within the Bitter Lake National Wildlife Refuge (F’ST = 0.55–0.76 for putatively neutral data). Phylogenetic analyses showed three distinct clades corresponding to these groups, with split times estimated to be in the last 50,000 years. Subsequent morphological analyses detected differences among the three groups, including male colour pattern in life, and the number of caudal-fin rays in both sexes. Taken as a whole, the results indicate that the endangered G nobilis comprises three species (two of which are named herein), rather than one, and the study highlights the daunting yet critical task of describing species diversity during a period of unprecedented diversity loss.
Dataset DOI: 10.5061/dryad.jsxksn0nf
Description of the data and file structure
The Variant Call Format (vcf) files provided are gzipped files consisting of double digest restriction site associated DNA (ddRAD) sequenced samples mapped to Gambusia affinis genome (Genbank # GCF_019740435) and single nucleotide polymorphisms (SNPs) called using a modified version of dDocent. A metadata text file is also included to describe the catch locations and species identities. Please refer to the vcf file format for fields within the vcf file (https://samtools.github.io/hts-specs/VCFv4.2.pdf)
Files and variables
File: SNP.TRS.QC.recode.vcf.gz
Description: The full dataset consists of 395 samples with 356 unique individuals and their duplicates across 870,332 SNPs. These individuals are from the focal Gambusia locations (Leon Creek, LC; Toyah Creek, TC; New Mexico, NM) as well as Gambusia found in Texas (G. affinis, G. clarkhubbsi, G. gagei, G. geiseri, G. heterochir, G. speciosa) and other species to serve as outlier groups (Heterandria formosa, Poecilia latipinna, P. obscura, P. picta).
File: Gam_SNP.TRS.F09.vcf.gz
Description: The filtered dataset which includes all of the samples for hybrid analysis contains 348 samples with 312 unique individuals and their duplicates across 26,760 SNPs. These individuals are from the focal Gambusia locations (Leon Creek, LC; Toyah Creek, TC; New Mexico, NM) as well as other Gambusia found in in the same locations (G. affinis and G. geiseri).
File: Gan_SNP.TRS.F09.vcf.gz
Description: The filtered dataset which includes all of the G. nobilis in the focal locations (Leon Creek, LC; Toyah Creek, TC; New Mexico, NM), contains 232 samples with 226 unique individuals and their duplicates across 15,208 SNPs.
File: Sample_metadata.txt
Description: The sample names are included in the tab deliminated metadata (Sample_metadata.txt) file along with the sequence ID, Library, Index, catch location, current taxonomic identification and proposed taxonomic identification.
Variables
- Seq_ID: The name assigned to the sequence
- Sample_ID: The name assigned to the sample
- Library: The library the sample was placed in for sequencing
- Index: The index attached to the sample during library preparation
- Location: The geographic location where the sample was collected
- Species_ID: The currently accepted species identity
- Proposed_Species_ID: The proposed species identity based upon this research
File: Original_landmark_photos.zip
Description: Photos used for geometric-morphometric landmark analysis between the three species of Gambusia, partitioned by species (G. nobilis, G. pyrros, G. echelleorum) and sex. File labels include sample title and sex.
File: PCA_and_PERMANOVA.xlsx
Description: Results of a PCA for the male and female geometric-morphometric datasets and the results of PERMANOVA and pairwise PERMANOVA tests on each sex. Worksheets include: Male - PCA results; Female - PCA results; PERMANOVA - results. The group abbreviations refer to the general catch location including Leon Creek watershed, TX (LC), Toyah Creek watershed, TX (TC) and Chaves County, New Mexico (NM).
File: Supplement_Measurements_counts.xlsx
Description: Excel file containing the original measurements and counts obtained from specimens of the three species of Gambusia. The worksheets in the file include: Original measurements; Table of measurements (presented as a % of SL or HL); Counts of caudal-fin rays; Table of caudal-fin ray counts (by species); Counts of scales; Counts from cleared and stained (c&s) specimens. The abbreviations of the measurements are provided to the right of the data on the Original measurements and Counts from c&s specimens spreadsheets. NA indicates values not available due to damage to the specimen.
Code/software
Further information about the code used to produce and analyze this data can be found at:
Sample collection
Fin clips or voucher specimens of Gambusia nobilis sensu lato were collected between 2020–2024, from twelve discrete sampling sites within the Pecos River drainage (Rio Grande basin) of Texas and New Mexico. This included three springs (Diamond Y Head Pool, HEAD; Karges, KGS; and Euphrasia, EU) within the Leon Creek watershed (LC) in Pecos County, Texas; three springs (San Solomon; SS, Phantom Lake; PL, and East Sandia; ES) within the Toyah Creek watershed (TC) in Jeff Davis and Reeves counties, Texas; and six sites within the Bitter Lake watershed and a series of geographically proximate sinkholes in greater Chaves County, New Mexico (NM). Sites in LC are not directly connected, though HEAD and KGS are only separated by ~ 0.3 km and connected intermittently. Sites in TC are not connected and separated by an average of 8.9 km. Sites in New Mexico included the Bitter Lake National Wildlife Refuge (BLNWR), a section of Bitter Creek north of the refuge (BC), and four sinkhole habitats (Sink7, Sink27, Sink31, Sink37), all of which are not directly connected and separated by an average of 2.0 km (Figure 1, Supplementary Table 1). Tissues were also collected from G. affinis and G. geiseri, when the species were encountered, but also from additional sites outside of the distribution of G. nobilis (Supplementary Table 1). Additional tissues were acquired from four other species of Gambusia found in Texas (G. heterochir, G. clarkhubbsi, G, speciosa, G. gagei) and from other species in the family Poeciliidae (Supplementary Table 1). Metadata and voucher numbers can be found in Supplementary Data File 1. Samples were collected with permission from USFWS (TE814933) and Texas Parks and Wildlife Department (SPR-0614–111, SPR-1010–173).
Sequencing and Data Processing
DNA was extracted using Mag-Bind Tissue DNA kits (Omega Bio-Tek) and ~1,000 ng of high-quality genomic DNA was used in a modified version of the ddRAD genomic library preparation method. Libraries were sequenced on part of an Illumina NovaSeq X lane with technical replicates (duplicated individuals) sequenced across the libraries. In total, three libraries were sequenced with 356 unique individuals.
Raw reads were demutliplexed in the software Stacks. Read trimming, mapping, and SNP calling were performed using dDocent. After trimming with fastp v.0.23.2, overlapping reads were concatenated with pear v.0.9.6 before mapping both overlapping reads and nonoverlapping reads to the Gambusia affinis genome (Genbank # GCF_019740435). Individual SNPs were identified and compiled into a variant call file (VCF) file using freebayes v.1.0.2 and variants were filtered using a combination of VCFtools v.0.1.17 and custom BASH and Perl scripts to remove artifacts. SNPs on the same RAD fragment were collapsed into microhaplotypes (SNP-containing-loci) with rad_haplotyper v.1.1.9 and loci with more haplotypes than expected per individual were removed. One individual from each pair of technical replicates were removed. Admixed individuals between G. nobilis and G. geiseri and between G. nobilis and G. affinis were identified using the Bayesian framework implemented in NewHybrids and a combination of simulation and principal component analysis (PCA), and subsequently removed from the data set.
Morphological examination
Select quantitative and qualitative morphological traits were assessed in individuals representative of each of the three regions. Upon collection and prior to tissue subsampling, representatives of each of the three groups (LC, TC and NM) were placed into a small field aquarium and photographed using a Nikon D850 to document life colours. Subsequent to tissue subsampling, photographed individuals were euthanized using a lethal dose of Eugenol and fixed in 10% neutral buffered formalin for a minimum of five days before transfer to 70% ETOH. The preserved specimens have been deposited within the Collection of Fishes at the Texas A&M University Biodiversity Research and Teaching Collections (TCWC). Approximately 10 preserved male and 10 preserved female individuals per geographic group (LC, TC and NM) were photographed in lateral view using a ZEISS SteReo Discovery V20 stereomicroscope equipped with a Zeiss Axiocam MRc5 digital camera. Fifteen measurements were obtained directly from digital photographs using Fiji. Counts of scales and fin rays listed in species descriptions generally follow Greenfield, except that the total number of caudal-fin rays (principal plus procurrent) are also reported. Terminology of the gonopodium follows Hubbs and Springer. Select specimens representatives of each of the three regions (TC, LC and NM) were cleared and double stained (C&S). Preserved specimens housed at the Museum of Comparative Zoology, Harvard (MCZ), and Cornell Museum of Vertebrates (CU), the University of Texas Biodiversity Center (TNHC) were also examined (see Supplementary Extended Methods).
To assess differences in body shape among the three geographic groups, we compared the position of ten homologous landmarks visible in lateral view using geometric morphometrics. Homologous landmarks were placed on images of preserved specimens (ca. 10 male/10 female per TC, LC and NM) using TpsDig. Following landmark placement, raw coordinate data was imported into R for analysis. To avoid issues relating to sexual dimorphism, two separate datasets were created, one for each sex (male/female). Procrustes superimposition was conducted on each coordinate dataset to translate, rotate, and scale landmarks using the geomorph package in R. To visualize differences in body shape between individuals of the three groups, a principal component analysis was run with the aligned coordinates for each data set (male/female) independently using the ‘rda’ function of the vegan package in R. PERMANOVA was conducted on the data returned for the first three principal components (PC1–3) for each of the two datasets (male/female) to assess whether significant differences in body shape existed between individuals from the three geographic groups. To assess the strength of the a priori hypothesis that G. nobilis comprises three groups based on genetic data, two subsets of the PCA data returned from the analysis of each dataset (male/female) were constructed and PERMANOVA was run on each separately. One subset of data divided individuals between three groups, representing the three regions (NM, LC and TC). PERMANOVA was conducted using the vegan package in R.
