Skip to main content

Genomic data support the taxonomic validity of Middle American livebearers Poeciliopsis gracilis and Poeciliopsis pleurospilus (Cyprinodontiformes: Poeciliidae)

Cite this dataset

Ward, Sarah et al. (2022). Genomic data support the taxonomic validity of Middle American livebearers Poeciliopsis gracilis and Poeciliopsis pleurospilus (Cyprinodontiformes: Poeciliidae) [Dataset]. Dryad.


Poeciliopsis (Cyprinodontiformes: Poeciliidae) is a genus comprised of 25 species of freshwater fishes. Several well-known taxonomic uncertainties exist within the genus, especially in relation to the taxonomic status of Poeciliopsis pleurospilus and P. gracilis. However, to date, no studies have been conducted to specifically address the taxonomic status of these two species. The goal of this study was to examine the taxonomic validity of P. pleurospilus and P. gracilis using genomic data (ddRADseq) in phylogenetic, population genetic, and species delimitation frameworks. Multiple analyses support the recognition of both taxa as distinct species and also permits us to revise their respective distributions. A species delimitation analysis indicates that P. pleurospilus and P. gracilis are distinct species from one another, each of which consists of two distinct lineages that are geographically structured. Phylogenetic and population genetic analyses demonstrate clear evidence that individuals of P. gracilis are distributed north and west of the Isthmus of Tehuantepec in both Pacific and Atlantic river systems in Mexico, whereas individuals of P. pleurospilus are distributed in both Atlantic and Pacific river systems south and east of the Isthmus of Tehuantepec, from southern Mexico to Honduras.


Three double digest restriction enzyme DNA (ddRAD) libraries (batch information available in S3 Table) were prepared following a modified version (S1 Appendix) of the protocol from Peterson et al. 2012. Each sample was digested with MspI and PstI restriction enzymes and ligated to common (5’- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT - 3’) and unique oligos (S3 Table). We then used a BluePippin machine to size select for 300 – 500 bp fragments, then libraries were sent to the University of Oregon’s Genomic and Cell Characterization Core Facility (GC3F) for Illumina Sequencing on the Hiseq 4000 for 100 bp single-end reads.

The raw data files returned from GC3F were run through FastQC v0.11.3 to check the overall quality of the reads from the Illumina run. The FastQ file output from the previous step was input into the ipyrad pipeline for assembly and initial filtering. Reads that contained more than 5 bases with a low-quality Phred score (<33) were excluded. Reads were then clustered based on an 85% similarity threshold and reads with less than 6x coverage were filtered out. A maximum of 5 ambiguous base calls and 5 heterozygous sites per read were allowed during filtering. Additional filtering using VCFtools excluded individuals with more than 95% missing data, and single nucleotide polymorphism (SNP) loci with a 60% call rate or lower. We retained one dataset that consisted of all specimens that was used for our phylogenetic inference (oneout.recode.vcf). A separate dataset that consisted only of ingroup individuals was used for the population genetic analyses (ingroup.recode.vcf).

Usage notes

S1 Appendix. Laboratory Protocol.

S1 Fig. a-score optimization – spline interpolation.

S1 Table. Sampling localities of ingroup individuals.

S2 Appendix. ipyrad summary statistics.

S2 Table. Sampling localities of outgroup individuals.

S3 Appendix. Bioinformatic workflow.

S3 Table. Barcodes and batch information.

S4 Appendix. Treefile of phylogenetic inference.

S4 Table. Parameters used for assembly of concatenated dataset.

ingroup.recode - VCF file containing alignment of ingroup individuals. 

oneout.recode - VCF file containing alignment of ingroup individuals plus one outgroup specimen (Brachyrhaphis rhabdophora).