Data from: The Carpiodes conundrum: Molecular hypothesis testing informs conservation applications for Carpsuckers (Catostomidae: Carpiodes) in Texas and beyond
Data files
Nov 26, 2025 version files 2.81 GB
-
README.md
13.42 KB
-
Roberts_et_al_2025_data.zip
2.81 GB
Abstract
Sufficient taxonomic understanding is critical for biodiversity conservation. This is particularly relevant among freshwater fishes, where cryptic undescribed species cause difficulties for promoting conservation efforts. Catostomidae (i.e., suckers) is a family of freshwater fishes with cryptic diversity and biological traits that make them difficult to classify taxonomically. Among suckers, the Carpsuckers (Carpiodes carpio, Carpiodes cyprinus, Carpiodes velifer) possess uncertain taxonomic classifications and cryptic diversity despite a rich history of research. Within Carpiodes, uniquely slender bodied populations occurring in Western Gulf of Mexico drainages suggest potential for an undescribed species. Originally collected in the Llano River, tributary to the Texas Colorado River, Llano River Carpsucker are morphologically similar to C. cyprinus. Our study explores how historical biogeographic scenarios may have led to lineage diversification of Llano River Carpsucker. We test competing molecular hypotheses (i.e., Native Endemic Species Hypothesis, Native Lineage Hypothesis) to explain the native origin of Llano River Carpsucker and further assess whether the taxon is non-native C. cyprinus (i.e., Species Introduction Hypothesis), each carrying vastly different conservation and management implications. Additionally, we assessed phylogenetic relationships across the entire genus Carpiodes. Phylogenetic analyses recovered divergent lineages of C. cyprinus in Eastern Gulf of Mexico drainages, suggesting the presence of cryptic undescribed species. Llano River Carpsucker specimens were resolved in unique lineages relative to C. cyprinus, with mitochondrial haplotypes closely related to Mississippi C. cyprinus (p-distance < 0.005). Our study suggests Llano River Carpsucker represent native C. cyprinus, supporting our Native Lineage Hypothesis. We further provide evidence that C. cyprinus readily hybridizes with C. carpio, resulting in mitochondrial introgression across much of their distribution. Lastly, we provide recommendations to promote conservation efforts and discuss further research directions to understand deeper evolutionary and environmental mechanisms behind morphologically and genetically unique C. cyprinus inhabiting Western Gulf of Mexico drainages of Texas.
Dataset DOI: 10.5061/dryad.mw6m9069d
Description of the data and file structure
Data comprises mitochondrial (CYTB) and nuclear (IRBP2) sequences from tissued catostomid specimens collected from field work as part of this study and tissues from voucher specimens originating from several museums (see paper Acknowledgements).
Sequences were used to create phylogenetic trees, haplotype networks and genetic distance estimates. All sequences herein were uploaded to GenBank (PV579174-PV579834) and include metadata on locality information for each specimen. Metadata for each specimen can also be found in Supplementary Table S1 of the paper. Data also includes spatial information on the distribution of Carpiodes taxa, Carpiodes clades, and hypothesized biogeographic history.
Files and variables
Roberts_et_al_2025_data.zip
Haplotypes Folder
Note that opening up annotated .nex files in PopART typically will rearrange haplotypes making them appear different than data figures (i.e., Figure 9, Figure 10) from related publication. Annotated haplotype networks generated for this manuscript were further edited in BoxySVG software to develop final figures.
To generate haplotype networks, open up alignment .phy files (e.g., “CYTB_haplo.phy”) and .csv traits file (e.g., Taxa_River_for_haplo_CYTB.csv). Following uploading these files, select Network > Median Joining Network, and keep Epsilon value to default (i.e., zero).
Files included
- "CYTB_haplo.phy" (CYTB sequences for developing haplotype network)
- First row of file includes number of sequences (308) and number of nucleotides (1140) for each sequence
- All following rows include the sequence ID in the first column and associated nucleotide sequence in the second column
- "IRBP2_haplo.phy" (IRBP2 sequences for developing haplotype network)
- First row of file includes number of sequences (616) and number of nucleotides (839) for each sequence
- All following rows include the sequence ID in the first column and associated nucleotide sequence in the second column
- "CYTB_haplo_annotated.nex" (Annotated haplotype network)
- Data includes sequence IDs with associated nucleotides as in CYTB_haplo.phy
- Following this trait data includes which "TraitLabels" each sequence belongs to in binary
- Following trait information annotation information on the haplotype network are included.
- "ntax" number of unique haplotypes
- "nvertices" number of unique vertices in haplotype network
- "nedges" number of unique edges in haplotype network
- "plotDim" plot dimensions
- "Font" - font used for haplotype labels
- "LegendFont" - font used for legend
- "IRBP2_haplo_annotated.nex" (Annotated haplotype network)
- Data includes sequence IDs with associated nucleotides as in IRBP2_haplo.phy
- Following this trait data includes which "TraitLabels" each sequence belongs to in binary
- Following trait information annotation information on the haplotype network are included.
- "ntax" number of unique haplotypes
- "nvertices" number of unique vertices in haplotype network
- "nedges" number of unique edges in haplotype network
- "plotDim" plot dimensions
- "Font" - font used for haplotype labels
- "LegendFont" - font used for legend
- "Taxa_River_for_haplo_CYTB.csv" (File for developing CYTB haplotype network)
- First column is sequence IDs
- Remaining columns includes the specimen taxa code and drainage it was collected in coded in binary. Column headers are shifted to the left one column for formatting for PopArt. Column one has no header.
- Carpiodes carpio from Sabine drainage (Cc_SR)
- Carpiodes carpio from Colorado drainage (Cc_CR)
- Carpiodes carpio from Brazos drainage (Cc_BR)
- Carpiodes carpio from Rio Grande drainage (CC_RG)
- Carpiodes carpio from Mississippi drainage (Cc_MS)
- Carpiodes cyprinus from Mississippi drainage (Ccy_MS)
- Llano River Carpsucker from Colorado drainage (LRCS_CR)
- Llano River Carpsucker from Guadalupe drainage (LRCS_GR)
- Llano River Carpsucker from San Bernard drainage (LRCS_SBR)
- "Taxa_River_for_haplo_IRBP2.csv" (File for developing IRBP2 haplotype network)
- First column is sequence IDs
- Remaining columns includes the specimen taxa code and drainage it was collected in coded in binary. Column headers are shifted to the left one column for formatting for PopArt. Column one has no header.
- Carpiodes carpio from Sabine drainage (Cc_SR)
- Carpiodes carpio from Colorado drainage (Cc_CR)
- Carpiodes carpio from Brazos drainage (Cc_BR)
- Carpiodes carpio from Rio Grande drainage (CC_RG)
- Carpiodes carpio from Mississippi drainage (Cc_MS)
- Carpiodes cyprinus from Mississippi drainage (Ccy_MS)
- Llano River Carpsucker from Colorado drainage (LRCS_CR)
- Llano River Carpsucker from Guadalupe drainage (LRCS_GR)
- Llano River Carpsucker from San Bernard drainage (LRCS_SBR)
P_distance Folder
- "CYTB_full_clades.mdsx" (MEGA file used to determine Carpiodes clade genetic p-distances for CYTB)
- includes column for sequence IDs ("Name"). Remaining columns are nucleotides associated with each sequence ID.
- "CYTB_full_subclades.mdsx" (MEGA file used to determine Carpiodes subclade genetic p-distances for CYTB)
- includes column for sequence IDs ("Name"). Remaining columns are nucleotides associated with each sequence ID.
- "IRBP2.mdsx" (MEGA file used to determine individual based IRBP2 genetic distances)
- includes column for sequence IDs ("Name"). Remaining columns are nucleotides associated with each sequence ID.
Phylogenetics Folder
Includes four subfolders (Concatenated_Subset, CYTB_full, CYTB_subset, IRBP2_Subset) corresponding to consensus trees developed in paper. Each subfolder contains the following. Sequences in Nexus and PHYLIP files associated with this folder are formatted like those in Haplotypes folder which includes a column for the sequence ID and column for nucleotides associated with each sequence ID. Nexus files including "Bayes" in name include AIC results of top model of nucleotide evolution and mrbayes code block "begin mrbayes".
- Jmodel test results (.txt format)
- includes model selection results for best models of nucleotide evolution using Akaike Information Criterion (AIC) Akaike Information Criterion corrected for small sample size (AICc), Bayesian Information Criterion (BIC), and Decision Theory Performance-Based Selection (DT)
- Files used for Jmodel test (.phy format)
- Files used for MrBayes (.nex format)
- MrBayes Output. Additional folder including.
- Output files (.t, .p, .trprobs, .tstat, .vstat, .parts files)
- .t files
- includes names for tip labels coded as integers
- each "tree gen" includes results of the topology of each tree in Newick format
- .p files
- results of parameters from each sampled tree.
- Column 1: includes the tree generation sampled "Gen"
- Column 2: is log likelihood of tree "LnL"
- Column 3: Log Prior Probability of tree "LnPr"
- Column 4: Tree length "TL"
- Remaining columns include rate parameter values of nucleotide substitutions (e.g, "r(A <->C), Gamma parameter "alpha", proportion of variables sites "pinvar"
- Note that not all the same parameters will be represented in each tree as it depends on top model of nucleotide evolution determined through AIC
- .trprobs
- file showing the posterior probaility "P" of each tree
- .tstat
- file containing summaries of partition statistics
- .vstat
- file containing mean, median, variance, and 95% credible intervals for estimated branch lengths.
- .parts
- a file that contains the key to taxon bipartitions, meaning it lists the splits or branches of a phylogenetic tree and their associated posterior probabilities
- .t files
- Tip labels for tree anotation (.txt)
- includes tip labels from .t files and modified names "Tip_Label"
- infile (.nex)
- output consensus tree (.tre)
- Annotated full and condensed trees (.tre)
- Output files (.t, .p, .trprobs, .tstat, .vstat, .parts files)
R_script
- "infile.nex.con" (Output CYTB full consensus tree from MrBayes)
- "Roberts_et_al_25_master_.xlsx" (Excel file with data for R)
- Sheet 1 - datasheet containing the proportion of basins resolved in each clade for the CYTB full tree
- Column 1: "Clade" - Clade ID
- Column 2: "Region" - Basin name
- Column 3: "value" - proportion of specimens resolved in a given clade associated with the basin they were collected in
- Sheet 2 - datasheet containing the proportion of basins resolved in each clade for the CYTB subsetted and concatenated trees
- Column 1: "Clade" - Clade ID
- Column 2: "Region" - Basin name
- Column 3: "value" - proportion of specimens resolved in a given clade associated with the basin they were collected in
- Sheet 3: datasheet used to determine Nei's D genetic distance estimates for Carpiodes major clades and subclades using specimens from CYTB full tree
- Column 1: "Order" - Order used for organization
- Column 2: "Name" - Name of specimen
- Column 3: "Taxa" - Taxa code
- Column 4: "Phylocount" - order of specimens as they are listed in CYTB full tree
- Colum 5: "Clade" - Major clade each specimen is resolved in
- Column 6: "SubClade" - Subclade each specimen is resolved in
- Column 7: "SubClade_Num" - Subclades coded as numbers
- Column 8: "Population" - Population each specimen was collected from
- Column 9: "Basin" - Coded basin each specimen was collected from
- Remaining columns - Column headers L1 - L201 include sites from mitochondrial CYTB sequences where single nucleotide polymorphisms (SNPs) were observed. Each colulm possesses categorical codes for nucleotides ("A", "C", "G", "T")
- Sheet 1 - datasheet containing the proportion of basins resolved in each clade for the CYTB full tree
- "Roberts_et_al_2025_Master.r" (R file with script)
Spatial
All files associated with these the shared map packages below were developed using ArcGIS Pro (v 3.4.0). Individual shapefiles can be downloaded and mapped on free software such as QGIS.
-
"Carpiodes_conceptual_biogeography.ppkx" (Shared map package associated with input spatial files for Figure 5)
Figure 5 main text. Conceptual diagram illustrating ancestral Carpiodes hypothesized biogeographic patterns leading to colonization of the Western Gulf of Mexico (i.e., Western Gulf) basin. Western Gulf drainages illustrated include the Brazos, Colorado, Guadalupe, Neches, Rio Grande, Sabine, San Bernard, and Trinity. The Pecos drainage is also illustrated nested within the Rio Grande drainage. Colored arrows denote ancestral C. carpio and C. cyprinus hypothesized dispersal routes. Shaded basins were derived from the North American Atlas Basin Watersheds dataset.
-
"Carpiodes_Distribution_Maps.ppkx" (Shared map package with input spatial files for Figures 2 and 4)
Figure 2 main text. Distribution of recognized species of Carpiodes. Occurrences (i.e., points) were derived from the Global Biodiversity Information Facility database and represent native distributions of each species. Shaded basins were modified from the North American Atlas Basin Watersheds dataset.
Figure 4 main text. Distribution of Llano River Carpsucker. Crosses display locations where the taxon has been collected across Colorado, Guadalupe, and San Bernard drainages. Observations stem from collections conducted in this study, confirmed records of the taxon housed at the Fishes of Texas Database, and specimens collected in the San Bernard drainage (Adam Cohen, Personal communication). The grey shaded region denotes the Edwards Plateau ecoregion where the taxon is predominantly found.
-
"Carpiodes_Phylogeny_Maps.ppkx" (Shared map package with input spatial files for Figure 8
Figure 8 main text. Distribution of Carpiodes major clades and subclades. Specimens are shape coded based on taxonomic assignment. (a) Distribution of Carpiodes cyprinus in the Eastern Gulf basin. (b) Distribution of C. cyprinus and Carpiodes velifer in Atlantic Slope and Eastern Gulf basins. (c) Distribution of Carpiodes carpio and C. velifer inhabiting Mississippi and Western Gulf basins. (d) Distribution of specimens resolved in Subclade 3i across Great Lakes, Hudson Bay, Mississippi, and Western Gulf basins. Shaded basins were modified from the North American Atlas Basin Watersheds dataset.
Code/software
Rstudio (v4.4.2)
Packages used (ggplot2,ggpubr,tidytree,adegenet,hierfstat,phangorn,LEA,tidyr,dplyr,vegan,readxl,teeio)
Script includes code to generate pie charts for phylogenetic trees, pruned consensus tree (Figure S3), and Nei's genetic distances
ArcGIS Pro (v 3.4.0)
PopArt (v1.7)
MEGA (v11.0.13)
CIPRES (v3.3)
- jModelTest2 (v2.1.6)
- MrBayes (v3.2.7a)
