Skip to main content

Data from: Species limits and introgression in Pimelodus from the Magdalena-Cauca River basin

Cite this dataset

Martínez, José Gregorio et al. (2022). Data from: Species limits and introgression in Pimelodus from the Magdalena-Cauca River basin [Dataset]. Dryad.


Low morphological differentiation among taxa hampers its appropriate identification, basic biological studies, and promotion of any conservation effort. Aiming to clarify the evolution and speciation among members of Pimelodus from the Magdalena-Cauca River basin, this study tested the hypothesis that P. yuma, P. grosskopfii and P. crypticus represent three independently evolving species and explored signals of interspecific hybridization. Likewise, we test the ancient hypothesis that the trans-Andean Pimelodus yuma and P. crypticus belong to the cis-Andean P. blochii species complex. The outcomes based on mitochondrial (cox1) and nuclear [RADseq (Illumina Hi-Seq), microsatellites and rag2] markers combined with coalescence-based and allele-frequency methods, confirm that each studied trans-Andean species represent an independently evolving unit. We used Stacks v.2.52 for de novo SNP genotyping. Contrary to expectations, P. yuma was found as a sister clade of P. blochii, while P. crypticus (confused for a long time with P. blochii) was phylogenetic closer to P. grosskopfii. Additionally, we found strong evidence of historical introgression between the non-sibling species Pimelodus yuma and P. grosskopfii, breaking the absence of interbreeding and the independent evolutionary trajectory among Trans-Andean Pimelodus during their diversification history, a pre-requisite to define species limits. However, non-significant values of current gene flow were evidenced between them, supporting the hypothesis of full isolation.


A RAD-seq library was prepared for 19 individuals representing five putative morphospecies of Pimelodus, previously identified by DNA barcoding: three corresponded to P. grosskopfii, P. yuma and P. crypticus from Magdalena-Cauca River basin and two corresponded to the cis-Andean P. blochii stricto sensu and P. pictus. Samples were provided by: 1) Integral S.A., through two scientific cooperation agreements (19th September 2013; Grant CT-2013-002443), 2) Coleção de Tecidos de Genética Animal (CTGA) of the Laboratório de Evolução e Genética Animal (LEGAL), Federal University of Amazonas, Brazil, and by 3) Armando Ortega Lara from the Museo de Ciencias Naturales del Instituto para la Investigación y la Preservación del Patrimonio Cultural y Natural del Valle del Cauca – INCIVA. Samples were processed through a single-end sequencing step conducted by Floragenex inc. ( on one lane of an Illumina HiSeq 2000 machine. The raw data were de-multiplexed by barcode, quality-filtered, and made a de novo loci construction using RADproc v3 (Nadukkalam Ravindran, Bentzen, Bradbury, & Beiko, 2019) setting up -a psweep mode for the optimization of locus assembling parameters to produce a catalog of RAD-seq tag loci, alleles and SNPs from the raw NGS reads through the different parameter values for m, M and n. Parameters to run RADproc included a Phred Score base quality threshold > 20, a minimum coverage depht of 3 (-m), a maximum distance (in nucleotides) allowed between stacks of 4 (-M), a maximum distance (in nucleotides) allowed between putative loci during catalog construction of 1 (-n), a maximum number stacks per locus of 2 (-x) and minimum sample percentage of 0.9 (-S). Then, we used STACKS v2.53 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) to generate ouputfiles for population and phylogenomics analyses from the optimal constructed Catalogs obtained by the RADproc analysis. First, we ran in pipeline mode the modules SSTACKS, TSV2BAM, GSTACKS using defaults parameters. Finally we ran the POPULATIONS module following the parameter settings provided in "Population module parameteres.txt". Each individual was considered a population into the STACKS map file. In POPULATIONS module, the minimum number of populations that a locus must be present in to process it, was set to 100% (-p = 19). Likewise, the minimum percentage of individuals in a population required to process a locus for that population was set to 100% (-r = 1) and a minnor allele frequency (MAF) of 0.01. All other parameters for analysis were kept as default. Finally, we obtained VAR.PHYLIP (Pimelodus.var.phylip.txt) and VCF (Pimelodus.vcf.txt) output files containing, respectively, useful dataset for SNPs-based species delimitation and admixture tests.

Usage notes

Data from: Species limits and introgression in Pimelodus from the Magdalena-Cauca River basin

José Gregorio Martínez, José David Rangel Medrano, Anny Johanna Yepes Acevedo, Natalia Restrepo Escobar, Edna Judith Márquez*

*Corresponding author: Edna J. Márquez, Grupo de Investigación Biotecnología Animal, Facultad de Ciencias, Universidad Nacional de Colombia, Carrera 65 Nro. 59A – 110, Medellín, Antioquia, Colombia. Email:

Population module parameteres.txt Contains the population module run parameters used on Stacks software v.2.52 and some basic statistics derived. The analysis included genomic RADseqs data Illumina Hi-Seq-derived from 19 individuals of several Pimelodus species.

Demultiplexing statistics.txt Contains the demultiplexing statistics (number of reads and its lenght) across 19 samples of Pimelodus species libraries obtained by Illumina Hi-Seq.

De novo loci assembling statistics.txt Contains the de novo loci assembling statistics (number of loci, number of polymorphic loci and number of SNPs) obtained after the RADproc software analysis across 19 samples of Pimelodus species libraries, using the optimized parameters, by – psweep mode.

RADseqs Raw data Contains the demultiplexed fastq files from 19 samples of Pimelodus species libraries obtained by Illumina Hi-Seq.

Pimelodus.sumstats.summary.txt Contains the summary statistics of variant and fixed positions across loci for each of 19 samples of Pimelodus, after Stacks v.2.52 software analysis (Population module, --fstats).

Pimelodus.var.phylip.txt Contains 999 variable sites for each of 19 samples of Pimelodus in the phylip output encoded using IUPAC notation, using Stacks v.2.52 software (Population module, --phylip-var), suitable for SNAPP and SVDquartets species delimitation analyses. No sites with missing data were included.

Pimelodus.vcf.txt Contains the output SNPs and haplotypes in Variant Call Format (VCF) for 19 samples of Pimelodus, after Stacks v.2.52 software analysis (Population module, --vcf).

rag2 alleles.txt Contains the 40 inferred phased alleles from rag2 sequences from 20 individuals of Pimelodus yuma and P. grosskopfii, by using the application ‘PHASE’ of DnaSP v5.10.1.

Confidence intervals.R Contains the modified script from SPIDER software to estimate the confidence interval for GMYC, bGMYC and localMinima single-locus species discovery methods, developed by Machado et al. (2018).


Empresas Públicas de Medellín, Award: CT-2019-000661

Universidad Nacional de Colombia, Sede Medellín, Award: Hermes 40096

Institución Universitaria Colegio Mayor de Antioquia, Award: FCSA-2017

Institución Universitaria Colegio Mayor de Antioquia, Award: FCSA20