Skip to main content

Genomic analyses of the scorpion mud turtle Kinosternon scorpioides in continental and insular Colombia

Cite this dataset

Caballero, Susana (2022). Genomic analyses of the scorpion mud turtle Kinosternon scorpioides in continental and insular Colombia [Dataset]. Dryad.


The turtle genus Kinosternon is widespread with at least 25 species distributed from Mexico to northern Argentina. The taxonomy of this genus is controversial and requires a full revision using both morphological and molecular approaches. In this study, we did a genomic analysis on the species Kinosternon scorpioides distributed in insular and continental Colombia in order to define conservation units. Total DNA was extracted from 24 tissue samples and RADseq genotyping analysis was done. In addition, the intron R35 was amplified and sequenced for a subset of samples. A total of 35,507 SNPs combined with 1,047 bp of the intron were used for spatiotemporal colonization patterns reconstruction and phylogenetic analyses. In addition, the SNPs were used for population structure inferences and allele frequency-based analyses.   According to the long-term reciprocal monophyly, significant differences in allele frequencies (Fst = 0.37 - 0.8), and evidence of reproductive isolation (no admixture/geneflow), indicate long-term divergence between groups (2-8 MYA), possibly due to the effect of geographic barriers.   Four Evolutionarily Significant Units (ESUs) were defined within our sample.  One ESU was represented by the insular subspecies K. scorpioides albogulare, found in San Andrés island, and three ESUs were defined for the subspecies K. s. scorpioides in continental Colombia: one trans-Andean, found in northwestern Colombia (Caribbean region) and two cis-Andean, found in eastern and southeastern Colombia in the Orinoco and Amazon regions, respectively. Colonization of this species occurred from an ancestral area of Central America/Caribbean region (~ 8.43 MYA), establishing current populations in San Andrés Island and then, in independents events, on to the Colombian Caribbean, next, in the Orinoco, and more recently, in the Amazon. We hypothesize that the emergence of the Panamá Isthmus, the final uplift North Eastern Andes and Vaupes Arch, were key event leading to the differentiation of these ESUs. For management and conservation purposes, each of these ESUs should be considered as a separate management unit. A full revision of the taxonomy of the genus Kinosternon is warranted. 


Sampling locations

 A total of 24 tissue samples were obtained from wild Kinosternon scorpioides turtles in six sampling locations in three Colombian regions: one in the Colombian Amazon Basin (Leticia), two in the Caribbean Coast or Caribe Basin (Cispatá Bay and Lorica swamp), three in the Orinoco basin (sierra de la Macarena-Guaviare Subasin, Puerto Carreño-Orinoco river floodplain and Tuparro National Park-Tomo Subasin) and one in San Andrés Island (Figure 1 and Table 1).

 Turtles were captured at night. Then, following the protocol designed by the Instituto de Investigación de Recursos Biológicos “Alexander von Humboldt” in Colombia (Vargas-Ramírez, 2017), approximately 0.5 cm3 of tissue was cut out from the back foot using a scalpel and stored in 90% ethanol.

 DNA extraction, quality control, library preparation and sequencing

 Total DNA was extracted using the QIAamp DNA tissue mini kit (QIAGEN) and its quality was evaluated in 0.8% agarose gels.  DNA was quantified using nanodrop 2000 spectrophotometer (Thermo Scientific) and diluted to a final concentration ranging from 30 to 50 ng/uL.

 Genomic DNA was converted into nextRAD genotyping-by-sequencing libraries (SNPsaurus, LLC) as in Russello et al. (2015). DNA was first fragmented with Nextera DNA Flex reagent (Illumina, Inc), which also ligates short adapter sequences to the ends of the fragments. The Nextera reaction was scaled for fragmenting 50 ng of genomic DNA, although 75 ng of genomic DNA was used for input to compensate for the amount of degraded DNA in the samples and to increase fragment sizes. Fragmented DNA was then amplified for 27 cycles at 74 degrees, with one of the primers matching the adapter and extending 10 nucleotides into the genomic DNA with the selective sequence GTGTAGAGCC. Thus, only fragments starting with a sequence that can be hybridized by the selective sequence of the primer were efficiently amplified. The nextRAD libraries were sequenced on a HiSeq 4000 with one lane of 150 bp reads (University of Oregon).

 RADseq genotyping analysis

 The genotyping analysis used custom scripts (SNPsaurus, LLC) that trimmed the reads using bbduk (BBMap tools,

bbmap/ in=reads/run_2780/2780_CAAGTGTC-GTAAGGAG_S25_L003_R1_001_subset.fastq.gz out=reads/run_2780/2780_CAAGTGTC-GTAAGGAG_S25_L003_R1_001_t.fastq.gz ktrim=r k=17 hdist=1 mink=8 ref=bbmap/resources/nextera.fa.gz minlen=100 ow=t qtrim=r trimq=10

 Next, a de novo reference was created by collecting 10 million reads in total, evenly from the samples, and excluding reads that had counts fewer than 7 or more than 700. The remaining loci were then aligned to each other to identify allelic loci and collapse allelic haplotypes to a single representative.  All reads were mapped to the reference with an alignment identity threshold of .95 using bbmap (BBMap tools). Genotype calling was done using callvariants (BBMap tools) ( list=ref_turtle_rm.txt.align_samples out=turtle_total.vcf ref=ref_turtle.fasta ploidy=2 multisample=t rarity=0.05 minallelefraction=0.05 usebias=f ow=t nopassdot=f minedistmax=5 minedist=5 minavgmapq=15 minreadmapq=15 minstrandratio=0.0 strandedcov=t).  The Variant Call Format (VCF) was filtered to remove alleles with a population frequency of less than 3%. The VCF was used directly as input or for interconversion to other programs formats for subsequent coalescence-based and allele-frequency analyses methods.

In addition, phylip and nexus format files were generated with the concatenated dataset of loci for each sample for phylogeographical reconstructions. They were excluded samples that had more than 75% missing data.

Usage notes

This is a VCF file.


Private donor to Universidad de Los Andes