Data from: Limited evidence for extensive genetic differentiation between X and Y chromosomes in Hybognathus amarus (Cypriniformes:Leuciscidae)
Data files
Jul 10, 2023 version files 333.90 MB
-
female-referenced_SNP_dataset.vcf
-
male-referenced_SNP_dataset.vcf
-
popmap.txt
-
README.md
Abstract
We used Nextera-tagmented reductively amplified DNA (nextRAD) sequencing data to discover SNPs in Rio Grande silvery minnow samples of known and unknown sex; and we produced two contig level genomes from Nanopore long-read sequencing. Raw NextRAD was aligned to each of the genomes. Subsequent SNP calling and filtering were repeated independently to obtain two datasets, one using the female genome as reference (female-referenced dataset) and another using the male genome (male-referenced dataset). The SNP calling identified 4.46 M raw variants in female-referenced dataset and of these 16,714 biallelic SNPs were retained after all filtering steps. For this set of SNPs, the average depth of coverage for the retained 64 females was 35.81 (ranging from 8.94 to 86.88) and 33.14 for the retained 53 males (ranging from 18.93 to 65.56). When using the male genome as reference we obtained 3.77 M raw variants and 17,920 biallelic SNPs. In this case the average depth of coverage for the same 64 females was 36.01 (ranging from 8.99 to 87.34) and for the same 53 males was 33.36 (ranging from 19.23 to 66.27).
Methods
We obtained Rio Grande silvery minnow fin clip samples from 190 individuals. 65 individuals identified as females and 63 identified as males were sampled from a captive population held at the Los Lunas Silvery Minnow Refugium (LLSMR; New Mexico). Sexes were identified by hatchery personnel based on external phenotypic traits during the spring, when females became ripe and produced eggs. Additionally, we obtained tissue samples from two full-sibling families, each composed of a female, a male and 30 larvae of unknown sex from another captive population held at the Albuquerque Biological Park (New Mexico). NextRAD libraries were generated from these 190 individuals at SNPsaurus, LLC (University of Oregon, Oregon). Two nextRAD libraries were prepared, each one containeng half of the samples, as described in Russello et al. (2015). Libraries were pooled and sequenced for 150 bp reads on two lanes of an HiSeq 4000 (University of Oregon). Additionally, two other individuals (collected by U.S. Fish and Wildlife Service personnel) were used to produce a draft genome for each sex, using Nanopore long-read sequencing (see published article for details on genome sequencing and assembly). These were identified as a female and a male based on dissection and the presence of ovaries and testes, respectively.
To detect single-nucleotide polymorphisms (SNPs), trimmed reads from 190 individuals were aligned to both female and male Rio Grande silvery minnow draft genomes. SNP calling and filtering were repeated independently to obtain a female-referenced SNP dataset and a male-referenced SNP dataset. Reads were mapped against reference genomes with Bowtie2 v. 2.3.1 (Langmead and Salzber 2012) using 'local alignment' and default 'very sensitive' options, excluding alignments with mapping quality lower than 20. Variants were identified using FreeBayes v. 1.3.1 (Garrison and Marth 2012) using the parallel code implemented in the dDocent v. 2.7.8 pipeline (Puritz et al. 2014). We used VCFtools v. 0.1.16 (Danecek et al. 2011) to remove variants with a PHRED quality score of less than 12, sequencing depth for a genotype call lower than five, and a minor allele counts less than three. Multinucleotide states were decomposed into single variants with vcflib ('vcfallelicprimitives' command; https://github.com/ekg/vcflib). Indels were filtered out and only SNPs were retained using VCFtools. We retained biallelic SNPs with minor allele frequency of at least 0.05 and present in at least 60% of the individuals in each dataset. This filtering stringency allowed for a good compromise between SNP quality and quantity.
Preliminary results (see published article) revealed that for two individuals identified as females and 11 identified as males, the phenotipic sex was potentially misidentified. Such individuals were excluded from summary statistics analysis but are included in both VCF files containing the female- and male-referenced SNP datasets and are identified as potential sex misidentification but the original sex ID was maintained. See README.md file for additional details.