Data from: Time is of the essence: using archived samples to develop a GT-seq panel to preserve continuity of ongoing genetic monitoring
Data files
Feb 06, 2025 version files 104.02 MB
-
GT-seq_283loc_72ind.gen
109.83 KB
-
nextRAD_complete_2983loc_373ind.gen
7.86 MB
-
nextRAD_complete_2983loc_373ind.vcf
96.04 MB
-
README.md
1.87 KB
Abstract
For the past 25 years, genetic monitoring of Rio Grande silvery minnow (Hybognathus amarus) has been conducted annually. The monitoring program has been carried out using nine microsatellite loci. Recently a temporal genome-wide microhaplotype dataset obtained from nextRAD-seq (a reduced representation sequencing approach) was obtained from archived samples spanning 20 years and allowed to compare results from both datasets (Osborne et al. 2022). To develop a GT-seq panel that was able to track past genomic changes across the time-series, ensuring this way the continuity of the ongoing genetic monitoring, we first identified loci from that nextRAD-seq but using a new conspecific reference genome. The final dataset included 2,983 loci and 379 individuals (nextRAD_complete dataset). From those, we selected a subset of 500 loci with the highest power to track the changes identified with the genome-wide data for GT-seq PCR multiplex optimization. We also included the sex-linked marker HAM06 from Caeiro-Dias et al. (2023) in the panel optimization. After four rounds of panel optimization, we retained 284 loci. The optimized panel was used to genotype 118 samples from eight temporal collections (a subset of the nextRAD_complete) for validation of the 283 loci in the GT-seq panel; other 20 samples of known sex were used for sex assignment accuracy using the sex-marker genotyped with the GT-seq panel. The nextRAD_complete is provided as a VCF (single SNPs) and GENEPOP file (microhaplotypes). The GT-seq_283 is provided as a GENEPOP file (microhaplotypes).
https://doi.org/10.5061/dryad.4mw6m90mv
Description of the data and file structure
Filtered genotype data obtained from Rio Grande silvery minnow nextRAD-seq time-series and genotypes from an optimized GT-seq panel developed from that nextRAD-seq data.
Files and variables
File: GT-seq_283loc_72ind.gen
Description: GENEPOP file containing genotype (microhaplotypes) data from 283 loci for 72 Rio Grande silvery minnow samples from five temporal collections (1999, 2002, 2004, 2017,2018; separated as populations in this order).
File: nextRAD_complete_2983loc_373ind.gen
Description: GENEPOP file containing genotype (microhaplotypes) data from 2983 loci for 373 Rio Grande silvery minnow samples from 12 temporal collections (1999, 2000, 2002, 2004, 2006, 2008, 2009, 2010, 2012, 2015, 2017, 2018; separated as populations in this order).
File: nextRAD_complete_2983loc_373ind.vcf
Description: VCF file containing biallelic SNP data from the 5315 SNPs retained after microhaplotype identification (microhaplotype data in the file nextRAD_complete_2983loc_373ind.gen) for 373 Rio Grande silvery minnow samples from 12 temporal collections (1999, 2000, 2002, 2004, 2006, 2008, 2009, 2010, 2012, 2015, 2017, 2018).
Access information
Other publicly accessible locations of the data:
- n/a
Data was derived from the following sources:
- Raw reads from nextRAD-seq (BioSample accession numbers SAMN31170359 - SAMN311737) and GT-seq (BioSample accession numbers XXXXXXXX - XXXXXXXXXX) used to obtained these datasets are deposited in the NCBI Sequence Read Archive (SRA), with the BioProject accession number PRJNA887477.
Genome-wide SNP identification was performed using Nextera-tagmented reductively-amplified DNA sequencing (nextRAD-seq; Russello et al., 2015) data from 379 individuals reported in Osborne et al. (2023), comprising 12 temporal collections that spanned 20 years. Microhaplotypes were identified using the methods also described in Osborne et al. (2023), but with four modifications. First, NextRAD loci were identified using the draft genome sequenced for this study; no depth of coverage filter was applied to nextRAD loci before variant calling; loci were discarded if mean depth of coverage was lower than 20; and only individuals with less than 25% missing data were retained. Microhaplotypes and individuals retained after all filtering steps are referred to as nextRAD_complete dataset. The nextRAD_complete is provided as a VCF containing single SNPs and as GENEPOP file containing the haplotyped SNPs (microhaplotypes).
The optimized GT-seq panel excluding the sex-linked marker (GT-seq_283) was used to genotype 118 samples from eight temporal collections. Those samples are also included in the nextRAD_complete dataset. The GTscore pipeline v. 1.3 (https://github.com/gjmckinney/GTscore) was used to identify genotypes. In-silico probes were designed for each SNPs to include eight nucleotides flanking for each SNP and to include variants when overlapping identified SNPs (see manual for details on probe design https://github.com/gjmckinney/GTscore/blob/master/GTScoreDocumentation%20V1.3.docx). AmpliconRadCounter.pl script was used to count the number of unique reads per individual, to identify on-target reads, and to count the number of reads containing each SNP allele for every individual. Then counts of reads containing a SNP allele for each individual were used for microhaplotype genotyping with the maximum likelihood algorithm described by McKinney et al. (2018) and implemented in GTscore.R script. Only individuals genotyped for at least 70% of the loci were kept in the dataset, resulting in 72 individuals from five temporal collections retained. Missing data across loci was not higher than 30%. The GT-seq_283 dataset is provided as a GENEPOP file (microhaplotypes).
