Development of a panel of SNP loci in the emblematic southern damselfly (Coenagrion mercuriale) using a hybrid method: Pitfalls and recommendations for large-scale SNP genotyping in a non-model endangered species
Data files
Dec 16, 2024 version files 594.55 MB
-
135468_RADloci_sequence.fa
48.24 MB
-
ddRADg_genotypes_20individuals.vcf
545 MB
-
README.md
1.44 KB
-
SNP_genotypes_131indv.txt
1.30 MB
Abstract
Genomic markers are essential tools for studying species of conservation concern, yet non-model species often lack a reference genome. Here we describe a methodology for identifying and genotyping thousands of SNP loci in the southern damselfly (Coenagrion mercuriale), a bioindicator of freshwater stream quality classified as near-threatened, with locally declining populations. We used a hybrid approach combining reduced representation sequencing and target enrichment. First, we identified putative SNP loci using ddRADseq and de novo assembly. Then, single primer enrichment technology targeted 6,000 of these SNPs across 1,920 individuals. Challenges encountered included sequence recapture failure, coverage depth discrepancies, and aberrant FIS values. We provide recommendations to address such issues. After multiple filtering steps, 2,092 SNPs were retained and used to analyse the genetic structure of 131 individuals belonging to 11 populations in France, comparing central and marginal populations. Genetic differentiation was lower among central populations, with no sign of inbreeding. As compared to microsatellite loci, SNPs exhibited greater resolution in detecting fine-scaled genetic structure, identifying putative hybrids in adjacent populations. In this study, we emphasise the difficulties of large-scale SNP genotyping in non-model species via a hybrid method that ultimately did not offer the expected cost and time saving compared to classical ddRAD approaches. However, SNPs showed greater power than previously available markers in identifying conservation units or admixture events, and the panel of reusable probes we describe here offers the potential to improve conservation efforts through future diachronic studies or finer estimations of key parameters like effective population size.
README: Development of a panel of SNP loci in the emblematic southern damselfly (Coenagrion mercuriale) using a hybrid method: Pitfalls and recommendations for large-scale SNP genotyping in a non-model endangered species
https://doi.org/10.5061/dryad.vq83bk446
Description of the data and file structure
Files and variables
File: SNP_genotypes_131indv.txt
Description: this file contains for each population sample: the name of the geographical location where population was sampled (Population), the name of the collected individual (Individual), the geographical coordinates of each sampled population (xcoord_WGS84.EPSG4326, ycoord_WGS84.EPSG4326), the individual genotypes at the 2092 SNPs loci using a genind genotype format. Each individual sample is represented by a row, and each biallelic locus is encoded by two columns indicating the variant, with integers indicating the number of each allele, and summing up to the individuals' ploidy (2). Missing genotypes are encoded with NAs.
File: 135468_RADloci_sequence.fa
Description: a fasta file containing the sequences of the 135,468 RADloci built at the end of the ddRADseq data analyses with STACKS software
File: ddRADg_genotypes_20individuals.vcf
Description: a vcf file showing the genotypes on 758,786 SNPs of 20 individuals at the end of the ddRADseq data analyses with STACKS software.