Data for: SNPs detected in pool-seq data from resistant and susceptible Cimex lectularius populations
Data files
Mar 28, 2023 version files 306.22 MB
-
data_snps_cimex.txt
306.22 MB
-
README.md
1.33 KB
Abstract
In the last few years, the bed bug Cimex lectularius has been an increasing problem world-wide, mainly due to the development of insecticide resistance to pyrethroids. The characterization of resistance alleles is a prerequisite to improve surveillance and resistance management. To identify genomic variants associated with pyrethroid resistance in Cimex lectularius, we compared the genetic composition of two recent and resistant populations with that of two ancientsusceptible strains using a genome-wide pool-seq design. We identified a large 6 Mb "superlocus" showing particularly high genetic differentiation and association with the resistance phenotype. This superlocus contained several clustered resistance genes, andwas also characterized by a high density of structural variants (inversions, duplications). The possibility that this superlocus constitute a resistance "supergene" that evolved after the clustering of alleles adapted to insecticide and after reduction in recombination is discussed.
The four strains used in this studywere provided by CimexStore Ltd (Chepstow, United Kingdom). Two of these strains were susceptible to pyrethroids (S), as they were collected before their massive use and have been maintained under laboratory condition without insecticide exposure for more than 40 years : German Lab (GL, collected in Monheim, Germany) and London Lab (LL, collected in London, Great Britain). The other two resistant (R) populations were London Field (LF, collected in 2008 in London) moderately resistant to pyrethroids, and Sweden Field (SF, collected in 2015 in Malm., Sweden), with a moderate-to-high resistance level.
For each strain, genomic DNA was extracted from 30 individual females (except for London Lab which had only 28) using NucleoSpin 96 Tissue Kit (Macherey Nagel, Hoerdt, France) and eluated in 100 μL of BE buffer. DNA concentration of these samples was measured using Quant-iT PicoGreen Kit (ThermoFisher, Waltham MASS, USA) according to manufacturer’s instructions. Samples were then gathered with an equal DNA quantity into pools. DNA purification was performed for each pool with 1.8 times the sample volume in AMPure XP beads (Beckman Coulter, Fullerton CA, USA). Purified DNAwere retrieved in 100 μL of ultrapure water. Pool concentrations were measured with Qubit using DNA HS Kit (Agilent, Santa Clara CA, USA). Final pool concentrations were as follow: 38.5 ng/μL for London Lab, 41.6 ng/μL for London Field, 40.3 ng/μL for German Lab and 38 ng/μL for Sweden Field. Sequencing was performed using TruSeq Nano Kit (Illumina, San Diego CA, USA) to produce paired-end read of 2 x 150 bp length and a coverage of 25 X for London Lab, 32 X for London Field, 39.5 X for German Lab and 25.4 X for Sweden Field by Genotoul (Castanet-Tolosan, France).
The whole pipeline with the detail of parameters used is available on GitHub (https://github.com/chaberko-lbbe/clec-poolseq). Quality control analysis of reads obtained from each line was performed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). The raw data have been submitted to the Sequence Read Archive (SRA) database of NCBI under BioProject PRJNA826750. Sequencing reads were filtered using Trimmomatic software v0.39 (Bolger et al., 2014), which removes adaptors. FastUniq v1.1 was then used to remove PCR duplicates (Xu et al., 2012). Reads were mapped on the C. lectularius reference genome (Clec_2.1 assembly, Harlan strain) performed as part of the i5K project (Poelchau et al., 2015), with an estimated size of 510.83 Mb. Mapping was performed using BWA mem v0.7.4 (Li and Durbin, 2009). Sam files were converted to bam format using samtools v1.9, and cleaned of unmapped reads (Li et al., 2009). The 1573 nuclear scaffolds were kept in this analysis, while the mitochondrial scaffold was not considered.
Bam files corresponding to the four populations were converted into mpileup format with samtools v1.9. The mpileup file was then converted to sync format by PoPoolation2 version 1201 (Kofler et al., 2011). 8.03 million (M) SNPs were detected on this sync file using R/poolfstat package v2.0.0 (Hivert et al., 2018) and the following parameters: coverage per pool between 10 and 50. Fixation indexes (FST) were computed with R/poolfstat for each pairwise population comparison of each SNP. Global SNP pool was then trimmed on minor allele frequency (MAF) of 0.2 (computed as MAF = 0.5 − |p − 0.5|, with p being the average frequency across all four populations). This relatively high MAF value was chosen in order to remove loci for which we have very limited power to detect any association with the resistance phenotype in the BayPass analysis. BayPass v2.3 (Olazcuaga et al., 2020) was used with default parameters. The final dataset was thus reduced to 2.92M SNPs located on 990 scaffolds.