IIb-RAD-seq coupled with random forest classification indicates regional population structuring and sex-specific differentiation in salmon lice (Lepeophtheirus salmonis)
Guragain, Prashanna et al. (2022), IIb-RAD-seq coupled with random forest classification indicates regional population structuring and sex-specific differentiation in salmon lice (Lepeophtheirus salmonis), Dryad, Dataset, https://doi.org/10.5061/dryad.p8cz8w9r9
The aquaculture industry has been dealing with salmon lice problems forming serious threats to salmonid farming. Several treatment approaches have been used to control the parasite. Treatment effectiveness must be optimized, and the systematic genetic differences between sub-populations must be studied to monitor louse species and enhance targeted control measures. We have used IIb-RAD sequencing in tandem with a random forest classification algorithm to detect the regional genetic structure of the Norwegian salmon lice and identify important markers for sex differentiation of this species. We identified 19428 single nucleotide polymorphisms (SNPs) from 95 individuals of salmon lice. These SNPs, however, were not able to distinguish differential structure of lice populations. Using the random forest algorithm, we selected 91 SNPs important for geographical classification and 14 SNPs important for sex classification. The geographically important SNP data substantially improved the genetic understanding of the population structure and classified regional demographic clusters along the Norwegian coast. We also uncovered SNP markers that could help determine the sex of the salmon louse. A large portion of the SNPs identified to be under directional selection were also ranked highly important by random forest. According to our findings, there is a regional population structure of salmon lice associated with the geographical location along the Norwegian coastline.
The samples were collected along the Norwegian coast in 2019. DNA was extracted from lice and sequenced using IIb-RAD sequencing. Raw data was processed in a pipeline to generate Single nucleotide polymorphisms (SNPs) dataset. For more details on data processing please look: https://doi.org/10.1002/ece3.8809
Read the readme file for more detailed information on data.
SNP genotypes are encoded as 0, 1 or 2.
Missing values are coded as NA.
Geography is coded as : RA, VE, MD, NN
Norwegian Seafood Research Fund, Award: 901241