This README.txt file was generated on 2022-04-06 by Prashanna Guragain GENERAL INFORMATION Title of Dataset: IIb-RAD-seq coupled with random forest classification indicates regional population structuring and sex-specific differentiation in salmon lice (Lepeophtheirus salmonis) Author Information Prashanna Guragain1,2, Anna Solvang Båtnes2, John Zobolas1, Yngvar Olsen2, Atle M. Bones1,2, Per Winge1,2* 1Cell, Molecular Biology and Genomics Group, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway 2Taskforce Salmon Lice, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway *Corresponding author Per Winge e-mail: per.winge@ntnu.no Date of data collection: 2019 Geographic location of data collection: Norwegian coastal line, Norway Funding: Salmon industry in Mid‐Norway, the Norwegian Seafood Research Fund (project number 901241), and the NTNU (https://www.ntnu.edu/oceans/taskforce). Recommended citation for dataset: Guragain, Prashanna et al. (2021), IIb-RAD-seq coupled with random forest classification indicates regional population structuring and sex-specific differentiation in salmon lice (Lepeophtheirus salmonis), Dryad, Dataset, https://doi.org/10.5061/dryad.p8cz8w9r9 DATA & FILE OVERVIEW Description of dataset These data were generated to study the population structure along the Norwegian coast and sex determination in salmon lice. A total of 95 samples were collected from 12 sampling locations across four geographical regions. DNA was extracted and sequenced using IIb-RAD sequencing (CD genomics USA). File List: File 1 Name: Full_SNP_dataset_19428_loci.csv.zip Description: Full SNP dataset for salmon lice. File 2 Name: RfGeo_91_loci.csv Description: Important SNPs selected using Random forest for population analysis. File 3 Name: RfSex_14_loci.csv Description: Important SNPs selected using Random forest for sex analysis. DATA-SPECIFIC INFORMATION: Full_SNP_dataset_19428_loci.csv.zip Number of variables: 19432 Number of rows: 95 SNP genotypes are encoded as 0, 1 or 2. Missing values are coded as NA.  Abbreviations used: Geography is coded as RA, VE, MD, NN First four columns are Sample ID, Population area, Sex and Geographical locations. Remaining columns are SNP features. DATA-SPECIFIC INFORMATION: RfGeo_91_loci.csv Number of variables: 95 Number of rows: 95 SNP genotypes are encoded as 0, 1 or 2. Missing values are coded as NA.  Abbreviations used: Geography is coded as RA, VE, MD, NN First four columns are Sample ID, Population area, Sex and Geographical locations. Remaining columns are SNP features. DATA-SPECIFIC INFORMATION: RfSex_14_loci.csv Number of variables: 18 Number of rows: 95 SNP genotypes are encoded as 0, 1 or 2. Missing values are coded as NA.  Abbreviations used: Geography is coded as RA, VE, MD, NN First four columns are Sample ID, Population area, Sex and Geographical locations. Remaining columns are SNP features.