This repository contains a zipped folder with data and analyses unique to the manuscript Local adaptation in shell shape traits of a brooding chiton with strong population genomic differentiation, in addition to datasets linked to previous publications and also analysed here. The datasets linked to previous publications are: Morphological data: https://doi.org/10.17608/k6.auckland.12291200.v1 (linked to Salloum et al. 2020, https://doi.org/10.1093/biolinnean/blaa073); Raw sequence files and metadata: https://doi.org/10.17608/k6.auckland.19388579.v1 (linked to Salloum et al. 2022, https://doi.org/10.1111/1365-2656.13692). De-multiplexed sequence files (fastq): https://doi.org/10.17608/k6.auckland.19365608 (linked to Salloum et al. 2022, https://doi.org/10.1111/1365-2656.13692) The datasets and scripts unique to this publication are within the zipped folder named Data_scripts_FST-PST_O-negglectus. In this folder, find the following: Project name: Local adaptation in shell shape traits of a brooding chiton with strong population genomic differentiation FigShare DOI: 10.17608/k6.auckland.21077251 Folders: Pst: Datasets and code used for asessing phenotypic differentiation and estimating PST - all_pca_size_noCR: principal component scores and raw measurements of area (in millimetres), perimeter (in millimetres), circularity (in millimetres), length (in millimetres), and width (in millimetres) for the three valves of all samples (not regressed); - regressed_data_noCR: dataset used for estimating phenotypic differentiation (PST), regressed in 'area' (for all samples); -regressed_data_byGroupNCS: same as regressed_data_noCR, but using the information about genetic clades (North, Central, and South); - regressed_data_byGroupNS: same as regressed_data_noCR, but using the information about shell shape groups (northern and southern); - bootstrapValuesPst_PC1f: bootstrap values resulting from PST estimation for the PC1 score of the fourth valve; - PSTvalsANDcis_fPC1: Estimated PST (c=h2) and confidence intervals for the PC1 score of the fourth valve (used to plot the heat map below diagonal in Figure 3); - pairwisePopCombinations: combination of populations pairwise, used to plot PST of heatmap in Figure 3; -geofile: file with geographical distances among populations, used to test for correlation between PST and geographical distance; - pst_return_VariationWIthin-BetweenPops.R: R script used to assess the variation within and between populations for each trait (results in Tables S5-S8 in supplementary material); - pca_distribution_shellTraits.R: R script used to plot the distribution of all traits (results in Figure S8 in supplementary material); - calculating_p-value-fromCIofPSTpairwise.R: R script used to calculate significance of PST estimates of the PC1 score of the fourth valve for plotting in figure 3 (heatmap below diagonal). - pst_BayescanOutGenalexFst_Overall-Pairwise.R: R script used to calculate PST and compare it with FST (results in Table 2, Figure 3, and Figure S11) -calc_save_pairPst.R: R script for calculating population pairwise PST and saving the results. -cor_geo-fst-pst: R script to test for correlation among PST and geographical distance (mantel test) Pst_forthPC1_pairwise_forHeatMap.csv: pairwise pst of PC1 of the fourth valve used to make heatmap in Figure 3 (below diagonal) Pst_forthPC1_pairwise_withSignfromCI.csv: significance of pst of PC1 of the fourth valve, used to mark significant comparisons in Figure 3, based on the confidence intervals of pst. PST_fPC1_long_pairfst_forHetmap.csv: pairwise pst of PC1 of the fourth valve in long format, used to make heatmap in figure 3 (below diagonal) heatmap_pairwisePst_PCfPwithDotsCInotZero_2021.R: script for calculating a heatmap of pairwise Pst (of the PC1 of the fourth valve) with dots marking significance, such as in Figure 3 (below diagonal). Valves_raw_images Subdivided into head, fourth and tail folders, contains all raw images obtained from the chiton valves prior to geometric morphometric analyses. Not all images were used, see readme file in the folder for more details. A pdf file containing the SHAPE output contours for each principal component is included in the respective folders (head, fourth and tail valve). vcf Contains .vcf files after filtering, with respective metadata file (information on populations and ID of samples) - NAok_9275Snps_all.vcf: file with all SNPs (before removing outliers identified with Bayescan) - allNeutral.out.recode.vcf: file with only non-outlier SNPs (after filtering out outliers detected with Bayescan) - Id_pop_info: medatada (samples ID, population and group information) Fst Folder containing script and datasets used for estimating FST - heatmap_pairwisePst_PCf_withDotsCInotZero_2021.R: R script for plotting the heatmap in figure 3, with dots showing significance. Subfolder: all_loci - stamppFst.R: script for estimating FST from all 9275 loci using hierfstat, and also pairwise (results of pairwise estimations included in this folder: pairwise_fst_stamppFst_15pops.csv and pairwise_PVALUES-of-fst_stamppFst_15pops.csv) Subfolder: non_outlier - COI_long_pairfst_forHeatmap.csv: input file in long format to generate ?ST heatmap of figure S7 (below diagonal) based on COI (from Salloum et al., 2020). - COI_pairwiseFST_withSignificance.pdf: output file of ?ST heatmap of figure S7 (below diagonal) - Coi_pair_fst_noCR.csv: pairwise ?ST based on COI (from Salloum et al., 2020). - Coi_pair_fstSignificance_noCR.csv: significance of ?ST based on COI (from Salloum et al., 2020). - heatmap_pairwisefst_COI-SNPs_2021.R: R script to generate heatmap of figure S7 in supplementary material - NeutralAmovaGenalexAllPopsNCS.xlsx: GenAlex AMOVA results - NeutralToGenalex_csv: input to GenAlex in csv format (converted from allNeutral.out.recode.vcf); - Pairwise_fst_neutralGenalex.csv: pairwise Fst results (neutral dataset) in csv format - Pairwise_fst_p-values_neutralGenalex.csv: pairwise FST p-values (neutral dataset) in csv format) - SNPs_heatmap_withSignificance.pdf: heatmap of pairwise Fst with significance (black dots), used to make figure 3 and figure S7 (above diagonal) - Snps_pair_fst-neutralGenalex.csv: SNPs pairwise Fst results (neutral dataset) in csv format with three decimal points - Snps_COI_heatmap_withSignificance_2021.pdf: Figure S7 - Snps-pair_fstSignificance_neutralGenalex.csv: p-value (significance) of pairwise Fst (used to plot dots in Figure 3 and figure S7 above diagonal) - SummaryGenalexResults.xlsx: Summary of results for GenAlex analyses Bayescan Folder containing output of analyses with Bayescan, input file and slurm script used to run the analyses. - Slurm_script.sh: script used to run Bayescan analyses in the cluster - Fst_outliers_FDR005.tsv: list of outlier loci - All pdf files named ‘fst_distrib_pop*.pdf’: plots of the distribution of the parameters (Fst) for each population (1 to 16) - LogL_distrib.pdf: plot of the posterior distribution of the log likelihood of the Bayescan run. - fstAllLociVsAlpha_qvalScale.pdf: plot of the qvalues of all loci against the alpha parameter from the Bayescan run - AllTObayescenv.txt: input file converted from vcf (NAok_9275Snps_all.vcf) - All_notNeutral_Verif.txt, all_notNeutral_fst.txt, all_notNeutral_AccRte.txt, and all_notNeutral.sel: output files from Bayescan run - Plot_R.r: R script to plot Bayescan results Co-ancestry Contains the R script and input files required to plot the co-ancestry matrices in Figure 2. Co-ancestry.R : R script to take vcf input files, convert them to LEA input files removing uninformative loci (will crash LEA if left in), and make the co-ancestry matrix based on snmf. allNeutral.out.recode.vcf – vcf file containing the non-outlier SNPs for all samples (same as the one in the vcf folder) id_pop_info.txt: metadata for the file above central.out.recode.vcf: vcf file containing the non-outlier SNPs only for Central samples central.txt: list of sample IDs for the file above id_pop_infoC.tsv: metadata for the central clade north.out.recode.vcf: vcf file containing the non-outlier SNPs only for North samples north.txt: list of sample IDs for the file above id_pop_infoN.tsv: metadata for the north clade southern.out.recode.vcf: vcf file containing the non-outlier SNPs only for the South samples southern.txt: list of sample IDs for the file above id_pop_infoS.tsv: metadata for the south clade Subfolder COI: input files and metadata for performing co-ancestry analyses based on the COI data from Salloum et al., 2020. - C_radiator_data_20200716@1217.snmf: subfolder with results from Radiator snmf run for the Central Clade with k = 2 populations (subfolders inside correspond to iterations and masking of the analyses automatically generated by Radiator) - COI.snmf: subfolder with results from Radiator snmf run with k = 16 populations (subfolders inside correspond to iterations and masking of analyses automatically generated by Radiator) - N_radiator_data_20200716@1209.snmf: subfolder with results from Radiator snmf run for the North Clade with k = 4 (subfolders inside correspond to iterations and masking of the analyses automatically generated by Radiator) - S_radiator_data_20200716@1147.snmf: subfolder with results from Radiator snmf run for the Southern clade, with k = 16 populations (subfolders inside correspond to iterations and masking analyses automatically generated by Radiator) - Radiator_data_20200716@1053.snmf: subfolder with results from Radiator snmf run for all populations with k = 16 (subfolders inside correspond to iterations and masking of the analyses automatically generated by Radiator) - C_radiator_data_20200716@1217.* : different output file formats of the radiator analysis run with k=2 in the central clade (formats are: *.geno, *.removed, *.snmfProject, *.vcf, *.vcfsnp) - COI.* : different output file formats of the radiator analysis run with k=2 in the Central Clade (formats are: *.geno, *.removed, *.snmfProject, *.vcf, *.vcfsnp). Note: COI_noMA.fasta contains the fasta sequences of the COI gene for all samples (not generated by Radiator) - N_radiator_data_20200716@1209.* : different output file formats of the radiator analysis run with k=4 in the North Clade (formats are: *.geno, *.removed, *.snmfProject, *.vcf, *.vcfsnp) - S_radiator_data_20200716@1147.*: different output file formats of the radiator analysis run with k=16 in the Southern Clade (formats are: *.geno, *.removed, *.snmfProject, *.vcf, *.vcfsnp) - radiator_data_20200716@1053.*: different output file formats of the radiator analysis run with k=16 in all populations (formats are: *.geno, *.removed, *.snmfProject, *.vcf, *.vcfsnp) - Info.csv: metadata for all samples, in csv format. - Info.tsv: metadata for all samples, in tsv format. - infoC:tsv: metadata for samples of the Central Clade, in tsv format. - infoN.tsv: metadata for samples of the North Clade, in tsv format. - infoS.tsv: metadata for samples of the Southern Clade, in tsv format. Subfolders 05_radiator*, 06_radiator* and 07_radiator*: contain the transformed vcf file into LEA’s input file (the script in co-ancestry.R will call for files in these folders, but it is possible to start from the vcf files and convert them using the R code provided, and the only requirement is to change the name of the folders/files to the newly converted files when running the co-ancestry analyses with LEA). More specifically: - 05_radiator_genomic_converter_20200713@1036: contains the results of running the genomic converter of Radiator in the populations of the Southern clade. o radiator_data_20200713@1036.*: Different file formats automatically generated by the genomic converter (*.geno, *.lfmm, *.rad, *.removed, *.snmfProject, *.vcfsnp, *.vcf) o radiator_data_20200713@1036.pca: subfolder with results from Radiator pca run for the Southern clade(subfolders inside correspond to results of the pca analyses: *.eigenvalues, *.eigenvectors, *.projections, and *.sdev) o radiator_data_20200713@1036.snmf: subfolder with results from Radiator snmf run for the Southern clade, with k = 15 populations (subfolders inside correspond to iterations and masking analyses automatically generated by Radiator) o Filters_parameters_20200713@1036.tsv: output file from Radiator genomic converter, with information on the parameters used to convert files (only has headers as no filter was used in the conversion). o Radiator_genomic_converter_args_20200713@1036.tsv: output file from Radiator genomic converter, with information on the arguments used to convert files. - 06_radiator_genomic_converter_20200713@1046: contains the results of running the genomic converter of Radiator in the populations of the North clade. o radiator_data_20200713@1046.*: Different file formats automatically generated by the genomic converter (*.geno, *.lfmm, *.rad, *.removed, *.snmfProject, *.vcfsnp, *.vcf) o radiator_data_20200713@1046.pca: subfolder with results from Radiator pca run for the North clade(subfolders inside correspond to results of the pca analyses: *.eigenvalues, *.eigenvectors, *.projections, and *.sdev) o radiator_data_20200713@1046.snmf: subfolder with results from Radiator snmf run for the North clade, with k = 9 populations (subfolders inside correspond to iterations and masking analyses automatically generated by Radiator) o Filters_parameters_20200713@1046.tsv: output file from Radiator genomic converter, with information on the parameters used to convert files (only has headers as no filter was used in the conversion). o Radiator_genomic_converter_args_20200713@1046.tsv: output file from Radiator genomic converter, with information on the arguments used to convert files. - 07_radiator_genomic_converter_20200713@1108: contains the results of running the genomic converter of Radiator in the populations of the Central clade. o radiator_data_20200713@1108.*: Different file formats automatically generated by the genomic converter (*.geno, *.lfmm, *.rad, *.removed, *.snmfProject, *.vcfsnp, *.vcf) o radiator_data_20200713@1108.pca: subfolder with results from Radiator pca run for the Central clade(subfolders inside correspond to results of the pca analyses: *.eigenvalues, *.eigenvectors, *.projections, and *.sdev) o radiator_data_20200713@1108.snmf: subfolder with results from Radiator snmf run for the Central clade, with k = 6 populations (subfolders inside correspond to iterations and masking analyses automatically generated by Radiator) o Filters_parameters_20200713@1108.tsv: output file from Radiator genomic converter, with information on the parameters used to convert files (only has headers as no filter was used in the conversion). o Radiator_genomic_converter_args_20200713@1108.tsv: output file from Radiator genomic converter, with information on the arguments used to convert files.