### README_R_scripts ## SM Schaal This document describes the R scripts that were used to produce results presented in the manuscript. There are four R scripts: 1) ProcessSingleSimFile_submission.R which is the main script which produces the figures and data needed from each individual simulation, 2) ProcessAllSims.R that takes the results of the first R script to make summary plots used in the manuscript, 3) muts.R which summaries the distribution of QTN effect sizes and the percent additive genetic variance (VA) and 4) LocalAdaptationCalc.R which summarizes the amount of LA and percent VA in inverted versus collinear genomic regions across time for all simulations. # ProcessSingleSimFile_submission.R This file takes only three inputs: folderIn # This is the directory to the folder with all the outputs from the SLiM script folderOut # This is the output file for your figures and data that the R script produces seed # This is the seed of the simulation that you are running the R script on The code is currently set up to run on a cluster. If you are running on your local machine you need to uncomment the "If running on a local machine" section (lines 33-35). In addition, for each png file you need to remove the type = "cairo" argument that is need on a cluster. The easiest way to do this is search for png and manually remove that argument from the code. Outputs from this file will vary if there is an inversion mutation rate > 0 or not. Each output file will be proceeded by the seed of the simulation (denoted below with SEED). If there is an inversion mutation rate than the outputs are as follows: 1) SEED_adaptInvCriteria.pdf - shows the distribution of empicical p-values for each of our three criteria for determining whether a QTN is an outlier. Colors represent whether the QTN was found inside or outside of an inversion 2) SEED_fitness.pdf - shows the average fitness values of each population +/- 1 SD 3) SEED_heatmapPop1alphaINV.png - genotype heatmap for population 1 only inversion QTNs with pixel color as the product of (QTN effect size) * (copies of derived allele), and therefore representing the effect of that individual's genotype on their phenotype. 4) SEED_heatmapPop2alphaINV.png - same as above but for population 2 5) SEED_heatmapPop1alphaINVFST.png - same as above but with (QTN effect size)*(copies of derived allele)*(FST of that QTN) for population 1 6) SEED_heatmapPop2alphaINVFST.png - same as above but for population 2 7) SEED_heatmapPop1alphaFSTnew.png - genotype heatmap for all QTN loci with pixel color as the product of (QTN effect size) * (copies of derived allele) for population 1 8) SEED_heatmapPop2alphaFSTnew.png - same as above but for population 2 9) SEED_heatmapPop1geno.png - genotype heatmap of population 1. Each pixel in the heatmap is the number of alternate alleles (alleles that are different from the reference allele) that the individual (in rows, with label from the individual ID in SLiM) had at that locus (in columns, with label as the mutation ID in SLiM) with homozygous sites for the reference allele in light yellow, heterozygous sites in golden, or homozygous sites for the alternate allele in maroon 10) SEED_heatmapPop2geno.png - same as above but for population 2 11) SEED_invAge.pdf - the age of inversions in generations plotted over time split by adaptive inversions, nonadaptive inversions, and no-selection control inversions +/- 1 SD 12) SEED_invAgebox.pdf - boxplot of the distribution of inversion ages from all inversions in the final generation (60,000) 13) SEED_invLength.pdf- the length of inversions in cM plotted over time split by adaptive inversions, nonadaptive inversions, and no-selection control inversions +/- 1 SD 14) SEED_invLengthbox.pdf - boxplot of the distribution of the inversion lengths in cM from all inversions in the final generation (60,000) 15) SEED_invOriginBarplot.pdf - the proportion of inversions in three categories "capture and gain", "neutral and gain", and "neutral and no gain" for adaptive inversions, nonadaptive inversions and no-selection control inversions 16) SEED_invOriginFST.pdf - the FST of each inversion found in the final generation (60,000) plotted through time. Color of the inversion is determined by which population it is at highest frequency in in the final generation (60,000) 17) SEED_invOriginNew.pdf - the mean inversion effect size plotted through time for each inversion found in the final generation with the color of the inversion determined by which population it is at highest frequency in in the final generation (60,000) 18) SEED_invQTNsLscaled.pdf - the number of QTNs per inversion scaled by the length of the inversion plotted over time split by adaptive inversions, nonadaptive inversions, and no-selection control inversions +/- 1 SD 19) SEED_invQTNsLscaledbox.pdf - boxplot of the distribution of the number of QTNs per inversion scaled by the length of the inversion from all inversions in the final generation (60,000) 20) SEED_LA.pdf - the amount of local adaptation in the metapopulation plotted through time as the sympatric versus allopatric contrast. 21) SEED_manh.png - manhattan plots of the FST value for all QTNs (linkage groups 1-20) and neutral loci (linkage group 21) that evolved in the metapopulation with adaptive inversions plotted in the first mahnattan plot, nonadaptive inversions plotted in the second manhattan plot, and no-selection control inversions plotted in the final manhattan plot. 22) SEED_manhFST.png - same as above but just for the adaptive inversion plot 23) SEED_manhFSTzoom.png - same as above but with the y-axis scaled for the largest FST value in that simulation 24) SEED_outflankFstHist.pdf - the distribution of QTN FST values to check for fit of the chi-squared distribution needed for OutFLANK 25) SEED_pcaLoadingsPos.pdf - loadings of each QTN on the principal component axes determined by pcadapt 26) SEED_pcaScores.pdf - PC1 versus PC2 calculated from pcadapt for all QTNs 27) SEED_pheno.pdf - the average individual phenotype plotted over time for populations 1 and 2 28) SEED_VA.pdf - the average percent of the additive genetic variance found in inversions in red and found in the collinear genome in blue 29) SEEDnoSel_outflankFstHist.pdf - the distribution of QTN FST values to check for fit of the chi-squared distribution needed for OutFLANK for the no-selection control simulation 30) SEEDnoSel_pcaLoadingsPos.pdf - loadings of each QTN on the principal component axes determined by pcadapt for the no-selection control simulation 31) SEEDnoSel_pcaScoresPruned.pdf - PC1 versus PC2 calculated from pcadapt for all QTNs for the no-selecton control simulation 32) SEED__criticalMigration.png - the absolute value of the QTN effect size plotted as a function of its FST value. The vertical dashed line is the critical migration rate for that simulation. 33) SEED_outlierTests.png - manhattan plots of each QTN FST value with colors for which chromosome they are on in the top panel, whether they are called as an outlier (in red) or not (in black) from OutFLANK (middle panel) and pcadapt (bottom panel) with adaptive inversions plotted in the background 34) SEED_outlierTestsNA.png - manhattan plots of each QTN FST value with colors for which chromosome they are on in the top panel, whether they are called as an outlier (in red) or not (in black) from OutFLANK (middle panel) and pcadapt (bottom panel) with nonadaptive inversions plotted in the background Finally a row of data called "output.data" is appended to the dataframe called "outputSumData.txt". This should have summary data from all simulations that can be used to make summary plots with the next script "ProcessAllSims.R". This dataframe includes: 1) seed - simulation seed 2) Va_perc_In - average percent of the additive genetic variance found inside inversions at the end of the simulation 3) LA_final - amount of local adaptation at the end of the simulation 4) num_inv - number of adaptive inversions found at some frequency (MAF > 0.01) in the metapopulation at the end of the simulation 5) num_inv_NA - number of nonadaptive inversions found at some frequency (MAF > 0.01) in the metapopulation at the end of the simulation 6) num_inv_NS - number of no-selection control inversions found at some frequency (MAF > 0.01) in the metapopulation at the end of the simulation 7) capture_gain_p - the proportion of adaptive inversions in the final generation with a "capture and gain" evolutionary history 8) capture_no_gain_p - the proportion of adaptive inversions in the final generation with a "capture and no gain" evolutionary history 9) neutral_gain_p - the proportion of adaptive inversions in the final generation with a "neutral and gain" evolutionary history 10) neutral_no_gain_p - the proportion of adaptive inversions in the final generation with a "neutral and no gain" evolutionary history 11) capture_gain_p.NA - the proportion of nonadaptive inversions in the final generation with a "capture and gain" evolutionary history 12) capture_no_gain_p.NA - the proportion of nonadaptive inversions in the final generation with a "capture and no gain" evolutionary history 13) neutral_gain_p.NA - the proportion of nonadaptive inversions in the final generation with a "neutral and gain" evolutionary history 14) neutral_no_gain_p.NA - the proportion of nonadaptive inversions in the final generation with a "neutral and no gain" evolutionary history 15) capture_gain_p.NS - the proportion of no-selection control inversions in the final generation with a "capture and gain" evolutionary history 16) capture_no_gain_p.NS - the proportion of no-selection control inversions in the final generation with a "capture and no gain" evolutionary history 17) neutral_gain_p.NS - the proportion of no-selection control inversions in the final generation with a "neutral and gain" evolutionary history 18) neutral_no_gain_p.NS - the proportion of no-selection control inversions in the final generation with a "neutral and no gain" evolutionary history 19) ave_start_QTNs - the average number of QTNs at the start of the simulation in adaptive inversions 20) ave_start_QTNs_NA - the average number of QTNs at the start of the simulation in nonadaptive inversions 21) ave_start_QTNs_NS - the average number of QTNs at the start of the simulation in no-selection control inversions 22) ave_end_QTNs - the average number of QTNs at the end of the simulation in adaptive inversions 23) ave_end_QTNs_NA - the average number of QTNs at the end of the simulation in nonadaptive inversions 24) ave_end_QTNs_NS - the average number of QTNs at the end of the simulation in noselection control inversions 25) ave_start_FST - average starting FST value for all QTNs in adaptive inversions 26) ave_start_FST_NA - average starting FST value for all QTNs in nonadaptive inversions 27) ave_start_FST_NS - average starting FST value for all QTNs in no-selection control inversions 28) ave_end_FST - average final FST value for all QTNs in adaptive inversions 29) ave_end_FST_NA - average final FST value for all QTNs in nonadaptive inversions 30) ave_end_FST_NS - average final FST value for all QTNs in no-selection control inversions 31) ave_abV_start_qtnSelCoef - average QTN selection coefficent for all QTNs found inside adaptive inversions when the inversion arose 32) ave_abV_start_qtnSelCoef_NA - average QTN selection coefficent for all QTNs found inside nonadaptive inversions when the inversion arose 33) ave_abV_start_qtnSelCoef_NS - average QTN selection coefficent for all QTNs found inside no-selection control inversions when the inversion arose 34) ave_abV_end_qtnSelCoef - average QTN selection coefficent for all QTNs found inside adaptive inversions at the end of the simulation 35) ave_abV_end_qtnSelCoef_NA - average QTN selection coefficent for all QTNs found inside nonadaptive inversions at the end of the simulation 36) ave_abV_end_qtnSelCoef_NS - - average QTN selection coefficent for all QTNs found inside no-selection control inversions at the end of the simulation 37) true_pos_pcadapt - number of inversions found in selection simulations called as outliers when they were adaptive by pcadapt 38) true_pos_outflank - number of inversions found in selection simulations called as outliers when they were adaptive by OutFLANK 39) false_neg_pcadapt - number of inversions found in selection simulations not called as outliers when they were adaptive by pcadapt 40) false_neg_outflank - number of inversions found in selection simulations not called as outliers when they were adaptive by OutFLANK 41) true_neg_pcadapt - number of inversions found in selection simulations not called as outliers when they were not adaptive by pcadapt 42) true_neg_outflank - number of inversions found in selection simulations not called as outliers when they were not adaptive by OutFLANK 43) false_pos_pcadapt - number of inversions found in selection simulations called as outliers when they were not adaptive by pcadapt 44) false_pos_outflank - number of inversions found in selection simulations called as outliers when they were not adaptive by OutFLANK 45) true_neg_pcadapt_NS - number of inversions found in no-selection control simulations not called as outliers when they were not adaptive by pcadapt 46) false_pos_pcadapt_NS - number of inversions found in no-selection control simulations called as outliers when they were not adaptive by pcadapt 47) true_neg_outflank_NS - number of inversions found in no-selection control simulations not called as outliers when they were not adaptive by OutFLANK 48) false_pos_outflank_NS - - number of inversions found in no-selection control simulations called as outliers when they were not adaptive by OutFLANK 49) av.effect - average effect size of all QTNs 50) av.perc.VA - average percent of the additive genetic variance explained by individual QTNs 51) num.adapt.overlap.invs - number of overlapping inversions on the genetic map 52) num.adapt.within.invs - number of inversions within the bounds of another inversion on the genetic map In addition, a few other dataframes are output: 1) outputInvChar_finalGen.txt - appends rows for each inversion found in the final generation with five columns of data: the seed of the simulation, the inversion age, the inversion length, the number of QTNs scaled by its length, and whether it was called as adaptive, nonadaptive, or found in the no-selection control simulation. 2) outputInvChar_allData.txt - appends rows with data found in "outputInvChar_finalGen.txt", but averaged at every 200th generation. Columns are split for inverisons in each category: adaptive, nonadaptive, and no-selection control with the mean in one column and the upper and lower SD in other columns. 3) outputAdaptInvCrit.txt - appends rows with the results of the criteria used to call an inversion as adaptive or not for all inversions in the final generation along with its ID, length, starting base, final base, and number of qtns found inside it. 4) outputInvGenome.txt - appends a row with the percent of the genome that is inverted used as a null expectation in a summary plot in the next script # ProcessAllSims.R This R script takes the outputs of the previous R script and creates a number of summary plots. This file takes two inputs: folderIn # This is the directory to the folder with all the outputs from the SLiM script and R scripts folderOutFig # This is the output file for your figures that this R script produces In the folderIn you need a number of files that were provided or created in the last R script: 1) "FullSet_dfparams.txt", 2) "outputSumData.txt", 3) "outputInvChar_finalGen.txt", 4) "outputInvChar_allData.txt", and 5) "outputInvGenome.txt". 1) Fig2_LAplots_envar.pdf - which corresponds to Figure 2 in the manuscript and summarizes the local adaptation, percent additive genetic variance, and the number of adaptive inversions for each parameter combination for all simulations that included environmental variance. 2) fig3_evoHist.pdf - which corresponds to Figure 4A in the manuscript and is a boxplot showing the distribution of inversions with our three evolutionary history categories: "capture and gain", "capture and no-gain", and "neutral and gain" across every simulation 3) SFigX_characteristics.pdf - which corresponds to Figure S10 in the manuscript and is a boxplot showing the distirbution of inversion characteristics (i.e., age, length, and number of QTNs scaled by inversion length) split by each parameter combination 4) Fig3_characteristics_envar.pdf - which corresponds to Figure 3 in the manuscript and the same as the previous figure just not split by each parameter combination (i.e., summarized across all simulations) 5) SFigX_outliers_count_envar.pdf - which corresponds to Figure S14 in the manuscript and is a stacked barplot of the average number of inversions found in the final generation of the selection simulation called correctly or incorrectly by genome scan methods pcadapt and OutFLANK for each parameter combination 6) SFigX_outliersNS_count_envar.pdf - which corresponds to Figure S14 in the manuscript and is a stacked barplot of the average number of inversions found in the final generation of the no-selection simulation called correctly or incorrectly by genome scan methods pcadapt and OutFLANK for each parameter combination 7) SFig7_overlappingInversions.pdf - which corresponds to Figure S7 and is a boxplot showing the distribution of the number of overlapping inversions and inversions within inversions that evolved across the different parameter combinations # muts.R This script creates a distribution of all the QTN effects on the phenotype and their percent additive genetic variance. The input for this script includes three directories: 1) DATADIR - the directory to a folder containing all the "SEED_outputMutations.txt" files, 2) PARAMSDIR - the directory to a parameter file called "FullSet_dfparams.txt" which lists the seed and the parameters that correspond to that seed, and 3) DATAOUT - which is where you want the output to be printed to. This script can be run one of two ways. The original run will create a file called "FullSet_muts.txt" and written to the output directory on line 29. The rest of the script will make the plots for summarizing the distributions of QTN effects on the phenotype and percent additive genetic variance which correspond to Figure S3 in this manuscript. If you have already created the "FullSet_muts.txt" you can comment lines 17 to 29 and uncomment line 31 to just read in that file. # LocalAdaptationCalc.R This script is used in a similar manner as the muts.R script. This script plots the average LA and percent VA inside inversions compared to the collinear genome over time for all parameter combinations. The input is the same as muts.R with three directories: 1) DATADIR - the directory to a folder containing all the "SEED_outputPopDynam.txt" files 2) PARAMSDIR - the directory to a parameter file called "FullSet_dfparams.txt" which lists the seed and the parameters that correspond to that seed and, 3) DATAOUT - which is where you want the output to be printed to. This script can also be run one of two ways. The original run will make two output files. The first will be a file called "FullSet_popDyn.txt" which is a concatentation of all the SEED_outputPopDynam.txt files. The second will then be an average of those values across the five replicates per parameter combination. These files will be written out on lines 36 and 44, respectively. If you have already created these files, you can comment out lines 24-44 and uncomment line 46 to just read in the second file for plotting.