Title: Assessing biological factors affecting post-speciation introgression Authors: Jennafer A. P. Hamlin jennahamlin@gmail.com Mark S. Hibbins Leonie C. Moyle Accepted at: Evolution Letters, Oct. 2019 This data dryad directory contains three data files and one sub-directory called scripts with sub-directories within it associated with the above publication. The three data files: - solanum.all.filtered.mvf (4.47 GB): This is the mvf file for all 33 mapped and filtered genomes. - solanum.geo.combined.windows.csv (22.60 MB): This is a csv file calculated using mvftools for all of the geographic trios. - solanum.mating.combined.windows.csv (9.41 MB): This is a csv file calculated using mvftools for all of the mating trios. The 'scripts' sub-directory has three additional sub-directories: - D_bootstrap: This contains the pbs scripts and a python script to bootstrap D for each trio. A brief description of these scripts is given below: tomato_Dstat.py: Was used to estimate D and perform bootstrapping for a given trio pbs files: Used to call tomato_Dstat.py for each trio. - D2_statistic: This contains shell scripts and python scripts for performing the D2 simulation. A brief description of these scripts is given below: get_genetrees.pbs: This script was used to call MVFtools to infer gene trees in 100kb windows for each trio. calc_empirical_D2.py: This script was used to estimate the D2 statistic from the set of gene trees estimated for a given trio. calc_empirical_D2.sh: This script was used to call calc_empirical_D2.py for each trio. calc_D2_sims.py: This script was used to calculate D2 given an output file from Seq-Gen and corresponding tree output file from ms. calc_D2_pvals.py: This script was used to estimate the p-value of the empirical D2 statistic estimated for a given trio, using the null distribution simulated for that trio. Folder D2_sims/D2_sim_scripts: Contains shell scripts which were used to call ms and Seq-Gen, to generate the simulated data used to estimate the null distribution of D2 for each trio. Each trio has its own folder which contains 10 job scripts, which simulated 100 replicates each. Folder D2_sims/calc_sim_D2_scripts: Contains shell scripts which were used to call calc_D2_sims.py and estimate the null distribution of D2 values for each trio. Also broken up into 10 scripts for each trio, each of which calculated the value of D2 for 100 replicate simulations. - Dp_admix_prop: This contains pbs scripts and python scripts for calculating Dp or the admixture proportion. A brief description of these scripts is given below: tomato_prop.py: This script was used to estimate the value of Dp for a given trio. calc_tomato_prop.pbs: This script was used to call tomato_prop.py for each trio. Folder Dp_sims: calc_newstat.py was used to calculate the value of Dp for each replicate simulation. The four subfolders in this folder are broken down by condition: C -> B introgression with a timing of 0.2 (basesims), B -> C introgression with a timing of 0.2 (otherdir_sims), C -> B introgression with a timing of 0.04 (timing_sims), and B -> C introgression with a timing of 0.04 (otherdir_timing_sims). Each of these folders contains a set of shell scripts, which were used to call ms and Seq-Gen to simulate introgression across 10 different values of the admixture proportion.