When two diverging species begin hybridizing, selection against hybridization is likely driven not by single substitutions, but by interactions between incompatible mutations. To identify these incompatibilities in natural populations, researchers examine the extent of non-random associations between ancestry at physically unlinked loci in admixed populations. In this approach, which we call “AD scans”, locus-pairs with significantly positive “ancestry disequilibrium” (AD, i.e. locus-pairs that positively covary by ancestry) represent incompatible alleles. Past research has uniformly revealed an excess of locus-pairs with significantly positive AD, suggesting that dozens to hundreds of incompatibilities separate species. With forward simulations, we show that many realistic demographic scenarios, including recent and/or ongoing contact, generate a bias towards positive ancestry disequilibrium. We suggest steps that researchers can take to avoid pitfalls in interpreting AD scans, and present a novel measure of AD, which minimizes but does not fully eliminate bias in the AD distribution. We also show, by simulation, that the tail of the AD distribution is enriched for true incompatibilities. To illustrate the potential power and appropriate caution in interpretation of AD scans, we reanalyze previously published data from two admixed populations of Xiphophorus fishes. Our results imply that the prevalence of positive LD in admixed populations does not in itself support the idea that two-locus incompatibilities are widespread, but the co-enrichment of top AD hits across the two Xiphophorus populations supports the idea that AD scans can identify candidate interspecific incompatibilities.
configuration files and admix'em input file for neutral simulations
Required files to run neutral simulations including three admix'em configuration files for neutral hybrid swarm (admixsimul.cfg), neutral bottleneck hybrid swarm (admixsimul_bottle.cfg), and neutral hybrid swarm with migration simulations (admixsimul_mig.cfg) and other input files to admix'em called by these configuration files
neutral_simulations.tar
configuration files and admix'em input file for selection simulations
admix'em configuration files for simulations with epistatic selection under a hybrid swarm (admixsimul.cfg), bottleneck (admixsimul_bottle.cfg) and continuing migration model (admixsimul_mig.cfg) as well as admix'em input files called by these configuration files
selection_simulations.tar
simulation shell script
shell script to run admix'em simulations for a given configuration file, number of generations, and number of individuals. NOTE: assumes admixem is installed in the same directory and that the executable is called admixemp. Also assumes that perl and sh scripts provided with this submission are available in the same directory. Usage: perl simulated_ld_shell.pl cfg_file out_folder_name num_iterations focal_gen num_indiv_sample
simulated_ld_shell_YB_MS.pl
generate_msg_data_header_200locisim.pl
script to convert admix'em output for the simulation parameters provided to genotype format. This script is called by the simulation shell script: simulated_ld_shell_YB_MS.pl
hybrid_index.pl
script to calculate hybrid index from the genotype-format admix'em output for the 200 loci simulated. This script is called by the simulation shell script: simulated_ld_shell_YB_MS.pl
random_sample.sh
script to subsample the simulated individuals from the genotypes file to the desired number of individuals. This script is called by the simulation shell script: simulated_ld_shell_YB_MS.pl
compressed neutral simulation files for reproduction of results figure 2
"neutral.zip" contains neutral simulation results files and scripts needed to reproduce figure 2 in the main text
neutral.zip
compressed selection simulation files for reproduction of results figure 3
"selection.zip" contains neutral simulation results files and scripts needed to reproduce figure 3 in the main text
selection.zip