Speciation genomic studies aim to interpret patterns of genome-wide variation in light of the processes that give rise to new species. However, interpreting the genomic ‘landscape’ of speciation is difficult, because many evolutionary processes can impact levels of variation. Facilitated by the first chromosome-level assembly for the group, we use whole-genome sequencing and simulations to shed light on the processes that have shaped the genomic landscape during a radiation of monkeyflowers. After inferring the phylogenetic relationships among the nine taxa in this radiation, we show that highly similar diversity (π) and differentiation (FST) landscapes have emerged across the group. Variation in these landscapes was strongly predicted by the local density of functional elements and the recombination rate, suggesting that the landscapes have been shaped by widespread natural selection. Using the varying divergence times between pairs of taxa, we show that the correlations between FST and genome features arose almost immediately after a population split and have become stronger over time. Simulations of genomic landscape evolution suggest that background selection (i.e., selection against deleterious mutations) alone is too subtle to generate the observed patterns, but scenarios that involve positive selection and genetic incompatibilities are plausible alternative explanations. Finally, tests for introgression among these taxa reveal widespread evidence of heterogeneous selection against gene flow during this radiation. Combined with previous evidence for adaptation in this system, we conclude that the correlation in FST among these taxa informs us about the processes contributing to adaptation and speciation during a rapid radiation.
Genome-wide data in nonoverlapping 500kb windows
This file includes population genetic statistics, measures of genomic features, and estimates of phylogenetic concordance in 500kb nonoverlapping windows across the bush monkeyflower genome, which were used in analyses of genomic landscape evolution. Fst and dxy are included for 36 pairwise comparisons between taxa, and nucleotide diversity is included for all 9 taxa. These statistics were calculated using python scripts downloaded from https://github.com/simonhmartin/genomics_general. PC1 Fst, PC1 dxy, and PC1 nucleotide diversity for each window are obtained by performing a PCA using the 36 comparisons (fst or dxy) or 9 taxa (nucleotide diversity) as variables, and provide a summary of variation across taxa or taxon comparisons in each of these statistics. Gene count is obtained from the genome annotation, recombination rate (cM/Mb) is based on the genetic map, and tree concordance is obtained by taking the correlation coefficient between the window based tree and the whole genome ‘species tree.’
500kb_win_data_nonoverlap.txt
Genome-wide data in nonoverlapping 100kb windows
This file includes population genetic statistics, measures of genomic features, and estimates of phylogenetic concordance in 100kb nonoverlapping windows across the bush monkeyflower genome, which were used in analyses of genomic landscape evolution. Fst and dxy are included for 36 pairwise comparisons between taxa, and nucleotide diversity is included for all 9 taxa. These statistics were calculated using python scripts downloaded from https://github.com/simonhmartin/genomics_general. PC1 Fst, PC1 dxy, and PC1 nucleotide diversity for each window are obtained by performing a PCA using the 36 comparisons (fst or dxy) or 9 taxa (nucleotide diversity) as variables, and provide a summary of variation across taxa or taxon comparisons in each of these statistics. Gene count is obtained from the genome annotation, recombination rate (cM/Mb) is based on the genetic map, and tree concordance is obtained by taking the correlation coefficient between the window based tree and the whole genome ‘species tree.’
100kb_win_data_nonoverlap.txt
Genome-wide fd statistic in 500kb windows
This file contains estimates of admixture (fd) calculated in 500kb non-overlapping windows across the genome, for 48 different four taxon comparisons. Fd was calculated using a python script download from https://github.com/simonhmartin/genomics_general.
500kb_window.fd_statistic.txt
Genotypes file for genetic map
This file is the input data file used for map construction in joinmap format (.loc) produced by the program Stacks 1.3.5 assuming a cp cross design. The first column gives the locus ID assigned to each marker by the program Stacks 1.3.5. The next column gives the segregation type code for each marker using the joinmap 4 convention. Each subsequent column provides the genotypes for an individual for all 9029 markers. Missing data is coded as “--“. The ID for each individual is given as a list in the first column underneath the last locus ID.
batch_1.genotypes_250.loc
Genetic map
This file contains the full genetic map used to estimate recombination rates and scaffold the genome. ‘LG’ is the linkage group identifier, which ranges from 1 – 10. The stacks_id field contains the locus ID allocated to each marker by the program Stacks 1.3.5 (Catchen et al. 2013). bp is the base-pair position of the marker within the M. aurantiacus assembly at the chromosome scale. Contig is the assembly contig (scaffold) that each marker is associated with. cM gives the sex-averaged map position estimated for each marker.
Genetic_map.txt
Genomic location of mapped markers
This file, which is in standard SAM format, contains the genomic position of each of the mapped markers. The ID for each locus is the ID allocated to each marker by the program Stacks 1.3.5. The sequence in the SEQ field is the consensus tag sequence for each marker, exported from Stacks 1.3.5. Mapping was performed with bowtie 2.2.6.
Mapped_markers_to_genome.sam
Tree topologies in 500kb non-overlapping windows
This file contains ML trees estimated by RAxML using MVFtools in 500kb non-overlapping windows. These were used to calculate the estimate of tree concordance based on the correlation with the species tree topology.
500kb_win_trees.txt
Tree topologies in 100kb non-overlapping windows
This file contains ML trees estimated by RAxML using MVFtools in 100kb non-overlapping windows. These were used to calculate the estimate of tree concordance based on the correlation with the species tree topology.
100kb_win_trees.txt
Genome-wide variant calls
This file contains the genome-wide variant calls (SNPs) for all 37 individuals included in the study. Variants were called with GATK v3.8 using UnifiedGenotyper and following the best practices work flow.
all_9_taxa_G1_vars.vcf.gz
Genome-wide VCF including invariant sites
This VCF file includes genotype calls for all 37 individuals included in the study at both variant and invariant sites. This file was generated using GATK v3.8 UnifiedGenotyper by including the EMIT_ALL_SITES option, and was used to more accurately estimate dxy and nucleotide diversity in genomic windows.
all_9_taxa.postBQSR.all_sites.vcf.gz