The process of ecotype formation has been invoked as an important driver of postglacial biodiversity, because many species colonized heterogeneous habitats and subsequently experienced divergent selection. Ecotype formation has been predominantly studied in outcrossing taxa, while far less attention has been paid to the implications of mating system shifts. Here we studied the genomic footprint of ecotype formation in Arabidopsis lyrata subsp. lyrata. The species colonized both rocky and sandy substrates during its postglacial range expansion, while it also shifted the mating system from predominantly outcrossing to predominantly selfing in a number of regions. We performed an association study on pooled whole-genome re-sequence data of 20 populations, which suggested genes and gene ontology terms related to substrate adaptation. We validated results by comparing root growth between plants from the two substrates in a common environment and found that plants originating from sand – independent of mating system – grew roots faster and produced more side-roots, potentially as a response to water limitation in the wild. Furthermore, we found single nucleotide polymorphisms associated with substrate-related ecotypes to be more clustered among selfing populations, presumably due to higher genome-wide linkage disequilibrium. Overall we show that a shift to selfing could initially facilitate ecotype formation linked to substrate, likely because selfing reduces effective recombination.
LD estimate per population using LDx
LD estimates based on poolseq data for 20 populations using the program LDx https://sourceforge.net/p/ldx/wiki/Home/
The program is described here:
Feder AF, Petrov DA, Bergland AO (2012) LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data. PLOS ONE 7(11): e48588. https://doi.org/10.1371/journal.pone.0048588
The output follows the data format of LDx and is for each column:
1) Location of SNP1
2) Location of SNP2
3) Number of pairs observed with x_11
4) Number of pairs observed with x_12
5) Number of pairs observed with x_21
6) Number of pairs observed with x_22
7) Estimate for allele frequency of allele A
8) Estimate for allele frequency of allele B
9) Read depth for SNP1
10) Read depth for SNP2
11) Intersecting read depth
12) Approx MLE R2 (low end of interval)
13) Approx MLE estimate
14) Approx MLE (high end of interval)
15) Direction Computation R2
16) allele A
17) allele a
18) allele B
19) allele b
LDx was run for each population separately with the following settings:
perl lds.pl -l 100 -h 500 -q 28 -i 5 -a 0.15
LD_estimates_per_population_using_ldx.zip
SNP genotypes (VCF) files
Contains three genotype files (VCF) using either all twelve outcrossing populations, all eight selfing populations or all twenty populations together. Each VCF file was filtered as described in the paper. All genotypes were used for the downstream GWAS analysis using the software BayPass.
Genotypes_VCFs.zip
Location of outlier SNPs for each GWAS analysis
Contains the location of the top 1% outlier SNPs based on BaysFactors averaged across ten independent GWAS analyses in the program BayPass. Provided are five files: i) Outliers using only outcrossing populations (outcrossing_only_outliers.txt), ii) outliers using only selfing populations (selfing_only_outliers.txt), iii) outliers using all 20 populations together (selfing_and_outcrossing_combined_outliers_using_all_populations.txt), iv) outliers using the combined dataset but only analyzing outcrossing populations (selfing_and_outcrossing_combined_outliers_using_only_outcrossing_populations.txt), v) outliers using the combined dataset but only analyzing selfing populations (selfing_and_outcrossing_combined_outliers_using_only_selfing_populations.txt). Each file has two columns – the first is the scaffold, the second is the SNP position.
Outlier_SNPs.zip
Root Morphology Data
Excel sheet containing the experimental data from a growth experiment on agar plates using individuals that originate from a rock or sand substrate. For each individual root growth and the number of primary side roots were counted through time. The columns describe the following:
1. Substrate where wild type individuals were collected (rock or sand)
2. ID for the replicate run [1 or 2]
3. Agar plate ID [A = Replicate 1, B = Replicate B]
4. Seed Family
5. Population of origin
6. Relative position on agar plate [from left to right]
7. Consecutive number for each measurement for each individual
8. Individual seed ID [Population + Seed Family + Replicate]
9. Population & seed family
10. Mating system [O - Outcrossing, S - Selfing]
11. Phylogenetic Cluster [E - East; W - West]
12. Date measurement was taken in 2017
13. Days since germination
14. Days since germination
15. Number of primary site roots
Root_morphology_data.xlsx
Population IDs
File providing the population IDs used in the paper (ID) and the IDs used for the data files (Data ID). The Data ID is consistent with the one used for the genomic data (BioProject: PRJEB19338)
population_IDs.txt