Data from: Postglacial ecotype formation under outcrossing and self-fertilization in Arabidopis lyrata

Lucek, Kay1; Hohmann, Nora1; Willi, Yvonne1

Published Jan 31, 2019 on Dryad. https://doi.org/10.5061/dryad.7501v9s

Data files

Jan 31, 2019 version files 307.75 MB

Genotypes_VCFs.zip

166.48 MB
LD_estimates_per_population_using_ldx.zip

141.05 MB
Outlier_SNPs.zip

57.76 KB
population_IDs.txt

175 B
Root_morphology_data.xlsx

153.41 KB

Abstract

The process of ecotype formation has been invoked as an important driver of postglacial biodiversity, because many species colonized heterogeneous habitats and subsequently experienced divergent selection. Ecotype formation has been predominantly studied in outcrossing taxa, while far less attention has been paid to the implications of mating system shifts. Here we studied the genomic footprint of ecotype formation in Arabidopsis lyrata subsp. lyrata. The species colonized both rocky and sandy substrates during its postglacial range expansion, while it also shifted the mating system from predominantly outcrossing to predominantly selfing in a number of regions. We performed an association study on pooled whole-genome re-sequence data of 20 populations, which suggested genes and gene ontology terms related to substrate adaptation. We validated results by comparing root growth between plants from the two substrates in a common environment and found that plants originating from sand – independent of mating system – grew roots faster and produced more side-roots, potentially as a response to water limitation in the wild. Furthermore, we found single nucleotide polymorphisms associated with substrate-related ecotypes to be more clustered among selfing populations, presumably due to higher genome-wide linkage disequilibrium. Overall we show that a shift to selfing could initially facilitate ecotype formation linked to substrate, likely because selfing reduces effective recombination.

LD estimate per population using LDx

LD estimates based on poolseq data for 20 populations using the program LDx https://sourceforge.net/p/ldx/wiki/Home/ The program is described here: Feder AF, Petrov DA, Bergland AO (2012) LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data. PLOS ONE 7(11): e48588. https://doi.org/10.1371/journal.pone.0048588 The output follows the data format of LDx and is for each column: 1) Location of SNP1 2) Location of SNP2 3) Number of pairs observed with x_11 4) Number of pairs observed with x_12 5) Number of pairs observed with x_21 6) Number of pairs observed with x_22 7) Estimate for allele frequency of allele A 8) Estimate for allele frequency of allele B 9) Read depth for SNP1 10) Read depth for SNP2 11) Intersecting read depth 12) Approx MLE R2 (low end of interval) 13) Approx MLE estimate 14) Approx MLE (high end of interval) 15) Direction Computation R2 16) allele A 17) allele a 18) allele B 19) allele b LDx was run for each population separately with the following settings: perl lds.pl -l 100 -h 500 -q 28 -i 5 -a 0.15

LD_estimates_per_population_using_ldx.zip

SNP genotypes (VCF) files

Contains three genotype files (VCF) using either all twelve outcrossing populations, all eight selfing populations or all twenty populations together. Each VCF file was filtered as described in the paper. All genotypes were used for the downstream GWAS analysis using the software BayPass.

Genotypes_VCFs.zip

Location of outlier SNPs for each GWAS analysis

Contains the location of the top 1% outlier SNPs based on BaysFactors averaged across ten independent GWAS analyses in the program BayPass. Provided are five files: i) Outliers using only outcrossing populations (outcrossing_only_outliers.txt), ii) outliers using only selfing populations (selfing_only_outliers.txt), iii) outliers using all 20 populations together (selfing_and_outcrossing_combined_outliers_using_all_populations.txt), iv) outliers using the combined dataset but only analyzing outcrossing populations (selfing_and_outcrossing_combined_outliers_using_only_outcrossing_populations.txt), v) outliers using the combined dataset but only analyzing selfing populations (selfing_and_outcrossing_combined_outliers_using_only_selfing_populations.txt). Each file has two columns – the first is the scaffold, the second is the SNP position.

Outlier_SNPs.zip

Root Morphology Data

Excel sheet containing the experimental data from a growth experiment on agar plates using individuals that originate from a rock or sand substrate. For each individual root growth and the number of primary side roots were counted through time. The columns describe the following: 1. Substrate where wild type individuals were collected (rock or sand) 2. ID for the replicate run [1 or 2] 3. Agar plate ID [A = Replicate 1, B = Replicate B] 4. Seed Family 5. Population of origin 6. Relative position on agar plate [from left to right] 7. Consecutive number for each measurement for each individual 8. Individual seed ID [Population + Seed Family + Replicate] 9. Population & seed family 10. Mating system [O - Outcrossing, S - Selfing] 11. Phylogenetic Cluster [E - East; W - West] 12. Date measurement was taken in 2017 13. Days since germination 14. Days since germination 15. Number of primary site roots

Root_morphology_data.xlsx

Population IDs

File providing the population IDs used in the paper (ID) and the IDs used for the data files (Data ID). The Data ID is consistent with the one used for the genomic data (BioProject: PRJEB19338)

population_IDs.txt