Soil and herbivory alters genomic adaptation to bumblebee pollination

Figueira, Tyler 1 ; Frachon, Lea2; Schiestl, Florian 1

Published Jul 10, 2025 on Dryad. https://doi.org/10.5061/dryad.280gb5n0x

Data files

Jul 10, 2025 version files 41.45 GB

archive.tar.gz

41.45 GB
README.md

11.42 KB

Abstract

Adaptation to environmental heterogeneity is a key driver of between-population genomic differentiation, yet we know very little about how genomic divergence is affected by adaptation to multiple ecological factors. Using an experimental evolution approach, we whole-genome resequenced ‘fast-cycling’ Brassica rapa plants which evolved in response to eight generations of selection from different combinations of soil type, herbivore presence (with aphid herbivory or without herbivory), and pollination mode (hand or bumblebee pollination). Our results show that bumblebee pollination was the strongest driver of genomic divergence and that the degree of genomic divergence was strongly associated with the number of SNP markers identified in genomic selection scans. Furthermore, we found that the number of SNPs under selection was affected by herbivory in a soil-dependent way. More specifically, aphid herbivory was associated with an increased number of selected SNPs for bee-pollinated plants that evolved in tuff soil, but was associated with a decreased number of selected SNPs for bee-pollinated plants that evolved in the more resource-limiting limestone soil. We also found that a higher number of selected SNPs was associated with higher rates of phenotypic evolution for 27 phenotypic traits including morphology and scent. Finally, we found that variation in pleiotropy between treatments was related to both the degree of genomic divergence and the number of SNPs under selection. Our results demonstrate that different soil types promote unique adaptive genomic architectures in response to biotic interactions, thus increasing genomic divergence between plant populations.

We used genomic and phenotypic data to better understand how ecological factors affect genomic divergence between three factors: soil, herbivory, and pollination.

The genomic sequences analyzed in this article are stored and accessible through the National Center for Biotechnology Information (NCBI) under https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1105729.

R scripts and metadata are stored on dryad: https://doi.org/10.5061/dryad.280gb5n0x

Description of the data and file structure

These scripts will recreate the analyses shown in the publication. The heading numbers are based on the material methods section. Within the folders, scripts are structured numerically. Thus, 1_ is first, 2_ is second, and so on. You may need to change the working directory to rerun the scripts.

The archive.tar.gz contains a combination of raw and metadata. Please see PLINK, EMMAX, and the manuscript or supplement for further documentation.

Treatment info

Soil: L = limestone soil
T = tuff soil
Herbivory: H = herbivory
NH = no-herbivory
Bee / Pollination: B = bee pollination
H = hand pollination

Headers:

2.4_pheno_evo - Creates phenotypic trait evolution analyses

file: emmax_pheno_data.tsv

Purpose: to run phenotypic genomic associations
comments on missing data: volatile data that is NA is because there was not enough volatile to register after the threshold, or because there were not enough flowers to measure volatiles. Other data that is missing is because plants did not produce traits on the day of measurement

id column description: ind_id = plant id,
morphological column descriptions: heightd22 and heightd34 = plant height at day 22 and 34 after sowing in cm,
leaf-size = (leaf width x leaf length)/2 in cm,
flowering time = days after sowing for first flower,
branches length = average length of branches in cm,
nbbranches = number of branches,
flowersperdaysincefloweringtime = number of flowers per day after starting flowering,
flowerdiameter = flower diameter in cm,
petal-length and petal-width = average measured in cm, sepale-length and style and stamen = measured in cm
floral volatiles column descriptions: all volatiles are measured in pg/flower/hour and natural log transformed

file: haldane_full_rep_soil_sep.tsv

purpose: to run model assessing haldane
column descriptions: trait = the trait for the haldane being calculated, haldane = the haldane calculated generation 10 - generation 1,
treatment = which treatment, rep = for which replicate

file: n_snps_clear_cmh_gwas_haldane_full_rep_soil_sep.ts

purpose: to run model assessing association between haldane and number of selected snps
column descriptions: trait = the trait for the haldane being calculated, avg_haldane = average haldane per treatment,
treatment = which treatment, n_snps = the number of selected snps identified in selection scans

file: shared_cmh_af_all_treatments_traits.tsv

purpose: shows the allele frequency, pvalue for snps, and associations with traits
column descriptions: chr = chromosome, pos = position on chromosome, cmh_fdr = fdr corrected pvalue of cmh tests,
G10a_af = the allele frequency of the last generation for the allele 1, G1a_af = the allele frequency of the first generation for the allele 1,
id = chromosome and pos combined or snp id, trait = phenotypic trait,
treatment = which treatment, af_diff = difference in allele frequency between generations 10 and 1,
trans_af_diff = transformed af_diff with arcsine square root

2.5.1_fst_pair - Fst files to run pairwise fst analyses

file: example : RA_G1_RB_G1_pop.windowed.weir.fst

File is pairwise fst between two populations. RA and RB are replicates (A or B). After the underscore is the population.
In this example, it is between both generation 1 populations.
column descriptions: CHROM = chromosome, BIN_START = start of 5000 bp windows, BIN_END = end of 5000 bp windows,
N_VARIANTS = number of snps within window, WEIGHTED_FST = mean fst weighted by number of variants,
MEAN_FST = mean fst calculated for window

file: example : RA_LLNHB_rep_pop

rep_pop files contain which plants are in the population and used for fst analyses

file: example : RA_fst_full_data.tsv

It is the aggregation of fst results for that replicate (A or B)

file: total_rep_sep_fst_full_data.tsv

It is the aggregation of the two full_data files above

file: fst_between_g1_and_all_treat.tsv

It has fst between g1 and that treatment for each replicate, and averaged over replicate (m column)
It is the aggregation of fst results for that replicate (A or B)

2.5.2 Create Fst model, and 1000 permutations

file: fst_rrpp_perm_boxcox_g10_only_trans.tsv

contains model results (F values) for each fixed effect and random effect

2.5.3 Runs selection scans for CMH tests and CLEAR

file: example : dr_RA_G1.frq

contains allele frequency for the population. Here is it is replicate RA (A) and population generation 1.
column descriptions: CHROM = chromosome, POS = bp position on chromosome,
N_ALLELES = number of alleles, N_CHR = number of chromosomes (copies essentially),
{ALLELE:FREQ} = The base and allele frequency of both alleles 1 and 2

file: clear_and_cmh_all_treat_pruned.tsv

contains cmh_fdr and CLEAR s to run model and assess significance for each SNP for each treatment
column descriptions: fdr = fdr adjusted pvalue, s = selection coefficient from CLEAR,
sig = if both fdr and s are above threshold used to assess significance

file: model_both_clear_cmh_num_snps_soil_sep.tsv

model results from above file with mean model estimate, lower and upper confidence intervals and actual number of snps. Treatment factors are listed
column descriptions: mean = mean model estimate, se = standard error,
full_group = numbers indicating pairwise significance between all treatments,
soil_group = numbers indicating pairwise significance between all treatments in same soil,
lcl and ucl = 95% confidence intervals

file: shared_snp_exact_pos_mat.csv

matrix indicating the number of shared selected snps between treatments

file: example : subsetG1_A.prune.in

contains the snps which after pruning are used for testing, done to account for linkage disequilibrium

files: example: subset_LHBcmh_fdr_sig_updated_ne_pruned_prior.tsv

Contains the allele frequencies and pvalues for that specific treatment for the cmh test
G10 is generation 10
G10a is generation 10 replicate A
af = allele frequency, base = nucleic base

files: avg_ne.csv

contains effective population size to run clear and cmh tests
column descriptions: V1 = effective population size, rounded_ne = ne rounded up

2.5.4 Creates genomic PCAs

We ran two PCAs 1) full data set and 2) on selected snps only
.eigenval suffix contains eigenvals for the PCA.
.eigenvec contains the PCA values for each individual plant all the way to PC10

2.5.5 Runs Genome wide assocation studies (GWAS)

.bed, .tfam, .tped, are all plink generated files to get genotype data, please see plink for more information
suffix .aIBS.kinf files are kinship matrices generated in emmax. Please see emmax for better documentation.

The data subdirectories (LHB, LHH) contain the gwas files for each treatment and for each phenotypic trait.
the files with the suffix: emmax.reml contain the emmax maximum likelihood estimation, please see emmax for better documentation.

files: example: LHB_Sepale-length_emmax_localscore_full.tsv

contains the phenotypic association for that chromosome and position and if it is a significant zone
column descriptions: beta = the effect size of the estimated association,
lindley and L and score, and cor and th = used in local score to assess significant zones,
mahattan = is the bp when plotting, col = color when plotting, sz_filter = is the snp in a significant zone

files: example: LHB_Petal-width_emmax_fdr_sig.tsv

contains the phenotypic association for that chromosome and position and only if it is a significant zone

files: example: LHB_Stamen_emmax_only_sigzones_1kb_range.tsv

contains the significant zone for the phenotypic association plus/minus 1kb
column descriptions: beg and end = start and end of significant zone, peak = how tall is the peak in the zone,
QTL_length = length of zone, minus_kb and plus_kb = start and end of zone plus minus 1000,
zone_id = chr and end pasted to get a zone name

files: example: LHB_Style_emmax_sig_snps_sigzones.tsv

contains all the significant snps within a significant zone

files: example: LHB_pleio_table_sigzone_only.tsv

contains the number of snps exhibiting pleiotropy for that treatment and level of pleiotropy

files: example: LHB_shared_modules_sigzone_only.tsv

contains the modules for that treatment
column descriptions: len = number of traits in module, trait_vector = which traits are in the module, treat = treatment

files: sup_mod_list_sigzone_only.csv

aggregation of file directly 1 above

files: example: LHB_three_modularity_data_frame_sigzone_only.tsv

contains the modules for that treatment assessed with three methods, please see manuscript for further details
Numbers tell which module number the trait is for that membership algorithm

files: example: LHBid_trait_pleio_table_sigzone_only.tsv

contains the zones which exhibit pleiotropy and for which traits

files: example: _phen_emmax.tsv

contains the phenotypes for that specific treatment, see above,

files: emmax_phenotypic_data_thomas.csv

contains the phenotypes for that specific treatment, see above,

files: trait_class.tsv

trait_class = whether trait is morphological or floral volatile,

files: emmax_estimates_all_treat_trait.tsv

has the variance calculated by emmax for each treatment and trait, please see emmax for better documentation
column descriptions: trait_class = whether trait is morphological or floral volatile,
va = additive variance, ve = environmental variance, va_ve = ratio of va/ve, pseudo_h = psuedo heritability

files: network_size_treatments_sigzone_only.tsv

contains the network size generated from pleiotropy

files: pleio_index_sigzone_only.csv

contains the pleiotropy index for each treatment, please see manuscript for calculations

files: shared_snps_between_traits_sigzone_only.tsv

for each treatment and trait pair, it contains the number of shared snps associated with both traits

files: treat_mod_mem_sigzone_only.tsv

contains matrix of which traits are in which modules

2.5.6_gene_ont

files: GCF_000309985.2_CAAS_Brap_v3.01_genomic_converted.txt

contains which candidate genes were annotated to the region of the genome
column descriptions: type = whether it is a gene, exon or mRNA among others, gene = gene name,
product and biotype = function of the gene

files: transcripts_go_exp6879.txt

contains go terms of genes
column descriptions: go = goterm,

files: shared_clear_cmh_gwas_all_treatments_traits.tsv

contains whether the genomic region is significant and associated with phenotypic traits

Sharing/Access information

The raw phenotypic and genomic reads data used in this publication is published in a previous study.
Genomic metadata from this publication is generated for this publication.

Soil and herbivory alters genomic adaptation to bumblebee pollination

Data files

Abstract

README: Data for: Soil and Herbivory Alters Genomic Adaptation to Bumblebee Pollination

Description of the data and file structure

Treatment info

Headers:

2.4_pheno_evo - Creates phenotypic trait evolution analyses

file: emmax_pheno_data.tsv

file: haldane_full_rep_soil_sep.tsv

file: n_snps_clear_cmh_gwas_haldane_full_rep_soil_sep.ts

file: shared_cmh_af_all_treatments_traits.tsv

2.5.1_fst_pair - Fst files to run pairwise fst analyses

file: example : RA_G1_RB_G1_pop.windowed.weir.fst

file: example : RA_LLNHB_rep_pop

file: example : RA_fst_full_data.tsv

file: total_rep_sep_fst_full_data.tsv

file: fst_between_g1_and_all_treat.tsv

2.5.2 Create Fst model, and 1000 permutations

file: fst_rrpp_perm_boxcox_g10_only_trans.tsv

2.5.3 Runs selection scans for CMH tests and CLEAR

file: example : dr_RA_G1.frq

file: clear_and_cmh_all_treat_pruned.tsv

file: model_both_clear_cmh_num_snps_soil_sep.tsv

file: shared_snp_exact_pos_mat.csv

file: example : subsetG1_A.prune.in

files: example: subset_LHBcmh_fdr_sig_updated_ne_pruned_prior.tsv

files: avg_ne.csv

2.5.4 Creates genomic PCAs

2.5.5 Runs Genome wide assocation studies (GWAS)

files: example: LHB_Sepale-length_emmax_localscore_full.tsv

files: example: LHB_Petal-width_emmax_fdr_sig.tsv

files: example: LHB_Stamen_emmax_only_sigzones_1kb_range.tsv

files: example: LHB_Style_emmax_sig_snps_sigzones.tsv

files: example: LHB_pleio_table_sigzone_only.tsv

files: example: LHB_shared_modules_sigzone_only.tsv

files: sup_mod_list_sigzone_only.csv

files: example: LHB_three_modularity_data_frame_sigzone_only.tsv

files: example: LHBid_trait_pleio_table_sigzone_only.tsv

files: example: _phen_emmax.tsv

files: emmax_phenotypic_data_thomas.csv

files: trait_class.tsv

files: emmax_estimates_all_treat_trait.tsv

files: network_size_treatments_sigzone_only.tsv

files: pleio_index_sigzone_only.csv

files: shared_snps_between_traits_sigzone_only.tsv

files: treat_mod_mem_sigzone_only.tsv

2.5.6_gene_ont

files: GCF_000309985.2_CAAS_Brap_v3.01_genomic_converted.txt

files: transcripts_go_exp6879.txt

files: shared_clear_cmh_gwas_all_treatments_traits.tsv

Sharing/Access information