Data from: Repeatability of adaptation in sunflowers reveals that genomic regions harbouring inversions also drive adaptation in species lacking an inversion
Data files
Nov 13, 2023 version files 1.24 GB
Abstract
Local adaptation commonly involves alleles of large effect, which experience fitness advantages when in positive linkage disequilibrium (LD). Because segregating inversions suppress recombination and facilitate the maintenance of LD between locally adapted loci, they are also commonly found to be associated with adaptive divergence. However, it is unclear what fraction of an adaptive response can be attributed to inversions and alleles of large effect, and whether the loci within an inversion could still drive adaptation in the absence of its recombination-suppressing effect. Here, we use genome-wide association studies to explore patterns of local adaptation in three species of sunflower: Helianthus annuus, H. argophyllus, and H. petiolaris, which each harbour a large number of species-specific inversions. We find evidence of significant genome-wide repeatability in signatures of association to phenotypes and environments, which are particularly enriched within regions of the genome harbouring an inversion in one species. This shows that while inversions may facilitate local adaptation, at least some of the loci can still harbour mutations that make substantial contributions without the benefit of recombination suppression in species lacking a segregating inversion. While a large number of genomic regions show evidence of repeated adaptation, most of the strongest signatures of association still tend to be species-specific, indicating substantial genotypic redundancy for local adaptation in these species.
README: Repeatability of adaptation in sunflowers reveals that genomic regions harbouring inversions also drive adaptation in species lacking an inversion
This archive contains the intermediate data files and scripts required to create the main and supplementary figures for the paper titled "Repeatability of adaptation in sunflowers: genomic regions harbouring inversions also drive adaptation in species lacking an inversion"
The folder called "scripts" contains the scripts.
The folder called "data_intermediate_files" contains the data necessary to run the scripts.
The folder called "results" includes some of the outputs from different results, including lists of the top windows identified by PicMin.
Versions & Packages
All R scripts are run in R v4.1.2
Scripts use the following packages:
RColorBrewer
gplots
wesanderson
ggplot2
ggdist
data.table
dplyr
ggridges
poolr
pheatmap
GenomicRanges
magrittr
biomaRt
topGO
enrichplot
ggpubr
stringr
foreach
doParallel
Two scripts for processing VCFs in PERL are also included (vcf2vertical_sunflower_modified.pl & snp_coverage.pl)
NA values in datasets:
- for all scripts running analyses on the matrices of p-values from the genome-wide association tests (with environment and phenotype), NA values indicate a place where either there was no phenotype/environment measured in that species.
- for all scripts running analyses on the matrices of top candidate scores, NAs occur when either the above condition (1.) occurs or when there are not enough SNPs in a window to calculate a score.
SCRIPTS
The following scripts make the figures indicated in their names:
- figure3B_S15_plot_cscores_figure_6way_FIXED_NFFD.R
- figure3A_dbinom_clusters_NFFD15_bothdir.R
- figure4_S3_S13_S14_process_MAX_SIPEC_CRAs_FIGURE_PCA_pairwise.R
- figure5_plot_top_candidate_overlap_inversions_per_species.R
- figureS15CD_process_dbinom_clusters_chyper_find0_6way_FP_any_eref.R
- figureS2_plot_violins_sunflower.R
- figureS5_S6_process_phe_env_corr_topcand_introgression.R
- figureS12_introgression_SIPEC_CRAs.R
- figures4.R
- Figure_S7_A_B_twosided_barplot_convergent_clusters_overlapes_inversions.R
- Figure_S7_C_D_hetmaps_convergent_clusters_overlapes_inversions.R
- Figure_S9_10_GO_Enrichment.R
- These scripts are used for pre-processing, necessary to run the above files:
- process_dbinom_clusters_picmin_nopetfal_baypass_fixed.R: runs PicMin on the Baypass results, excluding petiolaris fallax and comparing the other 3 taxa
- process_dbinom_clusters_picmin_nopetfal_fixed.R: runs PicMin on the data, excluding petiolaris fallax and comparing the other 3 taxa
- process_dbinom_clusters_picmin_nopetpet_baypass_fixed.R: runs PicMin on the Baypass results, excluding petiolaris petiolaris and comparing the other 3 taxa
- process_dbinom_clusters_picmin_nopetpet_fixed.R: runs PicMin on the data, excluding petiolaris petiolaris and comparing the other 3 taxa
- process_picmin_results_by_genomic_region_across_vars.R: Runs clustering statistics on the PicMin results, collapsing nearby windows together for each region (across variables)
- process_picmin_results_per_variable.R: Runs clustering statistics on the PicMin results, collapsing nearby windows together for each variable
- process_top_candidate_overlap_inversions_per_species.R: Processes the inversion overlap, used in figure 5.
- pull_annotations_sunflower.R: This pulls annotations for genes near each of the significant PicMin results
Data
The following data files are included in the archive, with descriptions of their contents. Unfortunately due to the main authors not being available to refine the documentation, not all meanings of variables are known to SY, who attempted to document this.
FigureS4data.csv
- species: species
- recombination_bin: which recombination bin
- Values: local recombination rate
- logValues: log transform of column 3
GWAScorrected_nullw_result_recom_bin
- Taxa1: focal species in comparison
- Taxa2: alternate species in comparison
- Variable: environmental variable
- quantile: recombination quantile bin
- Window: window chromosome:position
- W: W statistic
- P: p value
- Mean_test: unknown
- N_samples: number of SNPs
- Mean_BG: unknown
- N_BG: number of SNPs in background window
- Count: unknown
- Pred_P: Z-score
- Emp_p: empirical p-value (does it fall in the top 5% of the tail?)
- window_name: window ID
H.annuus_corr_env_pheno_p1_91_cleaned.txt
- phenotype: phenotype
- environment: environment
- H.annuus: correlation between phenotype and environment
H.argophyllus_corr_env_pheno_output_p1_31_cleaned.txt
- phenotype: phenotype
- environment: environment
- H.argophyllus: correlation between phenotype and environment
H.pet.fall_corr_env_pheno_output_p1_71_cleaned.txt
- phenotype: phenotype
- environment: environment
- H.pet.fall: correlation between phenotype and environment
H.pet.pet_corr_env_pheno_output_p1_71_cleaned.txt
- phenotype: phenotype
- environment: environment
- H.pet.pet: correlation between phenotype and environment
HAN412_Eugene_curated_v1_1.gff3
Standard gff3 format with columns:
- Chromosome
- Eugene
- Gene/mRNA/CDS/exon
- start position
- end position
- .
- orientation
- .
- annotation information
Ha412HO_inv.v3.inversions.regions.v1.txt
- group: grouping
- start: start position
- end: end position
- chr: chromosome
- spe: species
- species: species (alternate naming)
- dataset: which dataset identified the putative inversion
- mds: direction
Hannuus_Athaliana_shared_OG.rds
- Orthogroup: orthogroup
- OF_gene: gene ID
- gea_gene: position of gene
- Athal_genes: Arabidopsis thaliana annotations
SNP_count.csv
- window5: 5kb window chromosome/position
- window: unique identifier
- Annuus: number of SNPs in Annuus
- Argophyllus: number of SNPs in Argophyllus
- petpet: number of SNPs in H. pet pet
- petfal: number of SNPs in H. pet fal
- Annuus_Argophyllus: number of SNPs in both Annuus and Argophyllus
- Annuus_petpet: number of SNPs in both Annuus and Argophyllus
- Annuus_petfal: number of SNPs in both Annuus and Argophyllus
- Argophyllus_petpet: number of SNPs in both Argophyllus and petpet
- Argophyllus_petfal: number of SNPs in both Argophyllus and petfal
- petpet_petfal: number of SNPs in both petpet and petfal
all_res_cscore_FP_anygene.txt
- V1: code for comparison among species (1 = H.annuus-H.argophyllus, 2 = H.annuus-H.pet.fallax, 3 = H.annuus-H.pet.petiolaris, 4 = H.argophyllus-H.pet.fallax, 5 = H.argophyllus-H.pet.petiolaris, 6 = H.pat.fallax-H.pet.petiolaris)
- V2: Environmental variable
- V3: code for comparison among species (1 = H.annuus-H.argophyllus, 2 = H.annuus-H.pet.fallax, 3 = H.annuus-H.pet.petiolaris, 4 = H.argophyllus-H.pet.fallax, 5 = H.argophyllus-H.pet.petiolaris, 6 = H.pat.fallax-H.pet.petiolaris)
- V4:V20: the inferred number of genes that would give a c-score of 0 given observed overlap between the pair of species, assuming different false positive rates (each column with a false positive rate ranging from 0 to 0.8 in increments of 0.05).
all_windows_annot.txt
- V1: top candidate index for annuus
- V2: top candidate index for argophyllus
- V3: top candidate index for petpet
- V4: top candidate index for petfal
- V5: chromosome
- V6: start position
- V7: end position
- V8: Window ID
- V9: Environmental variable
- V10: picmin p-value
- q: FDR adjusted picmin p-value
- inversion_overlap (0 = no, 1 = yes)
- recomb_rate: recombination rate
- recomb_quantile: bin for recombination rate (1-5)
- merg1[,1] chromosome and position
- annot: annotation associated with the window
- nearby_annot: nearest annotation
- nearby_annot_dist: distance in bp to nearest annotation
- nearby_gff_dist: distance to closest gff annotation
- relpos: integer (meaningless, used to check for bugs)
convergent_cluster_inversion_overlap_LD0.9_1cM_ranges_verbose
- data: which type of data is this variable
- analysis: is it structure corrected (baypass = yes, spearman = no, gwas_uncorrected = no, gwas_corrected = yes)
- variable: environmental variable/phenotype
- direction: focal/alternate species to call WRAs
- comparison: focal/alternate species to call WRAs
- chromosome: chromosome
- cluster_start: start position
- cluster_end: end position
- cluster_size: size of cluster
- N_convergent: Number of WRAs
- inversion_species: which species was the inversion identified
- inversion_ID: ID for inversion
- inversion_start: start position of inversion
- inversion_end: end position of inversion
- inversion_size: size of inversion
- overlap_start: region of overlap start
- overlap_end: region of overlap end
- overlap_size: region of overlap size
- is_cluster_overlapping: yes/no
convergent_clustering_0.9_1_cM_summary
- data: input data type (climate, soil, phenotype)
- analysis: type of analysis (baypass, spearmans, gwas_corrected, gwas_uncorrected)
- variable: environmental variable
- direction: focal and alternate species
- chromosome: chromosome
- range: start:end position
- size: size of region
- N_convergent: number of WRAs in cluster
convergent_clustering_0.9_1cM_summary
NOTE: this file is used to collapse adjacent windows into clusters for C-scores analysis. Not used in other aspects of CRAs.
- V1: species
- V2: analysis (spearman, gwas-corrected)
- V3: environmental variable
- V4: chromosome
- V5: start:end position
- V6: window size
- V7: number of windows
dbinom_score_spearman_Eref
- Annuus: top candidate index for Annuus
- Argophyllus: top candidate index for Argophyllus
- petfal: top candidate index for H. pet fal
- petpet: top candidate index for H. pet pet
- chrom: chromosome
- start_window: window start
- end_window: window stop
- window_id: window ID
- variable: environmental variable
dbinom_score_spearman_NFFD
- Annuus: top candidate index for Annuus
- Argophyllus: top candidate index for Argophyllus
- petfal: top candidate index for H. pet fal
- petpet: top candidate index for H. pet pet
- chrom: chromosome
- start_window: window start
- end_window: window stop
- window_id: window ID
- variable: environmental variable
dbinom_scores
NOTE: this directory contains many files all with the same format of columns, with their filenames indicating the environmental variable and the kind of analysis used (baypass = structure corrected, spearman = non-corrected). The top candidate index values have been -1 * log10 transformed. NA's correspond to missing data due to lack of sequence information.
- Annuus: top candidate index for Annuus
- Argophyllus: top candidate index for Argophyllus
- petfal: top candidate index for H. pet fal
- petpet: top candidate index for H. pet pet
- chrom: chromosome
- start_window: window start
- end_window: window stop
- window_id: window ID
- variable: environmental variable
fixmin_tophits_alphatop1000_all_tested_windows.txt
NOTE: lists all of the windows that were tested in picmin
- V4: chromosome
- V5: start position
- V6: end position
- V7: window ID
fixmin_tophits_alphatop1000_recomb_FDR10_3way_pervar.txt
NOTE: This lists the clusters of adjacent windows with significant picmin hits
- cluster_chrom: chromosome of cluster of significant picmin hits
- cluster_start: cluster start position
- cluster_end: cluster end position
- cluster_var: environmental variable
- cluster_inv: is it overlapping an inversion
- cluster_numwin: how many windows are in the cluster
- clus1: mean empirical p for first species
- clus2: mean empirical p for second species
- clus3: mean empirical p for third species
- clus_recomb: average recombination rate for cluster
ha412_chrom_lengths.txt
- chrom: chromosome
- length: length in base pairs
introgression_standing_var_5window.txt
- window5: 5kb window chromosome/position
- window: unique identifier
- Annuus: number of SNPs in Annuus
- Argophyllus: number of SNPs in Argophyllus
- petpet: number of SNPs in H. pet pet
- petfal: number of SNPs in H. pet fal
- Annuus_Argophyllus: number of SNPs in both Annuus and Argophyllus
- Annuus_petpet: number of SNPs in both Annuus and Argophyllus
- Annuus_petfal: number of SNPs in both Annuus and Argophyllus
- Argophyllus_petpet: number of SNPs in both Argophyllus and petpet
- Argophyllus_petfal: number of SNPs in both Argophyllus and petfal
- petpet_petfal: number of SNPs in both petpet and petfal
- chrom: chromsome
- start: position
- end: end
- c_Annuus_Argophyllus: c-score for the overlap between these species in SNP counts
- c_Annuus_petfal: c-score for the overlap between these species in SNP counts
- c_Annuus_petpet: c-score for the overlap between these species in SNP counts
- c_Argophyllus_petfal: c-score for the overlap between these species in SNP counts
- c_Argophyllus_petpet: c-score for the overlap between these species in SNP counts
- c_petfal_petpe: c-score for the overlap between these speciest in SNP counts
overlap_heatmaps_nocluster_6way_0.995_include_inversion_FIXED0.txt
NOTE: includes inversion regions, no clustering of WRAs into CRAs
- variable: environmental variable
- pairwise_code: code for comparison among species (1 = H.annuus-H.argophyllus, 2 = H.annuus-H.pet.fallax, 3 = H.annuus-H.pet.petiolaris, 4 = H.argophyllus-H.pet.fallax, 5 = H.argophyllus-H.pet.petiolaris, 6 = H.pat.fallax-H.pet.petiolaris)
- number_rows: baseline estimate of overlap (add subsequent values to this)
- cut2: minimum cutoff for top candidate index to consider a window as "adapted" (min = 2)
- cut3: minimum cutoff for top candidate index to consider a window as "adapted" (min = 3)
- cut4: minimum cutoff for top candidate index to consider a window as "adapted" (min = 4)
- cut5: minimum cutoff for top candidate index to consider a window as "adapted" (min = 5)
- cut6: minimum cutoff for top candidate index to consider a window as "adapted" (min = 6)
- cut7: minimum cutoff for top candidate index to consider a window as "adapted" (min = 7)
- cut8: minimum cutoff for top candidate index to consider a window as "adapted" (min = 8)
- cut9: minimum cutoff for top candidate index to consider a window as "adapted" (min = 9)
- cut10: minimum cutoff for top candidate index to consider a window as "adapted" (min = 10)
overlap_heatmaps_nocluster_othercorrect_6way_0.995_include_inversion_FIXED0.txt
NOTE: includes inversion regions, run on WRAs without clustering, and includes population structure correction results on environment
- variable: environmental variable
- pairwise_code: code for comparison among species (1 = H.annuus-H.argophyllus, 2 = H.annuus-H.pet.fallax, 3 = H.annuus-H.pet.petiolaris, 4 = H.argophyllus-H.pet.fallax, 5 = H.argophyllus-H.pet.petiolaris, 6 = H.pat.fallax-H.pet.petiolaris)
- number_rows: baseline estimate of overlap (add subsequent values to this)
- cut2: minimum cutoff for top candidate index to consider a window as "adapted" (min = 2)
- cut3: minimum cutoff for top candidate index to consider a window as "adapted" (min = 3)
- cut4: minimum cutoff for top candidate index to consider a window as "adapted" (min = 4)
- cut5: minimum cutoff for top candidate index to consider a window as "adapted" (min = 5)
- cut6: minimum cutoff for top candidate index to consider a window as "adapted" (min = 6)
- cut7: minimum cutoff for top candidate index to consider a window as "adapted" (min = 7)
- cut8: minimum cutoff for top candidate index to consider a window as "adapted" (min = 8)
- cut9: minimum cutoff for top candidate index to consider a window as "adapted" (min = 9)
- cut10: minimum cutoff for top candidate index to consider a window as "adapted" (min = 10)
overlap_heatmaps_results_6way_0.995_include_inversion_FIXED0.txt
NOTE: includes inversion regions, collapses adjacent WRAs into clusters
- variable: environmental variable
- pairwise_code: code for comparison among species (1 = H.annuus-H.argophyllus, 2 = H.annuus-H.pet.fallax, 3 = H.annuus-H.pet.petiolaris, 4 = H.argophyllus-H.pet.fallax, 5 = H.argophyllus-H.pet.petiolaris, 6 = H.pat.fallax-H.pet.petiolaris)
- number_rows: baseline estimate of overlap (add subsequent values to this)
- cut2: minimum cutoff for top candidate index to consider a window as "adapted" (min = 2)
- cut3: minimum cutoff for top candidate index to consider a window as "adapted" (min = 3)
- cut4: minimum cutoff for top candidate index to consider a window as "adapted" (min = 4)
- cut5: minimum cutoff for top candidate index to consider a window as "adapted" (min = 5)
- cut6: minimum cutoff for top candidate index to consider a window as "adapted" (min = 6)
- cut7: minimum cutoff for top candidate index to consider a window as "adapted" (min = 7)
- cut8: minimum cutoff for top candidate index to consider a window as "adapted" (min = 8)
- cut9: minimum cutoff for top candidate index to consider a window as "adapted" (min = 9)
- cut10: minimum cutoff for top candidate index to consider a window as "adapted" (min = 10)
annuus_phenotypes.csv
argophyllus_phenotypes.csv
petioaris_fallax_phenotypes.csv
petiolaris_petiolaris_phenotypes.csv
NOTE: For all four of these files, the phenotypes are as shown below:
- Plant ID: plant unique ID
- Genotype ID: unique genotype ID
- TLN: Total leave number
- LIR: Leaf initiation rate
- Days to budding: days to budding at anthesis
- DTF: Days to Flower
- Stem diamater at flowering: Primary stem diameter at anthesis
- Plant height at flowering: Total length of the main stem
- Internode length: Average length of an internode
- Stem diameter final before 1st node: Diameter of the stem base of fully developed plants
- Stem diameter final after 5th node: Diameter of the stem of fully developed plants
- Distance of first branching from ground: Distance between ground and first node
- Primary branches: Number of branches on the main stem
- SLA: Specific leaf area (mm^2)
- Leaf total N: Amount of nitrogen in leaf tissue
- Leaf total C: Amount of carbon in leaf tissue
- Leaf C N ratio: Ratio between carbon and nitrogen content in leaf tissue
- Disk diameter: Diameter of the inflorescence disk
- Ligule length: Length of individual ligules
- Ligule width: Maximum width of individual ligules
- Flower head diameter: Diameter of the inflorescence
- Ligule LW ratio: Ratio between ligule length and maximum width
- Flower FHDD ratio: Ratio between Inflorescence and disk diameters
- Ligules number: Average number of ligule per Inflorescence
- Stem colour: Presence and intensity of purple colour on main stem (0-4)
- Petiole main veins colour: Presence and intensity of purple colour on main leaf veins (0-4)
- Darker axillae: Presence of purple colour on leaf axillae (presence/absence)
- Leaf perimeter: Perimeter of an individual leaf
- Leaf area: Area of an individual leaf
- Leaf width mid-height: Width measured at ½ of the leaf’s length
- Leaf maximum width: Maximum width of the leaf
- Leaf height mid-width: Length measured at ½ of the leaf’s width
- Leaf maximum height: Maximum length of the leaf
- Leaf curved height: Length measured along a curved line through the leaf (equidistant from both leaf borders)
- Leaf curvedHeight maxWidth: Ratio between curved length and maximum leaf width
- Trichomes length: Length of non-glandular trichomes on abaxial side of leaves
- Trichomes density leaf edge flat area: Density of trichomes outside secondary veins near the leaf edge
- Trichomes density leaf edge secondary veins: Density of trichomes on secondary veins near the leaf edge
- Trichomes density leaf center flat area: Density of trichomes outside secondary veins near the leaf main vein
- Trichomes density leaf center secondary veins: Density of trichomes on secondary veins near the leaf main vein
- Trichomes density edge average: Density of trichomes near the leaf edge
- Trichomes density center average: Density of trichomes near the leaf main vein
- Trichomes density nonvein average: Density of trichomes outside secondary veins
- Trichomes density vein average: Density of trichomes on secondary veins
- Leaf shape index external I: The ratio of the Maximum Height to Maximum Width
- Leaf shape index external II: The ratio of Height Midwidth to Width Mid-height
- Leaf curved shape index: The ratio of Curved Height to the width of the leaf at mid-curved-height, as measured perpendicular to the curved height line
- Leaf ellipsoid: The ratio of the error resulting from a best-fit ellipse to the area of the leaf. Error is the average magnitude of residuals (Res) along the leaf’s perimeter, divided by the length of the major (longer) axis of the ellipse. Smaller values indicate that the leaf is more ellipsoid
- Leaf circular: The ratio of the error resulting from a best-fit circle to the area of the leaf. Error is the average magnitude of residuals (Res) along the leaf’s perimeter, divided by the radius of the circle. Smaller values indicate that the leaf is more circular
- Leaf rectangular: The ratio of the area of the rectangle bounding the leaf to the area of the rectangle bounded by the leaf
- Leaf obovoid: Obovoid is calculated from the maximum width (W), the height at which the maximum width occurs (y), the average width above that height (w1), and the average width below that height (w2), and a scaling function scale_ob as: Obovoid = 1/2 * scale_ob(y) * (1 – w1/W + w2/W) If Obovoid > 0, subtract 0.4. Otherwise, Obovoid is 0
- Leaf width widest pos: The ratio of the height at which the maximum width occurs to the Maximum Height
- Leaf eccentricity: The ratio of the height of the internal ellipse to the Maximum Height
- Leaf proximal eccentricity: The ratio of the height of the internal ellipse to the distance between the bottom of the ellipse and the top of the leaf
- Leaf Distal eccentricity: The ratio of the height of the internal ellipse to the distance between the top of the ellipse and the bottom of the leaf
- Leaf shape index internal: The ratio of the internal ellipse’s height to its width
- Leaf eccentricity area index: The ratio of the area of the leaf outside the ellipse to the total area of the leaf
- Total RGB: Sum of RGB values for leaf colour
- RGB proportion green: Proportion of green in average leaf colour
- RGB proportion red: Proportion of red in average leaf colour
- RGB proportion blue: Proportion of blue in average leaf colour
- Phyllaries diameter: Diameter of phyllaries whorl
- Phyllaries length: Length of individual phyllaries
- Phyllaries width: Maximum width of individual phyllaries
- Phyllaries LW ratio: Ratio between phyllaries length and maximum width
- Flowerhead to phyllaries diameter ratio: Ratio between flower head and phyllaries whorl diameters
- Disk to phyllaries diameter ratio: Ratio between flower disk and phyllaries whorl diameters
- Seed perimeter: Perimeter of an individual seed
- Seed area: Area of an individual seed
- Seed width mid height: Width measured at ½ of the seed’s length
- Seed maximum width: Maximum width of the seed
- Seed height mid width: Length measured at ½ of the seed’s width
- Seed maximum height: Maximum length of the seed
- Seed curved height: Length measured along a curved line through the seed (equidistant from both seed borders)
- Seed HW ratio: Ratio between curved length and maximum seed width
- Seed shape index external I: The ratio of the Maximum Height to Maximum Width
- Seed shape index external II: The ratio of Height Midwidth to Width Mid-height
- Seed curved shape index: The ratio of Curved Height to the width of the seed at mid-curved-height, as measured perpendicular to the curved height line
- Seed ellipsoid: The ratio of the error resulting from a best-fit ellipse to the area of the seed. Error is the average magnitude of residuals (Res) along the seed’s perimeter, divided by the length of the major (longer) axis of the ellipse. Smaller values indicate that the seed is more ellipsoid
- Seed circular: The ratio of the error resulting from a best-fit circle to the area of the seed. Error is the average magnitude of residuals (Res) along the seed’s perimeter, divided by the radius of the circle. Smaller values indicate that the seed is more circular
- Seed rectangular: The ratio of the area of the rectangle bounding the seed to the area of the rectangle bounded by the seed
- Seed ovoid: Ovoid is calculated from the maximum width (W), the height at which the maximum width occurs (y), the average width above that height (w1), and the average width below that height (w2), and a scaling function scale_ov as: Ovoid = 1/2 * scale_ov(y) * (1 – w2/W + w1/W). If Ovoid > 0, subtract 0.4. Otherwise, Ovoid is 0
- Seed width widest pos: The ratio of the height at which the maximum width occurs to the Maximum Height
- Seed eccentricity: The ratio of the height of the internal ellipse to the Maximum Height
- Seed proximal eccentricity: The ratio of the height of the internal ellipse to the distance between the bottom of the ellipse and the top of the seed
- Seed distal eccentricity: The ratio of the height of the internal ellipse to the distance between the top of the ellipse and the bottom of the seed
- Seed shape index internal: The ratio of the internal ellipse’s height to its width
- Seed eccentricity area index: The ratio of the area of the seed outside the ellipse to the total area of the seed
recombination_bins.txt
- window: chromosome + position
- recomb_rate: estimated rate of recombination
- recomb_quantile: recombination bin (out of 5) with percentile range
results_WRAs_overlap_inversions_500reps_per_species.txt
- V1: species 1
- V2: species 2
- V3: environmental variable
- V4: how many windows that are non-WRAs are overlapping an inversion
- V5: how many windows that are non-WRAs are not overlapping an inversion
- V6: how many windows that are WRAs are overlapping an inversion
- V7: how many windows that are WRAs are not overlapping an inversion
- V8: what proportion of permuted nulls had a greater chi-square statistic than observed?
- V9: proportion of non-WRAs that overlap an inversion out of all non-WRAs
- V10: proportion of WRAs that overlap an inversion out of all WRAs
spearman_nullw_result_recom_bin
- Taxa1: focal species in comparison
- Taxa2: alternate species in comparison
- Variable: environmental variable
- quantile: recombination quantile bin
- Window: window chromosome:position
- W: W statistic
- P: p value
- Mean_test: unknown
- N_samples: number of SNPs
- Mean_BG: unknown
- N_BG: number of SNPs in background window
- Count: unknown
- Pred_P: Z-score
- Emp_p: empirical p-value (does it fall in the top 5% of the tail?)
- window_name: window ID
sunflower_environments_clipped.csv
- Population ID: unique ID for each population
- Individuals: unique ID for each individual
- Taxon: species/subspecies
- Latitude: Latutidue
- Longitude: Longitude
- Elevation: Elevation
- MAT: mean annual temperature (°C)
- MWMT: mean warmest month temperature (°C)
- MCMT: mean coldest month temperature (°C)
- TD: Continentality, temperature difference between TD MWMT and MCMT (°C)
- MAP: Mean annual precipitation (mm)
- MSP: May to September precipitation (mm)
- AHM: Annual heat-moisture index (MAT+10)/(MAP/1000))
- SHM: Summer heat-moisture index ((MWMT)/(MSP/1000))
- DD_0: Degree-days below 0°C, chilling degree-days (°C)
- DD5: Degree-days above 5°C, growing degree-days
- DD_18: Degree-days below 18°C
- DD18: Degree-days above 18°C
- NFFD: Number of frost-free days
- bFFP: The day of the year on which FFP begins
- eFFP: The day of the year on which FFP ends
- FFP: Frost-free period (days)
- EMT: Extreme minimum temperature over 30 years (°C)
- EXT: Extreme maximum temperature over 30 years (°C)
- Eref: Hargreaves reference evaporation (mm)
- CMD: Hargreaves climatic moisture deficit (mm)
- RH: Relative humidity
- OM: Organic matter percentage
- P1: phosphorous, weak Bray (ppm)
- P2: phosphorous, strong Bray (ppm)
- BICARB: sodium bicarbonate (ppm)
- K: potassium (ppm)
- MG: magnesium (ppm)
- CA: calcium (ppm)
- NA: sodium (ppm)
- PH: soil pH
- CEC: cation exchange capacity (meq/100g)
- PERCENT_K: percent base saturation K (%)
- PERCENT_MG: percent base saturation Mg (%)
- PERCENT_CA: percent base saturation Ca (%)
- PERCENT_NA: percent base saturation Na (%)
- SOL_SALTS: soluble salts (mmhos/cm)
sunflower_picmin_allvars_top1000_recomb_fix2_long.txt
4-species picmin analysis
- V1: top candidate index for annuus
- V2: top candidate index for argophyllus
- V3: top candidate index for petpet
- V4: top candidate index for petfal
- V5: chromosome
- V6: start position
- V7: end position
- V8: Window ID
- V9: Environmental variable
- V10: picmin p-value
- V11: FDR adjusted picmin p-value
sunflower_picmin_allvars_top1000_recomb_nopetfal_baypass_fix2_long.txt
3-species picmin analysis excluding petfal and using baypass (structure-corrected) instead of raw spearman's correlation.
- V1: top candidate index for annuus
- V2: top candidate index for argophyllus
- V3: top candidate index for petpet
- V4: top candidate index for petfal
- V5: chromosome
- V6: start position
- V7: end position
- V8: Window ID
- V9: Environmental variable
- V10: picmin p-value
- V11: FDR adjusted picmin p-value
sunflower_picmin_allvars_top1000_recomb_nopetfal_fix2_long.txt
3-species picmin analysis excluding petfal and using raw spearman's correlation.
- V1: top candidate index for annuus
- V2: top candidate index for argophyllus
- V3: top candidate index for petpet
- V4: top candidate index for petfal
- V5: chromosome
- V6: start position
- V7: end position
- V8: Window ID
- V9: Environmental variable
- V10: picmin p-value
- V11: FDR adjusted picmin p-value
sunflower_picmin_allvars_top1000_recomb_nopetpet_baypass_fix2_long.txt
3-species picmin analysis excluding petpet and using baypass (structure-corrected) instead of raw spearman's correlation.
- V1: top candidate index for annuus
- V2: top candidate index for argophyllus
- V3: top candidate index for petpet
- V4: top candidate index for petfal
- V5: chromosome
- V6: start position
- V7: end position
- V8: Window ID
- V9: Environmental variable
- V10: picmin p-value
- V11: FDR adjusted picmin p-value
sunflower_picmin_allvars_top1000_recomb_nopetpet_fix2_long.txt
3-species picmin analysis excluding petpet and using raw spearman's correlation.
- V1: top candidate index for annuus
- V2: top candidate index for argophyllus
- V3: top candidate index for petpet
- V4: top candidate index for petfal
- V5: chromosome
- V6: start position
- V7: end position
- V8: Window ID
- V9: Environmental variable
- V10: picmin p-value
- V11: FDR adjusted picmin p-value
top_candidate_spearman_H.argophyllus
- V1: environmental variable
- V2: window
- V3: number of SNPs
- V4: number of outliers
- V5: proportion
- V6: quantile expectation for binomial with 1e-4
- V7: quantile expectation for binomial with 1e-8
- V8: species
convergent_cluster_inversion_overlap_LD0.9_1cM_ranges_verbose.table
- data: type of data (climate, soil and phenotype)
- analysis: type of analysis (Spearman, baypass, GWAS)
- variable : variable type
- direction : direction of the pair
- comparison : type of comparison
- chromosome : chromosome
- cluster_start : start position of the convergent cluster
- cluster_end : end position of the convergent cluster
- cluster_size : size of the convergent cluster (bp)
- N_convergent : number of convergent clusters
- inversion_species : inversion detected in which species
- inversion_ID : inversion ID
- inversion_start : start positon of inversion
- inversion_end: end positon of inversion
- inversion_size : size of inversion (bp)
- overlap_start : start of overlap between inversion and convergent cluster
- overlap_end : end of overlap between inversion and convergent cluster
- overlap_size : size of overlap between inversion and convergent cluster
- is_cluster_overlapping : whether convergent cluster overlaps with any inversion
convergent_clusters_overlaps_with_inversions_pervariable.Spearman.table
- N_cluster : number of clusters overlapping
- porportion_number_cluster_overlap_inversion : porportion number of cluster overlap inversion
- porportion_number_cluster_overlap_inversion_P : P-value of porportion number of cluster overlap inversion
- porportion_length_cluster_overlap_inversion : porportion length of cluster overlap inversion
- porportion_length_cluster_overlap_inversion_P : P-value porportion length of cluster overlap inversion
- var : variable
- comparison : pair type
- type : type of data (climate, soil and phenotype)
- analysis: type of analysis (Spearman, baypass, GWAS)
convergent_inversion_overlap_merged_P_LD0.9_1cM.table
- data : type of data (climate, soil and phenotype)
- analysis : type of analysis (Spearman, baypass, GWAS)
- variable: variable type
- direction : direction of the pair
- N_cluster : number of convergent cluster
- total_cluster_length : length of the convergent cluster
- total_overlap_length : length of the convergent cluster overlapping with inversion
- N_cluster_overlap : number of the convergent cluster overlapping with inversion
- porportion_number_cluster_overlap_inversion : porportion number clusters overlaping with inversion
- porportion_number_cluster_overlap_inversion_NULL_mean : porportion number of clusters overlaping with inversion mean of null distribution
- porportion_number_cluster_overlap_inversion_P : porportion number of clusters overlaping with inversion P-value
- porportion_length_cluster_overlap_inversion : porportion length of clusters overlaping with inversion
- porportion_length_cluster_overlap_inversion_NULL_mean : porportion length of clusters overlaping with inversion mean of null distribution
- porportion_length_cluster_overlap_inversion_P: porportion length of clusters overlaping with inversion P-value
out_res_GOEnrichment_TopGo_arabidopsis_homologs_convergent_windowos_ElimFisher_CC.csv
- GO.ID : GO ID
- Term : GO term
- Annotated : annotated
- Significant: P-value
- Expected: expected P-value
- elimFisher: elimFisher value
- p.adj: Adjusted P-value (BH)
- comparison: comparison type
- analysis: analysis type
out_res_Inversion_Convergent_CLUSTERS_overlapping_Summed_Size_number_per_chromosome_Climate_Spearman_union_percomparison.table
- cluster_start : start position of the convergent cluster
- cluster_end : end position ofo the cluster
- total_cluster_length : length of the cluster (bp)
- total_N.cluster : tota,l number of convergent cluster
- total_overlap_size : size of overlap (bp)
- total_N.overlapped : ize of overlap
- direction : direction of the analysis
- chromosome: chromosome
out_res_Inversion_Convergent_CLUSTERS_overlapping_Summed_Size_number_per_chromosome_Phenotype_Spearman_union_percomparison.table
- cluster_start : start position of the convergent cluster
- cluster_end : end position ofo the cluster
- total_cluster_length : length of the cluster (bp)
- total_N.cluster : tota,l number of convergent cluster
- total_overlap_size : size of overlap (bp)
- total_N.overlapped : ize of overlap
- direction : direction of the analysis
- chromosome: chromosome
CONV_WIN_GENE_overlap_Ath_HanXRQr1-2_Ha412HOv2_NA.table
- data: data (env: enviornments, gwa: phenotypes)
- type: type of data (enviornment, soil, phenotype)
- analysis: analysis (spearman, GWAS corrected, GWAS uncorrected)
- Taxa1: first pair of taxa
- Taxa2: second pair of taxa
- comparison: type of comparison
- variable: variable type
- window_name: name of 5k window
- window5 : window ID
- Chr: chromosmome
- window_start = start position of the window
- window_end = end position of the window
- Ha412HOv2.0_annot2.1_start = start position on Ha412HOv2.0_annot2.1 assembly
- Ha412HOv2.0_annot2.1_end = end position on Ha412HOv2.0_annot2.1 assembly
- Ha412HOv2.0_annot2.1 =Ha412HOv2.0_annot2.1 assembly
- arabidopsis = arabidopsis assembly
- HanXRQr1.0 = ortholog HanXRQr1.0
- HanXRQv2= ortholog HanXRQr1.0