Data from: Inversions contribute disproportionately to parallel genomic divergence in dune sunflowers
Data files
Aug 28, 2024 version files 1.06 GB
-
GSD_SS_GBS_NGM_GATK_total.vcf.gz
-
Habitat_data_clean.csv
-
haploblocks.petfal.txt
-
inv_gt.WGS_RF.NEG1.txt
-
inv_gt.WGS_RF.NEG2.txt
-
inv_gt.WGS_RF.PET1.txt
-
inv_gt.WGS_RF.PET2.txt
-
MON_seed_size_data.csv
-
MON_SS_GBS_NGM_GATK_total.vcf.gz
-
NegPG_GBS_NGM_GATK_total.vcf.gz
-
QTL_SS_data.csv
-
README.md
Abstract
The probability of parallel genetic evolution is a function of the strength of selection and constraints imposed by genetic architecture. Inversions capture locally adapted alleles and suppress recombination between them, which limits the range of adaptive responses. Also, the combined phenotypic effect of alleles within inversions is likely to be greater than that of individual alleles, which should further increase the contributions of inversions to parallel evolution. We tested the hypothesis that inversions contribute disproportionately to parallel genetic evolution in independent dune ecotypes of Helianthus petiolaris. We analyzed habitat data and identified variables underlying parallel habitat shifts. Genotype-environment association analyses of these variables indicated parallel responses of inversions to shared selective pressures. We also confirmed larger seed size across the dunes and performed quantitative trait locus (QTL) mapping with multiple crosses. QTL shared between locations fell into inversions more than expected by chance. We used whole-genome sequencing data to identify selective sweeps in the dune ecotypes and found that the majority of shared swept regions were found within inversions. Phylogenetic analyses of shared regions indicate that within inversions the same allele typically was found in the dune habitat at both sites. These results confirm predictions that inversions drive parallel divergence in the dune ecotypes.
README: Data from: Inversions contribute disproportionately to parallel genomic divergence in dune sunflowers
https://doi.org/10.5061/dryad.bcc2fqznn
Description of the data and file structure
Files and variables
File: Habitat_data_clean.csv
Description: Habitat data collected from multiple locations where dune and non-dune populations of sunflowers grow.
New data was collected for the populations near Monahans Sandhills State Park in Texas. We established a transect through each population and picked 5 sites for habitat analysis at even intervals along those transects. At each site, we took a photo of a 0.65 m2 quadrat and used ImageJ to determine the proportion of vegetative cover and grass cover. Soil samples were taken from 25 cm deep, dried at 60 °C for 24 h, and pooled from the 5 sites at each population. Available phosphorus, and exchangeable potassium, magnesium, and calcium were measured at A&L Eastern Laboratories (Richmond, VA) and total nitrogen content was determined by Micro-Dumas Combustion (NA1500, Carlo Erba Strumentazione, Milan, Italy) at the University of Georgia Analytical Chemistry Laboratory.
The data for populations near Great Sand Dunes National Park and Preserve is from Andrew, R.L., Ostevik, K.L., Ebert, D.P. and Rieseberg, L.H., 2012. Adaptation with gene flow across the landscape in a dune sunflower. Molecular Ecology 21: 2078-2091.
Variables
- location: whether a population was near Monahans Sandhills State Park (MON) or Great Sand Dunes National Park and Preserve (GSD)
- pop: population name
- type: whether a population is dune (D) or non-dune (N)
- mean.percent.cover: proportion of quadrats covered by vegetation (average of 5 sites)
- percent.grass: proportion of quadrats covered by grass (average of 5 sites)
- total.N: total nitrogen (%*1000) in soil
- P: phosphorus (ppm) in soil
- K: potassium (ppm) in soil
- Mg: magnesium (ppm) in soil
- Ca: calcium (ppm) in soil
File: haploblocks.petfal.txt
Description: The locations of haploblocks (inversions) in Helianthus petiolaris based on Todesco, M., Owens, G.L., Bercovich, N., Légaré, J.S., Soudi, S., Burge, D.O., Huang, K., Ostevik, K.L., Drummond, E.B., Imerovski, I. and Lande, K., 2020. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584: 602-607.
Variables
- name: haploblock name
- chr: chromosome location of haploblock
- start: base pair position of the start of the haploblock
- end: base pair position of the end of the haploblock
File: inv_gt.WGS_RF.NEG1.txt
Description: Inversion genotypes for mapping population MON1. For genotyping scripts, see https://github.com/hkchi/Dune_parallelism
Variables
- haploblock: name of the haploblock (inversion)
- name: mapping population individual ID
- WGS-FR-genotype: genotype of the inversion (0=homozygous for wildtype, 1=heterozygous, 2=homozygous for inversion)
File: inv_gt.WGS_RF.NEG2.txt
Description: Inversion genotypes for mapping population MON2. For genotyping scripts, see https://github.com/hkchi/Dune_parallelism
Variables
- haploblock: name of the haploblock (inversion)
- name: mapping population individual ID
- WGS-FR-genotype: genotype of the inversion (0=homozygous for wildtype, 1=heterozygous, 2=homozygous for inversion)
File: inv_gt.WGS_RF.PET1.txt
Description: Inversion genotypes for mapping population GSD2. For genotyping scripts, see https://github.com/hkchi/Dune_parallelism
Variables
- haploblock: name of the haploblock (inversion)
- name: mapping population individual ID
- WGS-FR-genotype: genotype of the inversion (0=homozygous for wildtype, 1=heterozygous, 2=homozygous for inversion)
File: inv_gt.WGS_RF.PET2.txt
Description: Inversion genotypes for mapping population GSD1. For genotyping scripts, see https://github.com/hkchi/Dune_parallelism
Variables
- haploblock: name of the haploblock (inversion)
- name: mapping population individual ID
- WGS-FR-genotype: genotype of the inversion (0=homozygous for wildtype, 1=heterozygous, 2=homozygous for inversion)
File: MON_seed_size_data.csv
Description: Seed size data collected from plants grown in the wild and in a common garden
Variables
- experiment: whether seeds were collected from plants grown in the wild or in a common garden
- ecotype: the ecotypes (dune or non-dune) of the plant phenotyped
- pop: the population that the plant phenotyped was collected
- mom: the individual ID of the plant phneotyped
- weight: the average weight of a seed (mg) produced by the plant
File: GSD_SS_GBS_NGM_GATK_total.vcf.gz
Description: An unfiltered vcf that includes genotype calls for all the individuals sequenced as part of the GSD seed size QTL mapping. For the genotyping scripts, see https://github.com/hkchi/Dune_parallelism
File: QTL_SS_data.csv
Description: Seed size data collected from all the individuals from MON and GSD mapping populations grown in a common garden
Variables
- type: variable based on map_pop, generation, and cytoplasm (see below)
- map_pop: which of the four mapping populations (MON1, MON2, GSD1, GSD2) the individual belongs to
- generation: the generation (parent, F1, F2) that the individual belongs to
- cytoplasm: whether the individual has a dune or non-dune cytoplasm (this depends on the direction of cross made)
- weight: the weight of an average seed (mg) produced by this individual
File: MON_SS_GBS_NGM_GATK_total.vcf.gz
Description: An unfiltered vcf that includes genotype calls for all the individuals sequenced as part of the MON seed size QTL mapping. For the genotyping scripts, see https://github.com/hkchi/Dune_parallelism
File: NegPG_GBS_NGM_GATK_total.vcf.gz
Description: An unfiltered vcf that includes genotype calls for all the individuals sequenced for the GEA analyses at MON. For the genotyping scripts, see https://github.com/hkchi/Dune_parallelism
Access information
Other publicly accessible locations of the data:
- Andrew, R.L., Ostevik, K.L., Ebert, D.P. and Rieseberg, L.H., 2012. Adaptation with gene flow across the landscape in a dune sunflower. Molecular Ecology 21: 2078-2091.
- Todesco, M., Owens, G.L., Bercovich, N., Légaré, J.S., Soudi, S., Burge, D.O., Huang, K., Ostevik, K.L., Drummond, E.B., Imerovski, I. and Lande, K., 2020. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584: 602-607.
Data was derived from the following sources:
- vcfs were derived from sequences uploaded to the SRA (PRJNA1145483 and PRJNA1145296) and scripts found at https://github.com/hkchi/Dune_parallelism
Methods
This dataset includes:
- Habitat data collected from areas where wild H. petiolaris sunflowers grow
- Seed size data collected from H. petiolaris growing in the wild and under common garden conditions
- Unfiltered vcfs used for QTL mapping seed size and looking for genotype environement associations
- Genomic locations of previously identified haploblocks (inversions) within H. petiolaris
- Inversion genotypes for the QTL mapping populations
Please see the README file and associated publication for more complete descriptions of each dataset.