This Gramlich_et al_README.txt file was generated on 2022-05-16 by Sophie Karrenberg GENERAL INFORMATION 1. Title of Dataset: A polygenic architecture with habitat-dependent effects underlies ecological differentiation 2. Author Information Principal Investigator, corresponding investigator Name: Sophie Karrenberg Institution: Uppsala University Address: Norbyvägen 18D, 75236 Uppsala Sweden Email: sophie.karrenberg@ebc.uu.se Co-investigator 1 Name: Susanne Gramlich Co-investigator 2 Name: Xiaodong Liu Co-investigator 3 Name: Adrien Favre Co-investigator 4 Name: C. Alex Buerkle 3. Date of data collection (single date, range, approximate date) 2011-2019 4. Geographic location of data collection: Valais, Switzerland (phenotype data) Uppsala, Sweden (genetic data) 5. Information about funding sources that supported the collection of the data: Swiss National Science Foundation (SNF, no. 3100A-118221) Swedish Research Council (Vetenskapsrådet, no. 2012-03622) Carl Tryggers foundation to SK (CTS 17:249) German Science Foundation (Deutsche Forschungsgemeinschaft, project no. FA1117/1-2) SHARING/ACCESS INFORMATION 1. Links/relationships to ancillary data sets: Double-digest RAD (ddRAD) sequencing data is available on NCBI’s Short Read Archive, SRA (accession number SRP287913, BioProject. https://www.ncbi.nlm.nih.gov/sra/PRJNA669447). 2. Was data derived from another source? No 3. Recommended citation for this dataset: Gramlich S., Liu X, Favre A, Buerkle CA, Karrenberg S (2017) Data from: A polygenic architecture with habitat-dependent effects underlies ecological differentiation in Silene. Dryad Digital Repository. https://doi:10.5061/dryad.4tmpg4fcn DATA & FILE OVERVIEW 1. File List: File 1 Name: SD_cum_flowering.txt File 1 Description: cumulative flowering Silene dioica habitat, Input-file for association analysis in GEMMA File 2 Name: SL_cum_flowering.txt File 2 Description: cumulative flowering of hybrids grown in the Silene latifolia habitat, Input-file for association analysis in GEMMA File 3 Name: SDprunedGemma.txt File 3 Description: genotype probabilities of hybrids grown in the Silene dioica habitat, Input-file for association analysis in GEMMA File 4 Name: SLprunedGemma.txt File 4 Description: genotype probabilities of hybrids grown in the Silene latifolia habitat, Input-file for association analysis in GEMMA File 5 Name: SD_GEMMA_input.vcf File 5 Description: genetic sequence variation (variant call format) of hybrids grown in the Silene dioica habitat File 6 Name: SL_GEMMA_input.vcf File 6 Description: genetic sequence variation (variant call format) of hybrids grown in the Silene latifolia habitat File 7 Name: F0.vcf File 7 Description: genetic sequence variation (variant call format) parental individuals of Silene dioica and Silene latifolia File 8 Name: F0_species.txt File 8 Description: Identification of the species in F0.vcf (File 7) 2. Are there multiple versions of the dataset? no METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: Cumulative flowering was assessed over 4 years in transplanted recombinant interspecific hybrids (second generation) transplanted into two different habitats (Silene dioica habitat and Silene latifolia habitat) in a randomised block design. Values were standardised within each of the two sites per habitat type. Hybrids were sequenced using double digests restriction associated sequencing (ddRADseq); methods are described in: Favre A, Widmer A, Karrenberg S. 2017. Differential adaptation drives ecological speciation in campions (Silene): evidence from a multi-site transplant experiment. New Phytologist 213(3): 1487-1499. Liu X, Karrenberg S. 2018. Genetic architecture of traits associated with reproductive barriers in Silene: Coupling, sex chromosomes and variation. Molecular Ecology 27(19): 3889-3904. Gramlich S., Liu X, Favre A, Buerkle CA, Karrenberg S (2022) A polygenic architecture with habitat-dependent effects underlies ecological differentiation in Silene. New Phytologist, accepted, pre-print published on BioRxiv doi: https://doi.org/10.1101/2021.07.06.451304 2. Methods for processing the data: Cumulative flowering is the number of years a plants flowered plus 1 if the plants survived to the end of the experiment (Favre et al. 2017, Gramlich et al. 2022). Genotype probabilities were calculated from genotype likelihoods (https://github.com/visoca/popgenomworkshop gwas_gemma/tree/master/scripts/bcf2bbgeno.pl, accessed 28 January 2019). Genotype probabilities of 0 and 2 denote homozygotes for the reference and alternative allele, respectively, while 1 indicates heterozygosity. The reference allele in our ddRAD seq reference can be derived from either species (Gramlich et al. 2022) We processed the ddRAD-sequence reads following the dDocent pipeline (Puritz et al., 2014). After de-multiplexing of raw reads using STACKS 2.0b (Catchen et al., 2013) and trimming with fastp (Chen et al., 2018), we used BWA MEM 0.7.17 with default parameters (Li, 2013) to map reads to ddRAD-seq-generated reference contigs, which were previously assembled from eight deeply sequenced individuals of both species and hybrids and corresponded to 95,040,562 bp in total, corresponding to approximately 3.4% of the S. latifolia genome (Liu & Karrenberg, 2018; Liu et al., 2020). Variants were called with FreeBayes 1.1.0 (Garrison & Marth, 2012) without population priors using the following parameters: minimum mapping quality 30, minimum base quality 20, maximum complex gap 3, minimum repeat entropy 1, binominal-obs-priors 1, and use-best-n-alleles 10. References for data processing: Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA. 2013. Stacks: an analysis tool set for population genomics. Molecular Ecology 22: 3124-3140. Chen S, Zhou Y, Chen Y, Gu J. 2018. Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17): i884-i890. Favre A, Widmer A, Karrenberg S. 2017. Differential adaptation drives ecological speciation in campions (Silene): evidence from a multi-site transplant experiment. New Phytologist 213(3): 1487-1499. Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing. http://arxiv.org/abs/1207.3907 Gramlich S., Liu X, Favre A, Buerkle CA, Karrenberg S (2022) A polygenic architecture with habitat-dependent effects underlies ecological differentiation in Silene. New Phytologist, accepted, pre-print published on BioRxiv doi: https://doi.org/10.1101/2021.07.06.451304 Liu X, Glémin S, Karrenberg S. 2020. Evolution of putative barrier loci at an intermediate stage of speciation with gene flow in campions (Silene). Molecular Ecology 29(18): 3511-3525. Liu X, Karrenberg S. 2018. Genetic architecture of traits associated with reproductive barriers in Silene: Coupling, sex chromosomes and variation. Molecular Ecology 27(19): 3889-3904. Puritz JB, Hollenbeck CM, Gold JR. 2014. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms. PeerJ 2: e431. DATA-SPECIFIC INFORMATION FOR: SD_cum_flowering.txt 1. Number of variables: 1 2. Number of cases/rows: 134 3. Variable List: standardised cumulative flowering (see description above), individuals are in the same order as in SDprunedGemma.txt DATA-SPECIFIC INFORMATION FOR: SL_cum_flowering.txt 1. Number of variables: 1 2. Number of cases/rows: 156 3. Variable List: standardised cumulative flowering (see description above), individuals are in the same order as in SLprunedGemma.txt DATA-SPECIFIC INFORMATION FOR: SDprunedGemma.txt 1. Number of variables: 137 2. Number of cases/rows: 42090 3. Variable List: Name of variant Reference allele Alternative allele 134 variables corresponding to 135 individuals, individuals are in the same order as in SD_cum_flowering.txt DATA-SPECIFIC INFORMATION FOR: SLprunedGemma.txt 1. Number of variables: 159 2. Number of cases/rows: 42090 3. Variable List: Reference allele Alternative allele 156 variables corresponding to 156 individuals, individuals are in the same order as in SD_cum_flowering.txt DATA-SPECIFIC INFORMATION FOR: SD_GEMMA_input.vcf Genetic sequence variation at 42090 loci (variant call format) for 134 hybrids grown in the Silene dioica habitat. DATA-SPECIFIC INFORMATION FOR: SL_GEMMA_input.vcf Genetic sequence variation at 42090 loci (variant call format) for 156 hybrids grown in the Silene latifolia habitat. DATA-SPECIFIC INFORMATION FOR: F0.vcf Genetic sequence variation at 42090 loci (variant call format) for 18 Silene dioica and 18 Silene latifolia individuals (see F0_species.txt). DATA-SPECIFIC INFORMATION FOR: F0_species.txt Species identity for the 36 individuals in F0.vcf, SD: Silene dioica, SL: Silene latifolia.