Genotype, phenotype and linkage data for Mimulus parishii x M. cardinalis hybrid incompatibility study
Data files
Aug 14, 2023 version files 3.01 MB
-
PCPZ_RILF2_Dataforarchiving.xlsx
3.01 MB
-
README_PCPZ_RILsF2.md
4.77 KB
Abstract
The evolution of genomic incompatibilities causing postzygotic barriers to hybridization is a key step in species divergence. Incompatibilities take two general forms – structural divergence between chromosomes leading to severe hybrid sterility in F1 hybrids and epistatic interactions between genes causing reduced fitness of hybrid gametes or zygotes (Dobzhansky-Muller incompatibilities). Despite substantial recent progress in understanding the molecular mechanisms and evolutionary origins of both types of incompatibility, how each behaves across multiple generations of hybridization remains relatively unexplored. Here, we use genetic mapping in F2 and RIL hybrid populations between the phenotypically divergent but naturally hybridizing monkeyflowers Mimulus cardinalis and M. parishii to characterize the genetic basis of hybrid incompatibility and examine its changing effects over multiple generations of experimental hybridization. In F2s, we found severe hybrid pollen inviability (< 50% reduction vs. parental genotypes) and pseudolinkage caused by a reciprocal translocation between Chromosomes 6 and 7 in the parental species. RILs retained excess heterozygosity around the translocation breakpoints, which caused substantial pollen inviability when interstitial crossovers had not created compatible heterokaryotypic configurations. Strong transmission ratio distortion and inter-chromosomal linkage disequilibrium in both F2s and RILs identified a novel two-locus genic incompatibility causing sex-independent gametophytic (haploid) lethality. The latter interaction eliminated three of the expected nine F2 genotypic classes via F1 gamete loss without detectable effects on the pollen number or viability of F2 double heterozygotes. Along with the mapping of numerous milder incompatibilities, these key findings illuminate the complex genetics of plant hybrid breakdown and are an important step toward understanding the genomic consequences of natural hybridization in this model system.
Methods
Study system and plant lines
The plants in this study were all derived from two highly (>10 generations) inbred lines of Sierran M. cardinalis (CE10) and M. parishii (PAR), which were also used in previous investigations of species barriers (Bradshaw et al. 1998; Schemske and Bradshaw 1999; Ramsey et al. 2003; Bradshaw and Schemske 2003; Fishman et al. 2013, 2015; Nelson et al. 2021a). We generated PAR x CE10 F1 hybrids by hand-pollination (with prior emasculation of the PAR seed parent in the bud) and F2 hybrids by self-pollination of F1 hybrids. The F2 hybrids were grown in two separate greenhouse common gardens at the University of Montana (UM-F2; total N = 524) and the University of Connecticut (UC-F2 N = 253), along with parental control lines, and were phenotyped for numerous floral and vegetative traits including the pollen fertility traits presented here. Recombinant inbred lines (RILs) were generated by single-seed-descent from additional F2 individuals grown at the University of Georgia and California State Polytechnic University, Pomona; a total of 167 RILs were formed through 3–6 generations of self-fertilization.
DNA extraction and sequencing
Genomic DNA was extracted from bud and leaf tissue of the greenhouse-grown F2 and RIL mapping populations using a CTAB chloroform protocol modified for 96-well plates (dx.doi.org/10.17504/protocols.io.bgv6jw9e). We used a double-digest restriction-site associated DNA sequencing (ddRADSeq) protocol to generate genome-wide sequence clusters (tags), following the BestRAD library preparation protocol (dx.doi.org/10.17504/protocols.io.6awhafe), using restriction enzymes PstI and BfaI (New England Biolabs, Ipswich, MA). Post-digestion, half plates of individual DNAs were labeled by ligation of 48 unique in-line barcoded adapters, then pooled for size selection. Libraries were prepared using NEBNext Ultra II library preparation kits for Illumina (New England BioLabs, Ipswich, MA). Each pool was indexed with a unique NEBNext i7 adapter and an i5 adapter containing a degenerate barcode and PCR amplified with 12 cycles. The F2 libraries were size-selected to 200-700bp using BluePippin 2% agarose cassettes (Sage Science, Beverly, MA) and sequenced (150-bp paired-end reads) in a partial lane of an Illumina HiSeq4000 sequencer at GC3F, the University of Oregon Genomics Core Facility. The RIL library was sequenced (150-bp paired-ends) without size-selection on an Illumina HiSeq4000 at Genewiz (South Plainfield, NJ).
Sequence processing and linkage mapping
After sequencing, two separate ddRAD datasets were analyzed: one with samples from both F2 populations (N = 283 UM-F2 hybrids with 3 M. parishii and 2 M. cardinalis controls, and 253 UC-F2 hybrids, with 3 each F1, M. parishii and M. cardinalis controls) and one with samples from the RIL population (N = 167). Samples from both datasets were demultiplexed using a custom Python script (dx.doi.org/10.17504/protocols.io.bjnbkman), trimmed using Trimmomatic (Bolger et al. 2014), mapped to the M. cardinalis CE10 v2.0 reference genome (http://mimubase.org/FTP/Genomes/CE10g_v2.0) using BWA MEM, and indexed using SAMtools (Li et al. 2009). The RIL dataset was also filtered in SAMtools using a mapping quality ≥ 29. We called SNPs in both datasets using HaplotypeCaller in GATK v3.3 in F2s, v4.1.8.1 in RILs (McKenna et al. 2010).
Next, we performed a series of filtering steps to generate sets of high-quality SNPs. In the F2 dataset, we filtered using vcftools (Danecek et al. 2011), retaining sites with read depth ≥ 5, mapping quality ≥ 10, and < 40% missing data. We also filtered out loci deviating from Hardy-Weinberg Equilibrium at a p-value < 0.00005. In the RIL dataset, we filtered a combined GVCF file using GATK, retaining sites with read depth ≥ 4*N (with N = number of RIL samples) and < the mean + 2 * the standard deviation, QD score < 2.0, FS score > 60, MQ < 40, MQ rank sum < -12.5, ReadPosRankSum < -8.0, and < 10% missing genotypes. For both datasets, we used custom scripts to remove sites that were not polymorphic in the parents and heterozygous in the F1 hybrids (F2: https://github.com/bergcolette/F2_genotype_processing). We excluded individuals from the F2 dataset with > 10% missing data and from the RIL dataset with low coverage, high missingness, or excessive heterozygosity (> 50%, indicating line contamination). These filtering steps produced an F2 dataset with 18,119 SNPs (N = 252 UM-F2 and 253 UC-F2) and a RIL dataset with 47,851 SNPs (N = 145).
To produce sets of high-quality marker genotypes for mapping, we binned each dataset into 18-SNP windows using custom Python and R scripts (provided at GitHub links above), requiring ≥ 8 sites to have SNP genotype calls to assign a windowed genotype. In the F2 binning script, M. cardinalis homozygotes were coded as 2, M. parishii homozygotes as 0, and heterozygotes as 1. We called windows with mean values < 0.2 as parishii homozygotes, > 1.8 as cardinalis homozygotes, and between 0.8 and 1.2 as heterozygotes. Windows with means outside of these ranges were coded as missing genotypes. For the RILs, we required ≥ 88% of SNP calls to match each other to assign each parental homozygous genotype (e.g., 16/18 sites = homozygous for M. cardinalis alleles; https://github.com/vasotola/GenomicsScripts).
We generated linkage maps for each dataset using Lep-MAP3 (Rastas 2017). First, we used the SeparateChromosomes2 module to assign markers to linkage groups (F2: LodLimit = 25, theta = 3, RIL: LodLimit = 28, theta = 0.2). In the RIL dataset, 10 markers were assigned to linkage groups inconsistent with the reference genome assembly; we manually re-assigned these markers to linkage groups corresponding to their reference assembly chromosomes. Next, we performed iterative ordering using the OrderMarkers2 module (Kosambi mapping function; 6 iterations/per linkage group in the F2s, 10 in the RILs); the order with the highest likelihood for each linkage group was chosen. This resulted in an F2 map with 997 markers in seven linkage groups and a RIL map with 2,535 markers in eight linkage groups. In the RIL dataset, the genotype matrix output by Lep-MAP3 differed in two important respects from the input file. First, due to stringent thresholds for calling windowed genotypes, our input file includes a high percentage of missing data (23% of genotypes are coded as ‘no call’), whereas the output file contains no missing data (Lep-MAP3 converts each ‘no call’ genotype to a called genotype). Second, the Lep-MAP3 output file contains more heterozygous genotype calls than the input file. The reason for this increase in heterozygosity is that Lep-MAP3 disproportionately converts ‘no call’ genotypes to heterozygotes: relative to the input file, the output genotype matrix includes 115% more heterozygotes, compared to only 18% more M. cardinalis homozygotes and 20% more M. parishii homozygotes. Notably, Lep-MAP3 frequently converted ‘no call’ genotypes to heterozygotes when they occur at single markers between recombination breakpoints. Because most recombinational switches in this RIL population are between alternative homozygotes, any window that contains an actual breakpoint will carry a mixture of M. cardinalis and M. parishii homozygotes at the 18 SNPs (and thus be coded as ‘no call’ in our windowed genotype matrix). To circumvent these problems, for all downstream analyses, we used a modified version of the genotype matrix output from Lep-MAP3 in which genotypes were recoded as ‘no call’ as in the input file.
QTL mapping of pollen traits
In the UM-F2 and RIL populations, we directly assessed male fertility by collecting all four anthers of the first flower from each plant into 50 ml of lactophenol-aniline blue dye. We counted viable (darkly stained) and inviable (unstained) pollen grains using a hemocytometer (≥ 100 grains/flower). We estimated total pollen grains per flower (count per ml x 50) and pollen viability as viable grains/total counted. For a handful of individuals with < 100 pollen grains counted, pollen viability was scored as missing data. Phenotyping in the RIL population was performed on siblings or selfed progeny of the individuals genotyped. In a few cases in the RILs (N = 12), the first flower had < 100 pollen grains, so both traits were instead collected from the second flower.
We mapped pollen QTLs in Windows QTL Cartographer 2.5 (Wang et al. 2012) using composite interval mapping (CIM; Zeng 1993, 1994), with forward-backward stepwise regression, a window size of 10 cM, five background markers, and a 1-cM walk speed. We used permutations (N = 1000) to set genome-wide significance thresholds for QTL peaks and calculated 1.5-LOD-drops to determine confidence intervals for QTL locations. Because pollen viability QTLs exhibited complex interactions in F2s (see Results), we directly estimated QTL effects (at each peak marker) and interactions using the Generalized Linear Model module in JMP16 (SAS Institute, Cary, NC), starting with a full factorial model of all QTLs and 2-way interactions and removing nonsignificant (P > 0.05) interactions. To test for pollen viability signatures of a two-locus gametic incompatibility (LG4-LG8) detected from TRD and LD patterns, we contrasted Least Squared Means for each extant F2 class from an ANOVA also including the large-effect translocation breakpoint marker at 60.55Mb on Chr 6.
QTL mapping of underdominant effects in RILs has low power due to (relatively) few heterozygotes, plus most RIL QTL-mapping algorithms (including those in WinQTLCart) exclude non-homozygous genotypes. To directly test for persistent underdominant effects of the Chr 6-7 translocation in RILs, we screened for genotype-pollen viability associations at uniquely positioned markers with a joint sample size >70 across the breakpoint region (57.07-61.08 Mb of Chr6 and 7.51- 13.05 of Chr 7; n = 42), using t-tests in the response screening module in JMP16 (SAS Institute, Cary, NC). However, we caution that because RIL hybrid male sterility was measured on siblings or descendants of the genotyped individuals, underdominant effects will likely be underestimated. That is, regions genotyped as heterozygous might actually be homozygous in phenotyped individuals, potentially explaining why a few of them are highly fertile. We conservatively controlled for multiple tests by using an FDR-corrected P-value of 0.05.
Usage notes
The data are supplied as a multi-sheet Excel workbook. They should be openable by text handlers such as BBedit or similar.