Rapid sperm length divergence in a polygynandrous passerine: a mechanism of cryptic speciation?
Data files
Aug 15, 2023 version files 414.90 MB
-
Dataset.S1.xlsx
-
Dataset.S2.R
-
Dataset.S2.txt
-
Dataset.S3.fa.gz
-
Dataset.S4.vcf
-
Dataset.S5.XML
-
Dataset.S6.tre
-
Dataset.S7.fas
-
Dataset.S8.tre
-
README._Dataset.S2.md.txt
-
README.md
Abstract
When populations become geographically isolated, they begin to diverge in various traits and at variable rates. The dynamics of such trait divergences are relevant for understanding evolutionary processes such as local adaptation and speciation. Here we examine divergences in sperm and body structures in a polygynandrous songbird, the alpine accentor (Prunella collaris) between two allopatric high-altitude populations, in Morocco and Spain. The populations diverged around 82,000 years ago, as estimated with a coalescence-based phylogenetic analysis of genome-wide single nucleotide polymorphisms. We found that birds in the two areas had non-overlapping sperm lengths. This suggests adaptation to divergent female reproductive tract environments. Sperm length also showed an exceptionally low coefficient of among-male variation, a signal of strong stabilizing selection imposed by sperm competition. The evolutionary rate of sperm length was almost twice the rates for the most divergent morphological traits, and more than three times higher than expected from literature data over a similar generational timescale. This rapid evolution of a key reproductive trait has implications for reproductive isolation and ultimately for speciation. Strong selection for different sperm length optima in allopatry predicts conspecific sperm precedence and disruptive selection in sympatry, hence a possible postcopulatory prezygotic barrier to gene flow.
Methods
Fieldwork was conducted in June-July 2012–2015 in the Picos de Europa mountain range in northern Spain, at altitudes above 2,000 m, and in June-July 2013–2015 in the High Atlas, central Morocco, at altitudes above 3,000 m. The data set on phenotypic traits (body size and sperm size components) contains measurements of 28 breeding males in the Spanish population and 9 males in the Moroccan population.
For each bird we measured the following body structures: right wing chord (straightened and flattened), length of the right primary wing feathers and tail length using a ruler (precision ± 0.5 mm); right tarsus length and bill length using a digital caliper (precision ± 0.1 mm); and body mass using a digital weighing scale (precision ± 0.1 g). The length of primaries were used to calculate three wing shape indices, isometric size, pointedness, and convexity, following the formulas in Lockwood et al. (1998), which are based on size-constrained component analysis (SCCA). These indices rely on the length of individual primaries (P1 – P8), numbered ascendingly from the distal edge of the wing while omitting the vestigial-outermost primary feather.
Sperm samples were collected by cloacal massage, fixed in 5% phosphate-buffered formaldehyde solution, smeared on a glass slide, photographed at 320x in a Leica DM6000B microscope with a digital camera (Leica DFC420), and the length of sperm components measured using Leica Application Suite software v4.13. We measured the length of the sperm head, midpiece, and the tail (i.e., the midpiece-free end of the flagellum). The sum of the three measurements gives the total sperm length, and the sum of the midpiece and tail equals the flagellum length. We measured 10 sperm cells per male and used the mean value for each segment. We also calculated the coefficient of variation of sperm total lengths among the 10 sperm cells from the same male (CVwm) and among males (CVam) using their mean values with the correction factor of (1+ 1/4N) for variable sample sizes.
Divergence in each phenotypic trait between the two study populations was quantified as Hedges’ g (Hedges, 1981). This index expresses the difference between means in units of the pooled standard deviation of the trait. The rate of divergence for each trait was expressed in haldanes following Gingerich (2001) using logged measurements (natural logarithms). This rate is calculated as the difference between the population means expressed in standard deviations (c.f. Hedges’ g), divided by the number of generations since their last common ancestor. We calculated the number of generations by dividing the estimated time of divergence (= 82,200 years, see methods below) by a generation length of 2.68 years for P. collaris (Bird et al., 2020).
The raw data and the calculations of wing shape indices, Hedges' g and haldanes are combined in an Excel file (Dataset.S1.xlsx). The file gives the underlying data and calculations for Table 1 and Figure 2 in the Evolution article.
Sperm length CVam is negatively correlated with the risk of sperm competition measured as the frequency of extrapair young (EPY) in socially monogamous passerines (Calhim et al., 2007; Kleven et al., 2008; Lifjeld et al., 2019; Lifjeld et al., 2010). It is generally assumed that the association reflects a causal relationship where sperm competition acts as a force of stabilizing selection that reduces the trait variance around an optimum value. For an estimation of sperm competition risk in P. collaris, we examined paternity data in two studies (Hartley et al., 1995; Heer, 2013) and assumed that the number of offspring not sired by the alpha male in the polygynandrous groups would be equivalent to offspring not sired by the pair male (EPY = extrapair young) in a socially monogamous pair. Hartley et al. (1995) found that 41 of 110 (37.3%) young were not sired by the alpha male, and Heer (2013) reported 44 of 72 (61.1%) young. We used the average percentage of the two studies (= 49.2%) as our estimate of sperm competition risk. The sperm length CVam measured in the two P. collaris populations were compared with those of 58 other Passerides species for which data on the frequency of EPY exist. Data for these 58 species were extracted from Lifjeld et al. (2019, Table S3). The data set for this comparison and the R-code for the generation of Figure 3 in the Evolution article can be found in Dataset.S2.txt and Dataset.S2.R.
We used a whole genome sequencing approach to estimate the time of divergence between the two P. collaris populations. We sequenced the whole genomes of seven individuals: three males from the Spanish population, three males from the Moroccan population, and a male P. modularis from Norway. The sequences for a male P. himalayana were downloaded from the short read archive of NCBI (https://www.ncbi.nlm.nih.gov/sra) with accession numbers SRR9946605, SRR9946606, and SRR9946608 (Feng et al., 2020).
DNA was extracted with the DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer’s protocol. DNA library preparation and sequencing (150 bp, paired end) on an Illumina Novaseq SP flowcell were performed by the Norwegian Sequencing Centre at Oslo University Hospital. The sequencing output yielded between 76 and 132 million read pairs per individual, while the downloaded P. himalayana had 180 million read pairs, corresponding to greater than 20x sequencing depth for all individuals. Adapter removal and quality trimming were performed with cutadapt v.3.5 with a quality threshold of 20 and a minimum length threshold of 90 (Martin, 2011). Reads from one Moroccan individual were used to assemble a draft genome for P. collaris using MaSuRCA v. 4.0.8 (Zimin et al., 2013). This draft genome is uploaded as Dataset.S3.fa.gz.
Reads were mapped to the P. collaris genome using Bowtie2 v. 2.4.5 (Langmead & Salzberg, 2012) and converted to sorted binary files using SAMtools v. 1.15 (Danecek et al., 2021; Li et al., 2009). The mpileup function implemented in BCFtools v. 1.15.1 (Danecek et al., 2021) was used for SNP calling with a threshold of 10 for minimum mapping quality to obtain genotype likelihoods, and the call function with the consensus caller option (-c) was used to call the genotypes. Genotypes were further filtered by retaining only loci where: the locus was a bi-allelic SNP, the coverage depth was between 100 and 300 over all individuals, the SNP was more than 20 bases away from an indel, had a depth of 8 reads or greater for an individual, had a genotype quality greater than 10, and was present in all individuals. Both BCFtools and VCFtools v. 1.1.16 (Danecek et al., 2011) were used for filtering. This filtering retained 24,070,775 SNPs scored in all individuals. These SNPs were further pruned by keeping only one SNP (the first SNP) every 2,000 bases, leaving 473,945 SNPs. This filtered and pruned SNP file is uploaded as Dataset.S4.vcf.
We performed a Bayesian phylogenetic analysis on the SNP data using SNAPP v. 1.6.1 (Bryant et al., 2012), an add-on package for the program BEAST2 v.2.7.3 (Bouckaert et al., 2019). The program calculates the probability of a species tree by mathematically integrating over all possible gene trees from the individual SNP markers. Branch lengths can be converted to a time scale by specifying a time constraint on one or more divergence events in the tree following Stange et al. (2018). We used a divergence time of 7.3 mya as a calibration point for the split between P. modularis and P. collaris (Drovetski et al., 2013; Zang et al., 2023), and defined it through a lognormal prior distribution with a mean of 7.3 mya and a standard deviation of 0.138.
The script prep_snapp.rb of Stange et al. (2018) was used to prepare the XML format input file for SNAPP (Stange et al., 2018) with further pruning to keep only 100,000 SNPs out of the 473,945 SNPs. The Markov-chain Monte Carlo (MCMC) process was run in three replicates with a chain length of 100 million and results were stored every 25,000 iterations. Since the populations of P. collaris are restricted to disjunct high alpine areas while P. himalayana and P. modularis have much wider and continuous distributions, we assumed that the population sizes were not equal and thus allowed variable population sizes in the model for divergence by using the parameters. We did so by modifying the XML file, adding SNAPPs “GammaMover” and “RateMixer” operators to allow changes of each branch’s population size over the course of the MCMC. The input file for the SNAPP analysis is uploaded as Dataset.S5.XML.
The SNAPP log files were checked in Tracer v 1.7.2 (Rambaut et al., 2014), ensuring that the effective samples sizes (ESS) of all model parameters were >200. The log and trees files from the three replicate SNAPP runs were combined in LogCombiner, a utility program within BEAST2 (Bouckaert et al., 2019). Node ages (mean height) and credible intervals (95% highest posterior density [HPD]) were estimated together with the maximum clade credibility tree in TreeAnnotator v 1.7.4 (Drummond et al., 2012) and visualized in FigTree v 1.4.4 (Rambaut, 2017). The resulting input file for visualizing the maximum credibility tree with node ages and credible intervals in FigTree, as basis for Figure 4 in the Evolution article, is uploaded as Dataset.S6.tre.
We reconstructed the full mitogenomes from the short-read sequences using MitoBim (Hahn et al., 2013), and using the mitogenome of P. himalayana (Genbank GenBank acc. NC_053082.1) as a bait sequence. This sequence stems from the same voucher specimen as the one used in the SNP analysis (Feng et al., 2020). Sequences were aligned with MUSCLE (Edgar, 2004) in MEGA11 (Tamura et al., 2021) using default settings. Mitochondrial divergence times were then estimated with BEAST2 (Bouckaert et al., 2019) using the same constraints of 7.3 mya divergence time for the split between P. modularis and P. collaris. We applied the GTR model of sequence evolution, a strict-clock model, and a Yule tree prior, and ran two replicate analyses, each with an MCMC chain length of 10 million iterations. Posterior tree distributions were analyzed with Tracer and visualized in FigTree as described above. The alignment of the eight mitogenomes is available as Dataset.S7.fas, and the input file for the FigTree visualization of the mitochondrial gene tree (Figure 5 in the Evolution article) is uploaded as Dataset.S8.tre.