Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod
Data files
Jun 15, 2022 version files 232.67 MB
Abstract
The role of epistasis in adaptive evolution has remained an unresolved problem dating back to the Evolutionary Synthesis. This role is now being revisited due to its relevance for polygenic adaptation. In the absence of epistasis, polygenic adaptation is predicted to result in non-parallel evolution, because repeated selection could act on subsets of effectively redundant alleles. However, positive epistatic interactions among adaptive alleles would make the alleles non-redundant and selection for particular allelic combinations could drive parallel evolution. The inability to address this fundamental question might arise from traditional approaches lacking the power to capture the genomic architecture and dynamics of polygenic adaptation. To address this problem, we employed a replicated and controlled evolution experiment using the copepod Eurytemora affinis to elucidate the evolutionary response architecture to rapid salinity decline, a predicted consequence of global climate change in higher latitudes. Based on time-resolved pooled whole-genome sequencing, we uncovered a remarkably parallel response, despite polygenic adaptation involving over 1000 loci across ten replicate selection lines. Interestingly, single-nucleotide polymorphism (SNP) frequencies converged during the experiment, far beyond expectations, resulting in replicate lines sharing 93.1% of selected alleles. Using simulations, we found that this polygenic parallelism was consistent with synergistic epistasis among alleles responding in concert across replicate lines, a phenomenon that may be common for selection on complex physiological traits. Furthermore, we found that the same SNPs with signatures of selection in the laboratory also exhibited signatures of selection across a natural salinity gradient in the Baltic Sea. Our study provides the first experimental evidence that polygenic adaptation can actually be highly repeatable at the genomic level, given the presence of synergistic epistasis among the loci under selection.
Methods
A draft genome for Eurytemora affinis complex was constructed from long read and long-range sequencing technology. To generate genomic data for assembly, an inbred line was generated from 30 generations of full-sibling mating of copepods derived from a saline population in Baie de L’Isle Verte, St. Lawrence marsh, Quebec, Canada (Atlantic clade, aka E. carolleae). Prior to DNA extraction, the culture was treated with a series of antibiotics to reduce bacterial contamination, including Primaxin (20mg/l), Voriconazole (0.5 mg/l for at least 2 weeks prior to DNA extraction), D-amino acids to reduce biofilm (10 mM D-methionine, D-tryptophan, D-leucine, and 5 mM D-tyrosine, for at least for 2 weeks prior to DNA extraction). DNA was extracted from pooled adult copepods and used to generate Pacific Biosciences (PacBio) long-read data (2.65 million subreads, 30.2 Gb, read N50 = 19.25 kb), Dovetail Chicago® (213 million 150 bp read pairs) and Hi-C (171 million 150 bp read pairs) Illumina HiSeq X data.
PacBio reads were assembled into contigs using wtdbg v2.5 and polished with Racon v1.4.3. This procedure resulted in an assembly of size 575 Mb in 3013 contigs with an N50 of 951 kb, an N90 of 106 kb, and maximum contig length of 7.7 Mb. Contigs were scaffolded with HiRiseTM (Dovetail Genomics) using Dovetail Chicago® and Hi-C data. This procedure resulted in a highly contiguous assembly with 90% of the genome present in four scaffolds (1706 total scaffolds, 0.02% gaps), likely corresponding to the four chromosomes of E. affinis observed in karyotype (data not shown). These 4 scaffolds consisted of 518 Mb of sequence. Gene annotations were generated by mapping annotations from the previously published, short-read-assembled genome of E. affinis complex using LiftOff v1.6.1.
European E. affinis populations, the subject of the laboratory selection experiment, are highly genetically divergent from North American populations of the E. affinis complex, from which the draft genome was derived. Therefore, to maximize the number and accuracy of SNP calls, a pseudo-reference genome was assembled for SNP calling using an approach used in a previous study. This approach uses a reference transcriptome as an anchor to assemble Pool-seq data around coding regions, generating a reference assembly that includes coding sequences and surrounding sequence with putative regulatory function.
To generate RNA-seq data for assembly of the reference transcriptome, approximately 100 individuals of all life stages were collected from the European E. affinis laboratory culture derived from Kiel, Germany. Animals were pooled and subjected to a Trizol/Qiagen RNeasy hybrid total RNA extraction protocol. An mRNA sequencing library was prepared using the Illumina TruSeq Stranded mRNA kit and sequenced on an Illumina HiSeq 4000 sequencer, generating approximately 186 million 100 bp paired-end reads. Raw RNA-seq reads were processed using BBDuk in the BBtools package (https://jgi.doe.gov/data-and-tools/bbtools) to filter adapter sequences, low complexity sequences, and low quality (Q < 10) bases using a sliding window (ktrim=r k=23 mink=11 hdist=1 qtrim=w trimq=10 minlen=36 entropy=0.01 entropywindow=50 entropyk=5 tbo).
To filter sequences that may have originated from microbial contamination prior to assembly, reads were mapped against a database of reference and representative bacterial, archaeal, and fungal genomes from the NCBI RefSeq using Bowtie 2 v2.3.5 and were removed if they mapped concordantly. Cleaned reads were assembled using Trinity v2.6.6 in strand-specific mode. mRNA expression estimates were made using RSEM v1.3.1 with cleaned reads mapped to the Trinity assembly with Bowtie 2. Only the most highly expressed Trinity ‘isoform’ per ‘gene’ was retained to obtain the most highly supported transcriptome assembly. Transcript sequences were clustered to 95% similarity using CD-HIT v4.7 to reduce redundancy associated with allelic variation and assembly errors. Transdecoder v5.5 was used to predict open reading frames and coding sequences. Predictions included blastp hits to annotated proteins of the E. affinis complex draft genome and HMMER v3.2.1 (http://hmmer.org) hits to the PFAM protein database.
The resulting non-redundant transcriptome assembly was used to anchor the de-novo assembly of Pool-seq data around coding regions to obtain a pseudo-reference genome. First, the ‘left’ pairs of Pool-seq reads from the starting laboratory population were mapped to the coding sequences of the transcriptome with BWA-MEM v0.7.17, retaining reads with a mapping quality > 20. The corresponding ‘right’ pairs of the mapped reads were extracted and the mapped reads were assembled using Trinity v. 2.6.6, as Trinity was developed to assemble highly variable sequences with potentially uneven coverage. The resulting assembly was clustered to 95% similarity using CD-HIT v4.7. The above mapping and assembly procedures were repeated one additional time, mapping reads to the Trinity-assembled Pool-seq data rather than to the transcriptome. In this way, the assembly was extended into genomic regions surround the transcriptome-derived coding sequences. Finally, only assembled contigs with a significant BLAST hit (E-value < 0.001) to the draft genome (see above) were retained to further eliminate assembly errors. The pseudo-reference genome ultimately spanned a greater portion of the genome with greater contiguity than the transcriptome (Supplementary Table 10). These pseudo-reference genome contigs were arranged and oriented into scaffolds by BLASTing (blastn E-value < 0.001) contigs to the long-read genome assembly and retaining the top hit per contig.