The contribution of ancient admixture to reproductive isolation between European sea bass lineages
Data files
Apr 08, 2020 version files 2.63 GB
-
ALL_STATS_50Kb_Windows.txt
-
HQ_68G_phased_LG.vcf.gz
-
HQ_68G_phased_LG.vcf.idx
Abstract
Understanding how new species arise through the progressive establishment of reproductive isolation barriers between diverging populations is a major goal in Evolutionary Biology. An important result of speciation genomics studies is that genomic regions involved in reproductive isolation frequently harbor anciently diverged haplotypes that predate the reconstructed history of species divergence. The possible origins of these old alleles remain much debated, as they relate to contrasting mechanisms of speciation that are not yet fully understood. In the European sea bass (Dicentrarchus labrax), the genomic regions involved in reproductive isolation between Atlantic and Mediterranean lineages are enriched for anciently diverged alleles of unknown origin. Here, we used haplotype-resolved whole-genome sequences to test whether divergent haplotypes could have originated from a closely related species, the spotted sea bass (Dicentrarchus punctatus). We found that an ancient admixture event between D. labrax and D. punctatus is responsible for the presence of shared derived alleles that segregate at low frequencies in both lineages of D. labrax. An exception to this was found within regions involved in reproductive isolation between the two D. labrax lineages. In those regions, archaic tracts originating from D. punctatus locally reached high frequencies or even fixation in Atlantic genomes but were almost absent in the Mediterranean. We showed that the ancient admixture event most likely occurred between D. punctatus and the D. labrax Atlantic lineage, while Atlantic and Mediterranean D. labrax lineages were experiencing allopatric isolation. Our results suggest that local adaptive introgression and/or the resolution of genomic conflicts provoked by ancient admixture have probably contributed to the establishment of reproductive isolation between the two D. labrax lineages.
Methods
Whole-genome resequencing and haplotyping
We sequenced the whole genome of one Dicentrarchus punctatus individual from the Atlantic Ocean (Gulf of Cadiz, PUN) and 59 new Dicentrarchus labrax individual genomes. Fifty-two of them were wild individuals captured from the Atlantic Ocean (English Channel, 10 males ♂AT), the western Mediterranean Sea (Gulf of Lion, 14 females ♀WME and 9 males ♂WME) and the eastern Mediterranean Sea (Turkey, 10 males ♂NEM and Egypt, 9 males ♂SEM). Some of these specimens were experimentally crossed to generate first generation hybrids, which were used to phase the genome of their parents using a phasing-by-transmission approach.
Whole genome sequencing libraries were prepared separately for each individual using either the Illumina TruSeq DNA PCR-Free (40 individuals) or the TruSeq DNA Nano protocol (20 individuals), depending on DNA concentration. Pools of 5 individually barcoded libraries were then sequenced on 12 separate lanes of an Illumina HiSeq3000 using 2x150bp PE reads.
Alignment of individual PE reads to the sea bass reference genome was performed using BWA-mem v0.7.5a with default parameters. Duplicate reads were marked using Picard version 1.112 before being removed, producing a mean coverage depth of 33.8X per individual. We then followed GATK’s (version 3.3-0-g37228af) best practice pipeline from individual variant calling (using HaplotypeCaller), to joint genotyping, genotype refinement and variant filtering (using Filter Expression: QD<10; MQ<50; FS>7; MQRankSum<-1.5; ReadPosRankSum<-1.5). We used the BQSR algorithm to recalibrate base quality scores using a set of high-quality variants identified in a previous study (Duranton et al. 2018), and to perform variant quality score recalibration using the VQSR algorithm. Hard filtering was then applied to exclude low-quality genotypes with a GQ score < 30.
Haplotype phasing in D. labrax was done after merging the 59 newly sequenced genomes with the 16 genomes already obtained in Duranton et al. (2018). Fifteen individuals that were involved in family crosses were submitted to phasing-by-transmission using the PhaseByTransmission algorithm in GATK with default parameters and a mutation rate prior of 10-8 for de novo mutations. For all individuals, variants located on a same read pair were directly phased using physical phasing information. Non-related D. labrax individuals were then statistically phased using Eagle2 (version 2.4). The 22 parents phased with the phasing-by-transmission approach were used to build a European sea bass reference haplotype library, which was used in Eagle2 to improve statistical phasing.
We finally filtered out SNPs in VCFtools using --max-missing-count 0 and the --phased option to exclude unphased variants. In this way, we generated a dataset of haplotype-resolved whole-genome sequences from 68 unrelated D. labrax individuals (14 AT, 31 WME, 11 SEM and 12 NEM), containing 5,074,249 phased SNPs without missing data.
The corresponding VCF file containing 5,074,249 phased SNPs is called:
HQ_68G_phased_LG.vcf.gz
The genome-wide statistics calculated in non-overlapping 50kb:
ALL_STATS_50Kb_Windows.txt
Usage notes
The VCF file containing 5,074,249 phased SNPs from 68 Dicentrarchus labrax genomes without missing genotype:
HQ_68G_phased_LG.vcf.gz
The genome-wide statistics calculated in non-overlapping 50kb windows from 68 Dicentrarchus labrax genomes:
ALL_STATS_50Kb_Windows.txt