Data from: Genomic footprints of hybridization in North Atlantic eels (Anguilla anguilla and A. rostrata)
Data files
Feb 06, 2025 version files 149.22 GB
-
allSites.chr01-19.vcf.gz
93 GB
-
chr01-19.filtered.ann.mac3.max2.miss1.LD-thinned.recode.vcf.gz
168.10 MB
-
chr01-19.filtered.ann.mac3.max2.miss1.phased.vcf.gz
663.84 MB
-
chr01-19.filtered.ann.mac3.max2.miss1.recode.vcf.gz
5.07 GB
-
chr01-19.vcf.gz
50.32 GB
-
README.md
1.37 KB
Abstract
Understanding interspecific introgressive hybridization and the biological significance of introgressed variation remains an important goal in population genomics. European (Anguilla anguilla) and American eel (A. rostrata) represent a remarkable case of hybridization. Both are panmictic and spawn in partial sympatry in the Sargasso Sea, occasionally producing viable, fertile hybrids, primarily found in Iceland. We studied introgressive hybridization from American into European eel using whole-genome sequences of 78 individuals, including European, American, and 21 putative hybrid eels. Previous studies using few genetic markers could not resolve whether hybridization involved simple unidirectional backcrossing or a more complex hybrid swarm. However, local ancestry inference along individual chromosomes revealed that Icelandic hybrids were primarily F1 or first-generation backcrosses toward European eel, with some showing more complex backcrossing. All European eels outside Iceland contained short chromosomal blocks from American eel, indicating a porous genome. We found no evidence for previously hypothesized geographical gradients of introgression in European eel outside Iceland. Several chromosomal regions showed high interspecific divergence, but haplotype blocks introgressed from American eel were identified both within and outside these regions. There was little correspondence between regions of high relative (FST) and absolute divergence (dXY), with the former reflecting selective sweeps within species or reduced recombination rather than barrier loci. A single genomic region showed evidence of repeated introgression from American into European eel under positive selection in both species. The study illustrates that species can maintain genetic integrity despite porous genomes, and that standing variation in one species can potentially be available for future adaptive responses in the other species.
https://doi.org/10.5061/dryad.x0k6djhvh
Description of the data and file structure
The data encompasses multiple VCF files generated from whole-genome resequencing of European eels (Anguilla anguilla), American eels (A. rostrata) and their hybrids.
Files and variables
1) chr01-19.vcf.gz
Raw variant calls.
2) chr01-19.filtered.ann.mac3.max2.miss1.recode.vcf.gz
Variant calls from file 1) filtered to retain only biallelic SNPs with minimum variant quality of 20, minor allele count ≥3, no missing data, and pooled read depth between 900 and 1500 across all individuals were kept.
3) chr01-19.filtered.ann.mac3.max2.miss1.phased.vcf.gz
Variant calls from file 2) with phase information obtained by statistical phasing with SHAPEIT.
4) chr01-19.filtered.ann.mac3.max2.miss1.LD-thinned.recode.vcf.gz
A thinned version of file 3) containing only no more than one SNP per 1,000 bp.
5) allSites.chr01-19.vcf.gz
File containing both variant and invariant sites filtered as described for file 2. No filtering for missing data performed.
Code/software
Scripts are available at https://github.com/atengstedt/Eel_hybrids.
Sampling
We analyzed a total of 50 European eels (of which one later turned out to be a hybrid), 7 American eels, and 21 individuals of admixed ancestry. We note that the admixed individuals do not represent a random selection of hybrids; 10 individuals had beforehand been identified as F1 hybrids and 11 as different types of backcrosses: bAA (first-generation backcross to European eel, N = 3), bAAxAA (second-generation backcross to European eel, N = 3) and bAR x AA (second-generation backcross, with backcrossing first to American eel and subsequently to European eel), N = 5). This initial hybrid identification was based on 68 species-diagnostic SNPs and analyses using Structure 2.3.4 (Falush et al., 2003; Pritchard et al., 2000) and NewHybrids 1.1 (Anderson, E. C. & Thompson, 2002) as detailed in Pujolar et al. (2014a). As demonstrated in this paper using both simulated and empirical data there is high power for distinguishing “pure” American and European eel from each other and from recently admixed individuals.
The analyzed individuals were collected between 2001 and 2017 using electrofishing or net fishing and encompassed both juvenile glass eels and adult eels within the continental ranges of American and European eel (see Fig. 1). Two eel larvae collected by ring net in the Sargasso Sea were also included. For individual sampling details, see Table S1. A total of 22 individuals, including 17 hybrids and 5 American eels, were sequenced for the present study. The remaining 56 individuals, encompassing primarily European eels, had previously been analyzed using whole-genome sequencing for the purpose of analyzing speciation between European and American eel and genome-wide methylation in European eel (Liu, S. et al., 2022; Nikolic et al., 2020).
Mapping and variant calling
Genomic DNA was extracted using a standard phenol-chloroform extraction or E.Z.N.A. purification columns (Omega Bio-Tek, Norcross, Georgia, USA). Whole-genome sequencing was outsourced to BGI (Beijing Genomics Institute, Hongkong, China) (nine individuals) and NOVOGENE (Hong Kong, China) (all other individuals). Sequencing libraries were constructed using the NEBNext® DNA Library Prep Kit (New England Biolabs, MA, USA). Genomic DNA was randomly fragmented to a size of 350bp by shearing, and PCR amplification was conducted. Paired-end Illumina sequencing was conducted using the Illumina HiSeq 2500 platform with a read length of 150 bp and aimed for a coverage of ~20x.
Using BWA MEM v.0.7.17 (Li, 2013; Li & Durbin, 2009a) with default parameter settings, the reads were mapped to a recent chromosome level European eel genome assembly (Rhie et al., 2021) (GenBank accession: GCA_013347855.1). Mapped reads were converted to BAM files, sorted and indexed using SAMtools v.1.9 (Li et al., 2009b). A VCF file of SNPs encompassing all 78 individuals was generated from the BAM files with BCFtools v.1.9 (Li, 2011; Li et al., 2009b) using a minimum mapping quality threshold of 20. Initial filtering of the SNPs was performed using VCFutils.pl (Li et al., 2009b) and VCFtools v.0.1.16 (Danecek et al., 2011). Only biallelic SNPs with minimum variant quality of 20, minor allele count ≥3, no missing data, and pooled read depth between 900 and 1500 across all individuals were kept. The thresholds were determined based on inspection of the SNP depth distribution (Fig. S1). SNPs on non-anchored scaffolds were discarded.
We furthermore produced an ‘all sites’ data set by using BCFtools call without the --variants-only input option. Similarly to the ‘variant sites’ data set, we filtered the VCF to discard indels and retain only sites with minimum mapping quality of 20 and pooled read depth between 900 and 1500 across all individuals. No filtering for missing data and minor allele frequency was performed.
LD-pruning
The ‘variant sites’ data set was thinned using VCFtools, so that no two SNPs are within >1,000 bp from one another - a distance at which linkage disequilibrium (LD) is virtually absent in the species (Jacobsen et al., 2014a).
Statistical phasing
For the purpose of inferring local ancestry along chromosomes and scanning for selective sweeps, we created a phased data set for the reference populations. The previously produced VCF file was split between hybrids and non-hybrids. The former file was not phased, as statistical phasing introduced a large number of switch errors (results not shown, but see e.g. Smeds et al. (2021)). Statistical phasing was performed using SHAPEIT v.2 (r900) (Delaneau et al., 2013). To increase accuracy, the number of conditioning states was increased from 100 states per SNP (default) to 200 states. Each chromosome was phased individually, and no recombination map was specified. The American and European eels were phased simultaneously and the phased VCF file was subsequently subset by species.
