Structural genomic variation in the inbred Scandinavian wolf population contributes to the realized genetic load but is positively affected by immigration
Cite this dataset
Smeds, Linnéa; Huson, Lars S A; Ellegren, Hans (2024). Structural genomic variation in the inbred Scandinavian wolf population contributes to the realized genetic load but is positively affected by immigration [Dataset]. Dryad. https://doi.org/10.5061/dryad.12jm63z57
Abstract
When populations decrease in size and may become isolated, genomic erosion by loss of diversity from genetic drift and accumulation of deleterious mutations is likely an inevitable consequence. In such cases, immigration (genetic rescue) is necessary to restore levels of genetic diversity and counteract inbreeding depression. Recent work in conservation genomics have studied these processes focusing on genetic diversity of single nucleotide polymorphisms. In contrast, our knowledge about structural genomic variation (insertions, deletions, duplications and inversions) in endangered species is limited. We analysed whole-genome, short-read sequences from 212 wolves from the inbred Scandinavian population, and from neighbouring populations in Finland and Russia, and detected >35,000 structural variants (SVs) after stringent quality and genotype frequency filtering; >26,000 high-confidence variants remained after manual curation. The majority of variants were shorter than 1 kb, with a distinct peak in the length distribution of deletions at 190 bp, corresponding to insertion events of SINE/tRNA-Lys elements. The site frequency spectrum of SVs in protein-coding regions was significantly shifted towards rare alleles compared to putatively neutral variants, consistent with purifying selection. The realized genetic load of SVs in protein-coding regions increased with inbreeding levels in the Scandinavian population, but immigration provided a genetic rescue effect by lowering the load and reintroducing ancestral alleles at loci fixed for derived SVs. Our study shows that structural variation comprises a common type of in part deleterious mutations in endangered species and that establishing gene flow is necessary to mitigate the negative consequences of loss of diversity.
README: Structural genomic variation in the inbred Scandinavian wolf population contributes to the realized genetic load but is positively affected by immigration
Link to paper: https://onlinelibrary.wiley.com/doi/10.1111/eva.13652
Citation: Smeds, L., Huson, L.S.A. & Ellegren, H. (2024). Evolutionary Applications, 17:e13652.
This datasets contain genotypes for structural variants for Scandinavian, Finnish and Russian wolves, both raw variants from the Smoove pipeline, and the final set of variants after quality filtering, genotype frequency filtering and manual curation.
Description of the data and file structure
The genotypes are saved in vcf format, and the file structure and names are identical to what's described in the GitHub code.
Genotypes for 212 wolves generated by the Smoove pipeline: data/smove/annotated/cohort.smoove.square.anno.vcf.gz
Final variants divided into types (DEL=deletions, DUP=duplications, INV=inversions), quality filtered and manually inspected by 1 curator: data/curated/1cur.strict..*vcf.gz
Metadata for all 212 wolves are found in: helpfiles/metadata.txt
Description of metadata columns:
Column header | Description |
---|---|
UU_ID | Internal sample name, used in the vcf files |
ENA_ID | Sample name used in the raw data at ENA, European Nucleotide, Archive, |
Category | Sample origin (country) for Finnish and Russian wolves, and either sample year category or immigrant category for Scandinavian wolves. Sample year categories ("1983-1990", "1991-1998", "1999-2006", "2007-2014") are taken from Kardos et al, 2018, with the extension to the last cohort, with I meaning "descendant to immigrants" and S meaning "descendant only to original Scandinavian population". Immigrant categories are "NR_immigrants" (meaning non-reproducing immigrants) and "R_immigrants" (reproducing immigrants), or "Founders" (representing the first three founders of the population) |
SRA_acc | SRA accession number in the ENA database |
RoHClass | Percent of the genome that lies in Runs of Homozygosity - long stretches without variation which is a sign of inbreeding (this information was taken from Kardos et al, 2018, but not explicitly used in this study). The individuals were divided into groups of 0-20%, 20-30%, 30-40% or >40% RoH. For individuals without assigned RoH (samples not included in Kardos et al), "Category" value is repeated. |
GenClass | Number of generations to the closest founder or reproducing immigrant. F1 means first generation offspring to first 3 founders. L1 means first generation offspring to later founder (=reproducing immigrants). For Finnish, Russian and immigrant wolves, where this category is not applicable, "Category" value is repeated. |
sex | Sex of the sample |
Batch | Sequencing batch (sequencing was done on three different occasions between 2016 and 2019). |
Note that Founder males are not sampled but inferred from their offspring, and are not specifically used for SV analysis except in the comparison with SNPs from Smeds & Ellegren, 2023 (Figure 3 in this study). They are marked "NA" for all columns related to sequencing.
Information on outgroups is found in helpfiles/SRA_accession_outgroups.txt (with SRA accession number, and sample number, species and internal ID)
helpfiles/unrelated_individuals.txt includes internal sample names (UU_ID) for all unrelated individuals (taken from Smeds et al. 2021) used for the site frequency spectra (Figure 4).
helpfiles/bamlocations.json (paths to cram files) and helpfiles/contigs.json (name of autosomal chromosomes) were used by the snakemake Smoove pipeline.
Sharing/Access information
The data is based on raw data from four public studies: PRJEB20635, PRJEB28342, PRJEB38198 and PRJEB44869, and can be downloaded from https://www.ebi.ac.uk/ena/browser/home.
Code/Software
The code used to produce the data, the metadata and the code to produce all figures and results from the data, are found on github: https://github.com/linneas/wolf-structural
Funding
Knut and Alice Wallenberg Foundation, Award: 2014‐0044
Swedish Research Council, Award: 2013-08271