Data from: Evaluating inbreeding and assessing the risk of outbreeding depression in genetic rescue of the endangered marsh fritillary (Euphydryas aurinia)
Data files
Dec 26, 2025 version files 48.15 GB
-
EupAur.fa
735.41 MB
-
Marsh_filtered.sansGlanville.minGQ30.minDP10.maxDP70.miss1.HWE14.LD-pruned.recode.vcf.gz
299.32 MB
-
Marsh_filtered.sansGlanville.minGQ30.minDP10.maxDP70.miss1.HWE14.MappabilityMask.RepeatMask.ann.vcf.gz
1.74 GB
-
Marsh_filtered.sansGlanville.minGQ30.minDP10.maxDP70.miss1.HWE14.phased.vcf.gz
154.18 MB
-
Marsh_filtered.sansGlanville.minGQ30.minDP10.maxDP70.miss1.HWE14.vcf.gz
2.33 GB
-
Marsh_raw.vcf.gz
42.90 GB
-
popmap.txt
1.50 KB
-
README.md
2.86 KB
Abstract
Restoring genetic diversity through assisted migration is increasingly recognized as a crucial strategy to counteract inbreeding depression and boost genetic variation in small and fragmented populations, yet concerns about outbreeding depression often hinder its application. This dataset contains whole-genome variant data generated to evaluate inbreeding patterns and inform assessments of outbreeding depression risk in the marsh fritillary (Euphydryas aurinia). It includes multiple SNP datasets representing successive stages of processing: raw variant calls, a quality-filtered dataset, a phased dataset for selection scans, an LD-pruned dataset suitable for population-structure analyses, and a dataset specifically filtered for runs-of-homozygosity (ROH) inference. We identify substantial inbreeding in the investigated populations (FROH approaching 40%), historical gene flow, recent divergence times between populations, and a low likelihood of local adaptation. Together, these results provide an example where genetic rescue could be undertaken successfully by transplanting individuals across populations with minimal outbreeding depression risks.
Dataset DOI: 10.5061/dryad.8931zcs50
Description of the data and file structure
This dataset contains the processed genomic datasets, variant call files, sample metadata, and the curated reference genome used in the paper “Evaluating inbreeding and assessing the risk of outbreeding depression in genetic rescue using whole-genome sequence data” by Tengstedt et al. The study evaluates inbreeding patterns and informs assessments of outbreeding depression risk in the marsh fritillary (Euphydryas aurinia). All data derive from whole-genome resequencing of individuals sampled in Denmark under appropriate collection permits. The dataset is intended to enable full reproducibility of the analyses in the associated study.
Files and variables
Curated reference genome
EupAur.fa: The final curated reference genome assembly (NCBI BioProject PRJNA1026358) used for read mapping and variant calling.
Raw SNP calls
Marsh_raw.vcf.gz: Variant calls produced from mapped Illumina reads using SAMtools/BCFtools. These files represent the unfiltered SNP dataset prior to quality control. Note that this file contains four Glanville fritillary individuals, which were only filtered out in the subsequent filtering steps.
Quality-filtered SNP dataset
Marsh_filtered.sansGlanville.minGQ30.minDP10.maxDP70.miss1.HWE14.vcf.gz: A VCF file containing variants retained after filtering on genotype quality, depth, missingness, ss, and Hardy-Weinberg equilibrium. This file is the primary dataset utilized for the various analyses described in our paper.
LD-pruned dataset
Marsh_filtered.sansGlanville.minGQ30.minDP10.maxDP70.miss1.HWE14.LD-pruned.recode.vcf.gz: A reduced SNP set generated through linkage disequilibrium pruning with PLINK. This file is suitable for population structure inference and comparative analyses requiring unlinked variants.
Phased dataset
Marsh_filtered.sansGlanville.minGQ30.minDP10.maxDP70.miss1.HWE14.phased.vcf.gz: A statistically phased version of the quality-filtered SNP dataset. This file is used for selection scans.
OH-filtered dataset
Marsh_filtered.sansGlanville.minGQ30.minDP10.maxDP70.miss1.HWE14.MappabilityMask.RepeatMask.ann.vcf.gz: A SNP dataset filtered specifically for detecting runs of homozygosity (ROH), with filtering parameters optimized to retain only the most reliable variant calls.
Sample metadata
popmap.txt: A tab-delimited file listing individual sample IDs and their corresponding population.
Code/software
The VCF files can be opened and analyzed using various bioinformatics tools and software; some of the most common include BCFtools and VCFtools.
