Data from: Demographic history and inbreeding in two declining sea duck species inferred from whole genome sequence data
Data files
Jul 20, 2024 version files 91.68 GB
-
LTD_allSites.max2.minDP10.minGQ15.HWE.vcf.gz
29.28 GB
-
LTD_raw.vcf.gz
5.51 GB
-
LTD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.mac2.MappabilityMask.100biggest.vcf
895.52 MB
-
LTD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.mac2.vcf
7.20 GB
-
LTD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.vcf
15.24 GB
-
README.md
2.64 KB
-
VSD_allSites.max2.minDP10.minGQ15.HWE.vcf.gz
27.22 GB
-
VSD_raw.vcf.gz
1.86 GB
-
VSD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.mac2.MappabilityMask.100biggest.vcf
700.44 MB
-
VSD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.mac2.vcf
1.59 GB
-
VSD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.vcf
2.19 GB
Abstract
Anthropogenic impact has transitioned from threatening already rare species to causing significant declines in once numerous organisms. Long-tailed duck (Clangula hyemalis) and velvet scoter (Melanitta fusca) were once important quarry sea duck species in NW Europe, but recent declines resulted in their reclassification as Vulnerable on the IUCN Red List. We sequenced and assembled genomes for both species and resequenced 15 individuals of each. Using analyses based on site frequency spectra and sequential Markovian coalescence, we found long-tailed duck to show more historical demographic stability, whereas velvet scoter was affected particularly by the Last (Weichselian) Glaciation. This likely reflects long-tailed duck breeding continuously across the Arctic, with cycles of glaciation primarily shifting breeding areas south or north without major population declines, whereas the more restricted southern range of velvet scoter would lead to significant range contraction during glaciations. Both species showed evidence of declines over the past thousand years, potentially reflecting anthropogenic pressures with the recent decline indicating an accelerated process. Analysis of Runs of Homozygosity (ROH) showed low but non-trivial inbreeding, with FROH from 0.012 to 0.063 in long-tailed duck and ranged from 0 to 0.047 in velvet scoter. Lengths of ROH suggested that this was due to ongoing background inbreeding rather than recent declines. Overall, despite demographically important declines this has not yet led to strong inbreeding and genetic erosion, and the most pressing conservation concern may be the risk of density dependent (Allee) effects. We recommend monitoring of inbreeding using ROH analysis as a cost-efficient method to track future developments to support effective conservation of these species.
https://doi.org/10.5061/dryad.w3r22810z
Description of the data and file structure
The data encompasses multiple VCF files generated from whole-genome resequencing of two sea duck species, the velvet scoter (Melanitta fusca) and long-tailed duck (Clangula hyemalis).
1A) LTD_raw.vcf.gz
Raw variant calls from WGS data from long-tailed duck.
2A) LTD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.vcf
Variant calls from file 1A) filtered based on quality, depth, missing data and HWE. Contains only bi-allelic SNPs of QUAL>30, only genotypes with read depth >10 and genotype quality >15, and only sites in Hardy-Weinberg equilibrium, located on autosomal scaffolds and with <10% missing data and pooled read depth >250 and <470.
3A) LTD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.mac2.vcf
Variant calls from file 2A) furthermore filtered to exclude singletons.
4A) LTD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.mac2.MappabilityMask.100biggest.vcf
Variant calls from file 3A) furthermore filtered to retain only sites with high mappability located on the 100 biggest scaffolds.
5A) LTD_allSites.max2.minDP10.minGQ15.HWE.vcf.gz
File containing both variant and invariant sites filtered based quality, depth, and HWE, as described for file 2A. No filtering for missing data.
1B) VSD_raw.vcf.gz
Raw variant calls from WGS data from velvet scoter.
2B) VSD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.vcf
Variant calls from file 1B) filtered based on quality, depth, missing data and HWE. Contains only bi-allelic SNPs of QUAL>30, only genotypes with read depth >10 and genotype quality >15, and only sites in Hardy-Weinberg equilibrium, located on autosomal scaffolds and with <10% missing data and pooled read depth >200 and <420.
3B) VSD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.mac2.vcf
Variant calls from file 2B) furthermore filtered to exclude singletons.
4B) VSD.filtered.max2.QUAL30.minDP10.minGQ15.miss0.9.variant.sansSexChr.HWE.mac2.MappabilityMask.100biggest.vcf
Variant calls from file 3B) furthermore filtered to retain only sites with high mappability located on the 100 biggest scaffolds.
5B) VSD_allSites.max2.minDP10.minGQ15.HWE.vcf.gz
File containing both variant and invariant sites filtered based quality, depth, and HWE, as described for file 2B. No filtering for missing data.
A. Sampling
A total of 16 individuals were sampled of each species in overwintering areas in Denmark, in each case encompassing 13 males and three females. The sampled individuals were obtained in Denmark from Danish hunters between December 2017 and January 2019, during the open hunting seasons as part of a separate study run by the Danish Hunters’ Association and Aarhus University on the distribution and feeding ecology of C. hyemalis and M. fusca. Individuals were stored at -18°C and kept frozen until samples of muscle tissue were collected. Tissue samples were then stored in 96% ethanol at -18°C until DNA extraction.
B. DNA extraction and whole genome resequencing
DNA was extracted from 15 individuals of each species using the E.Z.N.A.® Tissue DNA Kit (OMEGA, Bio-tek, CA, USA) following the manufacturer's recommendations. Whole genome resequencing was outsourced to BGI and involved 150 bp paired-end sequencing on the DNBseq platform, aimed at providing sequencing depths of 20X.
C. Mapping and SNP calling
Raw reads were trimmed with Sickle v1.33 (Joshi & Fass 2011) and then mapped to their corresponding reference genomes (GenBank accession: C. hyemalis: GCA_029619115.1 and M. fusca: GCA_029620185.1) using BWA-mem v0.7.17 (Li & Durbin 2009). The software Picard tools v2.26.3 (Broad Institute 2019) was then used to sort, mark, and remove duplicated reads in the alignments prior to genotyping calls.
SNP calling was conducted using the mpileup and call functions implemented in BCFtools v1.13 (Danecek et al. 2021) to obtain sample-specific SNPs with a minimum mapping quality of 30. Subsequently, the dataset was filtered using VCFutils.pl and VCFtools v0.1.16 (Danecek et al. 2011) to remove sites with extremely high or low depth (thresholds determined visually from depth distribution of SNPs; Supplementary Material Fig. S1), indels and monomorphic sites. We changed individual genotypes with low read depth (<10) or low quality (<15) to missing and filtered to retain only biallelic SNPs with a QUAL score ≥ 30 and <10% missing genotypes. Finally, we discarded any SNPs not in Hardy-Weinberg equilibrium using the R script VCF_HWF (https://github.com/shenglin-liu/VCF_HWF/blob/main/VCF_HWF.r).
Additionally, we performed SNP calling with the -v parameter in BCFtools call disabled to produce an ‘all sites’ data set containing both polymorphic and monomorphic sites. The data set was subsequently filtered to exclude indels, sites with extreme depth or low genotype quality and SNPs out of HWE as described previously. No filtering for missing data was performed.