Skip to main content

Weak population genetic structure in Eurasian spruce bark beetle over large regional scales in Sweden


Ellerstrand, Simon J. et al. (2022), Weak population genetic structure in Eurasian spruce bark beetle over large regional scales in Sweden, Dryad, Dataset,


The Eurasian spruce bark beetle, Ips typographus, is a major pest, capable of killing spruce forests during large population outbreaks. Recorded dispersal distances of individual beetles are typically within hundreds of meters or a few kilometres. However, the connectivity between populations at larger distances and longer time spans and how this is affected by the habitat is less studied, despite its importance for understanding at which distances local outbreaks may spread. Previous population genetic studies in I. typographus typically used low resolution markers. Here, we use genome-wide data to assess population structure and connectivity of I. typographus in Sweden. We used 152 individuals from 19 population samples, distributed over 830 km from Strömsund (63º 46' 8'' N) in the north to Nyteboda (56º 8' 50'' N) in the south, to capture processes at a large regional scale, and a transect sampling design adjacent to a recent outbreak to capture processes at a smaller scale (76 km). Using restriction site-associated DNA sequencing (RADseq) markers capturing 1409-1997 SNPs throughout the genome, we document a weak genetic structure over the large scale, potentially indicative of high connectivity with extensive gene flow. No differentiation was detected at the smaller scale. We find indications of isolation-by-distance both for relative (FST) and absolute divergence (Dxy). The two northernmost populations are most differentiated from the remaining populations, and diverge in parallel to the southern populations for a set of outlier loci. In conclusion, the population structure of I. typographus in Sweden is weak, suggesting a high capacity to disperse and establish outbreak populations in new territories.


Material and sampling

The I. typographus specimens were captured at local stations that were part of the Swedish Forest Agency and Södra (Sweden’s largest forest owner association) networks. The stations were distributed throughout Sweden, and samples were gathered during the peak activity period in the spring of 2019 (Figure 1A). The large-scale sampling spanned 19 main locations, ranging from Strömsund in the north to Nyteboda in the south (Table S1). These two locations are ca. 830 km apart. As new attacks by spruce bark beetles have typically been suggested to be within 500 meters of previous attacks (Wichmann and Ravn 2001), we additionally designed a transect to resolve population structure at a more local scale. Both transects started at a recent outbreak locality, Nyteboda forest, and stretched towards the north and the south, respectively. The north-south direction was chosen to reflect the main direction in the larger data set. The trap intervals started at 200 m and were doubled for each trap, resulting in a series of distances ranging between 200 m up to 13 km between each pair of traps in the south and 26 km in the north, with a cumulative maximum distance of 76 km between the southernmost and northernmost traps (Figure S1), thus an order of magnitude shorter than the larger scale.

Bark beetles were captured using Theysohn pheromone slit traps (Galko et al. 2016) with a ca. 100 m expected sampling range (Schlyter 1992), collected within a week during April 2019, and stored in 96% ethanol and frozen as they arrived to Lund University. Traps from the transect were emptied the day after the trap was set up and only living individuals were preserved at -80 °C. Traps from the larger scale swarming surveillance were emptied within a week, but differences in the time the individuals had been dead prior to being stored in ethanol implied that DNA quality could vary across stations. We sampled 10 individuals per population and transect location, but included fewer specimens in a few populations where the DNA was degraded (Table S1).

DNA extraction and library preparation

The DNA extraction protocol, the RADtag library protocol, and summaries of the RADtag library preparation have been deposited in a public repository ( DNA was extracted using the Qiagen blood and tissue kit, following the protocol suggested for insect DNA. The whole insect was homogenised with the TissueLyser for 4 min at 30 Hz. The final DNA elution was done in 60 µl buffer EB. We monitored DNA concentration with a NanoDrop, and performed Qubit analyses on a subset of the samples (14 out of 634 extracted samples; 4 of these samples were among the 320 that were sequenced) as nanodrop quantification may inflate the estimated concentration. To further assess DNA-quality, all extractions were run on agarose gels to estimate fragment size. Samples with concentrations below ~13 ng/µl or smeared profiles on the gel, indicative of decomposed DNA, were omitted from further analyses.

For the two transects, two single-digest RADtag libraries were created from 160 samples per library. All samples from each transect were pooled in the same library to avoid any cross-library batch effects. Of the 320 samples, 13 constituted individual duplicates originating from the same extract, and were split into two different tubes and treated as separate samples before library preparation to enable us to assess the repeatability of the method. From each sample, 500 ng DNA was used for restriction digest with the restriction enzyme SbfI from New England Biolabs. The DNA was sheared to allow for identification and removal of PCR duplicates. Individual barcoding was performed with 40 P1 inline barcodes of 8 bp differing by at least three sequence positions, combined with four P2 index barcodes of 8 bp differing by at least six sequence positions. Both libraries were amplified with a 16 cycle PCR. We performed size-selection using SPRI beads throughout the library preparation, with a final size selection following the PCR step. The bulk of the final library consisted of fragments in the size range of 380-730 bp. The RADtag libraries were sequenced on separate lanes on an SP flow cell as paired-end 150 bp reads on an Illumina NovaSeq 6000 platform by National Genomics Infrastructure – Stockholm, Sweden (NGI Stockholm,

RADseq data filtering and variant calling

All the code for RADseq data filtering, variant calling, and population genomic analyses have been deposited in a public repository ( The RADseq data were demultiplexed by the NGI sequencing facility to sublibraries based on the index barcodes on the P2 adapter. These libraries were then demultiplexed with the bioinformatics pipeline Stacks 2.53 (Catchen et al. 2013; Catchen et al. 2011; Rochette et al. 2019), rescuing barcodes with a maximum of one mismatch, and filtering adapter sequences allowing for two mismatches. PCR duplicates were then removed based on the randomly sheared paired-end reads, and adapters were removed using Trimmomatic 0.36 (Bolger et al. 2014), by providing custom adapter lists. Read quality was evaluated with FastQC 0.11.8 ( The reads were aligned to a concatenated reference genome consisting of the nuclear genome assembled by Powell et al. (2021) and the mitochondrion assembled by Lv et al. (2017) using BWA-MEM 0.7.17 (H. Li and Durbin 2009). After alignment, the bam files were sorted by coordinates with SAMtools 1.10 (H. Li et al. 2009) and only the single-end reads were extracted for variant calling.

Variants were called with  HaplotypeCaller from GATK (McKenna et al. 2010) using a minimum base quality of 20 from non-soft clipped bases. To correct for low levels of sequencing errors, batch effects and cross-contamination, we filtered out minor alleles representing less than 10% of the reads per individual and site. The resulting gVCF files were combined with CombineGVCFs and genotypes were called using GenotypeGVCFs on properly paired reads with a minimum phred-scaled confidence of 20, and then SNPs were extracted. The assembled reference genome contains a large portion of repetitive sequences and low complexity regions, which were hard masked in the assembly (Powell et al. 2021). VCFtools 0.1.16 (Danecek et al. 2011) was used to filter out the hard masked regions. Briefly, we filtered SNPs according to GATK hard filtering practice, and for a minimum quality of 30, minimum read depth of 5, as well as additional filters following O’Leary et al. (2018), which are specified in Table S3. Samples with low coverage, high missingness, or evidence of severe cross-contamination were removed from the dataset as follows: genotypes with a depth below 30, sites with a mean depth less than 48, a mean depth higher than 52, sites present in less than 80% of the individuals were filtered out.

The error rate of the dataset was evaluated for seven duplicates samples using BCFtools gtcheck with genotype likelihoods and used to examine the error rate, and we determined filtration criteria based on this rate. Duplicate samples were then removed from the dataset. To obtain balanced sample sizes, we sampled eight individuals per sampling location, with only seven individuals in two of the transect locations, as we did not have eight individuals for these. Variants present in less than five individuals in any sampling location were removed, as well as sites present in less than 90% of the remaining individuals.

We divided the samples into one dataset containing all the regional locations and two of the transect locations, and a second data set containing the transect locations only. We kept only alleles still present at least 3 times in each new dataset. Plink 1.90b4.9 (Purcell et al. 2007) was used to linkage prune the data using 50 kb windows, step sizes of 10 kb and a cut-off at a r-squared value higher than 0.1 (see Table S3 for number of SNPs following each filter step). Due to the absence of genetic structure in the shorter, local transect dataset, only the larger, regional dataset was used and reported for all subsequent analyses. To evaluate whether the conservative filtering approach, aimed to reduce the error rates between the duplicates included to check the quality of our calls, a less stringent filtering approach was applied to the regional dataset for comparison resulting in 3398 SNPs for the linkage pruned data set. We found no differences between the two data sets (Table S4, data not shown).

Usage notes

All code used for creating this dataset can be found on the following link:


Formas, Award: 2018-01061