Data from: Demographic history inferred from an inversion-rich spruce bark beetle genome

Zielinski, Piotr 1 ; Morales-García, Julia1; Schebeck, Martin2; Duduman, Mihai Leonard3; Nadachowska-Brzyska, Krystyna1

Published Feb 25, 2026 on Dryad. https://doi.org/10.5061/dryad.3ffbg7b01

Data files

Feb 25, 2026 version files 4.95 GB

BED_files.zip

768.43 KB
Fastsimcoal_files.zip

735.13 KB
PSMC_pipeline

22.33 KB
Raw_data_processing.zip

20.07 KB
README.md

9.59 KB
SFS_pipeline

11.05 KB
VCF.tar

4.95 GB

Abstract

The demographic history of species inferred from whole-genome data provides quantitative insights into key biological parameters such as population size changes and divergence times. Reliable estimates often require data that have not been affected by selection. Extensive research, however, indicates that many species harbour multiple polymorphic chromosomal inversions, which often evolve under different selective pressures. Consequently, inversions can influence genome-wide patterns of variation and subsequent evolutionary inferences. In this study, we used genome-wide data from over 300 spruce bark beetle (Ips typographus) individuals from 23 populations across Europe to reconstruct their demographic history and to investigate the impact of a complex polymorphic inversion landscape (covering approximately 28% of the beetle genome) on demographic inference. We used two complementary methods, Pairwise Sequential Markovian Coalescent (PSMC) and Site Frequency Spectrum (SFS)-based modelling, and revealed a Late Pleistocene divergence (~79 kya) between populations from the southern and northern parts of the species’ European range, and a long-term effective population size of ~250,000. The southern group underwent significant population expansion after this divergence event, whereas the northern group expanded during the Holocene (~7 kya). Recent population size estimates suggest that the southern group is twice as large as the northern group. Neglecting the presence of chromosomal inversions did not significantly affect the model selection procedure and resulted in relatively small biases in the estimated demographic parameters. This study provides information on the historical population dynamics of the spruce bark beetle and improves our understanding of the influence of a complex genomic architecture on the inference of evolutionary history.

Dataset DOI: 10.5061/dryad.3ffbg7b01

Description of the data and file structure

Associated publication:
Demographic history inferred from an inversion-rich spruce bark beetle genome

Raw sequencing data availability:
Raw FASTQ files are deposited in the NCBI Sequence Read Archive (SRA) and are available under BioProject ID PRJNA1013983.

General information

This repository contains data files and analysis pipelines used for population genomic and demographic analyses of Ips typographus. The materials include workflows for raw sequencing data processing, variant calling, site frequency spectrum (SFS) construction, and demographic inference using both SFS-based (fastsimcoal2) and Pairwise Sequentially Markovian Coalescent (PSMC) based approaches.

Processed variant call data (VCF format) and scripts are provided to facilitate transparency, reproducibility, and reuse of the analyses presented in the associated manuscript.

Reference genome information

Sequencing reads were mapped to the Ips typographus reference genome assembly:

Assembly accession: GCA_016097725.1
Genome size: 236.8 Mb

Analyses were restricted to 25 autosomal contigs listed in: ALL_PSMC_Contigs.txt (see below).

Repository file inventory

This repository contains the following files and folders:

Raw_data_processing.zip

VCF.tar

BED_files.zip

SFS_pipeline/

Fastsimcoal_files.zip

PSMC_pipeline/

File/folder content description

Raw_data_processing.zip

Pipeline and scripts for parallelised preprocessing (each contains separate README file) of raw sequencing data prior to downstream analyses. The workflows are designed for execution in high-performance computing (HPC) environments. To view and edit all files within subfolders one can use any plain text editor i.e.: Notepad++, nano.

Directory contents:

Raw_data_processing_pipeline - main wrapper pipeline coordinating successive preprocessing steps
parallel_trimming/ - parallel adapter trimming and quality filtering of raw sequencing reads
parallel_sorting/ - parallel sorting and indexing of alignment files
parallel_duplicate_removal/ - parallel removal or marking of PCR duplicates
parallel_coverage/ - calculation of genome-wide or regional sequencing coverage
parallel_snp_calling/ - parallel SNP calling from processed alignment files
parallel_genotyping/ - genotyping of variants across individuals
parallel_combine_variants/ - merging and combining variant call files into final datasets

Input: Raw sequencing reads (FASTQ files from NCBI SRA).
Output: Processed BAM & VCF files.

Software requirements: FastQC; Trimmomatic, Bowtie2, samtools/bcftools, Picard, GATK;

VCF.tar

Compressed Variant Call Format (VCF) file and corresponding index file (.tbi extension) used for population genomic and demographic analyses. VCF file structure is according to the VCF v4.2 specification. File represents variation for the whole genome dataset and can be furter filtered using bed files to provide subsets of the data used in the publication for downstream analyses. This file resulted from fastq files processing, mapping and variant calling and was used for SFS construction and population genetic analyses. File with the .tbi extension is tabix index file associated with compressed Variant Call Format file. A .tbi file contains indexing information that allows software tools to rapidly access specific genomic regions within a compressed VCF file without reading the entire file sequentially. To view and edit the files one can use any plain text editor i.e.: Notepad++, nano. For further bioinformatic analyses we reccommend the following software: bedtools, bcftools, GATK, easySFS.

BED_files.zip

This archive contains files defining genomic regions used for filtering, masking, and partitioning the genome, especially for defining genomic regions by including / excluding inversions. To view and edit the files one can use any plain text editor i.e.: Notepad++, nano. For further bioinformatic analyses we reccommend the following software: bedtools, bcftools, GATK.

Contents:

ALL_PSMC_Contigs.txt - plain text file, contains list of contigs included in PSMC analyses
PSMC_INVERSIONS.bed - plain text file in bed format, contains genomic coordinates of inversion regions used for filtering
ALL_25_PSMC_contigs.bed - plain text file in bed format, file defining contigs used for whole genome dataset
ALL_25_PSMC_contigs_No_inversions.bed - plain text file in bed format, file defining genomic coordinates of 25 contigs with inversion removed
ALL_25_PSMC_contigs_No_inversions_no_genes_no_repeats.bed - plain text file in bed format, file defining genomic coordinates of 25 contigs excluding inversions, genes, and repeats
ALL_25_PSMC_contigs_Inversions_no_genes_no_repeats.bed - plain text file in bed format, file defining genomic coordinates of 25 contigs including only inverted regions, excluding genes and repetitive elements
Small_colinear.bed - plain text file in bed format, file defining genomic coordinates of regions representing small colinear segments in the associated publication
Small_colinear_sorted.bed - plain text file in bed format, file defining genomic coordinates of small colinear segments in the associated publication, sorted version of Small_colinear.bed
ALL_25_PSMC_contigs_SMALL_no_genes_no_repeats.bed - plain text file in bed format, file defining genomic coordinates of small colinear segments in the associated publication, excluding genes and repeats

SFS_pipeline

Pipeline for constructing observed site frequency spectra (SFS) from data (plain text files with .obs extension). To view and edit the file one can use any plain text editor i.e.: Notepad++, nano. For further bioinformatic analyses we reccommend using the following software: bedtools, bcftools, GATK, easySFS.

Observed SFS (.obs) files represent joint minor allele frequency spectra where:

rows and columns represent allele frequency bins across analysed populations
each matrix cell represents the number of SNPs observed in a given frequency class

Units: number of SNPs per frequency bin.

Two population groups were defined for demographic inference:

Northern population (Nor) – individuals originating from Sweden, Norway and Finland

Southern population (Sou) – individuals originating from Italy, Austria, Germany and Czech Republic

Fastsimcoal_files.zip

Input files for demographic inference using fastsimcoal2, organized in folders by genomic partitions. Used to test demographic models under different genomic partitions to assess effects of inversions and genome subsets. To view and edit the file one can use any plain text editor i.e.: Notepad++, nano. For further bioinformatic analyses we recommend using the following software: fastsimcoal2.

Top-level directories and contents:

whole-genome/ - contains files used for full-genome analyses
inversions-only/ - contains files used for analyses restricted to inverted regions
no-inversions/ - contains files used for analyses excluding inversions
no-inversions-small/ - contains files used for analyses of smaller subset of non-inverted regions

Each folder contains:

observed site frequency spectrum files (in two different file formats) required by fastsimcoal2 to run a given demographic model. Naming convention for the files (for example NorSouC25INV_jointMAFpop1_0.obs NorSouC25INV_MSFS.obs) is:
- Nor - northern population
- Sou - southern population
- C25 - dataset restricted to 25 contigs used in demographic analyses
- INV - genomic partition including just inversion regions
Demographic model directories corresponding to different demographic scenarios: IM/, IMDE/, ISO/, ISODE/, SC/, SCDE/.
- IM - isolation with constant migration model
- IMDE - isolation with constant migration model and single, instant demographic event (expansion or contraction)
- ISO - isolation model
- ISODE - isolation model and single, instant demographic event (expansion or contraction)
- SC - secondary contact model
- SCDE - secondary contact model and single, instant demographic event (expansion or contraction)
Each model subdirectory contains steering files required by fastsimcoal2 to run simulations
- .tpl - plain text file with template defining the demographic model
- .est - plain text file with starting parameters for the model

PSMC_pipeline

Description:
Pipeline for demographic inference using Pairwise Sequentially Markovian Coalescent (PSMC). To view and edit the file one can use any plain text editor i.e.: Notepad++, nano. For further bioinformatic analyses we reccommend using the following software: bedtools, bcftools, GATK, psmc.

Notes on reuse

File paths in scripts require adaptation to local directory structures.

Contact

For questions regarding the dataset or analysis pipelines, please contact the corresponding author of the associated publication.

Access information

Publicly accessible locations of the raw sequencing data:

National Centre for Biotechnology Information Sequence Read Archive BioProject ID PRJNA1013983

Data from: Demographic history inferred from an inversion-rich spruce bark beetle genome

Data files

Abstract

README: Data from: Demographic history inferred from an inversion-rich spruce bark beetle genome

Description of the data and file structure

General information

File/folder content description

Raw_data_processing.zip

VCF.tar

BED_files.zip

SFS_pipeline

Fastsimcoal_files.zip

Top-level directories and contents:

Each folder contains:

PSMC_pipeline

Notes on reuse

Contact

Access information