Data from: Combined evidence reveals the origin of a rapid range expansion despite retained genetic diversity and a weak founder effect
Data files
Sep 03, 2025 version files 95.48 MB
-
1_Migratory_direction_input_files.zip
10.36 KB
-
2A_STACKS_pipeline_and_reference_alignment_input_files.zip
5.42 KB
-
2B_Batch_effect_checks_input_files.zip
1.40 KB
-
2C_Preparing_the_filtered_datasets_input_files.zip
289.78 KB
-
2D_Principal_component_analysis_input_files.zip
2.17 KB
-
2F_conStruct_input_files.zip
1.17 KB
-
2G_StAMPP_input_files.zip
1.40 KB
-
2H_RangeExpansion_input_files.zip
1.56 KB
-
2I_Genetic_diversity_input_files.zip
5.94 KB
-
README.md
14.58 KB
-
RW_expansion_MolEcol_filtered_datasets.zip
95.14 MB
Abstract
Many species are currently experiencing range shifts in response to changing environmental conditions, but with potentially serious genetic consequences. Repeated founder events and strong genetic drift are expected to erode genetic variation at the range front, reducing adaptive potential and slowing or even halting the expansion. However, the severity of these consequences for the more common and highly mobile species undergoing environment-driven range shifts (c.f. invasions) is less clear. Here we combined historical observations of the common reed warbler (Acrocephalus scirpaceus) with contemporary movement data from ringing re-encounters and genomic (RAD-seq) data from across its European breeding range to (1) infer the origin and (2) quantify the genetic consequences of a recent and rapid northward range expansion. While there were no reductions in levels of nucleotide diversity or allelic richness, nor a signal of founder effect in the directionality index (ψ), our combined dataset approach was able to infer an expansion origin from the southwest. Furthermore, we found that private allelic richness retained a slight but significant linear decline along the colonisation route. These results suggest that high dispersal capabilities can allow even philopatric species to avoid the loss of genetic diversity during rapid range expansions. Nevertheless, if multiple lines of evidence enable identification of an expansion pathway, we may still detect genetic signals of expansion.
Access this dataset on Dryad, DOI: 10.5061/dryad.bcc2fqzrw
Description of the data and file structure
This repository contains the following data used in "Combined evidence reveals the origin of a rapid range expansion despite retained genetic diversity and a weak founder effect" (Bergman et al. 2025, Molecular Ecology):
- Dataset of the reed warblers (Acrocephalus scirpaceus) ringed in Finland and re-encountered in other countries (years 1969-2023, n = 326). The data is provided by the databank of the Ringing Centre at the Finnish Museum of Natural History.
- The filtered genomic reed warbler RAD-seq datasets (DS1-4) that were used in the molecular analyses of the study, in the input formats required by each analysis they were used in. The details of these datasets can be found in the original paper and its supplementary material.
- Other input files and tables used in different parts of the genomic analysis pipeline that are not produced by running the associated scripts (available at https://github.com/norabergman/Reed-warbler-range-expansion). These files are organised under numbered pipeline steps or specific analyses as detailed below, matching the numbering of scripts on GitHub.
This repository does not contain the raw sequence data, which are available on NCBI SRA (BioProject PRJNA1217894).
The reference genome for Acrocephalus scirpaceus (bAcrSci1) used in this study was published in Sætre et al. 2021 and is available on the European Nucleotide Archive (BioProject PRJEB45715).
Files and variables
File: RW_expansion_MolEcol_filtered_datasets.zip
Description: Filtered datasets used in the molecular analyses of the study, in the input formats required by each analysis they were used in. Code for running the analysis available at https://github.com/norabergman/Reed-warbler-range-expansion (sections 2D-2I). Additional input files for each analysis are available in the respective zipped folder here.
Contains files:
DS1_populations.snps.vcf.gz
- Dataset 1, bgzipped VCF (Variant Call Format) file created with STACKS populations program
DS2_populations.haps.radpainter
- Dataset 2, RADpainter input format file created with STACKS populations program
DS3_populations.snps.vcf.gz
- Dataset 3, bgzipped VCF (Variant Call Format) file created with STACKS populations program
DS3.bed
- Dataset 3, binary genotype file, part of PLINK’s binary genotype file format
DS3.bim
- Dataset 3, extended MAP file, part of PLINK’s binary genotype file format
DS3.fam
- Dataset 3, family information file, part of PLINK’s binary genotype file format
DS4_populations.structure
- Dataset 4, STRUCTURE input format file created with STACKS populations program
File: 1_Migratory_direction_input_files.zip
Description: Input files used in script "1 - Ringing data/Migratory_direction.R" (https://github.com/norabergman/Reed-warbler-range-expansion). Re-encounter data of reed warblers (Acrocephalus scirpaceus) ringed in Finland.
Contains file:
Acrsci_results_20240926.txt
- ID - row identification code
- EVENTDATE - date when recaptured (YYYY-MM-DD)
- EVENTDATE.RINGING - date when ringed (YYYY-MM-DD)
- WGS84DECIMMALLAT.RINGING - latitudinal coordinates of ringing in degrees and decimals in WGS84 (World Geodetic System 1984)
- WGS84DECIMMALLON.RINGING - longitudinal coordinates of ringing in degrees and decimals in WGS84 (World Geodetic System 1984)
- DIRECTIONTORINGINGINDEGREES - direction from ringing site to recovery site in degrees (North = 0)
- WGS84DEGREELAT - latitudinal coordinates of recovery in degrees, minutes and seconds (DDMMSS) in WGS84 (World Geodetic System 1984)
- WGS84DEGREELON - longitudinal coordinates of recovery in degrees, minutes and seconds (DDMMSS) in WGS84 (World Geodetic System 1984)
- WGS84DECIMALLAT - latitudinal coordinates of recovery in degrees and decimals in WGS84 (World Geodetic System 1984)
- WGS84DECIMALLON - longitudinal coordinates of recovery in degrees and decimals in WGS84 (World Geodetic System 1984)
File: 2A_STACKS_pipeline_and_reference_alignment_input_files.zip
Description: Input files used in script "2 - Genomics/2A_STACKS_pipeline_and_reference_alignment" (https://github.com/norabergman/Reed-warbler-range-expansion). Contents / use of each file is described in code annotation.
Contains files:
2017_barcodes
2021_plate1_barcodes
2021_plate2_barcodes
- Column 1 - barcode sequence
- Column 2 - sample ID
2021+2017_ALL_popmap
- Column 1 - sample ID
- Column 2 - sampling site code
namelist_2021+2017
- Column 1 - sample ID
File: 2B_Batch_effect_checks_input_files.zip
Description: Input files used in script "2 - Genomics/2B_Batch_effect_checks" (https://github.com/norabergman/Reed-warbler-range-expansion). Contents / use of each file is described in code annotation.
Contains files:
controls_all_popmap
- Column 1 - sample ID
- Column 2 - sequencing batch
controls_mislabeled_removed
- Column 1 - sample ID
- Column 2 - sequencing batch
controls_mislabeled_removed_popmap
- ind.names - sample ID
- pop - sequencing batch
File: 2C_Preparing_the_filtered_datasets_input_files.zip
Description: Input files used in script "2 - Genomics/2C_Preparing_the_filtered_datasets" (https://github.com/norabergman/Reed-warbler-range-expansion). Contents / use of each file is described in code annotation.
Contains files:
2021+2017_controls+Malta+SK15+RODD111007_rem_popmap
- Column 1 - sample ID
- Column 2 - sampling site code
2021+2017_controls+Malta_removed_popmap
- Column 1 - sample ID
- Column 2 - sampling site code
2021+2017_controls+Malta_removed_popmap_genlight
- ind.names - sample ID
- pop - sampling site code
25_2021+2017_max10indv_popmap
- Column 1 - sample ID
- Column 2 - sampling site code
25_2021+2017_max10indv_popmap_colnames
- ind.names - sample ID
- pop - sampling site code
25_2021_max10indv_popmap
- Column 1 - sample ID
- Column 2 - sampling site code
DS1_whitelist_thin3000_minmac3_random-snp
- SNP-specific whitelist for STACKS populations (Dataset 1), for formatting details see: https://catchenlab.life.illinois.edu/stacks/manual/
- Column 1 - catalog locus ID
- Column 2 - Location of a specific SNP within the locus (column number)
DS2_whitelist_thin3000_minmac3_H
- Locus whitelist for STACKS populations (Dataset 2), for formatting details see: https://catchenlab.life.illinois.edu/stacks/manual/
- Column 1 - catalog locus ID
DS3_whitelist_thin3000_minmac3_random-snp
- SNP-specific whitelist for STACKS populations, (Dataset 3) for formatting details see: https://catchenlab.life.illinois.edu/stacks/manual/
- Column 1 - catalog locus ID
- Column 2 - Location of a specific SNP within the locus (column number)
FIE_coords.txt
- lat - latitude in decimal degrees
- lon - longitude in decimal degrees
FIW_coords.txt
- lat - latitude in decimal degrees
- lon - longitude in decimal degrees
FI_sample.list
- Column 1 - sample ID
FR_sample.list
- Column 1 - sample ID
IT_sample.list
- Column 1 - sample ID
PL_sample.list
- Column 1 - sample ID
SE_sample.list
- Column 1 - sample ID
TR_sample.list
- Column 1 - sample ID
multi.list
- Column 1 - path to the output file from the previous step
- Column 2 - sampling site code
File: 2D_Principal_component_analysis_input_files.zip
Description: Input files used in script "2 - Genomics/2D_Principal_component_analysis" (https://github.com/norabergman/Reed-warbler-range-expansion). Contents / use of each file is described in code annotation.
Contains files:
25_2021+2017_max10indv_popmap_colnames
- ind.names - sample ID
- pop - sampling site code
25_2021+2017_pop_latitudes.txt
- Pop - sampling site code
- Latitude - latitude in decimal degrees
sampling_site_coords.txt
- Country - sampling country
- Site_code - sampling site code
- Longitude - longitude in decimal degrees
- Latitude - latitude in decimal degrees
File: 2F_conStruct_input_files.zip
Description: Input files used in script "2 - Genomics/2F_conStruct" (https://github.com/norabergman/Reed-warbler-range-expansion). Contents / use of each file is described in code annotation.
Contains files:
25_final_max10indv.csv
- Sample - sample ID
- Lon - longitude in decimal degrees
- Lat - latitude in decimal degrees
File: 2G_StAMPP_input_files.zip
Description: Input files used in script "2 - Genomics/2G_StAMPP_Fst" (https://github.com/norabergman/Reed-warbler-range-expansion). Contents / use of each file is described in code annotation.
Contains files:
25_2021+2017_max10indv_popmap_colnames
- ind.names - sample ID
- pop - sampling site code
25_2021+2017_pop_latitudes.txt
- Pop - sampling site code
- Latitude - latitude in decimal degrees
File: 2H_RangeExpansion_input_files.zip
Description: Input files used in script "2 - Genomics/2H_RangeExpansion" (https://github.com/norabergman/Reed-warbler-range-expansion). Contents / use of each file is described in code annotation.
Contains files:
25_2021_max10indv_no_regions.csv
- id - sample ID
- latitude - latitude in decimal degrees
- longitude - longitude in decimal degrees
- pop - sampling site code
25_2021_max10indv_popmap_colnames
- ind.names - sample ID
- pop - sampling site code
File: 2I_Genetic_diversity_input_files.zip
Description: Input files used in script "2 - Genomics/2I_Genetic_diversity" (https://github.com/norabergman/Reed-warbler-range-expansion). Contents / use of each file is described in code annotation.
Contains files:
2021_all_paramfile.txt
- Parameter file for ADZE, for parameter details see: https://rosenberglab.stanford.edu/software/ADZE_Manual.pdf
2021_expansion_eastcoast_paramfile.txt
- Parameter file for ADZE, for parameter details see: https://rosenberglab.stanford.edu/software/ADZE_Manual.pdf
2021_expansion_paramfile.txt
- Parameter file for ADZE, for parameter details see: https://rosenberglab.stanford.edu/software/ADZE_Manual.pdf
2021_expansion_thinned1_paramfile.txt
- Parameter file for ADZE, for parameter details see: https://rosenberglab.stanford.edu/software/ADZE_Manual.pdf
2021_expansion_thinned2_paramfile.txt
- Parameter file for ADZE, for parameter details see: https://rosenberglab.stanford.edu/software/ADZE_Manual.pdf
2021_expansion_westcoast_paramfile.txt
- Parameter file for ADZE, for parameter details see: https://rosenberglab.stanford.edu/software/ADZE_Manual.pdf
DS4_2021_stacks_popstats_coords.txt
- Contains a subset of the summary statistics calculated by STACKS populations in the populations.sumstats_summary.tsv output (Dataset 4), as well as latitude and longitude of the sampling sites
- Pop_ID - sampling site code
- Longitude - longitude in decimal degrees
- Latitude - latitude in decimal degrees
- Sites - the number of nucleotide sites (variant and invariant) in the population (i.e. sampling site) in this filtered dataset
- Variant_Sites - the number of variant sites in the population (i.e. sampling site) in this filtered dataset
- Polymorphic_Sites - the number of variant sites segregating within the population (i.e. sampling site) in this filtered dataset
- Percent_Polymorphic_Loci - the % of RAD loci segregating (containing variation) within the population (i.e. sampling site) in this filtered dataset
- Num_Indv - mean number of individuals per locus in the population
- P - mean frequency of the most frequent allele at each locus in the population
- Obs_Het - mean observed heterozygosity in the population
- Obs_Hom - mean observed homozygosity in the population
- Exp_Het - mean expected heterozygosity in the population
- Exp_Hom - mean expected homozygosity in the population
- Pi - mean value of π (nucleotide diversity) in the population
- Fis - mean measure of FIS (inbreeding coefficient) in the population
Code/software
The workflow and code is available at https://github.com/norabergman/Reed-warbler-range-expansion
The required software and packages are described in the scripts. To run the scripts, the following software and environment are required:
R (version 4.3.0 or higher)
RStudio (recommended)
Unix-based environment (e.g. Linux or macOS, scripts rely on Unix shell commands and may not run properly on Windows without a compatibility layer)
Access information
Other publicly accessible locations of the data:
- Scripts and workflow: https://github.com/norabergman/Reed-warbler-range-expansion
- Raw sequence data for running the entire genomic pipeline: NCBI SRA (BioProject PRJNA1217894), https://www.ncbi.nlm.nih.gov/sra/
Data was derived from the following sources:
- Ringing re-encounter data: Databank of the Ringing Centre at the Finnish Museum of Natural History
