Data from: Rapid colonisation of synanthropic stone martens in a highly urbanised region: Insights from temporal and spatial analysis
Data files
May 08, 2025 version files 6.41 MB
-
filtered_snps.csv
6.39 MB
-
metadata_samples.csv
20.52 KB
-
README.md
1.02 KB
Abstract
Medium-sized carnivores, including the synanthropic stone marten (Martes foina Erxleben, 1777), have shown remarkable adaptability to urbanized and fragmented landscapes, facilitating their spread across mainland Europe. This study investigates the recolonisation of a highly urbanised region by stone martens within two decades, examining spatial and temporal genome-wide data (using genotyping by sequencing) to reveal colonisation dynamics, sources, and barriers influencing their expansion. Using genotypes from 5,536 SNPs across 376 stone martens collected between 1995 and 2013, our findings indicate that stone martens successfully expanded through urban environments, yet dispersal was neither entirely random nor strictly distance-dependent. The initial southeastern stronghold (E1) showed the lowest genetic diversity and limited spatial expansion, while other population sources contributed to recolonisation, highlighting a complex, multi-source expansion. Gene flow in the early stages was largely confined to E1, progressing northward and eventually enabling exchange with a second eastern lineage (E2). Meanwhile, the western lineage displayed higher connectivity, occasionally crossing barriers like motorways. Motorways, however, significantly shaped recolonization patterns, reducing gene flow, while other elements such as built-up areas, secondary roads or waterways showed an additional though very small effect. Over the study period, genetic patch size increased, indicating longer dispersal distances. Gene flow strengthened within both eastern (E1 and E2) and western populations. Still, the western population diverged in two subclusters (W1 and W2) of which one became more differentiated. This suggests limited genetic homogenization in the near future. This study provides insights into the genetic and ecological dynamics of carnivore recolonization in highly fragmented landscapes.
https://doi.org/10.5061/dryad.3xsj3txrn
Description of the data and file structure
See the Methods section for information on the collection of samples, GBS analysis, SNP genotyping and filtering steps. The resulting filtered SNP data set holds genotypes of 376 individuals of Martes foina using 5536 SNPs.
Files and variables
File: metadata_samples.csv
Description: Sample information with sample ID, group ID, sex (empty cells represent gender not know), collection date (or year), and coordinates (Belgian Lambert 72 coordinate system).
File: filtered_snps.csv
Description: Filtered SNP data file with 376 samples of stone marten and 5536 SNPs, with samples in rows and SNPs in columns. In the first two columns the sample ID and the group ID is provided. Missing genotypes are indicated with “NA”.
Traffic casualties of stone martens were collected from 1995 to 2013 in Flanders, the northern region of Belgium. A few samples were from the Brussels Capital region and one from the Ardennes in the Walloon region. DNA from 386 individuals, extracted using the DNeasy Blood & Tissue Kit (Qiagen), was used for Genotyping-by-sequencing (GBS). A PstI GBS library of all samples, 83 replicated samples with one to three replicates each (18 replicates starting from tissue), and 5 blanks was prepared according to Elshire et al. (2011), and the resulting library was sequenced on an Illumina HiSeq 2000 (Cornell University Genomics Core Laboratory).
Trimming of Illumina adaptors and filtering of reads with a minimal length of 50 bp were performed using Cutadapt 4.2. Stacks v2.53 process_radtags was used to demultiplex raw reads and remove low-quality reads, followed by the removal of the restriction enzyme cut site with Cutadapt. We checked read quality using FastQC v0.11.9 after each processing step and using MultiQC v1.14 after the final step. The software BWA v0.7.17 and more specifically the BWA-MEM algorithm was used for the alignment of the reads to the stone marten reference genome, the chromosome-length (2n =38) assembly available from the DNA Zoo Consortium (https://www.dnazoo.org/assemblies/Martes_foina, accessed on 8 September 2022; Dudchenko et al., 2017, 2018). After alignment, SAM files were converted to BAM files and sorted by position order using SAMtools v1.16.1.
Stacks was used to call SNPs with a reference genome (gstacks), allowing a maximum soft-clipping level of 10% of the read length. The dataset was converted to VCF format for further analysis in RStudio v2022.12.0+353 with R v4.2.0. Initially, we removed variants with a read depth of less than 3, a missingness of more than 30%, a minor allele frequency (MAF) of less than 0.01 or a minor allele count (MAC) of less than 3 using the R package vcfR v1.13.0. Paralogs were identified with the R script HDplot developed by McKinney et al. (2017) and removed. We further omitted potential paralogs or multicopy loci by removing variants with a read depth in the top 5% (i.e., with a read depth > 50). Next, samples with high missingness (≥ 80%) were removed.
Further filtering steps were performed using the R package snpR v1.2.5. Filtering thresholds were set to MAF > 0.01, a maximum observed heterozygosity of 0.55, and a missingness rate of less than 30% per locus. These filtering steps were repeated until all thresholds were reached. Subsequently, the same thresholds were applied, but with a missingness threshold per sample (< 50%), instead of per locus. SNPs were finally pruned for high levels of pairwise linkage disequilibrium (LD) when MAF > 0.01, missingness per sample < 30%, and a correlation coefficient existed of at least 0.5 between SNPs within a sliding window of a maximum 50 SNPs, using the R package SNPRelate 1.28.0.
In order to filter SNPs that deviated from Hardy-Weinberg equilibrium (HWE), we first defined genetic clusters using principal component analysis (PCA) with the R package adegenet v2.1.10, with missing data imputed as the mean value per SNP. We grouped samples according to the PCA results, which showed five major genetic clusters. Using the R package pegas v1.2, we tested which SNPs deviated from HWE (P < 0.05). Those SNPs that tested positive in all five clusters were removed. SNPs potentially under selection were detected with pcadapt v4.3.3 using five principal components and an FDR correction for multiple testing. The method implemented in the R package OutFLANK v0.2 was used as a second approach with groups clustered in the same five genetic clusters (see earlier). File conversions were performed with the R package radiator 1.2.2.
One sample failed to sequence. After alignment, skipping primary alignments with insufficient mapping qualities (1.2%) and excessively soft-clipped primary alignments (2.8%), 1493681 loci could be built, which held 144179 variant sites. After the first filtering steps using vcfR and HDplot, 5592 SNPs in 378 samples were retained. A further 56 SNPs were removed that showed deviations from HWE in all five genetic clusters. We found only four outliers using PC-Adapt, though no outliers with OutFLANK. Therefore, all SNPs were retained for further analysis. Two samples had uncertain metadata (location and year) and were, therefore, removed. In our final set, average missingness per SNP and sample was 13.63%, and consisted of 376 samples and 5536 SNPs.
References:
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. (2011) A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLoS ONE 6(5): e19379. https://doi.org/10.1371/journal.pone.0019379
Dudchenko, O., Batra, S. S., Omer, A. D., Nyquist, S. K., Hoeger, M., Durand, N. C., Shamim, M. S., Machol, I., Lander, E. S., Aiden, A. P., & Aiden, E. L. (2017). De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. https://doi.org/aal3327
The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000
Olga Dudchenko, Muhammad S. Shamim, Sanjit S. Batra, Neva C. Durand, Nathaniel T. Musial, Ragib Mostofa, Melanie Pham, Brian Glenn St Hilaire, Weijie Yao, Elena Stamenova, Marie Hoeger, Sarah K. Nyquist, Valeriya Korchina, Kelcie Pletch, Joseph P. Flanagan, Ania Tomaszewicz, Denise McAloose, Cynthia Pérez Estrada, Ben J. Novak, Arina D. Omer, Erez Lieberman Aiden
bioRxiv 254797; doi: https://doi.org/10.1101/254797
McKinney, G. J., Waples, R. K., Seeb, L. W., & Seeb, J. E. (2017). Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping-by-sequencing data from natural populations. Molecular Ecology Resources, 17(4), 656-669. https://doi.org/10.1111/1755-0998.12613
