The postglacial history of Euphrasia micrantha in Scotland: evidence from genome skimming
Data files
Aug 22, 2025 version files 22.92 MB
-
README.md
5.41 KB
-
SupportingDataCodes.zip
22.92 MB
Abstract
The flora of northern Europe has been shaped by a complex history of glaciation, where species have recolonised from different refugia and spread via contrasting migration routes. Our understanding of plant phylogeography in northern Europe has been limited by the resolution of genetic markers that cannot detect subtle population structure arising from close or cryptic refugia. Here, we employ genome skimming to recover complete plastid genomes and partial nuclear ribosomal arrays in the widespread plant species Euphrasia micrantha. We focus on populations from Scotland, where cryptic and complex patterns of recent recolonisation may prove hard to resolve. Genome skimming from 145 individuals revealed a species specific plastid DNA lineage in Scotland and England, while two other genetic clusters were highly intermixed with co-occurring taxa. Within the distinct plastid DNA lineage we recover subtle genetic divergence corresponding to West-East differentiation. These findings suggest E. micrantha has recolonised from multiple distinct refugia, potentially including cryptic northern refugia.
Dataset DOI: 10.5061/dryad.b8gtht7pm
Description of the data and file structure
This dataset includes complete plastid genomes and partial nuclear ribosomal DNA (nrDNA) sequences obtained through genome skimming of 145 Euphrasia individuals. Leaf tissue was collected from field populations across Scotland and England, and total genomic DNA was extracted and sequenced using Illumina short-read technology. Reads were quality-filtered and organellar genomes were assembled bioinformatically to extract complete plastid sequences and nrDNA arrays. The dataset includes aligned sequence files, associated sample metadata (species ID, geographic origin), and phylogenetic trees inferred using maximum likelihood methods. These data were used to explore population structure and postglacial history in Euphrasia micrantha, with broader relevance for phylogeographic inference in selfing, polyploid plant species.
Files and variables
File: SupportingDataCodes.zip
Description: This dataset is archived in a compressed folder titled SupportingDataCodes.zip, which contains two subfolders:
Folder 1: 1_SeqAlignment
This folder includes all raw and processed sequence alignment files:
· 1_IndividualSequence.geneious:
Geneious project file containing individual DNA sequences (both cpDNA and nrDNA) for Euphrasia micrantha and related taxa.
Format: .geneious (proprietary Geneious format)
Content: Raw unaligned sequences labeled with individual sample codes.
· 2_Alignment_cp_nrDNA.geneious:
Geneious project file containing aligned plastid (cpDNA) and nuclear ribosomal (nrDNA) sequences used for downstream phylogenetic analyses.
Content: Includes sequence alignments for all samples across both loci.
· 3_cpDNA_MIC.fasta:
FASTA file containing cpDNA sequences for E. micrantha individuals only.
Format: Standard FASTA; sequences labeled with sample codes (e.g., M1_1, M1_2, etc.).
· 4_nrDNA_MIC.fasta:
FASTA file containing nrDNA sequences for E. micrantha individuals only.
Format: Standard FASTA; same sample naming convention as above.
Folder 2: 2_AnalysisFilesCodes
This folder includes files used for phylogenetic inference and the associated R script for downstream analysis and figure generation:
· cpDNA.fasta.treefile:
Newick-format tree file for cpDNA sequences.
Format: .treefile output from IQ-TREE2, used for visualizing and comparing plastid phylogenies.
· nrDNA.fasta.treefile:
Newick-format tree file for nrDNA sequences.
Format: .treefile output from IQ-TREE2, used for visualizing nuclear phylogenies.
· Rscripts.R:
R script file used to process alignment data, generate summary plots, build phylogenies, and compare plastid and nuclear datasets using phytools, ggtree etc. packages.
Variables & Notation
· Sample identifiers are consistently abbreviated (e.g., M1_1–M59_2 for E. micrantha individuals).
· All sequence data is unscaled and without units; genetic divergence is expressed in substitutions per site in tree output.
· Missing data is notated as - in alignments or gaps in the tree matrices. No explicit “NA” values are present.
· All metadata (species ID, sampling origin) is embedded in the supplementary table.
Code/software
The dataset can be viewed and analyzed using the following free and open-source software:
· Geneious Prime (for .geneious files)
While the .geneious files require Geneious Prime (commercial software), sequence alignments can also be exported and viewed in standard formats (e.g., FASTA) using tools like AliView or MEGA.
· IQ-TREE2 (v2.1.2 or later)
Used to perform maximum likelihood phylogenetic inference.
Relevant commands and parameters are documented within the R script.
· R (v4.2.0 or later)
Used for all downstream visualization and statistical analysis.
Required R packages include:
o ape
o phytools
o ggtree
o adegenet
o stats
o ips
o msa
o Biostrings
o haplotypes
o pegas
o reshape2
o factoextra
o cluster
o marmap
o RColorBrewer
o dartR
o tibble
o poppr
o ggplot2
o dplyr
o ggpubr
· R script: Rscripts.R
This script includes commands to:
o Import and manipulate tree files and sequence alignments.
o Generate figures (e.g., combined cpDNA/nrDNA phylogeny using cophylo()).
o Color sample clades and annotate results.
o Perform haplotype analysis and visualize regional patterns.
Each step is documented in-line in the script, allowing reproduction of all analytical workflows presented in the manuscript:
“Postglacial expansion and divergence in Euphrasia micrantha revealed by genome skimming”
(Plant Ecology & Diversity, https://doi.org/10.1080/17550874.2025.2519256)
Access information
Other publicly accessible locations of the data:
Data was derived from the following sources:
