Molecular species delimitation of Sphagnum subgenus Subsecunda in Europe
Data files
Sep 17, 2025 version files 125.72 MB
-
popmap_SNP.tsv
1.11 KB
-
Pruned_SNP_dataset.vcf
27.46 MB
-
README.md
3.11 KB
-
SNP_dataset.vcf
98.18 MB
-
SSR_data.csv
37.62 KB
-
Voucher_data.csv
38.93 KB
Abstract
A taxonomic revision of Sphagnum subgenus Subsecunda in Europe is warranted as the taxonomic status and the relationships among the six species found in Europe are inconsistent. Here we provide molecular data from a large specimen set of species in Sphagnum subgenus Subsecunda obtained from European herbaria, used to investigate taxonomic relationships in European Subsecunda species. The molecular data includes a dataset of 15 microsatellite markers from a total of 310 specimens and short-nucleotide repeats (SNP) data from 64 specimens obtained using double digest restriction-site associated DNA (ddRAD) sequencing method. The species included in the datasets are S. contortum, S. platyphyllum, S. subsecundum, S. inundatum, and S. denticulatum, in addition to one deviating specimen of S. auriculatum from the Azores potentially representing an undescribed taxon of subgenus Subsecunda in Europe. The methods used to obtain the molecular data are described in detail in Meleshko et al.
Access this data on Dryad https://doi.org/10.5061/dryad.9w0vt4btt
This data was generated to investigate taxonomic issues of species in Sphagnum subgenus Subsecunda. The data were obtained from herbaria in Europe, and the datasets contains data from five species oof Sphagnum subgenus Subsecunda. All specimens are from Europe. DNA was extracted from dried capitula of herbaria specimens. We obtained two kind of molecular data: short nucleotide polymorphisms (SNP) data and microsatellite (short sequence repeats, SSR) data. The former were obtained using double digest RAD sequencing. Detailed methods to obtain molecular data are described in Meleshko et al. 2025 (Botanical Journal of Linneaen Society).
Description of the data and file structure
The data are divided in separate files and names and content are described in detail below. File 1 contains the voucher data of all specimens and should be linked to the ID used in the other files. The first column in the file has the ID of the specimens with SSR data and the second column has the ID for the specimens with SNP data. There are fewer specimens with SNP data than SSR data and specimens with no SNP ID are indicated with n/a. Likewise, specimens in the outgroup with SNP data, are denoted with n/a in the ID column of SSR specimens.
File 1 Name: Voucher_data.csv
CSV file containing the voucher table with information about all samples: the ID used for the SSR markers, ID used for SNPs, species name, ID of herbarium sample (TRH (herbarium)), the country and site the specimen were collected, the year the specimen were collected and longitude and latitude of the collection. This data gives the background data for the specimens in files 2-5.
File 2 Name: SNP_dataset.vcf
Variant Call Format file containing the SNP data for 64 specimens (referred to as "SNP dataset" in Meleshko et al. 2025).
File 3 Name: Pruned_SNP_dataset.vcf
Variant Call Format file containing the pruned SNP data for 64 specimens (referred to as "pruned SNP dataset" in Meleshko et al. 2025).
File 4 Name: popmap_SNP.tsv
Assignment of individuals from the SNP dataset to species. The file should be combined with the data in files 2 and 3. First column shows the sample ID, and second column gives the species name (genus name - Sphagnum - is not included in the name a all specimens belong to Sphagnum; auriculatum refers to the haploid sample of S. auriculatum).
File 5 Name: SSR_dataset.csv
SSR data includes SSR ID, name of the species and all SSR markers (15 in total) identified with SSR name (a number corresponding to the length of the DNA fragment, marker names have an added _1 or _2 corresponding to allele 1 and allele 2 at the same marker). Each line in the table represents one specimen. As two of the species are diploid, but three are haploid, all specimens are listed as diploid (two alleles per marker). The three haploid species have their alleles duplicated in the list (homozygous). Missing data is given as '0'.
