Skip to main content

Data from: Phylogenomic structure and speciation in an emerging model: The Sphagnum magellanicum complex (Bryophyta)

Cite this dataset

Shaw, A. Jonathan et al. (2022). Data from: Phylogenomic structure and speciation in an emerging model: The Sphagnum magellanicum complex (Bryophyta) [Dataset]. Dryad.


The moss genus Sphagnum has unparalleled ecological importance because some 30% of the total terrestrial carbon pool is bound up in Sphagnum-dominated peatlands. A major peat-former, S. magellanicum, is one of two species for which a reference-quality genome exists to facilitate research in ecological genomics, but recently published work indicated that S. magellanicum s. str. is restricted to South America and two other species, S. divinum and S. medium occur in North America and Europe. We report herein that there are four clades/species within the S. magellanicum complex in eastern North America, two in South America, and another in eastern Asia. The reference genome belongs to S. divinum. Phylogenetic analyses at the whole genome and chromosome levels, using genome resequencing and RADseq, resolve sister group relationships within the complex. Species are monophyletic in most analyses and exhibit tens of thousands (RADseq) to millions (resequencing) of fixed nucleotide differences, but two, referred to informally as S. diabolicum and S. magni because they have not been formally described, are differentiated by only hundreds (RADseq) to thousands (resequencing) of differences. Data from 14 of the 19 resequenced chromosomes (7 chromosomes for RADseq) resolve the reciprocal monophyly of S. magni and S. diabolicum. These two appear to be in the process of speciation and because they differ in geographic ranges and the climate zones they occupy – S. diabolicum in boreal peatlands and S. magni in warm temperate to subtropical communities of the southern U.S. – they provide an exciting opportunity for comparative genomic analyses of climate niche evolution. Introgression among species in the complex is demonstrated using D-statistics and f4-ratios. One ecologically important functional trait that underlies peat (carbon) accumulation, tissue decomposability, does not differ between segregate North American species in the S. magellanicum complex although previous research showed that many related Sphagnum species have evolved differences in decomposability/carbon sequestration. Phylogenetic resolution and more accurate species delimitation in the S. magellanicum complex substantially increase the value of this group for studying the early evolutionary stages of climate adaptation, and ecological evolution more broadly.


RADseq data: Genomic DNA was extracted from a single capitulum of each herbarium sample or new collection. RADseq libraries were prepared following a double digestion restriction site-associated DNA sequencing (ddRADseq) protocol. Each library was sequenced on a single lane of Illumina NextSeq 500 with 150bp single-ended reads.

"RADseq-like" in silico digested genomic data: 43 genomic resequencing assemblies were digested in silico with EcoRI and MseI using the program “restrict” from the EMBOSS package. Custom scripts were used to filter for digested sequence fragments with an EcoRI cutsite at one end and an MseI cutsite at the other, to mimic the size-selection steps of a RADseq library preparation, to trim the fragments to match the length of our quality-filtered Illumina reads, and to write the sequences to a FASTQ formatted file. Each resulting “read” was given a quality score of all "E" (high enough to pass downstream quality filters and number of 10 copies (enough to pass downstream depth filters).

Chloroplast alignment: Plastid reads from 49 resequenced genomes were identified and assembled into contigs using NOVOPlasty. For each genome, contigs were manually aligned to the published Sphagnum palustre plastid genome (KU726621) and to each other to identify the Inverted Repeat boundaries and generate a single incomplete plastid genome sequence (with missing data represented by strings of Ns). Each sequence includes the Long Single Copy region, one copy of the Inverted Repeat, and the Small Single Copy region. Sequences were aligned with MAFFT.

Further details on processing methods are available in the associated manuscript.

Usage notes

File chloroplast_alignment.mafft.gz: Zipped fasta format file with an alignment of 49 chloroplast genomes (the Long Single Copy Region, Inverted Repeat, and Small Single copy region--the second copy of the inverted repeat is omitted).

File demultiplexed_sample_data-radseq.tar.gz: Zipped folder with 149 files of demultiplexed fastq format Illumina reads from RADseq samples.

File demultiplexed_sample_data-insilico.tar.gz: Zipped folder with 43 files of fastq format "RADseq-like" reads from in silico digested genome resequencing samples.


National Science Foundation, Award: DEB-1737899

National Science Foundation, Award: DEB-1928514