Skip to main content

Data from: Distribution and biogeography of Sanguina snow algae: fine-scale sequence analyses reveal previously unknown population structure

Cite this dataset

Brown, Shawn (2021). Data from: Distribution and biogeography of Sanguina snow algae: fine-scale sequence analyses reveal previously unknown population structure [Dataset]. Dryad.


It has been previously suggested that snow algal species within the genus Sanguina (S. nivaloides and S. aurantia) show no population structure despite being found globally (S. nivaloides) or throughout the Northern Hemisphere (S. aurantia). However, systematic biogeographic research into global distributions is lacking due to few genetic and no genomic resources for these snow algae. Here, using all publicly available and previously unpublished Sanguina sequences of the Internal Transcribed Spacer 2 region, we investigate if this purported lack of population structure within Sanguina species is supported by additional evidence. Using a minimum entropy decomposition (MED) approach to examine fine-scale genetic population structure, we find that these snow algae populations are largely distinct regionally and have some interesting biogeographic structuring. This is in opposition to the currently accepted idea that Sanguina species lack any observable population structure across their vast ranges and highlights the utility of fine-scale (sub-OTU) analytical tools to delineate geographic and genetic population structure. This work extends the known range of S. aurantia and emphasizes the need for development of genetic and genomic tools for additional studies on snow algae biogeography.


We analyzed all available and verified ITS2 sequences at the time of analysis from Sanguina species from GenBank, SRA, and supplemental information from associated publications. We chose to analyze the ITS2 region as opposed to 18S or other gene targets because ITS2 has the most available data and ITS regions have great potential for species level population analysis for algae (An et al. 1999).

We gathered the following Sanger sequences: 56 sequences from Segawa et al. (2018) collected from Alaska (USA), Svalbard (Norway), and Antarctica; 48 sequences from Procházková et al. (2019) from Austria, Italy, Slovakia, Switzerland, Norway, Colorado (USA), Argentina, and Antarctica; 29 sequences (Brown, unpublished using the primers ITS1-ITS4) from Lyman Basin, Washington (USA; 48º10’21” N, 120º53’50” W, 1880 m asl) and Niwot Ridge, Colorado (USA; 40º02’56” N, 105º34’51” W, 3514 m asl). Further, we gathered locus-targeted Illumina MiSeq sequence data: 1,600 sequences (Brown et al. 2016) from Washington (USA) and Colorado (USA); 44,666 sequences (Brown and Jumpponen 2019) from Finland, Sweden, Norway, and Colorado (USA); and 59,130 sequences (Tucker & Brown, unpublished; using fITS7-ITS4 primers) from Lyman Basin, Washington (USA; 48º10’27” N, 120º53’26” W; 1818 m asl), Mt. Democrat, Colorado (USA; 39º20’38” N, 106º07’45” W, 3950 m asl) and Medicine Bow Peak, Wyoming (USA, 41º20’45” N, 106º019’50” W; 3549 m asl). In all, we gathered 105,529 ITS2 sequences.

All sequences used were to the best of our knowledge from snows, generally perennial snowfields. To confirm that these sequences were from Sanguina snow algae, we extracted the ITS2 region (remove flanking 5.8S and LSU regions) from all sequences using the program ITSx (Bengtsson‐Palme et al. 2013), and MAFFT aligned (Katoh and Standley 2013) them to create a multiple sequence alignment (MSA). To initially confirm Sanguina origin of sequences, all sequences were initially clustered into OTUs using VSEARCH at 3% dissimilarity (Rognes et al. 2016) and representative sequences for these OTUs (see Appendix A1) were queried against GenBank (BLASTn nr/nt) and type sequences for both Sanguina species to confirm Sanguina identities.

This resulted in two retained OTUs – the dominant OTU1 (best match to Sanguina aurantia, 96.63% match to accession MK728633.1 – 38,012 total sequences; 95.65% match to S. aurantia type specimen MK728634.1) and OTU2 (best match to Sanguina nivaloides, 99.59% match to accession GU117577.1 – 22,065 total sequences, 99.01% match to S. nivaloides type specimen MK728599.1), remaining sequences were determined to not belong to Sanguina and were discarded. Discarded sequences were mainly assigned to the Trebouxiophyceae, other non-Sanguina Chlorophyceae, or were poorly matched to any reference taxa. It may be that a few errant sequences not belonging to either target Sanguina species may have been included as part of the OTU clustering, but we have no evidence that casts doubt on the veracity of these sequences. These retained OTUs will hereafter be referred to as S. aurantia or S. nivaloides. All associated retained sequences were collected (Table 1, Appendix A1) and coded by location for Sanguina species specific MED analyses (S. nivaloides and S. aurantia were analyzed separately).