Data from: Distribution and biogeography of Sanguina snow algae: fine-scale sequence analyses reveal previously unknown population structure
Data files
Aug 19, 2021 version files 1.73 MB
-
Appendix_1.xlsx
1.73 MB
Abstract
Methods
We analyzed all available and verified ITS2 sequences at the time of analysis from Sanguina species from GenBank, SRA, and supplemental information from associated publications. We chose to analyze the ITS2 region as opposed to 18S or other gene targets because ITS2 has the most available data and ITS regions have great potential for species level population analysis for algae (An et al. 1999).
We gathered the following Sanger sequences: 56 sequences from Segawa et al. (2018) collected from Alaska (USA), Svalbard (Norway), and Antarctica; 48 sequences from Procházková et al. (2019) from Austria, Italy, Slovakia, Switzerland, Norway, Colorado (USA), Argentina, and Antarctica; 29 sequences (Brown, unpublished using the primers ITS1-ITS4) from Lyman Basin, Washington (USA; 48º10’21” N, 120º53’50” W, 1880 m asl) and Niwot Ridge, Colorado (USA; 40º02’56” N, 105º34’51” W, 3514 m asl). Further, we gathered locus-targeted Illumina MiSeq sequence data: 1,600 sequences (Brown et al. 2016) from Washington (USA) and Colorado (USA); 44,666 sequences (Brown and Jumpponen 2019) from Finland, Sweden, Norway, and Colorado (USA); and 59,130 sequences (Tucker & Brown, unpublished; using fITS7-ITS4 primers) from Lyman Basin, Washington (USA; 48º10’27” N, 120º53’26” W; 1818 m asl), Mt. Democrat, Colorado (USA; 39º20’38” N, 106º07’45” W, 3950 m asl) and Medicine Bow Peak, Wyoming (USA, 41º20’45” N, 106º019’50” W; 3549 m asl). In all, we gathered 105,529 ITS2 sequences.
All sequences used were to the best of our knowledge from snows, generally perennial snowfields. To confirm that these sequences were from Sanguina snow algae, we extracted the ITS2 region (remove flanking 5.8S and LSU regions) from all sequences using the program ITSx (Bengtsson‐Palme et al. 2013), and MAFFT aligned (Katoh and Standley 2013) them to create a multiple sequence alignment (MSA). To initially confirm Sanguina origin of sequences, all sequences were initially clustered into OTUs using VSEARCH at 3% dissimilarity (Rognes et al. 2016) and representative sequences for these OTUs (see Appendix A1) were queried against GenBank (BLASTn nr/nt) and type sequences for both Sanguina species to confirm Sanguina identities.
This resulted in two retained OTUs – the dominant OTU1 (best match to Sanguina aurantia, 96.63% match to accession MK728633.1 – 38,012 total sequences; 95.65% match to S. aurantia type specimen MK728634.1) and OTU2 (best match to Sanguina nivaloides, 99.59% match to accession GU117577.1 – 22,065 total sequences, 99.01% match to S. nivaloides type specimen MK728599.1), remaining sequences were determined to not belong to Sanguina and were discarded. Discarded sequences were mainly assigned to the Trebouxiophyceae, other non-Sanguina Chlorophyceae, or were poorly matched to any reference taxa. It may be that a few errant sequences not belonging to either target Sanguina species may have been included as part of the OTU clustering, but we have no evidence that casts doubt on the veracity of these sequences. These retained OTUs will hereafter be referred to as S. aurantia or S. nivaloides. All associated retained sequences were collected (Table 1, Appendix A1) and coded by location for Sanguina species specific MED analyses (S. nivaloides and S. aurantia were analyzed separately).