Data from: Environmental gradients shape genetic variation in the desert moss, Syntrichia caninervis Mitt. (Pottiaceae)
Data files
Jan 24, 2025 version files 6.28 MB
-
123.cc147_population_metadata.txt
3.29 KB
-
123.cc147.recode.vcf
4.82 MB
-
CC30.84_population_metadata.txt
2.25 KB
-
CC30.84.vcf
1.45 MB
-
README.md
2.81 KB
Abstract
The moss Syntrichia caninervis Mitt. is distributed throughout drylands globally, and often anchors ecologically significant communities known as biological soil crusts (biocrusts). The species occupies a variety of dryland habitats with varying levels of drought and temperature stress, suggesting the potential for ecological specialization within S. caninervis. Here, we sampled S. caninervis from sites along two elevation gradients and used restriction site associated DNA sequencing to compare the relative impacts of environmental factors and geospatial distances on genetic differentiation in S. caninervis populations. While we found no evidence of isolation by distance in our data, one environmental variable, mean annual precipitation (MAP), was found to be a positive predictor of FST. An ecological association analysis identified 32 SNP alleles that covary significantly with MAP, 15 of which fall within the exonic regions of genes with annotations suggesting diverse roles in response to dehydration stress. Understanding the degree to which genetic variation in S. caninervis is associated with environmental factors is key to predicting its potential for persistence in the face of global climate change, which is predicted to be especially detrimental to desert organisms already living at their physiological limits.
README: Data from: Environmental gradients shape genetic variation in the desert moss, Syntrichia caninervis Mitt. (Pottiaceae)
https://doi.org/10.5061/dryad.kprr4xhfs
Description of the data and file structure
The data were collected in support of a study on population structure and environmental adaptation in a desert moss, Syntrichia caninervis. A total of 180 samples were genotyped through ddRAD-seq and two assemblies, one de novo and one reference-based, were performed. The vcf files generated after SNP calling and filtering for both assemblies are provided here, along with associated metadata.
Files and variables
File: CC30.84_population_metadata.txt
Description: This file contains information on the five populations represented in the SNP data set (CC30.84.vcf) derived from the reference-based assembly in Farah and Fisher (in press). Site data provided for each population include georeference (lat-long) coordinates, elevation in meters, and Mean Annual Precipitation (MAP) in millimeters. The NCBI SRA BioSample accession numbers for individuals from each population that contributed SNP genotype data to CC30.84.vcf are provided as a comma-separated list at the end of each population entry.
Variables
- POPULATION ID: A brief population identifier that is appended to the sample names associated with the NCBI SRA BioSamples in PRJNA1002376
File: CC30.84.vcf
Description: A vcf file generated by iPyrad_v.0.7.29 containing the SNP variant calls for 84 genotypes included in analyses of the reference-based assembly of RADseq loci in Farah and Fisher (in press). The #reference tag indicates the NCBI accession number for the* S. caninervis* reference genome, GCA_016097705.1
File: 123.cc147_population_metadata.txt
Description: This file contains information on the six populations represented in the SNP data set (123.cc147.recode.vcf) derived from the reference-based assembly in Farah and Fisher (in press). Site data provided for each population include georeference (lat-long) coordinates, elevation in meters, and Mean Annual Precipitation (MAP) in millimeters. The NCBI SRA BioSample accession numbers for individuals from each population that contributed SNP genotype data to 23.cc147.recode.vcf are provided as a comma-separated list at the end of each population entry.
Variables
- POPULATION ID: A brief population identifier that is appended to the sample names associated with the NCBI SRA BioSamples in PRJNA1002376
File: 123.cc147.recode.vcf
Description: A vcf file generated by Stacks v2.52 containing the SNP variant calls for 147 genotypes included in analyses of the de novo assembly of RADseq loci in Farah and Fisher (in press).
Methods
Samples underwent DNA extraction and double-digestion using the restriction enzymes MseI and PstI prior to RADseq. Libraries were multiplexed and sequenced in two lanes of a 1x150 Illumina NextSeq 500 run. Raw fastq sequence data for each sample were submitted to the NCBI Sequence Read Archive and are available under BioProject accession number PRJNA1002376.
De novo assembly and SNP filtering: Raw fastq sequence data were demultiplexed, trimmed to a consistent length of 60 bp, filtered for quality, and assembled de novo in STACKS v2. Initial filtering was performed to retain loci present in at least 60% of individuals across populations and with a minimum minor allele count (mac) of 6 (i.e., minor allele shared by at least three individuals). Identical multi-locus genotypes (MLGs) were identified in poppr, and the data set was subsequently filtered in VCFtools to retain one representative genotype per clone and to remove individual samples with more than 20% missing data.
Reference-Based Assembly and SNP Filtering: Raw fastq sequence data were demultiplexed, trimmed, filtered for quality, and assembled to the S. caninervis reference genome using the iPyrad pipeline. iPyrad was also used to generate consensus loci across samples and identify SNPs. Parameters were set to retain only biallelic loci (max SNPs per locus = 2), and to enforce the expectation of haploidy in the samples (max heterozygotes per consensus sequence = 0; max heterozygous sites per locus = 0). VCFtools was used to iteratively filter SNP loci. First, loci missing from more than 33% of the samples, or with a mac less than 4 (i.e., not present in at least two individuals) were removed, as were individual samples with more than 50% missing data (122 samples, 18,067 sites remaining). Next loci with more than 30% missing data in any of the populations were identified and filtered out. In poppr, identical multi-locus genotypes (MLGs) were identified and identical genotypes were subsequently filtered from the .vcf file to leave a single representative individual per genotype. Individuals from the NV high elevation site were also filtered from the data set, as too few (< 5) individuals from this site remained for meaningful population genetic inference. Finally, the data set was filtered to remove sites on the sex chromosome (n=14), thinned so that all loci were separated by at least 2 Kb, and filtered to enforce a minimum minor allele frequency of 0.05 (i.e., minor alleles present in at least three individuals).