Single nucleotide polymorphisms, environmental data and R scripts used in the work: A donor registry: Genomic analyses of Posidonia australis seagrass meadows identifies adaptive genotypes for future-proofing

Nimbs, Matt 1 ; Davis, Tom 1 1

Research facility: National Marine Science Centre, Coffs Harbour, NSW

Published Nov 21, 2024 on Dryad. https://doi.org/10.5061/dryad.d2547d89s

Data files

Nov 21, 2024 version files 2.25 MB

Environmental_Covariables_NSW_estuaries.csv

37.06 KB
PosidoniafiltereddataGEASNPs.csv

2.21 MB
README.md

6.25 KB

Abstract

Globally, anthropogenic climate change has caused declines of seagrass ecosystems necessitating proactive restoration approaches which would ideally anticipate future conditions. In eastern Australia, environmental conditions in estuaries with meadows of the endangered seagrass Posidonia australis have warmed and acidified over the past decade and seagrass communities have declined in some estuaries. Securing these valuable habitats will require proactive conservation and restoration efforts that could be augmented with restoration focussed on boosting resilience to future change. Understanding patterns of selection and where seagrass meadows are adapted to particular environmental conditions is key for identifying optimal donor material for restoration. We use single nucleotide polymorphisms and genotype by environment analyses to identify candidate loci under putative selection to environmental stressors and assess genomic variation and allelic turnover along stressor gradients. The most important estuarine variables driving selection were associated with temperature, water turbidity and pH. We developed a preliminary ‘donor registry’ of pre-adapted Posidonia australis genotypes by mapping the distribution of alleles to visualise allelic composition of each sampled seagrass meadow. The registry could be used as a first step to select source material for future-proofing restoration projects however, manipulative experiments will be required to test that pre-adapted genotypes confer increased resistance to multiple environmental stressors.

https://doi.org/10.5061/dryad.d2547d89s

Description of the data and file structure

We used single nucleotide polymorphisms (SNP data provded here) and genotype by environment analyses (environmental data provided here) to identify candidate loci under putative selection to environmental stressors and assess genomic variation and allelic turnover along stressor gradients for the seagrass Posidonia australis in estuaries along the New South Wales (Australia) coast.

Files and variables

File: PosidoniafiltereddataGEASNPs.csv

Description: DArT Sequence generated single nucleotide polymorphisms for individual loci from multiple individuals

Variables

Individual: Samples from estauries in New South Wales - see estuarine covariables dataset for explainers of individual origin
SNP sequence: short read sequence for individual SNPs. Some cells contain "NA" values representing no data available

File: Environmental_Covariables_NSW_estuaries.csv

Description: Corresponding environmental data for individuals and populations in the PosidoniafiltereddataGEASNPs.csv dataset

Variables

Ind: Name of individual samples
Pop: Name of population to which each individual belongs
AvTemp: Average water temperature (degrees Celsius) for the relevant population/individual averaged over the last 20 years
MinTemp: Minimum temperature (degrees Celsius) for the relevant population/individual over the last 20 years
MaxTemp: Maximum temperature (degrees Celsius) for the relevant population/individual over the last 20 years
TempRange: Difference between maximum and minimum temperatures (degrees Celsius) for the relevant population/individual over the last 20 years
AvpH: Average water pH (pH units) for the relevant population/individual over the last 20 years
MinpH: Minimum water pH (pH units)for the relevant population/individual over the last 20 years
MaxpH: Maximum water pH (pH units)for the relevant population/individual over the last 20 years
pHRange: Difference (pH units) between maximum and minimum pH value for the relevant population/individual over the last 20 years
AvSal: Average water salinity (practical salinity units) meaure for the relevant population/individual over the last 20 years
MaxSal: Maximum water salinity (practical salinity units meaure for the relevant population/individual over the last 20 years
AvTurb: Average water turbidity (nephelometric turbidity units) measure for the relevant population/individual over the last 20 years
MinTurb: Maximum water turbidity (nephelometric turbidity units) measure for the relevant population/individual over the last 20 years
MaxTurb: Minimum water turbidity (nephelometric turbidity units) measure for the relevant population/individual over the last 20 years
TurbRange: Difference between maximum and minimum water turbidity (nephelometric turbidity units) value for the relevant population/individual over the last 20 years

Code/software

Sequencing error was estimated by calculating the maximum proportion of allelic differences (bitwise distance) found between six pairs of technical replicates using bitwise.dist in the R package poppr.

A data filtering strategy was employed using several functions in the R package dartR v.2.7.2.

Genomic scans for adaptive divergence were carried out to identify candidate SNPs potentially under selective pressure using three different models: Redundancy analysis (RDA) , Principal Component Analysis for Outlier Detection (PCAdapt) and Latent Factor Mixed Models (LFMM2)

To infer population structure, individual ancestral coefficients were estimated based on a sparse non-negative matrix factorisation (SNMF) method. This was implemented using the snmf function in the *R package LEA v3.10.2. The optimal factor, K=8, *was used to inform the LFMM to identify whether allele frequencies were correlated with any of the environmental variables. Statistical power of associations was increased by imputing missing genotype data via the *gl.impute *function in the dartR package using the nearest neighbour option. Subsequently, the function *lfmm_ridge was used *to compute a regularised least-squares estimate using a ridge penalty. Individual associations between each SNP frequency and each environmental variable were assessed using statistics test calibrated using genomic inflation factor (function lfmm_test). Corrections for multiple comparisons were applied with the Benjamini-Hochberg algorithm with a false discovery rate (FDR) threshold of 5% . Significance associations was determined using a threshold of 0.001 as the probability of finding a false positive result increases with lower thresholds. Candidate SNP loci were retained for downstream analysis when they were identified by at least two out of the three methods.

The GF analysis was run in the R package gradientForest using a regression tree-based approach to fit a model of responses between genomic data and environmental variables . Turnover in adaptive genetic variation were modelled on the predictor variables using the candidate SNPs, identified via GEAs, as the response variables. The machine learning algorithm partitioned allele frequencies at numerous split values along each environmental gradient and calculated the change in allele frequencies for each split. The split importance (i.e., the amount of genomic variation explained by each split value) was cumulatively summed along the environmental gradient and aggregated across alleles to build a non-linear turnover function to identify loci that were significantly influenced by the predictor variable. The analysis was run over 500 regression trees for each of the nine environmental predictor variables with all other parameters at default settings.

Access information

Other publicly accessible locations of the data:

Data was derived from the following sources:

Single nucleotide polymorphisms, environmental data and R scripts used in the work: A donor registry: Genomic analyses of Posidonia australis seagrass meadows identifies adaptive genotypes for future-proofing

Data files

Abstract

README: Single nucleotide polymorphisms, environmental data and R scripts used in the work: A donor registry: Genomic analyses of Posidonia australis seagrass meadows identifies adaptive genotypes for future-proofing

Description of the data and file structure

Files and variables

File: PosidoniafiltereddataGEASNPs.csv

Variables

File: Environmental_Covariables_NSW_estuaries.csv

Variables

Code/software

Access information

Methods