DaRT-seq raw data of Eucalyptus spp for the genetic assessment of the value of restoration planting within an endangered eucalypt woodland
Data files
Apr 25, 2023 version files 135.58 MB
-
README_for_Dryad.txt
-
Rosser_et_al_2022_DRYAD_RAW_DATA.xlsx
Abstract
Assessment of woodland restoration often focusses on stand demographics, but genetic factors likely influence long-term stand viability. We examined the genetic composition of Yellow Box (Eucalyptus melliodora) trees in endangered Box-Gum Grassy Woodland in SE Australia, some 30 years after planting with seeds of reportedly local provenance. Using DArT sequencing for 1406 SNPs, we compared genetic diversity and population structure of planted E. melliodora trees with remnant bushland trees, paddock trees, and natural recruits. Genetic patterns imply that natural stands and paddock trees had historically high gene flow (among group pairwise FST = 0.04–0.10). Genetic diversity was highest among relictual paddock trees (He = 0.17), while diversity of revegetated trees was identical to natural bushland trees (He = 0.14). Bayesian clustering placed the revegetated trees into six genetic groups with four corresponding to genotypes from paddock trees, indicating that revegetated stands are mainly of genetically diverse, local provenance. Natural recruits were largely derived from paddock trees with some contribution from planted trees. A few trees have likely hybridised with other local eucalypt species which are unlikely to compromise stand integrity. We show that paddock trees have high genetic diversity and capture historic genetic variety and provide important foci for natural recruitment of genetically diverse and outcrossed seedlings.
Methods
Sample collection
We collected a total of 221 samples of Eucalyptus melliodora trees from 14 sites across the central valley and the uncleared valley walls of the Warrumbungle National Park in NSW (31°17’S, 149°00’E). We selected samples from six remnant populations on the edge of valley (n=60; termed “natural stands”), together with samples from all of the relictual paddock trees in the valley (n=36; termed “paddock”), from two populations of recruits, each one growing near one of the paddock trees (n=48; termed “recruits”), and from five populations of planted trees that were planted at different years in the restoration project (n=77; termed “planted”).
We selected samples from remnant populations on the valley walls at a distance >1 km from plantings on the uncleared slopes of the valley or separate to the central valley, with a preference for larger (likely older) trees, to increase the probability that these trees pre-date the germination of planted trees and hence provide a better reference for remnant E. melliodora within the region. We identified relictual paddock trees and planted trees from aerial photos taken prior to the establishment of the reserve. Relictual trees were easily relocated in the cleared central valley as they are large (20-40 m tall) and often have associated woody debris, while planted trees were smaller than the relictual ones (6-15 m tall) and were generally planted in rows, often with adjacent timber posts. The five planted populations were planted in different years, with population 11 planted in 1995, population 12 in 1998, and population 14 planted in 1992, while the dates that the other two planted populations (10 and 13) were planted are unknown. Recruits were smaller still (0.5-5 m tall) and occurred in scattered locations, often in a ‘shadow’ surrounding adult paddock trees.
DArT genotyping
We used DArTseq services for both DNA extraction from leaf material, and sequencing of SNPs (Single Nucleotide Polymorphisms) using a high-density microarray developed for eucalypts (Sansaloni et al. 2010, Petroli et al. 2012). Extraction of genomic DNA was conducted in accordance with a modified CTAB protocol produced by Diversity Array Technologies.
Complexity reduction was conducted through a PstI/TaqI based method developed by Sansaloni et al. (2010), using enzymatic breakdown to select for more active genomic regions, and importantly, remove repeat sequences typical of polyploidy genomes such as eucalyptus.
Next Generation Sequencing (NGS), was used to detect SNP polymorphisms across the genome of each sample against a library developed for eucalypts (Sansaloni et al. 2010) detecting dominant, biallelic SNP loci. Descriptive polymorphic SNP loci were sequenced across the genome of all leaf samples, using the DArTseq microarray at a ‘high intensity’ run. These SNP loci are bi-allelic, with results measuring the presence or absence of a polymorphism (SNP) at a specific SNP site on the genome. Sequencing was carried out on an Illumina HiSeq 2,500 using 75-cycle single end reads. Raw reads were processed using DArT's proprietary variant calling pipeline, DArTsoft-14.
Data analysis
We filtered the primary dataset at a stringent level to ensure only high-quality markers were retained, and therefore genotypes were accurate. Loci were filtered in DartR and the final dataset included only those loci that had a read depth of ≥ 20, a call rate of ≥ 0.9, a minor allele frequency of ≥ 0.05, and a reproducibility rate of ≥ 0.99.
To analyse population structure, we calculated pairwise FST in Dart R (Gruber et al. 2018), and conducted a Bayesian cluster analysis in the software ParallelStructure 2.3.4 on the CIPRES portal (Miller et al. 2015) to identify the number of genetic clusters in the dataset. We used the admixture model of ancestry with correlated allele frequencies, a burn-in length of 80,000 followed by 120,000 MCMC reps after burn-in, and we conducted 5 iterations each of K=1 to K=10. To estimate the number of genetic clusters, we used the Puechmaille method (Puechmaille 2016) on StructureSelector (Li & Liu 2018), which calculates MedMeaK, MaxMeaK, MedMedK, and MaxMedK. While StructureSelector also presents the results of ΔK using the Evanno method (Evanno et al. 2005), these results were discarded because ΔK is strongly affected by uneven sample sizes and shouldn’t be used to infer the number of subpopulations when sampling is uneven (Puechmaille 2016). In contrast, the four new estimators of the number of clusters used in the Puechmaille method are much less sensitive to uneven sampling regimes, and have been shown to consistently outperform other methods of genetic cluster membership across all simulations (Puechmaille 2016).
To estimate genetic diversity and mating system, we calculated observed heterozygosity and expected heterozygosity under Hardy-Weinberg equilibria, and F (inbreeding coefficient) in GenAlEx 6.51b (Peakall & Smouse 2012). To detect hybrids, we compared the genotypes of a subset of E. melliodora trees with other trees in the study site. Included in this analysis were 61 samples of E. melliodora selected randomly from all populations in the main dataset (except recruits) including natural stands, paddock trees, and planted trees, together with 42 samples of E. albens, and 11 samples from a range of other species including E. crebra and E. blakelyi (labelled “other spp”). Loci were filtered as described above, and a genetic cluster analysis was conducted in Structure, using the same parameters described above.