Macrogenetic alignment in ecological strategies better interprets assembly processes than pre-determined functional groupings
Data files
Apr 15, 2026 version files 13.77 MB
-
Rainforest_Landscape_genetics_data.zip
13.76 MB
-
README.md
8.02 KB
Apr 22, 2026 version files 27.37 MB
-
Rainforest_Landscape_genetics_data.zip
27.36 MB
-
README.md
8.78 KB
Abstract
Understanding how species assemble across landscapes requires integration of data representing evolutionary, ecological, and biogeographic processes. We developed a comparative macrogenetic framework, applying it across 22 co-distributed rainforest trees, to identify replicated landscape-level genetic signatures. Diversity-migration analyses and genogeographic clustering identified shared spatial dynamics in relation to refugial areas and genetic turnover, but not simple functional trait combinations. Three broad patterns emerged: Higher Northern Diversity with southward migration, Higher Southern Diversity with northward migration, and Homogeneous Diversity with no directional migration. We also identified five (post hoc) species groups sharing gene flow and isolation-by-distance dynamics in relation to recognised biogeographic barriers. Replicated genetic signatures highlight how assembly processes emerge from interacting ecological and historical filters rather than single traits or biogeographic histories. We present a statistically replicable interpretational framework to identify shared evolutionary and ecological dynamics, offering scalable and management-relevant tools to support restoration planning and biodiversity conservation under environmental change across all types of vegetation.
Access this dataset on Dryad: https://doi.org/10.5061/dryad.fxpnvx131
Genomic and climate data and associated scripts for a comparative landscape genetic study involving 22 rainforest species. Genomic analyses were conducted on DArTseq data, and include estimating genetic diversity, migration patterns and genogeographic clustering, to identify replicated genetic patterns across landscape and species. Climate modelling analyses include modelling suitable habitat across three time periods - LGM (Last Glacial Maximum), the present and future 2090 climate scenarios SSP245 and SSP585, with the results used to validate landscape histories.
Description of the data and file structure
There are four main folders in this repository compressed into one .zip file: Rainforest_Landscape_genetics_data.zip.
This archive contains (1) R scripts used for analyses, (2) GIS related raster files and shapefiles used for climate modelling and figure visualisation, (3) data tables used for various analyses in this study, and (4) filtered dms files for each species used for initial data generation.
R Scripts
For the genetic analyses, R scripts were used to run and visualise the analyses mentioned in the study. The scripts are packaged into separate .R files, with the file title explaining the purpose of the scripts within each .R file.
The “R scripts” folder provides all mentioned scripts used in this study. R is required to input the data tables and run the analyses using the scripts provided. The scripts were created using version 4.3.1. Annotations are provided where possible.
Each script has an introductory section outlining the purpose of the script, as well as relevant in and output files.
There is a total of 9 R scripts provided:
- read dms and run divmigrate analyses across all species
- Read diversity metrics across species and generate a combined table
- Compare and visualise FST and IBD patterns across species
- Analyse divMigrate directionality and cluster groupings based on migration strength decay across species
- Regional AMOVA and exp het summaries and plots for species groups
- Genogeographic clustering and visualisation of normalised exp het groupings
- Build grouped GIS maps across climate scenarios
- Combine map panels for Figure 1
- Combined plots for main text and supplementary figs
GIS Related Data
Species Distribution Model (SDM) data was generated for the 22 rainforest species. Analyses were conducted in RStudio. LGM refers to “Last Glacial Maximum”
The “GIS” folder contains SDM raster files and shapefiles used for visualisation and analyses in the associated R scripts. The Shapefiles and SDM rasters (TIFF files) presented in this folder can be opened and used in any GIS software (e.g. QGIS) and in R or Python. A shapefile consists of multiple file types beyond the .shp (specifically, .cpg, .dbf, .prj, .sbn, and .sbx). The user only interacts directly with the .shp file but the other files need to be in the same directory
Within the GIS folder, there is a subfolder labelled “australia_polygon” which contains polygons of east-Australian states used to help visualise outputs in the R script “7. Build grouped GIS maps across climate scenarios” (see R Scripts folder), as well as to produce the final output observed in Fig. 4 of the main text. Within the “GIS” folder, TIFF raster files representing various SDM outputs are provided for each species. These files are used in the R script “7. Build grouped GIS maps across climate scenarios” (see R Scripts folder) and used to produce the final output observed in Fig. 4 of the main text.
These TIFF files follow the below naming convention:
_ _eastOZ_stable_areas_LGM_Current_equalSS.TIF
This file contains SDM outputs representing climatically stable areas from LGM to current conditions (intersection of LGM and current)
_ _mean_Current_eastOZ_threshold_equalSS.TIF
This file contains SDM outputs representing mean current conditions
_ _ mean_LGM_eastOZ_threshold_equalSS.TIF
This file contains SDM outputs representing mean LGM current conditions
_ _ssp245_2090_grandmean_eastOZ_threshold_equalSS .TIF
This file contains SDM outputs representing modelled future projections for the year 2090 using an SSP245 scenario (moderate mitigation)
_ _ssp585_2090_grandmean_eastOZ_threshold_equalSS. TIF
This file contains SDM outputs representing modelled future projections for the year 2090 using an SSP585 scenario (high emissions)
Data Tables
The “Data tables used for analyses” folder provides all necessary summary data tables for downstream analyses used in the associated R Scripts. This includes:
SpeciesSummaryTable.csv
This table contains all filtering parameters and metadata columns (site/species columns) applied to each genotype matrix for each species. This table also outlines the number of filtered SNPs, number of samples and number of sites within each genotype matrix used for each species. The following columns and associated descriptions contain metrics used for downstream analyses in this study:
“Species” – full species name
“Species_Short” – short code version of each species used in R scripts
“spcol” - the metadata species column used for each genotype matrix
“sitecol” - the metadata site column used for each genotype matrix
“MAF” - the minor allele frequency applied to each genotype matrix
“Locus_Missing” - remove loci with missingness higher than this value
“sample_Missing” - remove samples with missingness higher than this value
“Filtered_SNPS” - the final number filtered SNPS used for each genotype matrix
combined_species_all_divmigrate_values_noBoots_ALLSp.csv
This table outlines all pairwise divMigrate data generated for each species (using R script 1- read dms and run divmigrate analyses across all species). The following columns and associated descriptions contain metrics used for downstream analyses in this study:
“from” - the source site of the divmigrate pairwise comparison
“to” - the destination (sink) site of the divmigrate pairwise comparison
“d” - Jost’s D DivMigrate metric used for analyses
“lat_to” - average latitude of the destination site of the divmigrate pairwise comparison
“long_to” - average longitude of the destination site of the divmigrate pairwise comparison
“lat_from” - average latitude of the source site of the divmigrate pairwise comparison
“long_from” - average longitude of the source site of the divmigrate pairwise comparison
“species” - target species
Divtab.csv
This table outlines all relevant diversity metrics generates per species for this study (generated using script 2 - Read diversity metrics across species and generate a combined table). The following columns and associated descriptions contain metrics used for downstream analyses in this study:
“species” - target species (short code version)
“NSWRFGrouping” - target site
“exp_het” - expected heterozygosity metric obtained from species specific analysis (using the diveRsity R package)
“n” - number of samples per site
“lat” - average latitude of taget site
“long" - average longitude of target site
“exp_het_norm” - normalised expected heterozygosity (He/max(He) within each species)
RRSpeciesParams_271023.csv
This table outlines the species distribution modelling for species of this study, with the species that are part of a broad pattern or grouping based on spatial genetic data indicated. The resulting columns and associated descriptions of each header name are provided:
species - the abbreviated species name
fullsp- full species name
statusRgrp - broad patterns based on the genetic analyses (genetic diversity and migration)
status - species groupings based on shared gene flow and IBD patterns
Filtered dms files
This folder contains all relevant filtered dms files (genotype matrix) for each species used for initial data generation.
These files were generated using the workflow outlined in the paper, and are used as input files for the R script “1. read dms and run divmigrate analyses across all species” (see R Scripts folder). Each file is named according to the following convention: _ _dms.Rdata
These .Rdata files can be opened and viewed in R or Rstudiuo using the readRDS() function. Each file contains a list of two objects: 1) the filtered genotype matrix for each species, and 2) the associated metadata table for each species.
Changes after Apr 15, 2026:
Added a folder called "Filtered dms files" with filtered genotype matrices for each species as requested by the editor.
The README has been ammended accordingly to describe these dms files and how they were generated. The method in which to read the dms files into Rstudio is also provided in the README.
