Data and code from: The Great Dividing Range as a driver of genetic divergence in a low-dispersing dragonfly
Data files
Jun 02, 2026 version files 3.54 GB
-
README.md
44.18 KB
-
Supplemental_Information.zip
3.54 GB
Abstract
Across complex landscapes, genetic structure can arise through a combination of geographic, ecological, and historical processes. In mountain systems, isolation by distance (IBD), environment (IBE), and resistance (IBR) represent three such mechanisms, but are often difficult to distinguish. The Great Dividing Range (GDR) of eastern Australia provides an ideal system for evaluating their relative contributions because of its varying elevation and climatic gradients. Here, we investigate the drivers of genetic structure in the Swamp Tigertail (Synthemis eustalacta) using an integrative framework combining population genomics, demographic analysis, morphology, and ecological data. Results showed overall low genetic differentiation and no evidence for discrete population structure. Linear mixed‐effect models provided no support for IBD but instead showed IBE and IBR, in which any observed genetic differentiation was correlated with environmental gradients, particularly precipitation and temperature extremes, vegetative cover, and landscape resistance. Morphological analyses revealed minimal variation among sampled sites, but several traits were sexually dimorphic, suggesting potential sex‐biased dispersal. Demographic reconstructions and paleo‐niche models suggest long‐term population stability, with shifts in suitable habitat following during past climatic oscillations. Together, these results highlight ecological and topographic isolation as key mechanisms shaping genetic variation in S. eustalacta and that montane dragonfly populations may maintain gene flow through climate cycles by tracking suitable conditions across elevation rather than persisting in long-isolated refugia.
We have submitted:
Genomic data files used for population genomic analyses, including raw and trimmed FASTQ files, BAM alignments and index files, VCF files, pseudogenome reference files, locus alignments, PLINK/ADMIXTURE input and output files, sample filtering files, and shell scripts used for dataset generation and filtering (Genomic_Data).
Environmental and spatial data used for linear modeling, including climatic variables, elevation rasters, vegetation rasters, resistance rasters, drainage basin and river shapefiles, lookup tables, figures, and R scripts used to test relationships between genomic, morphological, geographic, and environmental variables (Linear_Models).
Morphological analysis files, including wing measurement data, wing photographs, figures, and R scripts used to quantify and analyze morphological variation among specimens and populations (Morphology_Analysis).
Paleoclimate ecological niche modeling files, including current CHELSA climate data, Last Glacial Maximum, Last Interglacial, and Mid-Holocene bioclimatic raster layers, model prediction rasters, response curves, and R scripts used to evaluate historical and contemporary habitat suitability (Paleo_Niche_Model).
Principal component analysis and genetic differentiation files, including R scripts and figures used to evaluate population structure and FST-based genetic differentiation among sampling localities (PCA_Fst_Analyses).
Sampling and locality metadata, including locality information and population assignment files used across genomic, morphological, and environmental analyses (Spreadsheets).
Stairway Plot files used to infer historical changes in effective population size, including folded site-frequency spectrum input files, blueprint files, replicate output files, final demographic output files, figures, and program documentation (Stairway_Plot).
Tajima’s D analysis files, including VCF inputs, sliding-window outputs, site-level summaries, bootstrap outputs, statistical test results, figures, and scripts used to evaluate deviations from neutral expectations among populations (Tajima_D_Analysis).
Data repository description, folder summaries, and file organization information for this Dryad submission (READ_ME.md).
A zipped file of the data described above (Supplemental_Information.zip)
Data Formats
Spreadsheet and metadata files:
- .csv: Comma-separated files used for locality metadata, population assignments, wing measurements, environmental lookup tables, and summary files generated during downstream analyses (Locality_Information.csv, Wing_Measurements.csv, basin_name_lookup.csv, river_name_lookup.csv, vegedata_key.csv).
- .tsv: Tab-separated files used for pseudogenome or locus summary reports (pseudogenome_92loci_min10pct_tiesN.report.tsv).
Text files:
- .txt: Plain-text files containing sample lists, population files, filtering records, ADMIXTURE cross-validation summaries, BAM/FASTQ lists, and program output summaries (Population_File.txt, bad_samples.txt, bam_list.txt, fastq_list.txt, admixture_CV_summary.txt).
R and shell scripts:
- .R: R scripts used for PCA/FST analyses, linear models, morphology analyses, paleoclimate niche modeling, shapefile preparation, and site-frequency spectrum generation (PCA_Fst_Analyses.R, Linear_Models.R, Morphology_Analysis.R, Paleo_ENM.R, shapefile_making.R, sfs_code_create.R).
- .sh: Shell scripts used for consensus sequence generation, sequence length summaries, VCF filtering, and Stairway Plot execution (consensus.sh, length.sh, vcf_filter.sh, eustalacta.blueprint.sh, eustalacta.blueprint.plot.sh).
Cluster job and output files:
- .job: SLURM job scripts submitted to a computing cluster for batch processing (sort.job).
- .out: Standard output files generated by SLURM jobs run on a computing cluster (slurm-9586985_1.out).
Sequence, alignment, and reference genome files:
- .fas/.fasta: FASTA-format sequence files, including locus alignments and pseudogenome reference files used for genomic analyses (L6.fas, L18.fas, pseudogenome_92loci_min10pct_tiesN.fasta).
- .fai: FASTA index files generated for reference sequence files (pseudogenome_92loci_min10pct_tiesN.fasta.fai).
- .amb/.ann/.bwt/.pac/.sa: BWA index files generated for the pseudogenome reference and used for read mapping (pseudogenome_92loci_min10pct_tiesN.fasta.amb, .ann, .bwt, .pac, .sa).
Read and alignment files:
- .fastq.gz: Compressed raw and trimmed sequencing read files used for genomic analyses (P001_WG12_R1.fastq.gz, P001_WG12_R1_trimmed.fastq.gz).
- .bam: Binary alignment files containing mapped sequencing reads (P001_WG12.bam, P025_WA01.bam).
- .bai: BAM index files used for accessing mapped read alignments (P001_WG12.bam.bai).
- .flagstat.txt/.idxstats.txt: Alignment summary statistics generated from BAM files (P001_WG12.flagstat.txt, P001_WG12.idxstats.txt).
Variant and population genomic files:
- .vcf/.vcf.gz: Variant Call Format files containing SNP datasets used for filtering, PCA, FST, ADMIXTURE, Tajima’s D, and demographic analyses (eustalacta_filter.vcf, eustalacta_raw.vcf.gz, eustalacta_tajima.vcf.gz).
- .csi: Index files associated with compressed VCF files (eustalacta_filter.vcf.gz.csi).
- .bed/.bim/.fam: PLINK binary genotype files used for PCA, ADMIXTURE, and population genomic analyses (admixture_input.bed, admixture_input.bim, admixture_input.fam).
- .P/.Q: ADMIXTURE output files containing inferred allele frequencies and individual ancestry proportions across tested K values (admixture_input.fixed.2.P, admixture_input.fixed.2.Q).
- .prune.in/.prune.out: PLINK linkage-disequilibrium pruning output files listing retained and removed variants (eustalacta_ld.prune.in, eustalacta_ld.prune.out).
- .het/.imiss/.lmiss/.nosex: PLINK quality-control output files summarizing heterozygosity, individual missingness, locus missingness, and sample sex-status information where applicable (qc_raw.het, qc_raw.imiss, qc_raw.lmiss, qc_raw.nosex).
Spatial, raster, and ecological niche modeling files:
- .tif: Raster files used for climatic variables, elevation, vegetation, resistance surfaces, drainage basins, and ecological niche model predictions (bio_1.tif, elev_mod.tif, resistance_raster.tif, modern_prediction_cloglog.tif).
- .shp/.shx/.dbf/.prj/.cpg/.qmd: Shapefile components used for drainage basin and river spatial layers (Drainage_Basins.shp, Rivers.shp).
Image, figure, and report files:
- .HEIC: Wing photograph files used for morphological measurement and analysis (IMG_6073.HEIC, IMG_6075.HEIC).
- .svg: Vector graphics generated from ecological niche model response curves (bio_4_response.svg, bio_6_response.svg).
- .html/.json: Quality-control report files generated during read processing (P001_WG12.html, P001_WG12.json).
- .pdf/.docx: Documentation files associated with Stairway Plot analyses (READMEv2.1.pdf, READMEv2.1.docx).
Stairway Plot files:
- .blueprint: Stairway Plot blueprint file specifying demographic model parameters and input settings (eustalacta.blueprint).
- .addTheta: Stairway Plot output files containing theta-adjusted replicate results from demographic analyses (S_eustalacta_ind_filt_folded-1.22_0.67.addTheta).
- Files with no extension: Stairway Plot folded site-frequency spectrum input and replicate output files (S_eustalacta_ind_filt_folded, S_eustalacta_ind_filt_folded-1).
Log files:
- .log: Log files generated by genomic filtering, PLINK, ADMIXTURE, and related command-line analyses (admixture_input.log, admixture_K1.log, eustalacta_ld.log, qc_raw.log).
Additional repository files:
- .md: Markdown files used to describe the repository structure and contents (READ_ME.md).
Recommended Software for Data Analysis
Analyses in Genomic_Data require standard command-line bioinformatics software for read processing, read mapping, variant filtering, and population genomic file conversion. Recommended software includes SAMtools/BCFtools, BWA, VCFtools, PLINK, ADMIXTURE, and a Unix/Linux shell environment for running included shell scripts.
Analyses in PCA_Fst_Analyses require R and RStudio, along with R packages for population genomic summaries, ordination, data manipulation, and figure generation. These analyses use VCF- and PLINK-derived files generated from the genomic dataset.
Analyses in Tajima_D_Analysis require VCFtools or equivalent software for calculating windowed Tajima’s D from VCF files, along with R and RStudio for summarizing window-level estimates, conducting bootstrap analyses, running statistical tests, and generating figures.
Analyses in Stairway_Plot require Stairway Plot v2.1.3 and Java. Included files contain folded site-frequency spectrum inputs, blueprint files, shell scripts, replicate outputs, final demographic outputs, and figures.
Analyses in Linear_Models require R and RStudio, along with R packages for linear modeling, raster processing, vector spatial data processing, environmental data extraction, and figure generation. These analyses use climatic raster layers, elevation rasters, vegetation rasters, resistance surfaces, drainage basin shapefiles, and river shapefiles.
Analyses in Morphology_Analysis require R and RStudio for analyzing wing measurement data and generating figures. Image-viewing or measurement software may also be used to inspect the included wing photographs such as Fiji (https://imagej.net/software/fiji/downloads).
Analyses in Paleo_Niche_Model require R and RStudio, along with R packages for ecological niche modeling, raster processing, spatial data manipulation, and figure generation. These analyses use current CHELSA climate layers, paleoclimate bioclimatic rasters, ecological niche model prediction rasters, and response-curve outputs.
Spatial raster and shapefile data in Linear_Models and Paleo_Niche_Model can also be viewed and inspected using GIS software such as QGIS or ArcGIS.
Usage, Compatibility, and Accessibility
This dataset encompasses a wide breadth of file formats due to the span of analyses conducted within this manuscript, including population genomic, morphological, demographic, ecological niche modeling, and landscape/environmental analyses.
The majority of tabular files, scripts, logs, and summary outputs can be viewed using standard text-editing software such as BBEdit (https://www.barebones.com/products/bbedit/index.html). Raster and shapefile data can be viewed using GIS software such as QGIS or ArcGIS. Figures can be viewed using standard image or vector-graphics software. All programs, code, and data used in this repository are publicly available and free to use unless otherwise noted by the software developers.
Researchers are encouraged to use the original data formats, repeat analyses using our data to evaluate reliability and reproducibility, and email the corresponding author with any additional questions or points of confusion.
We are committed to providing support, clarity, and guidance to facilitate the effective use of these data, with the goal of supporting future analyses using this dataset and related phylogeographic, population genomic, and ecological datasets.
Descriptions
Spreadsheets
Locality_Information.csv
Specimen locality and metadata file used to organize sampling information for Synthemis eustalacta specimens included in downstream genomic, morphological, and environmental analyses.
- QUBIT: DNA concentration estimate for the specimen extraction.
- GEODE CODE: GEODE specimen code, including taxonomic information associated with the sequenced specimen.
- Field_code: Field or collection code assigned to the specimen.
- Order: Taxonomic rank of the specimen at the ordinal level.
- FAMILY: Taxonomic rank of the specimen at the family level.
- GENUS: Taxonomic rank of the specimen at the genus level.
- SPECIES: Taxonomic rank of the specimen at the species level.
- Scientific name: Full scientific name of the specimen.
- date: Collection date as recorded from specimen label data.
- Year: Calendar year of specimen collection.
- latin date: Collection date formatted using roman numeral month notation.
- Sex: Sex of the specimen, when available.
- Age: Life stage or age class of the specimen.
- Preservation: Preservation method of the specimen.
- Country: Country where the specimen was collected.
- State: State or territory where the specimen was collected.
- Park: Park, reserve, or broader named collection area where the specimen was collected.
- Locality: Descriptive locality where the specimen was collected.
- Elevation_ft: Elevation of the collection locality in feet.
- Elevation_m: Elevation of the collection locality in meters.
- Longitude: Longitude coordinate of the collection locality.
- Latitude: Latitude coordinate of the collection locality.
Population_File.txt
Population assignment and sample metadata file used to link individual sequencing samples to BAM files, locality coordinates, population assignments, and GEODE specimen identifiers for population genomic analyses.
- Sample_ID: Sample identifier used for sequencing and downstream genomic analyses.
- BAM_File: BAM alignment file associated with each sample.
- Row_in_Q_File: Plate row assignment for the individual during AHE sequencing.
- Family: Taxonomic rank of the specimen at the family level.
- Genus: Taxonomic rank of the specimen at the genus level.
- Species: Taxonomic rank of the specimen at the species level.
- CODE: Full specimen code containing specimen identifier and taxonomic information.
- putative_species: Putative species assignment used in downstream analyses.
- pop: Population or sampling locality assignment for each specimen.
- Long: Longitude coordinate for the specimen or sampling locality.
- Lat: Latitude coordinate for the specimen or sampling locality.
- GEODE_Code: GEODE specimen identifier associated with each sample.
Genomic_Data
92_loci_genome
Folder containing aligned AHE loci and files used to generate a pseudogenome reference from 92 anchored hybrid enrichment loci. The aligned AHE loci follow approaches used in Goodman et al. 2023, 2026.
- consensus.sh: Shell script containing an embedded Python script used to generate a pseudogenome consensus sequence from the aligned AHE locus files. The script reads all
L*.fasalignments, filters columns based on the minimum fraction of taxa with real nucleotide bases, assigns consensus bases by majority rule, assignsNto tied sites, and writes a pseudogenome FASTA file and summary report. - L.fas*: FASTA-format aligned AHE locus files used as input for pseudogenome construction. Each file represents one aligned locus. Example: L6.fas.
- length.sh: Shell script used to summarize or evaluate sequence/locus lengths during pseudogenome preparation.
- pseudogenome_92loci_min10pct_tiesN.fasta: Consensus pseudogenome FASTA file generated from the 92 aligned AHE loci. Columns were retained based on a minimum taxon-presence threshold, and tied consensus sites were coded as
N. - pseudogenome_92loci_min10pct_tiesN.fasta.amb: BWA index auxiliary file storing information about ambiguous bases in the pseudogenome reference sequence.
- pseudogenome_92loci_min10pct_tiesN.fasta.ann: BWA index annotation file storing reference sequence names, lengths, and related indexing information for the pseudogenome.
- pseudogenome_92loci_min10pct_tiesN.fasta.bwt: BWA Burrows-Wheeler Transform index file used during read alignment to the pseudogenome reference.
- pseudogenome_92loci_min10pct_tiesN.fasta.fai: FASTA index file generated for the pseudogenome reference, storing contig names, sequence lengths, and file offsets for rapid sequence retrieval.
- pseudogenome_92loci_min10pct_tiesN.fasta.pac: BWA packed-sequence file containing the pseudogenome reference sequence in a compressed binary format used during alignment.
- pseudogenome_92loci_min10pct_tiesN.fasta.sa: BWA suffix-array index file used with the Burrows-Wheeler Transform index for efficient read mapping to the pseudogenome.
- pseudogenome_92loci_min10pct_tiesN.report.tsv: Tab-separated report generated by
consensus.sh, with one row per AHE locus. Columns include locus name, number of taxa represented in the alignment, minimum number of taxa required to retain an alignment column, original aligned length, number of retained columns, number of dropped columns, and final consensus sequence length.
bam_files
Folder containing BAM alignment files, BAM index files, alignment summary statistics, VCF files, SNP filtering outputs, PLINK files, ADMIXTURE outputs, and scripts used for variant calling, genotype filtering, PCA/ADMIXTURE preparation, Tajima’s D input generation, and downstream population genomic analyses.
- vcf_filter.sh: Shell script used to call and filter SNPs from BAM files mapped to the 92-locus pseudogenome reference. The script uses
bcftools mpileupandbcftools callto generate a raw VCF, performs quality-control summaries with PLINK, identifies individuals with high missingness, extracts biallelic SNPs, adds INFO tags for missingness and minor allele frequency, creates separate VCF files for Tajima’s D/Stairway Plot and PCA/ADMIXTURE analyses, performs LD pruning, converts the LD-pruned VCF to PLINK binary format, and runs ADMIXTURE for K = 1–10. - bam_list.txt: Text file listing BAM files used as input for variant calling in
vcf_filter.sh. - bad_samples.txt: Text file listing individuals removed due to high missingness during SNP filtering. In
vcf_filter.sh, individuals with missingness greater than 0.70 are written to this file and excluded from downstream filtered VCFs. - P001_WG12.bam, P025_WA01.bam, etc.: BAM alignment files containing sequencing reads mapped to the 92-locus pseudogenome reference. Only representative files are listed here; the folder contains BAM files for each sequenced individual.
- P001_WG12.bam.bai, P025_WA01.bam.bai, etc.: BAM index files associated with each BAM alignment file. These files allow rapid access to mapped reads by genomic coordinate.
- P001_WG12.flagstat.txt, P025_WA01.flagstat.txt, etc.: SAMtools flagstat summary files reporting alignment statistics for each BAM file, including total reads, mapped reads, and related mapping-quality summaries.
- P001_WG12.idxstats.txt, P025_WA01.idxstats.txt, etc.: SAMtools idxstats summary files reporting mapped and unmapped reads across pseudogenome contigs for each BAM file.
- eustalacta_raw.vcf.gz: Raw compressed VCF file generated from BAM files using
bcftools mpileupandbcftools call. - eustalacta_raw.biallelic_snps.vcf.gz: Compressed VCF containing only biallelic SNPs extracted from the raw VCF.
- eustalacta_raw.biallelic_snps.tags.vcf.gz: Biallelic SNP VCF with INFO tags added for site-level missingness and minor allele frequency.
- eustalacta_ind_filt.vcf.gz: Individual-filtered VCF generated after removing low-quality individuals listed in
bad_samples.txt. - eustalacta_ind_filt.tags.vcf.gz: Individual-filtered VCF with INFO tags recalculated after sample removal.
- eustalacta_tajima.vcf and eustalacta_tajima.vcf.gz: VCF files filtered for Tajima’s D and Stairway Plot analyses. These files were filtered using missingness and depth thresholds but do not include a minor allele frequency filter, preserving rare variants for neutrality and demographic analyses.
- eustalacta_filter.vcf and eustalacta_filter.vcf.gz: VCF files filtered for PCA and ADMIXTURE analyses. These files include a minor allele frequency filter, missingness filter, and minimum depth filter.
- eustalacta_ldpruned.vcf.gz: LD-pruned VCF file generated from
eustalacta_filter.vcf.gzand used for PCA and ADMIXTURE analyses. - .csi: Index files associated with compressed VCF files. These allow rapid querying of VCF records by genomic coordinate.
- qc_raw.imiss: PLINK individual-missingness summary file generated from the raw VCF.
- qc_raw.lmiss: PLINK locus-missingness summary file generated from the raw VCF.
- qc_raw.het: PLINK heterozygosity summary file generated from the raw VCF.
- qc_raw.log: PLINK log file generated during raw VCF quality-control analyses.
- qc_raw.nosex: PLINK file listing samples with missing or unspecified sex information.
- eustalacta_ld.prune.in: PLINK output file listing SNPs retained after LD pruning.
- eustalacta_ld.prune.out: PLINK output file listing SNPs removed during LD pruning.
- eustalacta_ld.log and eustalacta_ldpruned.log: PLINK log files generated during LD pruning and LD-pruned VCF creation.
- eustalacta_ld.nosex and eustalacta_ldpruned.nosex: PLINK files listing samples with missing or unspecified sex information during LD-pruning steps.
- admixture_input.bed, admixture_input.bim, and admixture_input.fam: PLINK binary genotype files generated from the LD-pruned VCF and used as input for ADMIXTURE.
- admixture_input.fixed.bed, admixture_input.fixed.bim, and admixture_input.fixed.fam: Modified PLINK binary genotype files prepared for ADMIXTURE. Chromosome identifiers in the BIM file were recoded numerically because ADMIXTURE requires numeric chromosome labels.
- admixture_input.fixed.1.P through admixture_input.fixed.10.P: ADMIXTURE output files containing inferred allele frequencies for each tested number of genetic clusters, K = 1–10.
- admixture_input.fixed.1.Q through admixture_input.fixed.10.Q: ADMIXTURE output files containing individual ancestry proportions for each tested number of genetic clusters, K = 1–10.
- admixture_K1.log through admixture_K10.log: ADMIXTURE log files generated for each tested K value.
- admixture_CV_summary.txt: Text file summarizing ADMIXTURE cross-validation error values across K = 1–10.
- admixture_input.log and admixture_input.nosex: PLINK log and sample sex-status files generated during preparation of ADMIXTURE input files.
- out.log: General command-line output log associated with variant filtering or downstream genomic processing.
fastq_files
Folder containing raw and trimmed paired-end FASTQ files, unpaired read files, read-processing quality-control reports, SLURM output files, and the job script used to align trimmed reads to the 92-locus pseudogenome reference.
- sort.job: SLURM array job script used to align trimmed paired-end FASTQ files to the 92-locus pseudogenome reference. The script reads sample names from
fastq_list.txt, identifies the corresponding_R1_trimmed.fastq.gzand_R2_trimmed.fastq.gzfiles, aligns reads withbwa mem, sorts alignments withsamtools sort, indexes the resulting BAM file, and generatesflagstatandidxstatsalignment summaries for each sample. - fastq_list.txt: Text file listing sample identifiers used by
sort.jobto select samples for SLURM array-based read alignment. - P001_WG12_R1.fastq.gz, P025_WA01_R1.fastq.gz, etc.: Compressed raw forward-read FASTQ files for each sequenced individual. Only representative files are listed here; the folder contains one raw R1 FASTQ file for each sample.
- P001_WG12_R2.fastq.gz, P025_WA01_R2.fastq.gz, etc.: Compressed raw reverse-read FASTQ files for each sequenced individual. Only representative files are listed here; the folder contains one raw R2 FASTQ file for each sample.
- P001_WG12_R1_trimmed.fastq.gz, P025_WA01_R1_trimmed.fastq.gz, etc.: Compressed trimmed forward-read FASTQ files used as input for read alignment to the pseudogenome reference.
- P001_WG12_R2_trimmed.fastq.gz, P025_WA01_R2_trimmed.fastq.gz, etc.: Compressed trimmed reverse-read FASTQ files used as input for read alignment to the pseudogenome reference.
- P001_WG12_unpaired_R1.fastq.gz, P025_WA01_unpaired_R1.fastq.gz, etc.: Compressed unpaired forward reads retained after trimming when the corresponding reverse read was discarded.
- P001_WG12_unpaired_R2.fastq.gz, P025_WA01_unpaired_R2.fastq.gz, etc.: Compressed unpaired reverse reads retained after trimming when the corresponding forward read was discarded.
- P001_WG12.html, P025_WA01.html, etc.: HTML quality-control reports generated during read processing for each sample.
- P001_WG12.json, P025_WA01.json, etc.: JSON-format quality-control reports generated during read processing for each sample.
- slurm-9586985_1.out through slurm-9586985_54.out: Standard output files generated by the SLURM array job during read alignment. Each file corresponds to one array task/sample processed by
sort.job.
Genomic_Analysis.R
R script used to analyze population genomic structure from an LD-pruned SNP dataset. The script imports VCF and metadata files, performs genetic PCA and SNP clustering, tests population differentiation with ANOVA and AMOVA, calculates global and pairwise FST statistics, and exports summary tables and figures.
Figures
Folder containing summary tables generated from Fst Analyses.
Linear_Models
Folder containing environmental raster layers, elevation data, resistance rasters, resistance shapefiles, figures, and R scripts used to test relationships among genetic differentiation, geography, elevation, environmental variation, and landscape resistance.
Climatic_Variables
Folder containing raster layers used as environmental predictors in linear model and environmental distance analyses.
- bio_1.tif through bio_19.tif: Bioclimatic raster layers used to characterize environmental variation among sampling localities.
- evapotranspiration.tif: Raster layer representing evapotranspiration used as an environmental predictor.
- NDVI.tif: Normalized Difference Vegetation Index raster used as a vegetation/environmental predictor.
- Perc_NonTree.tif: Raster layer representing percent non-tree cover.
- Perc_NonVeg.tif: Raster layer representing percent non-vegetated cover.
- Perc_TreeCov.tif: Raster layer representing percent tree cover.
- vegedata.tif: Vegetation raster used as an environmental predictor in landscape and environmental analyses.
Elevation
Folder containing elevation data used to calculate elevation differences among population sampling localities.
- elev_mod.tif: Elevation raster used to extract population-level elevation values and calculate pairwise elevational distances among populations.
Figures
Folder containing figures and summary outputs generated from the linear model analyses.
Resistance_Rasters
Folder containing rasterized drainage basin, river, vegetation, and resistance-surface files used to calculate least-cost distances among populations.
- basin_name_lookup.csv: Lookup table linking numeric drainage basin raster IDs to drainage basin names.
- river_name_lookup.csv: Lookup table linking numeric river raster IDs to river names and river hierarchy classes.
- vegedata_key.csv: Lookup table describing vegetation raster classes used in resistance-surface construction.
- d8-9s.tif: Raster template used for generating drainage basin and river raster layers.
- Dbasin.tif: Rasterized drainage basin layer generated from drainage basin shapefiles.
- Dbasin_outline.tif: Rasterized drainage basin boundary layer representing watershed divides or basin outlines.
- major.tif: Rasterized major river layer generated from river shapefiles.
- minor.tif: Rasterized minor river layer generated from river shapefiles.
- vegedata.tif: Vegetation raster used in resistance-surface construction.
- resistance_raster.tif: Composite resistance raster generated from major rivers, minor rivers, drainage basin boundaries, and vegetation data. This raster was used to calculate least-cost distances among population sampling localities.
Resistance_Shapefiles
Folder containing vector spatial files and an R script used to generate raster layers for landscape resistance analyses.
- shapefile_making.R: R script used to convert drainage basin and river shapefiles into raster layers. The script reads drainage basin and river shapefiles, creates lookup tables for basin and river IDs, rasterizes drainage basins, separates major and minor rivers, and generates a drainage basin outline raster for resistance analyses.
- Drainage_Basins.shp: Main shapefile geometry file containing drainage basin polygons.
- Drainage_Basins.shx: Shapefile index file associated with the drainage basin shapefile.
- Drainage_Basins.dbf: Attribute table associated with the drainage basin shapefile.
- Drainage_Basins.prj: Projection information file associated with the drainage basin shapefile.
- Drainage_Basins.cpg: Character encoding file associated with the drainage basin shapefile.
- Drainage_Basins.qmd: Metadata file associated with the drainage basin shapefile.
- Rivers.shp: Main shapefile geometry file containing river line features.
- Rivers.shx: Shapefile index file associated with the river shapefile.
- Rivers.dbf: Attribute table associated with the river shapefile.
- Rivers.prj: Projection information file associated with the river shapefile.
- Rivers.cpg: Character encoding file associated with the river shapefile.
- Rivers.qmd: Metadata file associated with the river shapefile.
- d8-9s.tif: Raster template used for spatial alignment when rasterizing drainage basin and river shapefiles.
Linear_Models.R
R script used to perform genetic distance and landscape/environmental association analyses. The script reads the LD-pruned VCF and population metadata, calculates pairwise FST among populations, computes geographic distances, extracts environmental raster values, performs environmental PCA, constructs a composite resistance surface, calculates least-cost distances, fits isolation-by-distance, isolation-by-elevation, and mixed-effects models for environmental/resistance predictors, compares models with AICc and R², and generates figures and summary tables.
PCA_Fst_Analyses
Folder containing R scripts, figures, and summary outputs used to evaluate genetic population structure and genetic differentiation among sampled populations.
PCA_Fst_Analyses.R
R script used to conduct genetic PCA, population-level PCA summaries, clustering, AMOVA-style permutation tests, and FST analyses. The script reads the LD-pruned VCF from Genomic_Data/bam_files/eustalacta_ldpruned.vcf.gz and sample metadata from Spreadsheets/Population_File.txt, converts VCF data into genetic objects for downstream analysis, calculates principal components, identifies SNP-space clusters, summarizes PC scores by population, tests population-level differences in PC axes, estimates population structure using permutation-based distance analyses, calculates global and pairwise FST, and exports FST summary tables.
Figures
Folder containing figures and output tables generated by PCA_Fst_Analyses.R.
Tajima_D_Analysis
Folder containing scripts, per-population sample lists, per-population Tajima’s D output files, log files, figures, and summary outputs used to estimate and visualize Tajima’s D across sampled populations.
tajima_D_Code.sh
Shell script used to run vcftools --TajimaD for each population. The script uses Genomic_Data/bam_files/eustalacta_tajima.vcf.gz as input, reads per-population sample lists from tajimasD/*.txt, skips populations with fewer than two samples, and calculates Tajima’s D in 10,000 bp windows for each eligible population.
Tajima_D.R
R script used to prepare per-population sample lists, read and summarize Tajima’s D output files, filter finite Tajima’s D windows, calculate population-level mean Tajima’s D, perform block bootstrapping, run binomial sign tests, extract elevation values, test associations between Tajima’s D and elevation, and generate final figures. The script reads Genomic_Data/bam_files/eustalacta_tajima.vcf.gz, Spreadsheets/Population_File.txt, and Linear_Models/Elevation/elev_mod.tif, then writes outputs to Tajima_D_Analysis/tajimasD and Tajima_D_Analysis/Figures.
tajimasD
Folder containing per-population sample lists, Tajima’s D output files, and log files generated by vcftools.
- Armidale_Dist.txt, Boyd_River.txt, Cascades_Campground.txt, etc.: Per-population sample list files used with the
vcftools --keepoption. Each file contains sample IDs assigned to that population. - Cascades_Campground.Tajima.D, Gloucester_Tops.Tajima.D, Lake_Catalina.Tajima.D, etc.: Per-population Tajima’s D output files generated by VCFtools. These files contain window-level Tajima’s D estimates calculated in 10,000 bp windows.
- Cascades_Campground.log, Gloucester_Tops.log, Lake_Catalina.log, etc.: VCFtools log files generated during per-population Tajima’s D analyses.
Figures
Folder containing figures and output summaries generated by Tajima_D.R.
Stairway_Plot
Folder containing scripts, input files, output files, program documentation, and figures used to infer historical changes in effective population size from the folded site-frequency spectrum.
sfs_code_create.R
R script used to generate a folded site-frequency spectrum from Genomic_Data/bam_files/eustalacta_tajima.vcf.gz. The script reads the VCF, extracts genotypes, calculates allele counts per site, determines the number of sampled chromosomes, folds the site-frequency spectrum because no outgroup-polarized ancestral state was used, and reports callable variant sites for diagnostic purposes.
Stairway_plot.R
R script used to read Stairway Plot summary outputs from replicate runs, combine effective population size trajectories across random seeds, and generate the final demographic history plot. The script reads .summary files from stairway_plot_v2.1.3/S_eustalacta_ind_filt_folded, plots median effective population size with 12.5 % and 87.5 % intervals, marks key temporal reference points, and exports the figure to Stairway_Plot/Figures/p_stairway.svg.
READMEv2.1.docx and READMEv2.1.pdf
Stairway Plot documentation files included with the analysis directory.
Figures
- p_stairway.svg: Final Stairway Plot figure showing inferred historical changes in effective population size through time.
stairway_plot_v2.1.3
Folder containing Stairway Plot input files, shell scripts, replicate outputs, final summaries, documentation, and executable program files.
- eustalacta.blueprint: Stairway Plot blueprint file specifying input settings and demographic model parameters for the folded site-frequency spectrum analysis.
- eustalacta.blueprint.sh: Shell script generated from the Stairway Plot blueprint and used to run the Stairway Plot analysis.
- eustalacta.blueprint.plot.sh: Shell script used to summarize and plot Stairway Plot output files.
- READMEv2.1.docx and READMEv2.1.pdf: Stairway Plot documentation files included with the program directory.
stairway_plot_v2.1.3/S_eustalacta_ind_filt_folded
Folder containing folded site-frequency spectrum inputs, replicate outputs, final output files, and summary plots for the Stairway Plot analysis.
- input/S_eustalacta_ind_filt_folded-1 through input/S_eustalacta_ind_filt_folded-100: Stairway Plot input replicate files generated from the folded site-frequency spectrum.
- final/S_eustalacta_ind_filt_folded-1.22_0.67.addTheta through final/S_eustalacta_ind_filt_folded-100.22_0.67.addTheta: Final theta-adjusted output files generated from the Stairway Plot analysis.
- rand22/S_eustalacta_ind_filt_folded-1.22_0.67.addTheta through rand22/S_eustalacta_ind_filt_folded-100.22_0.67.addTheta: Replicate theta-adjusted output files generated from the random seed 22 analysis.
- rand43/S_eustalacta_ind_filt_folded-1.43_0.67.addTheta through rand43/S_eustalacta_ind_filt_folded-100.43_0.67.addTheta: Replicate theta-adjusted output files generated from the random seed 43 analysis.
- rand65/S_eustalacta_ind_filt_folded-1.65_0.67.addTheta through rand65/S_eustalacta_ind_filt_folded-100.65_0.67.addTheta: Replicate theta-adjusted output files generated from the random seed 65 analysis.
- rand86/S_eustalacta_ind_filt_folded-1.86_0.67.addTheta through rand86/S_eustalacta_ind_filt_folded-100.86_0.67.addTheta: Replicate theta-adjusted output files generated from the random seed 86 analysis.
- S_eustalacta_ind_filt_folded.final.summary: Final Stairway Plot summary file containing median effective population size estimates and confidence intervals through time.
- S_eustalacta_ind_filt_folded.final.summary.pdf and S_eustalacta_ind_filt_folded.final.summary.png: Summary plots generated from the final Stairway Plot output.
- S_eustalacta_ind_filt_folded.rand22.summary, S_eustalacta_ind_filt_folded.rand43.summary, S_eustalacta_ind_filt_folded.rand65.summary, and S_eustalacta_ind_filt_folded.rand86.summary: Stairway Plot summary files generated from replicate analyses using different random seeds.
- S_eustalacta_ind_filt_folded.rand22.summary.pdf/.png, S_eustalacta_ind_filt_folded.rand43.summary.pdf/.png, S_eustalacta_ind_filt_folded.rand65.summary.pdf/.png, and S_eustalacta_ind_filt_folded.rand86.summary.pdf/.png: Summary plots generated for each random-seed replicate analysis.
stairway_plot_v2.1.3/stairway_plot_es
Folder containing Java class files, JAR dependencies, and license files required to run Stairway Plot.
- .class: Compiled Java class files used by Stairway Plot for demographic inference, training/testing, and output plotting.
- .jar: Java archive dependencies required by Stairway Plot.
- _license.txt: License files associated with bundled Stairway Plot dependencies.
Paleo_Niche_Model
Folder containing current and paleoclimate raster layers, ecological niche modeling scripts, response curves, and prediction rasters used to model contemporary and historical habitat suitability for Synthemis eustalacta.
Paleo_ENM.R
R script used to run the ecological niche modeling pipeline. The script downloads and cleans GBIF occurrence records for Synthemis eustalacta, removes duplicate localities, spatially thins occurrence points, loads current CHELSA climate rasters, crops environmental layers to the study region, removes problematic or highly correlated predictors, samples background points, evaluates Maxnet models with ENMeval, selects a final model, generates response curves, creates modern habitat suitability predictions, applies a 10 % training-omission threshold, and projects the selected model onto paleoclimate layers for the Last Interglacial, Last Glacial Maximum, and Mid-Holocene.
CHELSA_cur_V1_2B_r2_5m
Folder containing current CHELSA climate raster data used as environmental predictors for ecological niche modeling.
- 2_5min/readme.txt: Documentation file associated with the current CHELSA climate raster dataset.
Last_Glacial_Maximum_20ka
Folder containing paleoclimate bioclimatic raster layers for the Last Glacial Maximum, approximately 20 ka, used for hindcasting habitat suitability.
- bio_1.tif through bio_19.tif: Last Glacial Maximum bioclimatic raster layers used as paleoenvironmental predictors.
Last_Interglacial_130
Folder containing paleoclimate bioclimatic raster layers for the Last Interglacial, approximately 130 ka, used for hindcasting habitat suitability.
- bio_1.tif through bio_19.tif: Last Interglacial bioclimatic raster layers used as paleoenvironmental predictors.
Mid_Holocene
Folder containing paleoclimate bioclimatic raster layers for the Mid-Holocene used for hindcasting habitat suitability.
- bio_1.tif through bio_19.tif: Mid-Holocene bioclimatic raster layers used as paleoenvironmental predictors.
Predictions
Folder containing response curves and raster predictions generated by Paleo_ENM.R.
- bio_4_response.svg: Response curve for temperature seasonality.
- bio_6_response.svg: Response curve for minimum temperature of the coldest month.
- bio_15_response.svg: Response curve for precipitation seasonality.
- bio_18_response.svg: Response curve for precipitation of the warmest quarter.
- bio_19_response.svg: Response curve for precipitation of the coldest quarter.
- modern_prediction_cloglog.tif: Continuous modern ecological niche model prediction raster on the cloglog scale.
- modern_prediction_10pct_omission.tif: Binary modern prediction raster generated using a 10 % training-omission threshold.
- Last_Glacial_Maximum_20_ka__continuous.tif: Continuous Last Glacial Maximum habitat suitability projection.
- Last_Glacial_Maximum_20_ka__binary_10pct.tif: Binary Last Glacial Maximum projection generated using the 10 % omission threshold.
- Last_Interglacial_130_ka__continuous.tif: Continuous Last Interglacial habitat suitability projection.
- Last_Interglacial_130_ka__binary_10pct.tif: Binary Last Interglacial projection generated using the 10 % omission threshold.
- Mid_Holocene_continuous.tif: Continuous Mid-Holocene habitat suitability projection.
- Mid_Holocene_binary_10pct.tif: Binary Mid-Holocene projection generated using the 10 % omission threshold.
Morphology_Analysis
Folder containing wing measurement data, specimen wing photographs, figures, and R scripts used to quantify and analyze morphological variation among sampled specimens and populations.
Morphology_Analysis.R
R script used to analyze wing morphology data from Wing_Measurements.csv and generate morphology-related figures and statistical summaries. The script reads specimen-level wing measurements, imputes missing numeric trait values, scales wing and leg measurements by thorax length or hind femur length, performs PCA on thorax-scaled morphology traits, generates PCA plots by population and sex, tests trait differences among populations with Kruskal-Wallis tests, tests sex-based trait differences with Wilcoxon tests, runs PERMANOVA on morphology traits, and exports morphology figures to Morphology_Analysis/Figures.
Wing_Measurements.csv
Comma-separated file containing specimen-level morphological measurements used for wing and body-size analyses. The file includes wing traits, thorax length, hind femur length, sex, population assignment, and associated sample identifiers used to evaluate morphological variation among individuals, populations, and sexes.
wing_photos
Folder containing specimen wing photographs used for morphological measurement and documentation.
- P6113143.JPG, P6113144.JPG, P6133145.JPG, etc.: JPG image files of specimen wings used for measurement or visual reference. Only representative files are listed here; the folder contains all wing photographs used in the morphology analysis.
Figures
Folder containing figures and graphical outputs generated from Morphology_Analysis.R.
- wing_pca_plot.svg: PCA figure showing morphology variation among specimens grouped by population.
- sex_pca_plot.svg: PCA figure showing morphology variation among specimens grouped by sex.
- thorax_scaled_trait_violins_by_sex.svg: Violin plot comparing thorax-scaled morphological trait distributions between female and male specimens.
Taxon Sampling
We acquired specimens of S. eustalacta from both freshly caught field-collected and museum specimens, covering the species’ range across Australia. We hand caught specimens using aerial nets within New South Wales and Victoria during the summer months of January and February 2023. We placed specimens in glassine envelopes, submerged them in acetone for preservation, and transported them to the American Museum of Natural History for molecular and morphological analysis. However, we sourced most specimens for this work from natural history collections. We sampled specimens from the Florida State Collection of Arthropods (FSCA), Naturalis Biodiversity Center, and the United States National Museum of Natural History (USNM). In total, we sampled 54 specimens from 21 distinct sites; the number of individuals per site ranged from 1 – 6 (Fig. 1 in associated manuscript).
DNA Extraction and Sequencing:
We removed the hind leg from each field-collected or museum acquired specimen using sterilized forceps, and extracted DNA using ZYMOBIOMICS DNA miniprep kits (Irvine, CA). We quantified DNA yield using a Qubit 4 fluorometer, and sent our samples to RAPID Genomics (Gainesville, Florida) for library preparation and sequencing. We amplified reads using modified Anchored Hybrid Enrichment (AHE) probes first implemented in Bybee et al. (2021); the probe set consists of 1,306 loci covering approximately 500 kilobases. We acquired reads from one individual using the full 1,306 probe set (500kb) from a previous dataset (Bybee et al. 2021), while a subset of 92 loci (20kb probe set) was used for the remaining specimens (Goodman et al. 2023).
Although the number of available odonate genome assemblies is increasing (Tolman et al. 2025a, Tolman et al. 2023), no reference genome yet exists for Synthemistidae. To avoid reference bias, we constructed a locus-based pseudogenome from our AHE alignments. Because AHE loci are sparse, non-contiguous, and flanked by variable regions, mapping to a heterospecific whole-genome reference can inflate linkage, increase mismapping, and reduce variant calls (Akopyan et al. 2024, 2025, Maurstad et al. 2025).
We assembled AHE reads, assigned orthology, and generated alignments following Goodman et al. (2023) and Goodman et al. (2025), and trimmed loci from our single 500kb specimen to match the number of loci retained across all other samples. For each of the 92 target loci, we filtered multiple sequence alignments to retain only alignment columns present in at least 10 % of sampled individuals, thereby excluding poorly aligned or sparsely represented sites. At each retained position, we called a consensus base using the most frequent nucleotide across samples to preserve shared variation among individuals. In the event of a tie, we masked that basepair (N) to avoid arbitrarily assigning equally supported states and introducing artificial signal (Schneider et al. 2002). We concatenated filtered consensus sequences across loci to construct a pseudogenome reference.
We mapped sequencing reads from all individuals to this pseudogenome using bwa (Li and Durbin 2009) and sorted alignments with samtools (Danecek et al. 2021). We called variants using bcftools (Danecek et al. 2021), applying minimum read mapping and base quality thresholds of 20. We calculated mapping and variant statistics including allele frequency, depth per individual and per site, site quality, individual- and site-level missingness, heterozygosity, and inbreeding coefficients. We removed nine samples with more than 70 % missing SNPs to minimize bias from uneven sequencing coverage. Additional filtering included retaining only biallelic SNPs with a minor allele frequency ≥ 0.05, excluded sites with missing genotypes in more than 30 % of individuals, and required a minimum read depth of three per genotype. To account for the potential genetic linkage bias (Bercovich et al. 2025), we performed LD pruning in PLINK (Purcell et al. 2007) using a sliding window of 50 SNPs, advancing by 5 SNPs, and removing one SNP from any pair with R2 > 0.2.
Genomic Analysis
Population Structure
We performed subsequent statistical analyses using the program R v4.3.0 (Team 2021). To explore patterns of genetic structure, we first performed a center scaled principal component analysis (PCA) on our filtered SNP dataset using the adegenet v2.1.10 (Jombart 2008). To identify genetic clusters without apriori population assignments, we performed Discriminant Analysis of Principal Components (DAPC) clustering on PCA-transformed SNPs, selecting the optimal number of clusters based on the lowest BIC, and assigned individuals to assigned clusters based on their maximum posterior probability. To further assess evidence for discrete clusters, we visualized ancestry proportions using admixture v1.3.0 (Skotte et al. 2013), assuming 2 – 10 clusters (K). We evaluated model support with cross-validation (CV) error and chose the optimal number of ancestry proportions based on the lowest mean CV error.
Because DAPC clustering supported K = 1, we interpreted genetic variation within S. eustalacta as continuous rather than discretely structured and did not assign individuals to inferred genetic clusters. Consequently, we performed all downstream population genetic analyses using sampling sites as population units. This finding was further supported by preliminary k-means clustering using geographic distances based on Vincenty’s ellipsoid supporting K = 1.
We quantified genetic differentiation by calculating pairwise Fst values among sites using the R package hierFstat v0.5-11 (Goudet 2005) based on the Weir and Cockerham (1984). In addition to the pairwise values, we computed both the global weighted Fst (accounting for sample size and allele frequency variance) and the unweighted Fst (simple average of pairwise comparisons) to summarize overall levels of genetic divergence. We also implemented a leave-one-out approach to assess the contribution of each site to the weighted Fst. We also explored the degree of genetic differentiation among sites using Permutational Analysis of Variance (PERMOVA) using distances between PC1, PC2, and PC3 from our SNP dataset with 999 permutations.
We calculated Tajima’s D using vcftools for each site to assess the degree of population expansion and contraction. We generated a new vcf file excluding rare variant filtering (MAF) and LD-pruning. We summarized values across 10k SNP sliding windows and compared distributions geographically. To account for non-independence among adjacent windows, we quantified uncertainty using block bootstrapping (10,000 replicates) of contiguous windows and evaluated directional consistency using binomial tests. To determine whether Tajima’s D varied with elevation, we fit a linear mixed-effects model using window-level Tajima’s D values as the response variable and elevation as a fixed effect, with site identity included as a random intercept to account for repeated window-level estimates within sites (Formula: Tajima’s D ~ elevation + (1|site)). We additionally fit a linear model using mean Tajima’s D per site.
Isolation by Distance, Environment, and Resistance (IBD & IBE)
We investigated the relationship between genetic differentiation and environmental, geographic, and resistance-based distances using linear mixed-effect modeling (LMM). We first calculated pairwise genetic distances (Fst) among S. eustalacta sites, selecting sites with > 2 individuals to avoid inflation of estimates due to low sample size. We calculated geographic distances among sites (in kilometers) using geosphere v1.5-18 (Hijmans et al. 2017), based on Vincenty’s ellipsoid. Using terra v1.7-65 (Hijmans et al. 2022), we derived environmental distances using 30 arc-second resolution rasters from the CHELSA v2.1 ‘Anthropocene’ dataset (1979-2013) (Karger et al. 2017) consisting of 19 bioclimatic variables which follow Worldclim v2 (Fick and Hijmans 2017). Additionally, we acquired remotely sensed variables from NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS) using MODIStsp v2.0.5 (Busetto and Ranghetti 2016). Following Goodman et al. (2022), we included percentage tree cover, non-tree cover, and non-vegetated cover (percentage tree cover subtracted from percentage non-tree cover) (MOD44B 250m), annual evapotranspiration (MOD16A3 500m), and Normalised Difference Vegetation Index (NDVI; MOD13A2 1km). We averaged All MODIS variables across the period 2000–2020 and resampled them to match the spatial resolution of CHELSA (30 arc-seconds).
We omitted bioclimatic variables 8 and 9 after visual inspection of interpolation discontinuities within New South Wales Booth (2022). We cropped the environmental data to the latitudinal and longitudinal range of all our sampled sites, and calculated correlations between variables using the ‘vifcor’ and ‘vifstep’ functions in the usdm package (Naimi 2017). We filtered out variables with correlation coefficients higher than 0.7, a standard cutoff in such analyses (Goodman et al. 2022). We extracted uncorrelated environmental variables from each site to compute pairwise Euclidian distances.
To assess ecological connectivity, we constructed a resistance surface from rivers, drainage basins, and major vegetation groups from the Australian Government National Surface Water Information portal (https://tinyurl.com/4akte9ew), the Australian Government Department of Climate Change, Energy, and the Environment and Water (DCCEEW, https://tinyurl.com/4xenh9a9)
Using the package sf v1.0-15 (Pebesma 2018), we read polygon shapefiles for drainage basins and assigned unique numeric identifiers for each basin. We extracted basin boundaries by converting polygons to line features, producing an edge-defined shapefile. We also imported line shapefiles of rivers, which we filtered by hierarchy to distinguish between major and minor river systems. We then rasterized our drainage basin, major and minor river shapefiles to the resolution of our bioclimatic variables.
We reclassified our rasters according to their hypothesized contribution to movement resistance. Pixels representing drainage basin boundaries were assigned low resistance (cells = 0.1), while all other cells were assigned high resistance (1). Major and minor rivers were similarly reclassified as low-resistance corridors (cells = 0.1, background = 1). We classified all major vegetative groups as low-resistance corridors except for cleared, non-native vegetation/buildings, naturally bare (sand, rock, claypan, mudflat), seas and estuaries, and no data. We combined these layers into a single composite resistance surface by averaging the four reclassified rasters. The resulting raster represented cumulative resistance to movement across Australia. We converted our resistance raster into a transition matrix using gdistance v1.6-4 (van Etten 2017) with geo-correction applied. Finally, we calculated pairwise least-cost path distances between all sites in kilometers.
We center-scaled all variables prior to modeling. Using lme4 v1.1-35.1 (Bates et al. 2015),** we first fit a linear model of our pairwise geographic distances and our genetic distance to test for isolation by distance, to account for spatial autocorrelation in model building; we checked for IBD using raw and log-scaled geographic distances. We then fitted a linear mixed-effects model of our pairwise environmental, and resistance surface matrices with site identity as random intercepts (Formula: genetic distance ~ scaled uncorrelated predictor variables + least cost distance matrix + (1|site1) | (1|site2)). Using mumin v1.47.5. (Barton and Barton 2015), we evaluated model performance using marginal and conditional R2 values, and Akaike Information Criterion corrected for small sample size (AICc). We performed an additional linear model using only pairwise distances in raw and log-scaled elevational distances to determine if model fit improved, as climatic variables are correlated with elevation.
Demographic analysis and Paleoniche Modeling
To infer historical ranges in effective population size (Ne) through time, we used StairwayPlot2 (Liu and Fu 2020) which reconstructs historical demography using a multi-epoch model based on site frequency spectrum (SFS). We calculated SFS using the same vcf file as our Tajima’s D analysis which included LD-pruning, and calculated the number of callable sites from our pseudogenome after removing ambiguous consensus bases (N). We excluded singleton bins and retained folded SFS bins up to the maximum allele count permitted by the folded spectrum (n = 44), corresponding to the 44 diploid individuals in our dataset. We fixed mutation rate to 2 × 10 -9 substitutions per site per generation, as approximate average rate for insects (Liu et al. 2017) and performed 1,000 bootstrap replicates to assess uncertainty.
We paired our demographic analysis with palaeoecological niche models (PaleoENM) of S. eustalacta to determine the correlation between past climatic cycles and historical population size. Model construction is outlined in detail in Goodman et al. (2024). In brief, we sourced occurrences from citizen-science databases (GBIF, iNaturalist, Atlas of Living Australia), and filtered data to include only museum records, curated research-grade observations, and published surveys. We used the same climatic variables as our LMM, and used the presence-background algorithm Maxent (Phillips 2005). We implemented model building, parameterization, partition, evaluation, and creation of marginal response curves in ENMeval v2.0.0 (Kass et al. 2021). We projected models across three geologic intervals: Mid-Holocene (4.2–8.3 ka), Last Glacial Maximum (LGM) (21ka), Last Interglacial Period (LIG) (130ka) using the PaleoClim dataset (Brown et al. 2018).
Morphological Analysis
Wing Dimensional Analysis
We identified potential morphological differences of S. eustalacta between sites through dimensional analysis of their wings. We photographed each sequenced specimen of S. eustalacta using an Olympus Tough TG-6 4K camera on the microscope setting and secured level to a vertical rig. We imaged the ventral side of the right pair of wings for each specimen with a scale bar in frame. Using ImageJ (Schneider et al. 2012), we collected the following 12 measurements: maximum hindwing length and width, hindwing distal half (wing node to tip), length of hindwing proximal half (wing node to base), hindwing pterostigma length, maximum forewing length and width, forewing distal half, forewing proximal half, forewing pterostigma length, thorax length, femur length (Fig. 1). Such measurements of the odonate wing have been shown to capture dispersal potential (Idec et al. 2024).
To identify significant differences in wing dimension among our sites, we first imputed seven missing measurements using the package mice v3.16.0 (van Buuren et al. 2015), and scaled each measurement to thorax length and femur length to control for differences in body size (Kaspari and Weiser 1999, Teuscher et al. 2009). We performed permutational Multivariate Analysis of Variance (PERMOVA) to test for significance in morphology among sites and sex. We then performed a Kruskal–Wallis test on each variable, as this non-parametric method does not assume normality of the underlying distributions. Due to the unevenness of sampling among sexes of S. eustalacta, we also performed a Wilcoxon rank-sum test among all individuals regardless of site. We visualized trait values among each site using violin plots using ggplot2 v4.0.3 (Wickham et al. 2026). All data and code pertaining to our genomic, morphological, and ecological analyses can be found in the Supplemental Information.
REFERENCES:- Bybee, S.M., et al., Phylogeny and classification of Odonata using targeted genomics. Molecular phylogenetics and evolution, 2021. 160: p. 107115.
- Goodman, A., et al., Assessment of targeted enrichment locus capture across time and museums using odonate specimens. Insect Systematics and Diversity, 2023. 7(3): p. 5.
- Tolman, E.R., et al., A Chromosome-length Assembly of the Black Petaltail (Tanypteryx hageni) Dragonfly. Genome Biology and Evolution, 2023. 15(3): p. evad024.
- Tolman, E., et al., First Genomic Insights into an Aeshnidae Dragonfly: Unveiling the Genome of a Holarctic Species, Aeshna juncea. bioRxiv, 2025: p. 2025.07. 15.664918.
- Maurstad, M.F., et al., Reference genome bias in light of species-specific chromosomal reorganization and translocations. Genome Biology, 2025. 26(1): p. 1-26.
- Akopyan, M., et al., Divergent reference genomes compromise the reconstruction of demographic histories, selection scans, and population genetic summary statistics. bioRxiv, 2024: p. 2024.11. 26.625554.
- Akopyan, M., et al., Reference genome choice compromises population genetic analyses. Cell, 2025. 188(24): p. 6939-6952. e11.
- Goodman, A., et al., Systematic and taxonomic revision of emerald and tigertail dragonflies (Anisoptera: Synthemistidae and Corduliidae). Systematic Entomology, 2025: p. e70000.
- Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics, 2009. 25(14): p. 1754-1760.
- Danecek, P., et al., Twelve years of SAMtools and BCFtools. Gigascience, 2021. 10(2): p. giab008.
- Bercovich, U., et al., Measuring linkage disequilibrium and improvement of pruning and clumping in structured populations. Genetics, 2025. 229(3): p. iyaf009.
- Purcell, S., et al., PLINK: a tool set for whole-genome association and population-based linkage analyses. The American journal of human genetics, 2007. 81(3): p. 559-575.
- Team, R.C., R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2012. 2021.
- Jombart, T., adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics, 2008. 24(11): p. 1403-1405.
- Skotte, L., T.S. Korneliussen, and A. Albrechtsen, Estimating individual admixture proportions from next generation sequencing data. Genetics, 2013. 195(3): p. 693-702.
- Goudet, J., Hierfstat, a package for R to compute and test hierarchical F‐statistics. Molecular ecology notes, 2005. 5(1): p. 184-186.
- Weir, B.S. and C.C. Cockerham, Estimating F-statistics for the analysis of population structure. evolution, 1984: p. 1358-1370.
- Hijmans, R.J., et al., Package ‘geosphere’. Spherical trigonometry, 2017. 1(7): p. 1-45.
- Hijmans, R.J., et al., Package ‘terra’. Maintainer: Vienna, Austria, 2022.
- Karger, D.N., et al., Climatologies at high resolution for the earth’s land surface areas. Scientific data, 2017. 4(1): p. 1-20.
- Fick, S.E. and R.J. Hijmans, WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas. International journal of climatology, 2017. 37(12): p. 4302-4315.
- Busetto, L. and L. Ranghetti, MODIStsp : An R package for automatic preprocessing of MODIS Land Products time series. Computers & Geosciences, 2016. 97: p. 40-48.
- Goodman, A.M., J.M. Kass, and J. Ware, Dynamic distribution modelling of the swamp tigertail dragonfly Synthemis eustalacta (Odonata: Anisoptera: Synthemistidae) over a 20‐year bushfire regime. Ecological Entomology, 2022.
- Booth, T.H., Checking bioclimatic variables that combine temperature and precipitation data before their use in species distribution models. Austral Ecology, 2022. 47(7): p. 1506-1514.
- Naimi, B., Package ‘usdm’. Uncertainty Analysis for Species Distribution Models. Wien: https://cran.r-project.org/, 2017.
- Pebesma, E., Simple features for R: standardized support for spatial vector data. 2018.
- van Etten, J., R package gdistance: Distances and routes on geographical grids. Journal of Statistical Software, 2017. 76: p. 1-21.
- Bates, D., et al., Package ‘lme4’. convergence, 2015. 12(1): p. 2.
- Barton, K. and M.K. Barton, Package ‘mumin’. Version, 2015. 1(18): p. 439.
- Liu, X. and Y.-X. Fu, Stairway Plot 2: demographic history inference with folded SNP frequency spectra. Genome biology, 2020. 21(1): p. 280.
- Zhu, D., et al., Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population. Genetics Selection Evolution, 2023. 55(1): p. 72.
- Liu, H., et al., Direct determination of the mutation rate in the bumblebee reveals evidence for weak recombination-associated mutation and an approximate rate constancy in insects. Molecular biology and evolution, 2017. 34(1): p. 119-130.
- Goodman, A.M., et al., Paleoecological niche modeling of Epiophlebia (Epiophlebioptera: Epiophlebiidae) reveals continuous distribution during the Last Glacial Maximum. International Journal of Odonatology, 2024. 27.
- Phillips, S.J., A brief tutorial on Maxent. AT&T Research, 2005. 190(4): p. 231-259.
- Kass, J.M., et al., ENMeval 2.0: Redesigned for customizable and reproducible modeling of species’ niches and distributions. Methods in Ecology and Evolution, 2021. 12(9): p. 1602-1608.
- Schneider, C.A., W.S. Rasband, and K.W. Eliceiri, NIH Image to ImageJ: 25 years of image analysis. Nature methods, 2012. 9(7): p. 671-675.
- Idec, J., et al., Interactions between sexual signaling and wing size drive ecology and evolution of wing colors in Odonata. Scientific Reports, 2024. 14(1): p. 25034.
