Data from: Diversification across the Australian Monsoonal Tropics: Comparing phylogeographic and demographic patterns within and between species of Cryptoblepharus skinks

Hayden Bofill, Sofía I.1 2 ; Potter, Sally 3 ; Afonso Silva, Ana C.4; Moritz, Craig5; Blom, Mozes P. K.1

Published May 15, 2026 on Dryad. https://doi.org/10.5061/dryad.x3ffbg801

Data files

May 15, 2026 version files 355.29 MB

DILS.zip

354.62 MB
mitogenomes.zip

552.21 KB
README.md

28.96 KB
Table_S1.csv

86.36 KB

Abstract

Organisms vary in their ability to cope with environmental perturbations and even closely related species can differ in their resilience to climate change. For example, generalists may be better at accommodating environmental change than specialists with a narrow ecological niche. However, for many species, it may be difficult to classify them as ‘specialist’ or ‘generalist’, and they may be merely adapted to a distinct ecological niche rather than a niche that differs in breadth. In this study, we employ a multi-locus exon-capture approach and combine phylogeographic and population genetic approaches to compare the evolutionary history between four species of Australian Cryptoblepharus skinks. These species co-occur in the Australian Monsoonal Tropics (AMT), have persisted despite major regional changes in Pleistocene climate, and have adapted to either arboreal or rock substrate (2 arboreal and 2 rock specialists). We find that the extent of phylogeographic structure is idiosyncratic between species and ecomorphs, likely shaped by the complex topography of the AMT. In contrast, preliminary single-population demographic models suggest that demographic history may potentially be concordant across species and suggest shared responses to past environmental change. These results show that ecological specialisation is not a good predictor of demographic history, and highlight the complex interplay of topography and past climatic change as drivers of diversification. This study supports that predictions based on ecological specialization and species-specific characteristics need to also account for climatic history, biogeography, and evolutionary history when assessing climate resilience of individual species.

Dataset DOI: 10.5061/dryad.x3ffbg801

Authors: Sofia I. Hayden Bofill*, Sally Potter, Ana C. Afonso Silva, Craig Moritz & Mozes P.K. Blom

*corresponding author: sofia.haydenbofill@gmail.com

Overview

This dataset supports a comprehensive comparative phylogeographic and demographic study of four co-occurring lizard species in the Australian Monsoonal Tropics (AMT). The research moves beyond the classic generalist-specialist paradigm by comparing how two species pairs with distinct ecological specializations differ in their responses to Pleistocene climate change.

The study integrated multiple data types:

Mitochondrial diversity survey: 390 individuals sequenced for the mitochondrial ND2 gene
Nuclear genetic variation: 3320 exon-capture loci sequenced in 125 individuals
Phylogeographic analyses: Maximum-likelihood phylogenies and clustering-based population structure
Demographic modeling: Approximate Bayesian Computation (ABC) using DILS to infer demographic histories

File Organization and Descriptions

1. Table_S1.csv

Description: Complete sample metadata and genetic library information for all 390 individuals surveyed for mitochondrial diversity and the subset of 125 individuals with exon-capture data.

File Format: (Comma-Separated Values) (.csv). Empty cells are marked with "NA"

Number of records: 390 individuals (rows include header)

Variables and columns:

Variable Name	Description	Data Type	Units/Notes
Catalog ID	Unique museum catalog identifier for the specimen	Text	Format varies by institution; NA if specimen number not cataloged
Tissue Number	Internal tissue/voucher number assigned by repository	Text	Format varies by collection; NA if not assigned
Field Number	Field collection number assigned in the field	Text	Unique identifier used during specimen collection
Study_ID	Study-specific individual identifier	Text	Abbreviated code combining specimen info (e.g., "Crub510")
Genus	Taxonomic genus	Text	Cryptoblepharus for all records
Species field identification	Species identification based on field morphology	Text	Species name or designation from field notes; may differ from nuclear assignment
Nuclear species identification	Species identification from nuclear DNA (exon-capture loci)	Text	Final species assignment based on genetic data; more reliable for problematic morphologies
Nearest	Nearest geographic feature or locality name	Text	Site or landmark name where specimen was collected
Latitude	Geographic collection latitude	Number	Decimal degrees; negative values = South
Longitude	Geographic collection longitude	Number	Decimal degrees; positive values = East
State	Australian state or territory	Text	WA (Western Australia), NT (Northern Territory), or other state code
Country	Country of collection	Text	Australia for all records
Collection year	Year specimen was collected	Number	YYYY format; range: 1986-2014
ND2_ID	study-specific individual identifier for mitochondrial ND2gene	Text	Individual specific ND2 identifier
ND2 Accesion	ND2 GenBank accession	Text	GenBank accession (e.g., "PQ155652"); NA if not sequenced
Library ID	Library identifier for exon-capture sequencing	Text	Unique identifier assigned to sequencing library (e.g., "MBCAP05_CCM1510"); NA if not sequenced with exon-capture
SRA ID	NCBI Sequence Read Archive accession	Text	SRA accession number (SAMN format); NA if raw reads not deposited
Skink Capture Version	Version of exon-capture array design used	Number	Indicates probe design version (1 or 2)
WMG ratio N (@10x cov)	Whole mitochondrial genome sequencing ratio (percentage)	Number	Percentage of mapped reads at 10x coverage aligned to mitochondrial genome; indicator of off-target recovery success
Mean Coverage	Mean sequencing coverage depth for nuclear loci	Number	Fold coverage (e.g., 10.57 = 10.57x) averaged across all recovered loci
Total length (bp)	Total number of base pairs recovered	Number	Sum of all aligned sequence length across 3320 targeted exon loci in bases (bp)
Heterozygosity number (bp)	Number of heterozygous base pairs	Number	Count of positions with two different alleles; measure of within-individual genetic variation
Individual heterozygosity	Proportion of heterozygous sites	Number	Ratio of heterozygous sites to total sequenced bases; measure of heterozygosity per individual
N number (bp)	Number of missing/ambiguous base pairs	Number	Count of positions coded as "N" (missing data or gaps) in alignment
N ratio	Proportion of missing data	Number	Ratio of N bases to total sequence length; high values indicate data quality issues
Loci recovered	Number of successfully sequenced nuclear loci	Number	Count of loci with recovered sequence data; out of 3320 targeted loci, threshold for inclusion set at 1000
ID Label	Sample identifier label for downstream analyses	Text	Formatted identifier combining species, location, and state; used in phylogenetic/demographic analyses
Included in ND2 phylogeny?	Included in mitochondrial phylogenetic analysis	Text	"yes" or "no"; indicates whether ND2 sequence was used in phylogenetic inference
whole mitochondrial genome recovered?	Complete mitochondrial genome recovered	Text	"yes" or "no"; indicates if complete or near-complete mtDNA genome was assembled from off-target reads
Included in exon analyses?	Included in nuclear exon-capture analyses	Text	"yes" or "no"; indicates whether individual was included in exon-capture sequencing and downstream analyses
included in phylogeographic structure? (relaxed datase)	Included in phylogeographic structure analysis (relaxed dataset)	Text	"yes" or "no"; indicates inclusion in population structure analyses using less stringent quality filters
included in demograhic analyses? (restricted dataset)	Included in demographic inference analysis (restricted dataset)	Text	"yes" or "no"; indicates inclusion in ABC demographic modeling; only individuals meeting strict quality criteria included

Lineage Codes and Ecological Classifications:

Population Code	Full Species Designation	Ecological Type	Region
jARP	Cryptoblepharus juno	Rock specialist	Kimberley
jBKR	Cryptoblepharus juno	Rock specialist	Northern Territory
jKimb	Cryptoblepharus juno	Rock specialist	Kimberley/Northern Territory
dae	Cryptoblepharus daedalos	Rock specialist	Northern Territory
rTE	Cryptoblepharus ruber	Arboreal specialist	Northern Territory
meg	Cryptoblepharus megastictus	Rock specialist	Kimberley
rKimb	Cryptoblepharus ruber	Arboreal specialist	Kimberley
met	Cryptoblepharus metallicus	Arboreal specialist	across AMT

2. mitogenomes.zip

Description: Complete and partial mitochondrial genome sequences recovered as off-target reads during the nuclear exon-capture sequencing.

File Format: ZIP archive containing FASTA sequence files

Contents:

Individual mitochondrial genome assemblies (one file per individual)
File naming convention: [Individual_ID]_mitogenome.fasta
Number of files: ~50–100 individuals (depending on sequencing success)

Data Specifications:

Assembly completeness: Ranges from partial (5,000–10,000 bp fragments) to near-complete (16,500–17,500 bp full genomes) depending on off-target read abundance and coverage depth
Annotation status: Raw assembled sequences without annotation; functional regions (coding genes, rRNA, tRNA, control region) can be identified by comparison to reference Cryptoblepharus mitogenomes or other reptile references

Important Note: This archive is supplementary to the main analyses. Primary demographic and phylogeographic conclusions reported in the manuscript are based on nuclear exon-capture data and the mitochondrial ND2 survey. Complete mitogenomes are used for phylogenetic inference to confirm patterns phylogeographic patterns observed with the ND2 fragments.

3. DILS.zip

Description: Complete ABC demographic modeling workflow, including input data, configuration files, analysis scripts, and final results from demographic inference using DILS (Fraïsse et al., 2021).

Size: ~350 MB (includes compressed result archives)

Contents Overview: This archive contains two main subdirectories corresponding to different demographic modeling scenarios, plus supporting files for data preparation.

2a. Directory: DILS/single_pop/

Single-population ABC models analyzing within-population demographic history and growth patterns.

Script: 01_prepareFastaDILS.r

Type: R script (source code)
Purpose: Prepares and formats aligned DNA sequence data (FASTA files) for DILS input; converts from raw alignments into DILS-compatible format with proper naming conventions
Input requirements:
- Phased FASTA alignment files (one per gene/locus) from 03_fasta_files.zip
- Sample-to-lineage mapping file (04_samples_lineage.txt)
- Template configuration file (02_template.yaml)
Output: Reformatted FASTA files with structure: [locus]|[lineage]|[individual]|[allele1/allele2]
Software versions and required packages:
- R ≥3.5.1
- Package ape (version 5.0+) — tools for sequence analysis and phylogenetics
- Package seqinr (version 3.6+) — bioinformatics functions for sequence I/O
- Package tidyverse (version 1.3+) — data manipulation and visualization
Working directory setup: Edit line 5: wd <- '/actual/path/to/working/directory'
Key operations:
1. Reads all FASTA files from specified directory
2. Renames sequences with lineage and haplotype information
3. Generates 5 independent replicate YAML configurations per lineage

Configuration file: 02_template.yaml

Type: YAML format configuration file (text)
Purpose: Defines all ABC prior distributions and model parameters for single-population analyses
Key parameters with interpretation:

Parameter	Value	Unit	Meaning
`nspecies`	1	—	Single population mode (vs. 2 for two-population comparisons)
`mu`	6.125 × 10⁻⁹	per site/generation	Generalized squamate mutation rate from Gemmell et al. 2020
`N_min`	4	haplotypes	Lower bound of prior for effective population size
`N_max`	NA	individuals	Upper bound of prior for effective population size
`Tsplit_min`	100	generations	Minimum divergence time in prior
`Tsplit_max`	10,000,000	generations	Maximum divergence time
`M_min`	0.4	migrants/generation	Minimum migration rate
`M_max`	20	migrants/generation	Maximum migration rate
`Lmin`	150	base pairs	Minimum locus length threshold for inclusion
`max_N_tolerated`	0.5	proportion	Maximum proportion of missing data (Ns) per alignment
`population_growth`	variable	—	Model allows variable population growth
`modeBarrier`	bimodal	—	mode specification
`region`	coding	—	Sequence type (coding vs. non-coding)
`rho_over_theta`	0.1	—	Recombination-to-mutation ratio

Note: Template is automatically duplicated for each lineage with customized file paths

Data file: 03_fasta_files.zip

Type: ZIP archive containing FASTA files
Contents: Raw aligned sequence data for all 3,320 nuclear exon-capture loci
Individual file format: Standard FASTA with multi-line sequences
File naming: [locus_name].fasta (one file per locus)
Sequence format: DNA sequences in IUPAC format (A, T, G, C, N for gaps/missing)
Total sequence count: 3,320 files (one per targeted exon region)

Metadata file: 04_samples_lineage.txt

Type: Tab-delimited text file (plain text)
Purpose: Maps sequence sample identifiers to phylogeographic lineages
Columns:
- Index (column 1): Sequence sample identifier, may include haplotype designation
- Lineage (column 2): Corresponding phylogeographic lineage code

Format example:

Index                    Lineage
SP03_indexing22_h0       rTE
MBCAP05_CMWA68_h0       rTE
MBCAP05_CMWA68_h1       rTE
MBCAP05_23808_h0        met
MBCAP05_23808_h1        met

Important notes:
- Each individual typically has TWO entries (diploid _h0 and _h1 alleles from phased sequence data)
- Must match sequence headers exactly in FASTA files
- Lineage codes must match entries in 05_ComparisonsOfInterest.txt

Analysis specification file: 05_ComparisonsOfInterest.txt

Type: Tab-delimited text file (plain text)
Purpose: Specifies which lineages to analyze and provides output directory names
Columns:
- lineage (column 1): Phylogeographic lineage code from 04_samples_lineage.txt
- analysis (column 2): Short name/identifier for output (used in YAML and result filenames)

Example contents:

lineage    analysis
jARP       jARP
jBKR       jBKR
jKimb      jKimb
rTE        rTE
met        met
meg        meg
rKimb      rKimb

Notes:
- One row per lineage to be analyzed independently
- Analysis names become prefixes in output files

Prepared data archive: 06_dils_fasta.tar.gz

Type: Compressed TAR archive (.tar.gz)
Contents: All sequences extracted from 03_fasta_files.zip and reformatted according to DILS specifications
Decompression command: tar -xzf 06_dils_fasta.tar.gz
Internal structure: One directory per lineage containing prepared FASTA files
File organization: [lineage]/[locus_name].fasta

Configuration archive: 07_dils_yaml.tar.gz

Type: Compressed TAR archive (.tar.gz)
Contents: All YAML configuration files generated for DILS runs
Decompression command: tar -xzf 07_dils_yaml.tar.gz
Naming convention: [lineage]_rep[1-5].yaml
Total files: 35 YAML files (7 lineages × 5 replicates)
Each file specifies: Input FASTA paths, output directory, priors, and all model parameters

Execution script: 08_runDILS_SLURM.sh

Type: SLURM cluster job submission script (shell/bash)
Purpose: Submits ABC demographic inference jobs in parallel to a high-performance computing cluster
Dependencies:
- SLURM job scheduler (must be installed on cluster)
- DILS software v1.0 or compatible (available from https://github.com/popgenomics/DILS.git)
- Perl (often pre-installed on clusters)
Key modifications before running:
- Line defining analysis=: Change /path/to/yaml_folder to actual directory containing YAML files
- Ensure /path/to/DILS/DILS/bin/DILS_1pop.sh points to correct DILS executable
Execution command: sbatch 08_runDILS_SLURM.sh (or qsub if using different scheduler)
Cluster resource parameters:
- -p long — job partition (modify for your cluster)
- Adjust time limits, memory, and CPU cores as needed
What the script does:
1. Loops through each YAML file in the directory
2. Calls DILS executable: /path/to/DILS/DILS/bin/DILS_1pop.sh [config.yaml]
3. Moves completed .tar.gz results to 09_done/ directory
4. Logs job output to .out/.err files
Job monitoring: Check bDILS.*.out and bDILS.*.err files for execution logs

Results directory: 09_done/

Type: Directory containing DILS output archives
Contents: Compressed TAR files with ABC inference results for each lineage × replicate
File naming pattern: [lineage]_rep[1-5].tar.gz
Total files: 35 archives (7 lineages × 5 replicates each)
Example contents per .tar.gz (when extracted):
- general_infos.txt — Summary metadata: dataset name, date, parameters used
- modelComp/report_[lineage].txt — Full model comparison with AIC values for all evaluated models
- best_model/report_[lineage].txt — Detailed parameter estimates for best-supported model
- [lineage]_infos.txt — Lineage-specific values and convergence statistics
- Various auxiliary posterior distribution files (may be in subdirectories)
Typical file size: 10-100 MB per .tar.gz depending on simulation output verbosity
Example extraction: tar -xzf jARP_rep1.tar.gz

Analysis and results compilation notebook: 010_DILS_plottingResults.Rmd

Type: R Markdown format notebook (combines code, output, narrative)
Purpose: Automates extraction of DILS results, applies quality filters, generates publication-quality plots and summary tables
Software requirements:
- R version ≥4.0
- Required packages and versions:
  - knitr (text processing and literate programming)
  - kableExtra (advanced table formatting)
  - flextable (flexible table layout for documents)
  - tidyverse (≥1.3; includes ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats)
  - hues (color utilities for plots)
  - ggpubr (ggplot2 publication-ready plots)
  - data.table (efficient data frame operations)
Key user-adjustable parameters:
- Line 33: folder_path <- '/path/to/folder' — set working directory
- Line 40: populations_to_exclude <- c("dae","jKimb","jARP") — choose which populations to exclude
Processing workflow:
1. Automatic detection: Finds all .tar.gz files in 09_done/
2. Extraction: Temporarily unpacks each archive to access result files
3. Sequential filtering:
  - Step 1a (optional): Strict p-value filtering to remove poorly fit runs
  - Step 1b (always): Selects best performing run per lineage based on model probability
  - Step 2 (optional): Removes excluded populations from all downstream visualization

Results file: 011_dilsOut.rds

Type: R binary object file (.rds format)
Purpose: Compiled output object containing extracted and processed DILS demographic inference results for all lineages and replicates
Data structure: R data frame with one row per DILS analysis (lineage × replicate combination) and multiple columns containing:
- Metadata columns:
  - analysis: type of analyses
  - species1: Population/lineage name
  - rep: Replicate number (1-5)
- DILS output columns: Extracted from best-model and model-comparison result files including:
  - Demographic parameter estimates (effective population size, divergence times where applicable, migration rates)
  - Model comparison statistics
  - Posterior distribution summaries
  - Convergence and goodness-of-fit metrics
How to use: Load in R with dilsOut <- readRDS('011_dilsOut.rds') to access processed results without re-extracting from .tar.gz archives
Contents:
- Entries for single- and two-population analyses (one per lineage × replicate from 09_done directory)
- Best-performing replicate per lineage selected for downstream visualization and publication
Note: This file is an intermediate output generated for visualization purposes. Primary demographic parameter estimates are extracted from individual DILS output files in the 09_done/ directory

2b. Directory: DILS/two_pop/

Comparative two-population demographic models analyzing divergence and gene flow between population pairs.

Directory structure and workflow: Identical to single_pop/ in organization, but with critical differences in model specification and analysis design:

Modified configuration: 02_template.yaml

Key changes for two-population model:
- nspecies: 2 — specifies two-population analysis mode
- nameA and nameB — names identifying the two populations being compared (e.g., "species1" and "species2")
- nameOutgroup — optional outgroup designation (set to NA if not used)
- Parameters for inter-population processes:
  - Tsplit_min and Tsplit_max — divergence time between the two populations (in generations)
  - M_min and M_max — migration rates for gene flow between populations
  - useSFS — whether to use site frequency spectrum (0=no, 1=yes)
- All other parameters (Ne ranges, mutation rate, etc.) match single_pop values for consistency

Modified analysis specification: 05_ComparisonsOfInterest.txt

Contains population pair specifications:

lineage                analysis
dae_jARP              dae_jARP
jARP_jBKR             jARP_jBKR
jARP_jKimb            jARP_jKimb
jBKR_jKimb            jBKR_jKimb
meg_rKimb             meg_rKimb
rKimb_rTE             rKimb_rTE

Format: Different from single_pop; entries list both lineages being compared

Results in 09_done/ directory:

Additional result files per comparison:
- modelComp/ — Model comparison statistics for divergence/no-divergence/divergence-with-migration scenarios
- best_model/ — Parameter estimates including divergence time and bidirectional migration rates
- Posterior distributions for demographic parameters specific to two-population models
File naming: [pop1]_[pop2]_rep[1-5].tar.gz
Interpretation: Results show divergence time estimates, migration rate estimates, and support for different gene flow scenarios

Scripts and outputs otherwise identical to single_pop/

Same DILS processing and formatting script (01_prepareFastaDILS.r)
Identical results visualization workflow (010_DILS_plottingResults.Rmd, 011_dilsOut.rds)
SLURM submission script structure identical (08_runDILS_SLURM.sh)

Data Access

Related Data in Public Repositories

Mitochondrial ND2 sequences:

Repository: NCBI GenBank
Accession numbers: PQ155520–PQ155900
Access: https://www.ncbi.nlm.nih.gov/nucleotide/?term=PQ155520:PQ155900

Nuclear exon-capture sequences (raw reads):

Repository: NCBI Sequence Read Archive (SRA)
BioProject: PRJNA1171859
Access: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1171859
Note: Raw fastq files; see manuscript for assembly/alignment procedures

How to Cite This Data

If you use this dataset, please cite:

Recommended citation:

Hayden Bofill, S. I., Potter, S., Afonso Silva, A. C., Moritz, C., & Blom, M. P. K. (2026). Data from: Diversification across the Australian Monsoonal Tropics: Comparing phylogeographic and demographic patterns within and between species of Cryptoblepharus skinks. Heredity, https://doi.org/[manusc DOI]. Dryad Digital Repository. https://doi.org/10.5061/dryad.x3ffbg801

Software citations (also include):

Fraïsse, C., Popovic, I., Mazoyer, C., Spataro, B., Delmotte, S., Romiguier, J., ... & Roux, C. (2021). DILS: Demographic inferences with linked selection by using ABC. Molecular Ecology Resources, 21(8), 2629-2644. https://doi.org/10.1111/1755-0998.13323
R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/