Data from: Diversification across the Australian Monsoonal Tropics: Comparing phylogeographic and demographic patterns within and between species of Cryptoblepharus skinks
Data files
May 15, 2026 version files 355.29 MB
-
DILS.zip
354.62 MB
-
mitogenomes.zip
552.21 KB
-
README.md
28.96 KB
-
Table_S1.csv
86.36 KB
Abstract
Organisms vary in their ability to cope with environmental perturbations and even closely related species can differ in their resilience to climate change. For example, generalists may be better at accommodating environmental change than specialists with a narrow ecological niche. However, for many species, it may be difficult to classify them as ‘specialist’ or ‘generalist’, and they may be merely adapted to a distinct ecological niche rather than a niche that differs in breadth. In this study, we employ a multi-locus exon-capture approach and combine phylogeographic and population genetic approaches to compare the evolutionary history between four species of Australian Cryptoblepharus skinks. These species co-occur in the Australian Monsoonal Tropics (AMT), have persisted despite major regional changes in Pleistocene climate, and have adapted to either arboreal or rock substrate (2 arboreal and 2 rock specialists). We find that the extent of phylogeographic structure is idiosyncratic between species and ecomorphs, likely shaped by the complex topography of the AMT. In contrast, preliminary single-population demographic models suggest that demographic history may potentially be concordant across species and suggest shared responses to past environmental change. These results show that ecological specialisation is not a good predictor of demographic history, and highlight the complex interplay of topography and past climatic change as drivers of diversification. This study supports that predictions based on ecological specialization and species-specific characteristics need to also account for climatic history, biogeography, and evolutionary history when assessing climate resilience of individual species.
Dataset DOI: 10.5061/dryad.x3ffbg801
Authors: Sofia I. Hayden Bofill*, Sally Potter, Ana C. Afonso Silva, Craig Moritz & Mozes P.K. Blom
*corresponding author: sofia.haydenbofill@gmail.com
Overview
This dataset supports a comprehensive comparative phylogeographic and demographic study of four co-occurring lizard species in the Australian Monsoonal Tropics (AMT). The research moves beyond the classic generalist-specialist paradigm by comparing how two species pairs with distinct ecological specializations differ in their responses to Pleistocene climate change.
The study integrated multiple data types:
- Mitochondrial diversity survey: 390 individuals sequenced for the mitochondrial ND2 gene
- Nuclear genetic variation: 3320 exon-capture loci sequenced in 125 individuals
- Phylogeographic analyses: Maximum-likelihood phylogenies and clustering-based population structure
- Demographic modeling: Approximate Bayesian Computation (ABC) using DILS to infer demographic histories
File Organization and Descriptions
1. Table_S1.csv
Description: Complete sample metadata and genetic library information for all 390 individuals surveyed for mitochondrial diversity and the subset of 125 individuals with exon-capture data.
File Format: (Comma-Separated Values) (.csv). Empty cells are marked with "NA"
Number of records: 390 individuals (rows include header)
Variables and columns:
| Variable Name | Description | Data Type | Units/Notes |
|---|---|---|---|
| Catalog ID | Unique museum catalog identifier for the specimen | Text | Format varies by institution; NA if specimen number not cataloged |
| Tissue Number | Internal tissue/voucher number assigned by repository | Text | Format varies by collection; NA if not assigned |
| Field Number | Field collection number assigned in the field | Text | Unique identifier used during specimen collection |
| Study_ID | Study-specific individual identifier | Text | Abbreviated code combining specimen info (e.g., "Crub510") |
| Genus | Taxonomic genus | Text | Cryptoblepharus for all records |
| Species field identification | Species identification based on field morphology | Text | Species name or designation from field notes; may differ from nuclear assignment |
| Nuclear species identification | Species identification from nuclear DNA (exon-capture loci) | Text | Final species assignment based on genetic data; more reliable for problematic morphologies |
| Nearest | Nearest geographic feature or locality name | Text | Site or landmark name where specimen was collected |
| Latitude | Geographic collection latitude | Number | Decimal degrees; negative values = South |
| Longitude | Geographic collection longitude | Number | Decimal degrees; positive values = East |
| State | Australian state or territory | Text | WA (Western Australia), NT (Northern Territory), or other state code |
| Country | Country of collection | Text | Australia for all records |
| Collection year | Year specimen was collected | Number | YYYY format; range: 1986-2014 |
| ND2_ID | study-specific individual identifier for mitochondrial ND2gene | Text | Individual specific ND2 identifier |
| ND2 Accesion | ND2 GenBank accession | Text | GenBank accession (e.g., "PQ155652"); NA if not sequenced |
| Library ID | Library identifier for exon-capture sequencing | Text | Unique identifier assigned to sequencing library (e.g., "MBCAP05_CCM1510"); NA if not sequenced with exon-capture |
| SRA ID | NCBI Sequence Read Archive accession | Text | SRA accession number (SAMN format); NA if raw reads not deposited |
| Skink Capture Version | Version of exon-capture array design used | Number | Indicates probe design version (1 or 2) |
| WMG ratio N (@10x cov) | Whole mitochondrial genome sequencing ratio (percentage) | Number | Percentage of mapped reads at 10x coverage aligned to mitochondrial genome; indicator of off-target recovery success |
| Mean Coverage | Mean sequencing coverage depth for nuclear loci | Number | Fold coverage (e.g., 10.57 = 10.57x) averaged across all recovered loci |
| Total length (bp) | Total number of base pairs recovered | Number | Sum of all aligned sequence length across 3320 targeted exon loci in bases (bp) |
| Heterozygosity number (bp) | Number of heterozygous base pairs | Number | Count of positions with two different alleles; measure of within-individual genetic variation |
| Individual heterozygosity | Proportion of heterozygous sites | Number | Ratio of heterozygous sites to total sequenced bases; measure of heterozygosity per individual |
| N number (bp) | Number of missing/ambiguous base pairs | Number | Count of positions coded as "N" (missing data or gaps) in alignment |
| N ratio | Proportion of missing data | Number | Ratio of N bases to total sequence length; high values indicate data quality issues |
| Loci recovered | Number of successfully sequenced nuclear loci | Number | Count of loci with recovered sequence data; out of 3320 targeted loci, threshold for inclusion set at 1000 |
| ID Label | Sample identifier label for downstream analyses | Text | Formatted identifier combining species, location, and state; used in phylogenetic/demographic analyses |
| Included in ND2 phylogeny? | Included in mitochondrial phylogenetic analysis | Text | "yes" or "no"; indicates whether ND2 sequence was used in phylogenetic inference |
| whole mitochondrial genome recovered? | Complete mitochondrial genome recovered | Text | "yes" or "no"; indicates if complete or near-complete mtDNA genome was assembled from off-target reads |
| Included in exon analyses? | Included in nuclear exon-capture analyses | Text | "yes" or "no"; indicates whether individual was included in exon-capture sequencing and downstream analyses |
| included in phylogeographic structure? (relaxed datase) | Included in phylogeographic structure analysis (relaxed dataset) | Text | "yes" or "no"; indicates inclusion in population structure analyses using less stringent quality filters |
| included in demograhic analyses? (restricted dataset) | Included in demographic inference analysis (restricted dataset) | Text | "yes" or "no"; indicates inclusion in ABC demographic modeling; only individuals meeting strict quality criteria included |
Lineage Codes and Ecological Classifications:
| Population Code | Full Species Designation | Ecological Type | Region |
|---|---|---|---|
| jARP | Cryptoblepharus juno | Rock specialist | Kimberley |
| jBKR | Cryptoblepharus juno | Rock specialist | Northern Territory |
| jKimb | Cryptoblepharus juno | Rock specialist | Kimberley/Northern Territory |
| dae | Cryptoblepharus daedalos | Rock specialist | Northern Territory |
| rTE | Cryptoblepharus ruber | Arboreal specialist | Northern Territory |
| meg | Cryptoblepharus megastictus | Rock specialist | Kimberley |
| rKimb | Cryptoblepharus ruber | Arboreal specialist | Kimberley |
| met | Cryptoblepharus metallicus | Arboreal specialist | across AMT |
2. mitogenomes.zip
Description: Complete and partial mitochondrial genome sequences recovered as off-target reads during the nuclear exon-capture sequencing.
File Format: ZIP archive containing FASTA sequence files
Contents:
- Individual mitochondrial genome assemblies (one file per individual)
- File naming convention:
[Individual_ID]_mitogenome.fasta - Number of files: ~50–100 individuals (depending on sequencing success)
Data Specifications:
- Assembly completeness: Ranges from partial (5,000–10,000 bp fragments) to near-complete (16,500–17,500 bp full genomes) depending on off-target read abundance and coverage depth
- Annotation status: Raw assembled sequences without annotation; functional regions (coding genes, rRNA, tRNA, control region) can be identified by comparison to reference Cryptoblepharus mitogenomes or other reptile references
Important Note: This archive is supplementary to the main analyses. Primary demographic and phylogeographic conclusions reported in the manuscript are based on nuclear exon-capture data and the mitochondrial ND2 survey. Complete mitogenomes are used for phylogenetic inference to confirm patterns phylogeographic patterns observed with the ND2 fragments.
3. DILS.zip
Description: Complete ABC demographic modeling workflow, including input data, configuration files, analysis scripts, and final results from demographic inference using DILS (Fraïsse et al., 2021).
Size: ~350 MB (includes compressed result archives)
Contents Overview: This archive contains two main subdirectories corresponding to different demographic modeling scenarios, plus supporting files for data preparation.
2a. Directory: DILS/single_pop/
Single-population ABC models analyzing within-population demographic history and growth patterns.
Script: 01_prepareFastaDILS.r
- Type: R script (source code)
- Purpose: Prepares and formats aligned DNA sequence data (FASTA files) for DILS input; converts from raw alignments into DILS-compatible format with proper naming conventions
- Input requirements:
- Phased FASTA alignment files (one per gene/locus) from 03_fasta_files.zip
- Sample-to-lineage mapping file (04_samples_lineage.txt)
- Template configuration file (02_template.yaml)
- Output: Reformatted FASTA files with structure:
[locus]|[lineage]|[individual]|[allele1/allele2] - Software versions and required packages:
- R ≥3.5.1
- Package
ape(version 5.0+) — tools for sequence analysis and phylogenetics - Package
seqinr(version 3.6+) — bioinformatics functions for sequence I/O - Package
tidyverse(version 1.3+) — data manipulation and visualization
- Working directory setup: Edit line 5:
wd <- '/actual/path/to/working/directory' - Key operations:
- Reads all FASTA files from specified directory
- Renames sequences with lineage and haplotype information
- Generates 5 independent replicate YAML configurations per lineage
Configuration file: 02_template.yaml
- Type: YAML format configuration file (text)
- Purpose: Defines all ABC prior distributions and model parameters for single-population analyses
- Key parameters with interpretation:
| Parameter | Value | Unit | Meaning |
|---|---|---|---|
nspecies |
1 | — | Single population mode (vs. 2 for two-population comparisons) |
mu |
6.125 × 10⁻⁹ | per site/generation | Generalized squamate mutation rate from Gemmell et al. 2020 |
N_min |
4 | haplotypes | Lower bound of prior for effective population size |
N_max |
NA | individuals | Upper bound of prior for effective population size |
Tsplit_min |
100 | generations | Minimum divergence time in prior |
Tsplit_max |
10,000,000 | generations | Maximum divergence time |
M_min |
0.4 | migrants/generation | Minimum migration rate |
M_max |
20 | migrants/generation | Maximum migration rate |
Lmin |
150 | base pairs | Minimum locus length threshold for inclusion |
max_N_tolerated |
0.5 | proportion | Maximum proportion of missing data (Ns) per alignment |
population_growth |
variable | — | Model allows variable population growth |
modeBarrier |
bimodal | — | mode specification |
region |
coding | — | Sequence type (coding vs. non-coding) |
rho_over_theta |
0.1 | — | Recombination-to-mutation ratio |
- Note: Template is automatically duplicated for each lineage with customized file paths
Data file: 03_fasta_files.zip
- Type: ZIP archive containing FASTA files
- Contents: Raw aligned sequence data for all 3,320 nuclear exon-capture loci
- Individual file format: Standard FASTA with multi-line sequences
- File naming:
[locus_name].fasta(one file per locus) - Sequence format: DNA sequences in IUPAC format (A, T, G, C, N for gaps/missing)
- Total sequence count: 3,320 files (one per targeted exon region)
Metadata file: 04_samples_lineage.txt
-
Type: Tab-delimited text file (plain text)
-
Purpose: Maps sequence sample identifiers to phylogeographic lineages
-
Columns:
Index(column 1): Sequence sample identifier, may include haplotype designationLineage(column 2): Corresponding phylogeographic lineage code
-
Format example:
Index Lineage SP03_indexing22_h0 rTE MBCAP05_CMWA68_h0 rTE MBCAP05_CMWA68_h1 rTE MBCAP05_23808_h0 met MBCAP05_23808_h1 met -
Important notes:
- Each individual typically has TWO entries (diploid _h0 and _h1 alleles from phased sequence data)
- Must match sequence headers exactly in FASTA files
- Lineage codes must match entries in 05_ComparisonsOfInterest.txt
Analysis specification file: 05_ComparisonsOfInterest.txt
-
Type: Tab-delimited text file (plain text)
-
Purpose: Specifies which lineages to analyze and provides output directory names
-
Columns:
lineage(column 1): Phylogeographic lineage code from 04_samples_lineage.txtanalysis(column 2): Short name/identifier for output (used in YAML and result filenames)
-
Example contents:
lineage analysis jARP jARP jBKR jBKR jKimb jKimb rTE rTE met met meg meg rKimb rKimb -
Notes:
- One row per lineage to be analyzed independently
- Analysis names become prefixes in output files
Prepared data archive: 06_dils_fasta.tar.gz
- Type: Compressed TAR archive (.tar.gz)
- Contents: All sequences extracted from 03_fasta_files.zip and reformatted according to DILS specifications
- Decompression command:
tar -xzf 06_dils_fasta.tar.gz - Internal structure: One directory per lineage containing prepared FASTA files
- File organization:
[lineage]/[locus_name].fasta
Configuration archive: 07_dils_yaml.tar.gz
- Type: Compressed TAR archive (.tar.gz)
- Contents: All YAML configuration files generated for DILS runs
- Decompression command:
tar -xzf 07_dils_yaml.tar.gz - Naming convention:
[lineage]_rep[1-5].yaml - Total files: 35 YAML files (7 lineages × 5 replicates)
- Each file specifies: Input FASTA paths, output directory, priors, and all model parameters
Execution script: 08_runDILS_SLURM.sh
- Type: SLURM cluster job submission script (shell/bash)
- Purpose: Submits ABC demographic inference jobs in parallel to a high-performance computing cluster
- Dependencies:
- SLURM job scheduler (must be installed on cluster)
- DILS software v1.0 or compatible (available from https://github.com/popgenomics/DILS.git)
- Perl (often pre-installed on clusters)
- Key modifications before running:
- Line defining
analysis=: Change/path/to/yaml_folderto actual directory containing YAML files - Ensure
/path/to/DILS/DILS/bin/DILS_1pop.shpoints to correct DILS executable
- Line defining
- Execution command:
sbatch 08_runDILS_SLURM.sh(orqsubif using different scheduler) - Cluster resource parameters:
-p long— job partition (modify for your cluster)- Adjust time limits, memory, and CPU cores as needed
- What the script does:
- Loops through each YAML file in the directory
- Calls DILS executable:
/path/to/DILS/DILS/bin/DILS_1pop.sh [config.yaml] - Moves completed .tar.gz results to 09_done/ directory
- Logs job output to .out/.err files
- Job monitoring: Check
bDILS.*.outandbDILS.*.errfiles for execution logs
Results directory: 09_done/
- Type: Directory containing DILS output archives
- Contents: Compressed TAR files with ABC inference results for each lineage × replicate
- File naming pattern:
[lineage]_rep[1-5].tar.gz - Total files: 35 archives (7 lineages × 5 replicates each)
- Example contents per .tar.gz (when extracted):
general_infos.txt— Summary metadata: dataset name, date, parameters usedmodelComp/report_[lineage].txt— Full model comparison with AIC values for all evaluated modelsbest_model/report_[lineage].txt— Detailed parameter estimates for best-supported model[lineage]_infos.txt— Lineage-specific values and convergence statistics- Various auxiliary posterior distribution files (may be in subdirectories)
- Typical file size: 10-100 MB per .tar.gz depending on simulation output verbosity
- Example extraction:
tar -xzf jARP_rep1.tar.gz
Analysis and results compilation notebook: 010_DILS_plottingResults.Rmd
- Type: R Markdown format notebook (combines code, output, narrative)
- Purpose: Automates extraction of DILS results, applies quality filters, generates publication-quality plots and summary tables
- Software requirements:
- R version ≥4.0
- Required packages and versions:
knitr(text processing and literate programming)kableExtra(advanced table formatting)flextable(flexible table layout for documents)tidyverse(≥1.3; includes ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats)hues(color utilities for plots)ggpubr(ggplot2 publication-ready plots)data.table(efficient data frame operations)
- Key user-adjustable parameters:
- Line 33:
folder_path <- '/path/to/folder'— set working directory - Line 40:
populations_to_exclude <- c("dae","jKimb","jARP")— choose which populations to exclude
- Line 33:
- Processing workflow:
- Automatic detection: Finds all .tar.gz files in 09_done/
- Extraction: Temporarily unpacks each archive to access result files
- Sequential filtering:
- Step 1a (optional): Strict p-value filtering to remove poorly fit runs
- Step 1b (always): Selects best performing run per lineage based on model probability
- Step 2 (optional): Removes excluded populations from all downstream visualization
Results file: 011_dilsOut.rds
- Type: R binary object file (.rds format)
- Purpose: Compiled output object containing extracted and processed DILS demographic inference results for all lineages and replicates
- Data structure: R data frame with one row per DILS analysis (lineage × replicate combination) and multiple columns containing:
- Metadata columns:
analysis: type of analysesspecies1: Population/lineage namerep: Replicate number (1-5)
- DILS output columns: Extracted from best-model and model-comparison result files including:
- Demographic parameter estimates (effective population size, divergence times where applicable, migration rates)
- Model comparison statistics
- Posterior distribution summaries
- Convergence and goodness-of-fit metrics
- Metadata columns:
- How to use: Load in R with
dilsOut <- readRDS('011_dilsOut.rds')to access processed results without re-extracting from .tar.gz archives - Contents:
- Entries for single- and two-population analyses (one per lineage × replicate from 09_done directory)
- Best-performing replicate per lineage selected for downstream visualization and publication
- Note: This file is an intermediate output generated for visualization purposes. Primary demographic parameter estimates are extracted from individual DILS output files in the 09_done/ directory
2b. Directory: DILS/two_pop/
Comparative two-population demographic models analyzing divergence and gene flow between population pairs.
Directory structure and workflow: Identical to single_pop/ in organization, but with critical differences in model specification and analysis design:
Modified configuration: 02_template.yaml
- Key changes for two-population model:
nspecies: 2— specifies two-population analysis modenameAandnameB— names identifying the two populations being compared (e.g., "species1" and "species2")nameOutgroup— optional outgroup designation (set to NA if not used)- Parameters for inter-population processes:
Tsplit_minandTsplit_max— divergence time between the two populations (in generations)M_minandM_max— migration rates for gene flow between populationsuseSFS— whether to use site frequency spectrum (0=no, 1=yes)
- All other parameters (Ne ranges, mutation rate, etc.) match single_pop values for consistency
Modified analysis specification: 05_ComparisonsOfInterest.txt
-
Contains population pair specifications:
lineage analysis dae_jARP dae_jARP jARP_jBKR jARP_jBKR jARP_jKimb jARP_jKimb jBKR_jKimb jBKR_jKimb meg_rKimb meg_rKimb rKimb_rTE rKimb_rTE -
Format: Different from single_pop; entries list both lineages being compared
Results in 09_done/ directory:
- Additional result files per comparison:
modelComp/— Model comparison statistics for divergence/no-divergence/divergence-with-migration scenariosbest_model/— Parameter estimates including divergence time and bidirectional migration rates- Posterior distributions for demographic parameters specific to two-population models
- File naming:
[pop1]_[pop2]_rep[1-5].tar.gz - Interpretation: Results show divergence time estimates, migration rate estimates, and support for different gene flow scenarios
Scripts and outputs otherwise identical to single_pop/
- Same DILS processing and formatting script (01_prepareFastaDILS.r)
- Identical results visualization workflow (010_DILS_plottingResults.Rmd, 011_dilsOut.rds)
- SLURM submission script structure identical (08_runDILS_SLURM.sh)
Data Access
Related Data in Public Repositories
Mitochondrial ND2 sequences:
- Repository: NCBI GenBank
- Accession numbers: PQ155520–PQ155900
- Access: https://www.ncbi.nlm.nih.gov/nucleotide/?term=PQ155520:PQ155900
Nuclear exon-capture sequences (raw reads):
- Repository: NCBI Sequence Read Archive (SRA)
- BioProject: PRJNA1171859
- Access: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1171859
- Note: Raw fastq files; see manuscript for assembly/alignment procedures
How to Cite This Data
If you use this dataset, please cite:
Recommended citation:
- Hayden Bofill, S. I., Potter, S., Afonso Silva, A. C., Moritz, C., & Blom, M. P. K. (2026). Data from: Diversification across the Australian Monsoonal Tropics: Comparing phylogeographic and demographic patterns within and between species of Cryptoblepharus skinks. Heredity, https://doi.org/[manusc DOI]. Dryad Digital Repository. https://doi.org/10.5061/dryad.x3ffbg801
Software citations (also include):
- Fraïsse, C., Popovic, I., Mazoyer, C., Spataro, B., Delmotte, S., Romiguier, J., ... & Roux, C. (2021). DILS: Demographic inferences with linked selection by using ABC. Molecular Ecology Resources, 21(8), 2629-2644. https://doi.org/10.1111/1755-0998.13323
- R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
