Comparative phylogeography of phrynosomatid lizards in Baja California: Asynchronous divergences and expansion of Callisaurus draconoides across the North American deserts
Data files
Oct 30, 2025 version files 164.42 MB
-
01_BCP_ddRAD_pyrad.zip
13.91 MB
-
02_BCP_ddRAD_admixture.zip
336.47 KB
-
03_BCP_ddRAD_raxml.zip
5.69 MB
-
04_BCP_ddRAD_bpp.zip
1.18 MB
-
05_BCP_TSC_phyluce.zip
7.97 MB
-
06_BCP_TSC_beast2.zip
65.87 MB
-
07_BCP_TSC_starbeast2.zip
41.53 MB
-
08_BCP_ecoevolity.zip
7.78 MB
-
09_callisaurus_rangewide_pyrad.zip
16.38 MB
-
10_callisaurus_rangewide_admixture.zip
196.93 KB
-
11_callisaurus_rangewide_raxml.zip
3.56 MB
-
README.md
20.32 KB
Abstract
This dataset comprises eleven archives containing processed genomic data, analysis inputs/outputs, and supporting files from double-digest RAD sequencing (ddRAD) and target sequence capture (TSC). Data span four genera (Callisaurus, Petrosaurus, Sceloporus, and Urosaurus) and include pyRAD assemblies, Admixture clustering, RAxML phylogenies, BPP coalescent-with-migration models, phyluce TSC assemblies, BEAST2 and StarBEAST2 species trees, and ecoevolity divergence-time inferences across Baja California biogeographic breaks. Files are organized by analysis and genus, with configuration files, job scripts, summary statistics, and outputs in formats such as VCF, structure/geno, Phylip/Nexus alignments, phased allele sequences, etc. Geographic coordinates are provided at reduced precision (0.01°) to limit exact locality disclosure. These data are suitable for reuse in comparative phylogeography, population genomics, and phylogenetic method development, enabling replication of published analyses or testing of alternative models and pipelines. Collection and use of specimens were conducted under relevant permits and institutional animal care protocols (see publication for details), and downstream use should continue to respect applicable legal and ethical standards.
Dataset DOI: 10.5061/dryad.tqjq2bwcb
Summary
This dataset is associated with the article titled "Comparative phylogeography of phrynosomatid lizards in Baja California: asynchronous divergences and expansion of Callisaurus draconoides across the North American deserts", accepted to Journal of Biogeography on September 30, 2025 (DOI: 10.1111/jbi.70075). A preprint is also available here.
We did our best to thoroughly document every analysis presented in the paper to fully enable reproducibility of the key results, but not every intermediate file, log file, or job script is included. If you have any questions or concerns, please contact the corresponding author, Andrew Gottscho (gottschoa@si.edu or andrew.gottscho@gmail.com).
Thank you for your interest in this article and dataset!
Frequently used acronyms
Throughout this Data Dryad package and the associated paper, you will frequently encounter the following acronyms.
- BCP = Baja California Peninsula
- ddRAD = double-digest Restriction-Associated-DNA sequencing
- TSC = Target Sequence Capture
Raw data availability
Raw sequence data (FASTQ format) have been deposited in the NCBI SRA (https://www.ncbi.nlm.nih.gov/sra/PRJNA1242740). These FASTQ data are key inputs to the pyRAD (ddRAD) and phyluce (TSC) pipelines.
Description of the data and file structure
Eleven .zip files are available to download, roughly following the order they are presented in the manuscript.
01_BCP_ddRAD_pyrad.zip
This package contains input files, output files, and statistics associated with the pyrad v3.0.6 pipeline, which was used to process the raw FASTQ data (BCP ddRAD) into a variety of output formats.
There are four top-level directories, corresponding to the four genera included in the study: Callisaurus/, Petrosaurus/, Urosaurus/, and Sceloporus/.
Within each of these top-level directories, there are the following files:
[genus]_params_021017.txt: The input parameters file (the most important file for reproducibility)step2.job: Example jobscript file for step 2 of the pipelinestep3_[genus].job: Example jobscript file for step 3 of the pipelinesteps5_7_[genus].job: Example jobscript file for steps 5-7 of the pipeline
Within each top-level directory, there are also the following sub-directories:
outfiles/: Please see the pyrad documentation for more details on file formats. Not all output files were used in the manuscript, but are provided here for maximum transparency and utility to anyone interested in reanalyzing this dataset.[n]= the MinCov parameter (minimum samples in a final locus).output_[genus]_021017_[n]_h5_p75.alleles: alleles fileoutput_[genus]_021017_[n]_h5_p75.excluded_loci: excluded locioutput_[genus]_021017_[n]_h5_p75.gphocs: GPHOCS formatoutput_[genus]_021017_[n]_h5_p75.loci: loci fileoutput_[genus]_021017_[n]_h5_p75.nex: nexus formatoutput_[genus]_021017_[n]_h5_p75.phy: phylip formatoutput_[genus]_021017_[n]_h5_p75.phy.partitions: partitioned phylip formatoutput_[genus]_021017_[n]_h5_p75.snps: single nucleotide polymorphisms (SNPs)output_[genus]_021017_[n]_h5_p75.snps.geno: SNPs in geno formatoutput_[genus]_021017_[n]_h5_p75.str: structure formatoutput_[genus]_021017_[n]_h5_p75.unlinked_snps: unlinked SNPsoutput_[genus]_021017_[n]_h5_p75.usnps.geno: unlinked SNPs in geno formatoutput_[genus]_021017_[n]_h5_p75.vcf: variant call file
stats/: Refer to the pyrad documentation for more details.output_[genus]_021017_[n]_h5_p75.stats: stats summary file (source for Supplementary Table 4)s5.consens.txt: stats for step 5s3.clusters.txt: stats for step 3s2.rawedit.txt: stats for step 2
02_BCP_ddRAD_admixture.zip
This package contains the inputs, outputs, and other relevant files for the Admixture analysis of the BCP ddRAD data, presented in Figures 2-5 in the article. For more information, consult the Admixture website.
- There are four top-level directories for each genus:
callisaurus/,petrosaurus/,sceloporus/, andurosaurus/. Each directory contains:admixture_plot_[genus].R: R script used to plot resultsCVE_values.txt: cross-validation errors, used to determine the optimal K value for each genusoutput_[genus]_021017_[n]_h5_p75.usnps.[k].P: 9 allele frequencies files, one for each K valueoutput_[genus]_021017_[n]_h5_p75.usnps.[k].Q: 9 ancestry proportions files, one for each K value.output_[genus]_021017_[n]_h5_p75.usnps.geno: The input file in.genoformat
CVE_table.csv: A summary of the cross-validation errors for all genera
03_BCP_ddRAD_raxml.zip
This package contains the inputs, outputs, and other files necessary to reproduce the RAxML analyses for the BCP ddRAD data, presented in Figures 2-5 in the article. To learn more, see the RAxML website.
- There are four top-level directories for each genus:
callisaurus/,petrosaurus/,sceloporus/, andurosaurus/. Each directory contains:output_[genus]_021017_[n]_h5_p75.phy: The input file from pyradRAxML_bipartitions.[genus].tre: The output.trefile presented in Figures 2-5raxml_[genus].job: Jobscript file used to run the analysisraxml_[genus].log: Log file from the analysis
04_BCP_ddRAD_bpp.zip
This package contains input files for Bayesian Phylogenetics and Phylogeography (BPP) analyses conducted under the Multispecies Coalescent with Migration model (MSC-M). The data include phased genomic data, population/species mapping files, and multiple BPP control (.ctl) files for five species complexes. These results are presented in Tables 4-5 and Supplemental Table 6 in the article. For more details, see the BPP repository on github.
Each folder corresponds to a focal species and contains:
- A data file (
data.txt) with phased allele sequences formatted for BPP - One or more imap files (
imap.txt,imap2.txt, etc.) that map individuals to populations or species units used in the MSC-M analyses - One or more BPP control files (
bpp1.ctl,bpp2.ctl, etc.), each specifying a different migration scenario
These files can be used to replicate the analyses in the associated publication or adapted for additional analyses of gene flow using the MSC-M framework.
The package contains folders for each species complex:
Petrosaurus/Sceloporus_magister/Sceloporus_orcutti/Urosaurus/Callisaurus/
Each species complex has the following files:
data.txt- Phased ddRADseq data formatted for BPP input
- Each data file contains two alleles per individual, and is ready for direct use in MSC-M analyses
imap.txt,imap2.txt, etc.- These files define the mapping of individuals (allele pairs) to species or population units
- Multiple
imapfiles are provided where alternative grouping hypotheses were tested bpp1.ctl,bpp2.ctl, etc.- Control files for BPP, each specifying model parameters, file paths, and the migration scenario being analyzed
- Each
.ctlfile corresponds to a specific analysis or migration model (e.g., different gene flow model, different population assignments)
Species folder details:
Petrosaurus/:data.txt: Phased alleles for all individualsimap.txt,imap2.txt: Two different population/species groupings testedbpp1.ctl,bpp2.ctl,bpp3.ctl: Three different migration scenarios
Sceloporus_magister/:data.txt: Input data for analysisimap.txt: Single groupingbpp1.ctl: One migration scenario
Sceloporus_orcutti/:data.txt: Input data for analysisimap.txt,imap2.txt,imap3.txt: Three different groupings testedbpp1.ctl,bpp2.ctl,bpp3.ctl: Three migration scenarios tested
Urosaurus/:data.txt: Input data for analysisimap.txt,imap2.txt: Two groupings testedbpp1.ctl,bpp2.ctl: Two migration scenarios
Callisaurus/:data.txt: Input data for analysisimap.txt: Single groupingbpp1.ctlthroughbpp9.ctl: Nine different migration models tested
05_BCP_TSC_phyluce.zip
This package contains input files, jobscripts, logs, and a complete set of final output files for the phyluce pipeline, which was used to process the TSC data. For more details, please see the phyluce documentation.
assembly.conf: configuration file used for assemblyillumiprocessor.conf: configuration file used for Illumiprocessorillumiprocessor.log: log file used for Illumiprocessorlizard_probes_edit.fasta: probe files used in the TSC workflowmafft-nexus-internal-trimmed-gblocks-clean-75p/: output files for 75% complete data in nexus format- 549 nexus files provided, one for each locus, following the format
[locus_name].nexus
- 549 nexus files provided, one for each locus, following the format
mafft-nexus-internal-trimmed-gblocks-clean-75p-raxml/: output file for 75% complete data in phylip format- a single concatenated file is provided
mafft-nexus-internal-trimmed-gblocks-clean-90p/: output files for 90% complete data in nexus format- 310 nexus files provided, one for each locus, following the format
[locus_name].nexus
- 310 nexus files provided, one for each locus, following the format
mafft-nexus-internal-trimmed-gblocks-clean-90p-raxml/: output file for 90% complete data in phylip format- a single concatenated file is provided
phyluce_assembly_assemblo_trinity.log: log file for phyluce assemblyphyluce_assembly_get_match_counts.log: log file for phyluce assemblyphyluce_assembly_match_contigs_to_probes.log: log file for phyluce assemblystep2_illumiprocessor.job/.log: jobscript/log files for step 2 (Illumiprocessor)step3_trinity.job/.log: jobscript/log files for step 3 (Trinity)step4_fasta_lengths.job/.log: jobscript/log files for step 4step5_assembly_match_contigs_probes.job/.log: jobscript/log files for step 5step6_get_match_counts_baja.job/.log: jobscript/log files for step 6step6_get_match_counts.job/.log: jobscript/log files for step 6step7_get_fastas_from_match_counts.job/.log: jobscript/log files for step 7step8_explode_get_fastas_file.job/.log: jobscript/log files for step 8step9_get_fasta_lengths.job/.log: jobscript/log files for step 9step10_align_seqcap_align.job/.log: jobscript/log files for step 10step11_get_align_summary_data.job/.log: jobscript/log files for step 11step12_align_seqcap_align.job/.log: jobscript/log files for step 12step13_get_gblocks_trimmed_alignments_from_untrimmed.job/.log: jobscript/log files for step 13step14_get_align_summary_data.job/.log: jobscript/log files for step 14step15_remove_locus_name_from_nexus_lines.job/.log: jobscript/log files for step 15step16_get_only_loci_with_min_taxa.job/.log: jobscript/log files for step 16step17_format_nexus_files_for_raxml.job/.log: jobscript/log files for step 17taxon-set-baja.conf: taxon set file
06_BCP_TSC_beast2.zip
This package contains input files, selected output files, and jobscripts used to generate a phylogeny of the concatenated BCP TSC data, presented in Supplemental Figure 1 in the article. For more details, see the BEAST2 web page.
mafft-nexus-internal-trimmed-gblocks-clean-75p.phylip: concatenated data used as input, directly from phylucebaja_TSC_75p.xml: Input file for the analysisbaja_TSC_mafft-nexus-internal-trimmed-gblocks-clean-75p_run1.trees: Trees from the first run (run 1)baja_TSC_mafft-nexus-internal-trimmed-gblocks-clean-75p_run3.trees: Trees from the second run (run 3)BEAST_TSC.job: job file used to run BEASTcombined_trees2.trees: combined trees across two runs, after discarding burn-inmax_clade_cred.tre: The final maximum clade consensus tree used to generate Supplemental Figure 1
07_BCP_TSC_starbeast2.zip
This package contains the input files, selected output files, and jobscript used to run the StarBEAST analysis, presented in Figure 6 in the article. For more details, please see the StarBEAST tutorial.
starbeast.job: jobscript file used to run the analysiscombined_species.trees: combined species trees resulting from three independent runs, after discarding burn-inspecies_run1.trees: species trees resulting from the first runspecies_run3.trees: species trees resulting from the second runspecies_run4.trees: species trees resulting from the third runspecies.tree: maximum clade consensus tree, presented in Figure 6SpeciesTreeUCLN_26exons_HKY_500million_2.4.5.xml: input file used to run the analysis
08_BCP_ecoevolity.zip
This archive contains input files for ecoevolity analyses conducted on two types of genomic data: ddRAD and TSC. The data are organized to reflect two separate biogeographic tests across the Baja California peninsula: the La Paz and Vizcaíno biogeographic breaks. These results are presented in Figures 7 & 8 in the article. For more details, see the ecoevolity repository on github.
The base directory contains two main subdirectories:
BCP_ddRAD/TSC/
Each of these directories contains two subfolders, representing the two biogeographic regions tested:
lapaz/vizcaino/
Within each biogeographic subfolder (lapaz and vizcaino), there are three key directories:
data/- Contains the sequence data in NEXUS format. Each file corresponds to a population pair used in the ecoevolity analysis. The filenames include species identifiers and dataset parameters (e.g., filtering thresholds, and region).
Independent_prior/- Contains a single file:
configuration.yml - This YAML file specifies the ecoevolity run configuration using an independent prior for each divergence event across population pairs.
- Contains a single file:
Shared_prior/- Contains a single file:
configuration.yml - This YAML file specifies the ecoevolity run configuration using a shared prior across divergence events.
- Contains a single file:
Contents Overview:
Baja_phryno_ecoevolity/
├── ddRAD/
│ ├── lapaz/
│ │ ├── data/
│ │ ├── Independent_prior/
│ │ └── Shared_prior/
│ └── vizcaino/
│ ├── data/
│ ├── Independent_prior/
│ └── Shared_prior/
├── uce/
│ ├── lapaz/
│ │ ├── uce/
│ │ ├── Independent_prior/
│ │ └── Shared_prior/
│ └── vizcaino/
│ ├── uce/
│ ├── independent_prior/
│ ├── shared_prior/
09_callisaurus_rangewide_pyrad.zip
This package contains input files, output files, and statistics associated with the pyrad v3.0.66 pipeline, which was used to process the raw FASTQ data (range-wide Callisaurus, ddRAD) into a variety of output formats.
steps2-7.job: Example jobscript file for steps 2-7 of the pipelinecallisaurus_params_051917.txt: The input parameters file (the most important file for reproducibility)
This package also contains the following directories:
outfiles/: Please see the pyrad documentation for more details on file formats. Not all output files were used in the manuscript, but are provided here for maximum transparency and utility to anyone interested in reanalyzing this dataset.output_callisaurus_051917_n102_h5_p75.alleles: alleles fileoutput_callisaurus_051917_n102_h5_p75.excluded_loci: excluded locioutput_callisaurus_051917_n102_h5_p75.gphocs: GPHOCS formatoutput_callisaurus_051917_n102_h5_p75.loci: loci fileoutput_callisaurus_051917_n102_h5_p75.nex: nexus formatoutput_callisaurus_051917_n102_h5_p75.phy: phylip formatoutput_callisaurus_051917_n102_h5_p75.phy.partitions: partitioned phylip formatoutput_callisaurus_051917_n102_h5_p75.snps: single nucleotide polymorphisms (SNPs)output_callisaurus_051917_n102_h5_p75.snps.geno: SNPs in geno formatoutput_callisaurus_051917_n102_h5_p75.str: structure formatoutput_callisaurus_051917_n102_h5_p75.unlinked_snps: unlinked SNPsoutput_callisaurus_051917_n102_h5_p75.usnps.geno: unlinked SNPs in geno formatoutput_callisaurus_051917_n102_h5_p75.vcf: variant call file- An identical set of files with the suffix
_ex_outgrmirrors the files above, but excludes the outgroups (Holbrookia).
stats/: Refer to the pyrad documentation for more details.output_callisaurus_051917_n102_h5_p75_ex_outgr.stats: stats summary fileoutput_callisaurus_051917_n102_h5_p75.statss2.rawedit.txt: stats for step 2s3.clusters.txt: stats for step 3s5.consens.txt: stats for step 5
10_callisaurus_rangewide_admixture.zip
This package contains the inputs, outputs, and other relevant files for the Admixture analysis of the range-wide Callisaurus ddRAD data, presented in Figure 9 in the article. For more information, consult the Admixture website.
data_conversion/: As described in the article, plink and PGDSpider were used to convert data from pyRAD into a format usable by the newer version of Admixture (v1.3.0).output_callisaurus_051917_n102_h5_p75_ex_outgr_plink.log: log file from plinkoutput_callisaurus_051917_n102_h5_p75_ex_outgr_plink.nosex: a PLINK output that flags individuals with missing or ambiguous sex informationoutput_callisaurus_051917_n102_h5_p75_ex_outgr.map: SNP map information (chromosome, SNP ID, position)output_callisaurus_051917_n102_h5_p75_ex_outgr.ped: genotype data in a large text table (individual IDs + genotypes)plink-calli_020625.log: log file from plinkplink.job: jobscript file used to run plink
inputs/admix-calli_K5-12.log: log file from Admixtureadmixture.job: jobscript used to run Admixtureoutput_callisaurus_051917_n102_h5_p75_ex_outgr_plink.bed: input file for Admixture; binary genotype dataoutput_callisaurus_051917_n102_h5_p75_ex_outgr_plink.bim: input file for Admixture; extended SNP mapoutput_callisaurus_051917_n102_h5_p75_ex_outgr_plink.fam: input file for Admixture; family/individual information (sample IDs, sex, phenotype)
results_plotting/admixture_on_map_callisaurus_020825.R: R script used to plot Admixture results on a mapadmixture_plot_callisaurus_rangewide.R: R script used to generate barplotscallisaurus_K12_run2.data: output data for K=12callisaurus_rangewide_gps.data: GPS coordinates for range-wide Callisaurus. Rounded to a precision of two decimal degrees for the purposes of this archive.CVE.csv: cross-validation errors, used to determine optimal Koutput_callisaurus_051917_n102_h5_p75_ex_outgr_plink.[k].P: allele frequencies output by Admixtureoutput_callisaurus_051917_n102_h5_p75_ex_outgr_plink.[k].Q: ancestry proportions estimated by Admixture
11_callisaurus_rangewide_raxml.zip
This package contains the inputs, outputs, and other files necessary to reproduce the RAxML analyses for the range-wide Callisaurus ddRAD data, presented in Figure 9 in the article. To learn more, see the RAxML website.
output_callisaurus_051917_n102_h5_p75.phy: The input file from pyradRAxML_bipartitions.output_callisaurus_051917_n102_h5_p75_raxml.tre: The output.trefile presented in Figure 9raxml_callisaurus.log: Log file from the analysisraxml-jobscript.job: Jobscript file used to run the analysis
- Gottscho, Andrew D.; Hollingsworth, Bradford D.; Espinal, Julio Lemos et al. (2025). Comparative Phylogeography of Phrynosomatid Lizards in Baja California: Asynchronous Divergences and Expansion of Callisaurus draconoides Across the North American Deserts. Journal of Biogeography. https://doi.org/10.1111/jbi.70075
