Population genomics of flat-tailed horned lizards (Phrynosoma mcallii) informs conservation and management across a fragmented Colorado Desert landscape
Data files
Apr 02, 2024 version files 85.46 MB
-
adegenet.zip
696.20 KB
-
admixture.zip
5.24 MB
-
EEMS.zip
1.06 MB
-
iPyrad.zip
34.35 MB
-
IQtree.zip
17.72 MB
-
Moments.zip
7.89 MB
-
NCBI.zip
2.63 KB
-
README.md
3.99 MB
-
splitstree4.zip
10.34 MB
-
vcf.zip
4.16 MB
Abstract
Phrynosoma mcallii (flat-tailed horned lizards) is a species of conservation concern in the Colorado Desert of the United States and Mexico. We analyzed ddRADseq data from 45 lizards to estimate population structure, infer phylogeny, identify migration barriers, map genetic diversity hotspots, and model demography. We identified the Colorado River as the main geographic feature contributing to population structure, with the populations west of this barrier further subdivided by the Salton Sea. Phylogenetic analysis confirms that northwestern populations are nested within southeastern populations. The best-fit demographic model indicates Pleistocene divergence across the Colorado River, with significant bidirectional gene flow, and a severe Holocene population bottleneck. These patterns suggest that management strategies should focus on maintaining genetic diversity on both sides of the Colorado River and Salton Sea. We recommend additional lands in the U.S. and Mexico that should be considered for similar conservation goals as those in the Rangewide Management Strategy (RMS). We also recommend periodic rangewide genomic sampling to monitor ongoing attrition of diversity, hybridization, and changing structure due to habitat fragmentation, climate change and other long-term impacts.
https://doi.org/10.5061/dryad.5x69p8dbj
A ddRADseq dataset was collected for 45 lizards (including outgroups). Sequencing occurred on an Illumina NextSeq. FASTQ data were processed, including mapping to the P. platyrhinos reference genome, using iPyrad. After filtering with VCFtools, the data were analyzed with the software packages adegenet, Admixture, splitstree, IQTree2, EEMS, and moments. Inputs, outputs, intermediate files, jobscripts, and other metadata for these analyses are included in this data package. Methods are detailed in the paper. Questions are welcome, please contact the corresponding author at gottschoa@si.edu.
Description of the data and file structure
There are nine zipped files which unpack to the following directories.
adegenet.zip includes the DAPC analysis, run with the adegenet library, in R version 4.3.1.
- adegenet.R is the script used to run the analyses and generate plots.
- thin.recode.vcf contains the filtered input data.
- coords.csv contains GPS coordinates (rounded to a precision of two decimal points).
admixture.zip contains the results of the Admixture analysis.
- script.sh is the jobscript with the command line used to run Admixture.
- Directories named run1…run10 represent 10 replicate runs. Within each run directory, the files are structured as follows. There are eight files named final.1.P…final.8.P, eight files named final.1.Q…final.8.Q, and and eight files named results1.txt…results8.txt which collectively represent the output files under K=1-8. See the Admixture documentation (Alexander et al., 2009) for further details.
admixture_CVE_results.numbers contains cross validation errors (CVE) used to determine optimal K.
- Results of run5 are plotted as k2.png…k8.png (under K=2-8; K=1 not plotted). Results of run5 under K=2-4 are presented in the paper (Figure 4, Figure S2).
EEMS.zip includes the results of the estimated effective migration surfaces (EEMS) analysis.
- how2run.txt contains instructions how to convert the input .vcf data to the .diffs format necessary for EEMS. The files from this process are found in the vcf directory (thin.recode.vcf, mcallii.bed, mcallii.bim, mcallii.diffs, mcallii.fam, mcallii.log, mcallii.order).
- The data directory contains the inputs to EEMS. mcallii.coord contains the GPS coordinates, rounded to a precision of two decimal points after the analysis. mcallii.diffs contains the genotype data converted as described above. mcallii.outer contains the outer bounds of the geographic region analyzed.
- mcallii300-chain1 is the directory containing the EEMS analysis presented in Figure 5. There are 31 text files which are beyond the scope of this readme, see Petkova et al. (2016) for extensive documentation.
- A plotting script in R (
eems_plotting.R) is provided.
iPyrad.zip contains the results of iPyrad, the pipeline used to process raw FASTQ data to primary outputs.
- all_samples includes the hybrids and outgroup (n=45). This dataset was used for phylogenetic and network analyses. Input parameters to the pipeline are in params-mcallii.txt. Output files include mcallii_stats.txt, mcallii.nex, mcallii.phy, mcallii.snps, and mcallii.vcf. The recoded VCF and log are also provided (biallelic.recode.vcf, biallelic.log).
- ingroup excludes the hybrids and outgroup (n=42) . This dataset was used for population structure, demographic modeling and EEMS. Input parameters to the pipeline are in params-mcallii.txt. Output files include mcallii_stats.txt, mcallii.nex, mcallii.phy, mcallii.snps, and mcallii.vcf. The recoded VCF and log are also provided (biallelic.recode.vcf, biallelic.log).
IQtree.zip includes the results of the rooted phylogenetic analysis.
- The top level files represent the analysis that was run to use the optimal nucleotide substitution model with ModelFinder. iqtree_MF.job is the jobscript used to run the program and iqtree_MF_011224.log is the resulting log. mcallii.phy is the input matrix. The rest of the files (mcallii.phy.ckp.gz, mcallii.phy.iqtree, mcallii.phy.log, mcallii.phy.model.gz, mcallii.phy.tre, and mcallii.phy.treefile) are outputs or intermediate files.
- The bootstrapped directory contains the files run with the best-fit model and bootstrap analysis, including the final tree files presented in the paper. iqtree_bb.job is the jobscript used to run the program and iqtree_bb_011224.log is the resulting log. mcallii.phy is the input matrix. Outputs and intermediate files include mcallii.phy.bionj, mcallii.phy.ckp.gz, mcallii.phy.contree, mcallii.phy.iqtree, mcallii.phy.log, mcallii.phy.mldist, mcallii.phy.splits.nex,
2024_mcallii.phy.tre, and mcallii.phy.treefile. The outgroup was pruned for visualization purposes in Figure 3, see 2024_mcallii_pruned.phy.tre.
Moments.zip contains the inputs, outputs, and intermediate files for the 16 demographic models tested in this study.
- summary_021623.xlsx summarizes the best-fit model and demographic_conversions_021623.xlsx contains the equations used to convert raw parameters to demographic values (individuals, years).
- There are 13 .R, .py and .txt files (Summarize_Outputs.py, Models_2D.py, moments_2D_00_projections.py, moments_Run_2D.py, moments_Run_Optimizations.py, Optimize_Functions.py, Optimize_Functions4plots.py, Optimizer_GOF.py, Plot_GOF.R, plot_model.py, Results_Summary_Extended.txt, Results_Summary_Short.txt, Simulate_and_Optimize.py) that were used to run the analyses and review results, adapted from Leaché et al. (2019) and Portik et al. (2017).
- There are 32 .txt files that constitute log and optimized files for 16 models. For example, west_east.anc_asym_mig_size.log.txt and west_east.anc_asym_mig_size.optimized.txt represent the files for the anc_asym_mig_size model. The 16 models are presented in Figure S1 and are adapted from Leaché et al. (2019). It is recommended to compare the list of .txt files here to Figure S1 to decode which model is which.
- west-east.sfs represents the site frequency spectrum.
NCBI.zip documents the BioSample accession numbers and URLs for raw FASTQ files (BioSampleObjects.txt and Objects.txt)
splitstree4.zip includes the input (ingroup.phy) and output files (pmcalli-splitstree) from this unrooted analysis.
vcf.zip contains instructions and logs for how to covert output files from pyRAD into the thinned / filtered versions suitable for downstream analysis.
ingroup.vcf is the input file.
- thin.recode.vcf is the output file.
- how to convert to bed.sh contains instructions on how to convert to a bed file.
- final.bed, final.bim, final.fam represent intermediate files.
- final.log and thin.log are the log files.
Sharing/Access information
Raw sequence data in FASTQ format have been deposited in NCBI (BioProject: PRJNA817579; Biosample: SAMN26796821 - SAMN26796865; SRA Accession: SRX14486053 - SRX14486097).
A ddRADseq dataset was collected for 45 lizards (including outgroups). Sequencing occurred on an Illumina NextSeq. Raw sequence data were processed (including mapping to the P. platyrhinos reference genome) using iPyrad. After filtering with VCFtools, the data were analyzed with the software packages adegenet (DAPC), Admixture, splitstree, IQTree, EEMS, and moments. Inputs, outputs, jobscripts, and other metadata for these analyses are included in this data package. Full methods are detailed in the paper.