Skip to main content

Saguaro cactus within-species phylogenomics

Cite this dataset

Sanderson, Michael (2022). Saguaro cactus within-species phylogenomics [Dataset]. Dryad.


Reconstructing accurate historical relationships between populations within a species poses numerous challenges, not least in many plant groups in which gene flow can extend well beyond species boundaries. Nonetheless, the extent of tree-like history within a species is an empirical question on which it is now possible to bring large amounts of genome sequence to bear. We assess phylogenetic structure across the geographic range of the saguaro cactus, an emblematic member of Cactaceae, a clade known for extensive hybridization and porous species boundaries. Using 200 Gb of whole genome resequencing data from 20 individuals sampled from 10 localities, we assembled two data sets comprising 150,000 biallelic single nucleotide polymorphisms (SNPs) from protein coding sequences. From these we infer a population tree and evaluate its significance and robustness using five qualitatively different inference methods. Despite the low sequence diversity, large census population sizes, and presence of wide-ranging pollen and seed dispersal agents, population trees were well resolved and highly consistent across both data sets and all methods. We inferred that the most likely root, based on marginal likelihood comparisons, is to the east and south of the region of highest genetic diversity, which lies along the coast of the Gulf of California. Together with striking decreases in marginal likelihood found to the north, this supports hypotheses that saguaro's current range reflects post-glacial expansion from the far south following retreat to refugia there. We conclude with observations about practical and theoretical issues raised by phylogenomic data sets within species, in which SNP-based methods must be used rather than gene tree methods that are widely used when sequence divergence is higher. These include computational scalability, inference of gene flow, and proper assessment of statistical support in the presence of linkage effects.


This data set is a tarred and gzipped archive of all phylogenetic data sets, VCF files, scripts and analysis results used in our study of saguaro within-species phylogenomics. Also included is the Supplementary Information document from the published paper.

Usage notes

After uncompressing and unpacking the archive, the user will find a README file in each directory in the hierarchy.


U.S. National Science Foundation, Award: 1735604