Skip to main content

A new approach using targeted sequence capture for phylogenomic studies across Cactaceae

Cite this dataset

Acha, Serena; Majure, Lucas (2022). A new approach using targeted sequence capture for phylogenomic studies across Cactaceae [Dataset]. Dryad.


Relationships within the major clades of Cactaceae are relatively well known based on DNA sequence data mostly from the chloroplast genome. Nevertheless, some nodes along the backbone of the phylogeny, and especially generic and species-level relationships, remain poorly resolved and are in need of more informative genetic markers. In this study, we propose a new approach to solve the relationships within Cactaceae, applying a targeted sequence capture pipeline. We designed a custom probe set for Cactaceae using MarkerMiner and complemented it with the Angiosperms353 probe set. We then tested both probe sets against 36 different transcriptomes using Hybpiper preferentially retaining phylogenetically informative loci and reconstructed the relationships using RAxML-NG and Astral. Finally, we tested each probe set through sequencing 96 accessions, representing 88 species across Cactaceae. Our preliminary analyses recovered a well-supported phylogeny across Cactaceae with a near identical topology among major clade relationships as that recovered with plastome data. As expected, however, we found incongruences in relationships when comparing our nuclear probe set results to plastome datasets, especially at the generic level. Our results reveal great potential for the combination of Cactaceae-specific and Angiosperm353 probe set application to improve phylogenetic resolution for Cactaceae and for other studies.


We used MarkerMiner (MM) 1.0 as implemented in the University of Florida High-Performance Cluster. We, therefore, used 15 transcriptomes representing all the main clades in Cactaceae. In addition, we selected Arabidopsis thaliana as the annotated reference genome (TAIR10) due to this lineage being the closest to Cactaceae among the datasets implemented in MarkerMiner. We then inspected the results from MarkerMiner and focused exclusively on the strictly single-copy loci. We manually trimmed our locus sequences in Geneious Prime 2020.0.5 to include only single-copy loci that contained at least one suitable exon of >120 bp size and intronic regions of 100 bp or more. To avoid the inclusion of any non-nuclear loci, we performed several BLASTx analyses between all the target sequences and: 1) Beta vulgaris and Arabidopsis thaliana mitochondrial genomes. 2) A. thaliana and Cylindropuntia bigelovii chloroplast genomes. In addition, we performed a BLASTx analysis with A. thaliana whole nuclear genome (Araport11) to confirm and update the MM annotation results. At the same time, we explored the single-copy loci potential annotations from GenBank and TAIR database. Finally, we reduced to one copy any identical loci after a reciprocal BLAST between the MarkerMiner, Angiosperms353, and within each probe set. We then added a subset of the sequences for Caryophyllales that were recovered for Nepenthes mirabilis. This subset included the 296 genes reported for Nephentes plus 26 additional genes retrieved from other Caryophyllales accessions in Johnson et al. (2019) for a total of 322 genes.

Usage notes

The Cactaceae120 loci names start with the corresponding transcriptome short name followed by loci name starting with "AT". The Angiosperms353 follow Johnson et al. (2019) names.


University of Florida

Florida Museum of Natural History