A chromosome-level genome assembly of the beavertail cactus, Opuntia basilaris
Data files
Jun 19, 2025 version files 800.25 KB
Abstract
Few genomic resources currently exist for the American endemic family Cactaceae, a group of around 1850 species, which are world-renowned for their amazing growth forms and succulent habits. These icons of arid landscapes across the Americas are threatened in many parts of their range, including in parts of California, and developing more comprehensive genomic data will aid efforts to better understand and preserve these plants. We sequenced and assembled the genome of the beavertail cactus, Opuntia basilaris, which is represented by three varieties in California, one of which is threatened, and another endangered. The genome assembly has a BUSCO complete score of 98.1%, and a total scaffold length of 980 Mb, with a scaffold N50 length of 83 Mb. The genome size of diploid O. basilaris is markedly smaller than other diploid members of Cactaceae that have been assembled to date. This is the first nuclear genome sequenced in subfamily Opuntioideae and the most complete nuclear genome for Cactaceae to date and will lay the foundation for future genomic work across the biologically and taxonomically complicated prickly pear cacti.
Dataset DOI: 10.5061/dryad.mpg4f4rbp
Description of the data and file structure
Data represent supplemental figures and tables associated with the publication of a chromosome-level assembly genome of Opuntia basilaris var. basilaris (Cactaceae).
Files and variables
File: SupplementaryFigure1.dcOpuBasi1.hifi.readlength.distribution.png
Description: Read length distribution.
File: SupplementaryFigure2.IRplusOutput.dcOpuBasi1.chloroplast.pdf
Description: Map of inverted repeats in chloroplast genome.
File: SupplementaryFigure3.blobtools_dcOpuBasi1.NCBI.a_ctg.snail.png
Description: Snailplot of alternative genome assembly.
File: Supplementary_Tables.xlsx
Description: List and description of removed contaminants.
Tables correspond to Blobtools output per haplotype. The tables show information per contig related to coverage, length, GC content and taxonomic assignment.
Table 1. is Primary haplotype/haplotype 1
Table 2. Alternate haplotype/haplotype 2.
Table 3. Summary of the removed contaminants based on contamination screening upon submission to NCBI. Output is the summary of the FCS:GX contamination screening pipeline.
Code/software
Assembly scripts and other data for the analyses presented can be found at the following GitHub repository: www.github.com/ccgproject/ccgp_assembly
Access information
Other publicly accessible locations of the data:
Data generated for this study are available under NCBI BioProject PRJNA777201. Raw sequencing data for sample Fawcett & Madeiros 1504 (NCBI BioSample SAMN41564419) are deposited in the NCBI Short Read Archive (SRA) under SRR30989401 for PacBio HiFi sequencing data, and SRR30989399, SRR30989400 for the Omni-C Illumina sequencing data. GenBank accessions for both primary and alternate assemblies are GCA_043229145.1 and GCA_043229025.1; and for genome sequences JBFLFP000000000 and JBFLFQ000000000. The GenBank genome assembly for the chloroplast genome is CM091650. The voucher specimen is deposted in the Jepson herbarium (JEPS) at UC Berkeley.