Data from: A global blueberry phylogeny: Evolution, diversification, and biogeography of the tribe Vaccinieae (Ericaceae)
Data files
Oct 22, 2025 version files 103.88 MB
Abstract
Vaccinieae is a morphologically diverse and species-rich (~1440 species) tribe in Ericaceae, itself a clade of ~4600 species. Although the majority of diversity is tropical, Vaccinieae are best known for temperate crops (i.e., blueberries, cranberries, huckleberries, lingonberries) in Vaccinium. Vaccinium itself (~500 species) has been previously suggested as highly polyphyletic, and taxonomic boundaries among many of the other genera in the tribe remain uncertain. We assessed the evolutionary history of Vaccinieae with phylogenomic analyses based on a target-enrichment dataset containing 353 low-copy nuclear gene regions and over 200 taxa representing 30 of the 34 genera in the tribe, and 25 of the 29 sections of Vaccinium. A plastome dataset for a subset of these taxa was additionally constructed. We conducted time-calibrated biogeographic analyses and diversification analyses to explore the area of origin and global dispersal history of the tribe. The analysis recovered a temperate North American origin for Vaccinieae approximately 30 million years ago. Tropical diversity of Vaccinieae was inferred to result from multiple, independent movements into the tropics from north-temperate ancestors. Diversification rate increases corresponded to radiation into the Andes and SE Asia. The pseudo-10-locular ovary evolved once in the tribe from the five-locular state, coinciding with the diversification of a major clade that includes most Asian Vaccinium and the group from which commercial blueberries are derived (V. sect. Cyanococcus). A reconstruction from available chromosome counts suggests that a major polyploid event predated the evolution of nearly half the diversity of Vaccinieae. The extent of polyphyly in Vaccinium documented here supports the need for a generic reclassification of the tribe.
https://doi.org/10.5061/dryad.ksn02v7cr
Description of the data and file structure
These are alignment files (.fasta) of sequences that have already been through a filtering process (see methods), that we used to generate the main topologies in our study with IQtree. This submission includes (1) the nuclear dataset of 262 samples of 353 nuclear loci, and (2) the other is 78 of the plastome for 122 species.
Sharing/Access information
The data, novel scripts, and output files that support the findings of this study are openly available in supplementary materials to the related article. Raw sequence reads are available from NCBI PRNA839108, PRJEB49299, and PRJEB51566.
DNA sequence data were derived from three sources. The DNA of most samples was extracted from herbarium specimens with the SLIMS method, a system connecting sampling and wet lab methodology (Folk et al., 2021). Specimens were sampled from the following herbaria (acronyms as in Thiers, updated continuously): FLAS, K, MICH, MO, and NY. Some DNAs (those from PWF) were from silica gel-dried leaf material. The DNA of these samples was extracted with DNeasy®. We quantified DNA amounts of these samples using a Qubit 2.0 fluorometer (Invitrogen, Carlsbad, California, USA) with the Qubit dsDNA Broad Range Assay Kit as per manufacturer recommendations. Finally, some sequence data were downloaded from the Kew DNA Bank.
Library preparation and target capture were performed at Rapid Genomics (Gainesville, FL, USA) with the Angiosperms353 v1 target capture kit (Johnson et al., 2019) to obtain 353 putatively single-copy nuclear loci from each sample. DNA sequencing was conducted on Illumina sequencing machines (Illumina, San Diego, California, USA) predominantly with 2x150-base pair (bp) chemistry, or occasionally 2x250-bp. Additionally, a plastome dataset was assembled from genome-skimming reads from non-enriched libraries.
For the nuclear loci, raw sequence data were filtered and adapters removed with Cutadapt v2.6 (Martin, 2011) and FastQC v.0.11.9 (Andrews, 2010), with a phred quality score cutoff of 20 (-q 20). Reads were assembled with HybPiper v1.3.1 (Johnson et al., 2016) under default settings. In addition to the standard Angiosperms353 targets, we included available Ericales sequences in the target reference file using the mega353 approach (McLay et al., 2021). Resulting supercontig sequences (introns and exons) were used for subsequent analyses. Putative paralogs were flagged with the paralog_investigator.py script in HybPiper. All flagged loci were removed from the dataset.
For the plastome dataset, we constructed a target file from available Ericales sequences for 79 protein-coding plastid loci as reference. HybPiper was run as above to extract and assemble the dataset.
Individual gene alignments (371 initial samples), a concatenated alignment for the nuclear data, and a concatenated single alignment for the plastome data were constructed with MAFFT v7.245 (Katoh et al., 2013). To reduce potential issues with missing data and poorly aligned regions, we removed columns from the individual gene alignments containing > 50% missing data and samples containing > 90% missing data in the concatenated datasets. These quality filtering steps resulted in a final nuclear dataset of 260 samples and 122 samples in the plastome dataset (Supplementary Table S2).
- Becker, Anna; Crowl, Andrew A.; Luteyn, James L. et al. (2024). A Global Blueberry Phylogeny: Evolution, Diversification, and Biogeography of Tribe Vaccinieae (Ericaceae) [Preprint]. Elsevier BV. https://doi.org/10.2139/ssrn.4837226
- Becker, Anna L.; Crowl, Andrew A.; Luteyn, James L. et al. (2024). A global blueberry phylogeny: Evolution, diversification, and biogeography of Vaccinieae (Ericaceae). Molecular Phylogenetics and Evolution. https://doi.org/10.1016/j.ympev.2024.108202
