Data from: Phylogeny and the evolution of flower symmetry in Posoqueria (Rubiaceae) using the universal Angiosperms353 probe set
Data files
May 25, 2026 version files 52.34 MB
-
appendix_1_vouchers.csv
3.51 KB
-
astral_tree.pdf
4.24 KB
-
astral.tree
6.86 KB
-
castles_tree.pdf
3.54 KB
-
castles.tree
7.24 KB
-
concatenated_tree.pdf
3.93 KB
-
concatenated.tree
6.79 KB
-
fig_S3.tif
19.16 MB
-
fig_S4.tif
11.31 MB
-
final_msa.zip
361.43 KB
-
paragone_stats.xlsx
15.37 KB
-
posoq.fasta
4.27 MB
-
README.md
8.06 KB
-
treeshrink_stats.xlsx
15.15 KB
-
video1_Persson_3963_Posoqueria_latifolia.mp4
8.60 MB
-
video2_Posoqueria_latifolia.mp4
8.56 MB
Abstract
This dataset accompanies the manuscript “Phylogeny and the Evolution of Flower Symmetry in Posoqueria (Rubiaceae) Using the Universal Angiosperms353 Probe Set”. The dataset includes voucher information, sequence alignments from 177 nuclear loci, concatenated and coalescent phylogenetic trees (ASTRAL, CASTLES, and concatenated tree), the filtered target file (posoq.fasta) used for sequence assembly (comprising the standard Angiosperms353 set plus subsampled Rubiaceae sequences from the 1KP project), Orthology Inference (Paragone) and TreeShrink summary statistics, supplementary heatmaps from Hybpiper of sequence recovery and paralog detection, and videos documenting the Pollen Catapult Mechanism (PCM) in Posoqueria latifolia. These data support analyses of phylogenetic relationships, flower symmetry evolution, and taxonomic patterns within the genus.
Dataset DOI: 10.5061/dryad.9w0vt4brk
Description of the data and file structure
Data were collected to investigate phylogenetic relationships and the evolution of flower symmetry in the Neotropical genus Posoqueria (Rubiaceae). Target sequence capture using the universal Angiosperms353 probe set was performed on 42 samples representing 15 Posoqueria species and two outgroup taxa. DNA was obtained from silica-dried material, herbarium specimens, and previously published datasets. Sequencing data were used for sequence recovery, orthology inference, phylogenetic reconstruction, and ancestral state analyses focused on the evolution of the Pollen Catapult Mechanism (PCM) and floral symmetry within the genus.
Files and variables
File: appendix_1_vouchers.csv
Description: Spreadsheet containing voucher information, taxonomic identifications, collection details, herbarium information, and raw sequencing accession numbers for all samples included in the study.
Variables:
- N = sample number
- Species = taxonomic identification of the sample
- Author = taxonomic authority for the species name
- Collector = collector or collecting team
- Collector No. = collector voucher number (* Sequences generated by Antonelli et al. (2021). ** Tissue sourced from pressed herbarium samples. The remaining collections were sourced from silica-dried leaves.)
- Herbarium = herbarium acronym following Index Herbariorum
- Country = country of collection
- Raw Read Accession Number = accession number for raw sequencing reads in ENA or SRA
File: final_msa.zip
Description: Compressed archive containing the final multiple sequence alignments (MSA) for the 177 orthologous nuclear loci used in phylogenetic analyses after alignment cleaning, TAPER filtering, and TreeShrink filtering. Files are provided in FASTA format.
File: paragone_stats.xlsx
Description: Summary statistics generated during the Paragone orthology inference analysis. The spreadsheet contains two sheets (“pre_paragone” and “post_paragone”) summarizing the number of sequences recovered per gene before and after orthology inference and filtering.
Variables:
- Gene = locus/gene identifier
- N sequences per gene = number of sequences recovered for each locus
File: treeshrink_stats.xlsx
Description: Summary statistics generated by TreeShrink, containing information on sequence counts per locus before and after TreeShrink filtering. The spreadsheet contains two sheets (“pre_Treeshrink” and “post_Treeshrink”) summarizing the number of sequences recovered per gene before and after running TreeShrink.
Variables:
- Gene = locus/gene identifier
- N sequences per gene = number of sequences recovered for each locus
File: astral.tree
Description: Coalescent species tree inferred with ASTRAL-III from 177 nuclear gene trees. Branch support values correspond to Local Posterior Probabilities (LPP).
File: astral_tree.pdf
Description: PDF visualization of the ASTRAL species tree used in the manuscript figures and interpretation of phylogenetic relationships.
File: castles.tree
Description: Species tree generated using CASTLES (Coalescent-Aware Species Tree Length Estimation in Substitution Units), derived from the ASTRAL species tree and gene trees. Branch lengths are expressed in substitution units and were used for ancestral state reconstruction analyses.
File: castles_tree.pdf
Description: PDF visualization of the CASTLES species tree with branch lengths in substitution units.
File: concatenated.tree
Description: Maximum likelihood phylogenetic tree inferred from the concatenated alignment of the 177 nuclear loci using IQ-TREE.
File: concatenated_tree.pdf
Description: PDF visualization of the concatenated maximum likelihood phylogenetic tree.
File: posoq.fasta
Description: Filtered target reference file used for HybPiper sequence assembly and recovery. The file includes the standard Angiosperms353 target sequences plus additional Rubiaceae sequences subsampled from the 1KP project to improve locus recovery in Posoqueria.
File: fig_S4.tif
Description: Supplementary heatmap figure showing paralog detection across loci and samples generated from HybPiper analyses. Color intensity represents the number of paralog sequences detected per gene and sample.
File: fig_S3.tif
Description: Supplementary heatmap figure showing sequence recovery across loci and samples. Values represent the proportion of recovered sequence length relative to the target reference length.
File: video2_Posoqueria_latifolia.mp4
Description: Video documenting the Pollen Catapult Mechanism (PCM) in Posoqueria latifolia, showing explosive pollen release during floral triggering.
File: video1_Persson_3963_Posoqueria_latifolia.mp4
Description: Video documenting the Pollen Catapult Mechanism (PCM) in Posoqueria latifolia specimen Persson et al. 3963 (Panama), showing explosive pollen release during floral triggering.
Missing values:
Missing or unavailable values are indicated as NA or left as blank cells, depending on the original file format.
Code/software
The dataset files can be viewed and analyzed using standard open-source or freely available software. Sequence alignments in FASTA format can be opened with any text editor or bioinformatics software such as AliView, Geneious Prime, MEGA, or BioEdit. Phylogenetic tree files in Newick format (.tree) can be visualized using FigTree v1.4.4, iTOL, or other tree-viewing software. Spreadsheet files (.csv) can be opened with Notepad, Microsoft Excel, or Google Sheets. Video files (.mp4) can be viewed using VLC Media Player or other standard media players. TIFF image files (.tif) can be opened using ImageJ/Fiji, GIMP, Adobe Photoshop, or other image visualization software. PDF files can be opened using any standard PDF viewer.
The phylogenomic workflow used in this study involved the following software and versions: FastQC for sequencing quality assessment, Fastp for read trimming and filtering, MultiQC for report summarization, HybPiper v2.1.2 for target sequence recovery and de novo assembly, SPAdes v3.15.5 for assembly, Exonerate v2.4.0 for coding sequence extraction, MAFFT v7.515 for sequence alignment, Phyutility v2.7.3 for alignment cleaning, TAPER v1.0.0 for misalignment filtering, IQ-TREE v2.2.2.2 for phylogenetic inference, TreeShrink v1.3.9 for long-branch filtering, ASTRAL-III v5.7.8 for species tree reconstruction, CASTLES for branch length estimation in substitution units, and the R package phytools v2.1.1 for ancestral state reconstruction analyses. Orthology inference was performed using the Paragone-NF pipeline.
Access information
Raw sequencing reads generated in this study are deposited in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1457116 (https://www.ncbi.nlm.nih.gov/bioproject/1457116). Previously published Angiosperms353 datasets incorporated into this study were obtained from the European Nucleotide Archive (ENA) project PRJEB35285 (Antonelli et al., 2021). The expanded target reference file (posoq.fasta) includes sequences derived from the publicly available Angiosperms353 probe set and Rubiaceae sequences subsampled from the 1KP project using the “mega353” target file published by McLay et al. (2021).
Public resources used in this study include:
Angiosperms353 target set: https://github.com/mossmatters/Angiosperms353
Expanded “mega353” target resource and scripts: https://github.com/chrisjackson-pellicle/NewTargets
The original sequence resources and software tools retain their respective licenses and terms of use as provided by the original authors and repositories.
