Building a robust backbone for Astragalus (Fabaceae) using a clade-specific target enrichment bait set
Data files
Aug 13, 2025 version files 1.40 MB
-
astragalean_bait_reference_file.zip
947.53 KB
-
astral_MO_ortho_publication_AJB.newik
6.55 KB
-
astral-pro_homologs_publication_AJB.tre
4.95 KB
-
bait_sequences_astra819.tar.gz
418.63 KB
-
IQtree2_Astragalus_concatenated_tree_publication_AJB.newik
6.61 KB
-
IQtree2_Astragalus_plastome_tree_publication_AJB.figtree
11.95 KB
-
README.md
2.22 KB
Abstract
With over 3100 species, Astragalus (Fabaceae) has long fascinated botanists as the largest genus of flowering plants. With an origin in the Middle Miocene, Astragalus has one of the highest diversification rates known in flowering plants. Comprehensive taxonomic treatments exist, and the genus is currently subdivided into 136 sections in the Eastern Hemisphere and 93 sections in the Western Hemisphere based on morphological characters. Despite considerable efforts, a comprehensive and well-resolved phylogeny of the genus is still lacking.
Here, we reconstruct the backbone phylogeny of Astragalus using a custom bait set capturing 819 loci specifically designed for a target enrichment approach in the Astragalean clade. We carefully selected a set of 107 taxa representing all major clades currently recognized in Astragalus. Of those, 80 newly sequenced taxa were obtained from herbarium specimens as old as 110 years.
We retrieved all the targeted loci and additional off-target plastome sequences for all samples, including the 80 herbarium specimens. Our phylogenetic analysis reinforced the currently accepted backbone phylogeny of Astragalus with high support and novel details, additionally providing insights into cytonuclear phylogenetic conflicts in the genus. Evidence for potential reticulate evolution was found, providing a possible explanation for the conflicts observed.
This work represents an important milestone in obtaining a comprehensive, herbarium-based phylogeny of Astragalus, which will constitute the base to study a wealth of relevant biological questions, for example, the still unanswered question of what drove the rapid diversification of Astragalus, with important repercussions on our understanding of diversification in natural contexts.
Dataset DOI: 10.5061/dryad.79cnp5j7g
Description of the data and file structure
This repository contains data and codes necessary to replicate the results in Buono et al. (under Review). The study aims to reconstruct the backbone phylogeny of Astragalus (Fabaceae) using target enrichment data obtained from herbarium material. The data covers 80 species of which 77 Astragalus and 3 outgroups obtained from herbarium material, plus additional sequences obtained from previous studies. The data includes 819 exons from 686 genes. Phylogenetic analysis recovered a well supported phylogeny using both concatenated maximum likelihood (IQtree2) and coalescent-based method (Astral and Astral-pro). Gene discordance (PhyPart and QuartetSampling) analysis highlighted existence of gene discordance especially in shallow nodes in the phylogeny. Evidence of reticulate evolution was found using phylonet analysis.
Files and variables
File: IQtree2_Astragalus_concatenated_tree_publication_AJB.newik
Description: ML plastome tree produced using IQtree2
File: bait_sequences_astra819.tar.gz
Description: bait sequences for Astragalean819 bait set
File: astragalean_bait_reference_file.zip
Description: reference file for Astragalean819 bait set
File: astral_MO_ortho_publication_AJB.newik
Description: ASTRAL coalescent-based species tree produced with ortholog sequences using ASTRAL III
File: astral-pro_homologs_publication_AJB.tre
Description: ASTRAL-pro coalescent-based species tree produced with homolog sequences using ASTRAL-pro
File: IQtree2_Astragalus_concatenated_tree_publication_AJB.newik
Description: ML concatenated species tree produced with homolog sequences using IQtree2
File: IQtree2_Astragalus_plastome_tree_publication_AJB.figtree
Description: ML plastome tree produced with plastome sequences using IQtree2
Access information
Other publicly accessible locations of the data:
-
Target enrichment data generated for this study can be found in the NCBI BioProject
PRJNA1242075
