Phylogenomic challenges in polyploid-rich lineages: Insights from paralog processing and reticulation methods using the complex genus Packera (Asteraceae: Senecioneae)
Data files
May 14, 2026 version files 24.62 MB
-
alignment_files.zip
24.30 MB
-
PhyTop_results.zip
65.62 KB
-
README.md
5.72 KB
-
RelTime_trees.zip
224.05 KB
-
speciestree_files.zip
24.59 KB
Abstract
Phylogenomic discordance is pervasive and cannot always be resolved by increasing the amount of sequencing data alone. Biological processes such as polyploidy, hybridization, and incomplete lineage sorting are major contributors to discordance and must be accounted for to avoid misleading evolutionary interpretations. To better understand how these processes influence phylogenetic reconstruction, we conducted a comprehensive phylogenomic study in the complex genus Packera. With over 90 species and varieties, 40% of which exhibit polyploidy, aneuploidy, or other cytological complexities, Packera presents significant challenges for phylogenetic reconstruction. Given these complexities, we assessed different published paralog processing methods on the resulting evolutionary relationships and phylogenetic support of this group. We then applied three of these methods to evaluate their impact on tree topology and our understanding of Packera’s evolutionary history by constructing a time-calibrated phylogeny, reconstructing historical biogeography, and testing for ancient reticulation. Phylogenetic outcomes varied based on the paralog processing method used, with no method outperforming others. Our findings highlight the large impact of orthology inference and paralog processing on phylogenomic analyses, particularly in polyploid-rich groups such as Packera, and we offer guidance on methodological impacts along with practical recommendations. We note that gaining a robust understanding of Packera's evolutionary history requires more than computational approaches alone. While technological advancements have greatly expanded our ability to analyze genomic data, effective phylogenomic research still relies on strong taxon sampling and detailed species knowledge. Without careful attention to biological context, phylogenomic studies risk misinterpreting evolutionary history and processes. By integrating genomic results with knowledge of the study system, we can begin to improve the accuracy of evolutionary reconstructions and gain deeper insights into the complex history of plant diversification.
Dataset DOI: 10.5061/dryad.bg79cnppn
Description of the data and file structure
Alignment and tree files for some analyses used in this study. More detailed descriptions are provided below with each file description.
Files and variables
File: alignment_files.zip
Description: Folder containing all nuclear and plastid gene alignment fasta files used as input to generate species trees.
Nuclear:
- HybPiper-df_concat_best.fasta - Alignment file of the gene trees used in the HybPiper-df analysis. Contains 111 Senecioneae taxa and 1,051 genes.
- wASTRAL_concat_best.fasta - Alignment file of the gene trees used in the weighted ASTRAL (wASTRAL) analysis. Contains all Senecioneae taxa and 1,051 genes.
- ASTRALPro_concat_best.fasta - Alignment file of the gene trees used in the ASTRAL-Pro analysis, which contains all orthologs and paralogs (1,048 genes total) identified in the 111 Senecioneae taxa.
- concatenated_concat_best.fasta - Alignment file of the gene trees used in the Concatenated analysis. Contains 111 Senecioneae taxa and 1,051 genes.
- 1-to-1_concat_best.fasta - Alignment file of the gene trees used in the "one-to-one orthologs" (1-to-1) analysis from paragone-nf. Contains 111 Senecioneae taxa and 285 genes.
- MI_concat_best.fasta - Alignment file of the gene trees used in the "Maximum Inclusion" (MI) analysis from paragone-nf. Contains 111 Senecioneae taxa and 7,388 genes.
- MO_concat_best.fasta - Alignment file of the gene trees used in the "Monophyletic Outgroup" (MO) analysis from paragone-nf. Contains 111 Senecioneae taxa and 803 genes.
- RT_concat_best.fasta - Alignment file of the gene trees used in the "Rooted Ingroups" (RI) analysis from paragone-nf. Contains 111 Senecioneae taxa and 6,398 genes.
Plastid:
- 111mapping_Packera_plastomes.fasta - Alignment file of all 111 Senecioneae taxa that were partially complete and mapped to a reference using Bowtie.
- 18complete_Packera_plastomes.fasta - Alignment file of the 18 Senecioneae taxa with complete plastomes from GetOrganelle.
File: speciestree_files.zip
Description: Folder containing all nuclear and plastid tree files used in this study.
Nuclear:
- HybPiper-df_Astral_lpp.tre - Tree file of the final species tree (n = 111) generated with ASTRAL-III from the HybPiper-df analysis.
- wASTRAL_Astral_lpp.tre - Tree file of the final species tree (n = 111) generated with wASTRAL from the weighted ASTRAL (wASTRAL) analysis.
- ASTRALPro_Astral_lpp.tre - Tree file of the final species tree (n = 111) generated with ASTRAL-Pro from the weighted ASTRAL-Pro analysis.
- concatenated_raxml.tre- Tree file of the final species tree (n = 111) generated with RAxML from the Concatenated analysis.
- 1-to-1_Astral_lpp.tre - Tree file of the final species tree (n = 111) generated with ASTRAL-III from the "one-to-one orthologs" (1-to-1) analysis.
- MI_Astral_lpp.tre - Tree file of the final species tree (n = 111) generated with ASTRAL-III from the "Maximum Inclusion" (MI) analysis.
- MO_Astral_lpp.tre - Tree file of the final species tree (n = 111) generated with ASTRAL-III from the "Monophyletic Outgroups" (MO) analysis.
- RT_Astral_lpp.tre - Tree file of the final species tree (n = 111) generated with ASTRAL-III from the "Rooted Ingroups" (RT) analysis.
Plastid:
- 111mapping_Packera_plastomes.tre - Tree file of the final species tree (n = 111) generated with IQ-TREE using partially complete plastomes from Bowtie as input.
- 18complete_Packera_plastomes.tre - Tree file of the final species tree (n = 111) generated with IQ-TREE using complete plastomes from GetOrganelle as input.
File: PhyTop_results.zip
Description: Folder containing the table separated value (.tsv) files and tree (.tree) file output from PhyTop of each ASTRAL analysis: HybPiper-df, wASTRAL, ASTRAL-Pro, 1-to-1, MO, MI, and RT. Headers in the tsv file represent: node: node number that corresponds to the corresponding tree file; n: the number of gene trees; p_value: the p-value indicating whether topologies q2 and q3 are equal; q1/q2/q3: proportion of gene trees that support the q1, q2, or q3 topology, respectively; ILS_explain: the proportion of gene tree incongruence that can be explained by incomplete lineage sorting (ILS); IH_explain: the proportion of gene tree incongruence that can be explained by introgression (IH); ILS_index: the degree of ILS detected among lineages; IH_index: the degree of IH detected among lineages.
File: RelTime_trees.zip
Description: Folder containing dated nuclear tree files of all tested scenarios (Scenarios 1 - 6) of the HybPiper-df, 1-to-1, and MO analyses used in this study. Dating was performed with RelTime using calibration points and scenarios as indicated in Supplementary Table 6. File structure is as follows: [Scenario#]-[Analysis]_[tree ending] (e.g., S2outTest-HybPiper-df_nexus.tre).
Access information
Other publicly accessible locations of the data:
- None
Data was derived from the following sources:
- Raw sequence data are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under BioProjects PRJNA907383, PRJNA978568, PRJNA540287, and PRJNA516161.
Please contact Erika Moore-Pollard (moore.erika.r@gmail.com) if you would like additional data.
