Data from: Complex but clear allopolyploid pattern of subtribe Tussilagininae (Asteraceae: Senecioneae) revealed by robust phylogenomic evidence, with development of a novel homeolog-sorting pipeline
Data files
Jul 18, 2024 version files 1.58 GB
-
data_matrices_and_results.zip
1.58 GB
-
README.md
9.54 KB
Abstract
Polyploidy is a significant mechanism in eukaryotic evolution and is particularly prevalent in the plant kingdom. However, our knowledge about this phenomenon and its effects on evolution remains limited. A major obstacle to the study of polyploidy is the great difficulty in untangling the origins of allopolyploids. Due to the drastic genome changes and the erosion of allopolyploidy signals caused by the combined effects of hybridization and complex post-polyploid diploidization processes, resolving the origins of allopolyploids has long been a challenging task. Here we revisit this issue with the interesting case of subtribe Tussilagininae (Asteraceae: Senecioneae) and by developing HomeoSorter, a new pipeline for network inferences by phasing homeologs to parental subgenomes. The pipeline is based on the basic idea of a previous study but with major changes to address the scaling problem and implement some new functions. With simulated data, we demonstrate that HomeoSorter works efficiently on genome-scale data and has high accuracy in identifying polyploid patterns and assigning homeologs. Using HomeoSorter, the maximum pseudo-likelihood model of Phylonet, and genome-scale data, we further address the complex origin of Tussilagininae, a speciose group (ca. 45 genera and 710 species) characterized by having high base chromosome numbers (mainly x = 30, 40). In particular, the inferred patterns are strongly supported by the chromosomal evidence. Tussilagininae is revealed to comprise two large groups with successive allopolyploid origins: Tussilagininae s.s. (mainly x = 30) and the Gynoxyoid group (x = 40). Two allopolyploidy events first give rise to Tussilagininae s.s., with the first event occurring between the ancestor of subtribe Senecioninae (x = 10) and a lineage (highly probably with x = 10) related to the Brachyglottis alliance, and the resulting hybrid lineage crossing with the ancestor of Chersodoma (x = 10) and leading to Tussilagininae s.s. Then, after early diversification, the Central American group (mainly x = 30) of Tussilagininae s.s., is involved in a third allopolyploidy event with, again, the Chersodoma lineage and produces the Gynoxyoid group. Our study highlights the value of HomeoSorter and the homeolog-sorting approach in polyploid phylogenetics. With rich species diversity and clear evolutionary patterns, Tussilagininae s.s. and the Gynoxyoid group are also excellent models for future investigations of polyploidy.
Description of files
The data_matrices_and_results.zip
contains all the data matrices used in this study as well as the corresponding results. The organization of the data is illustrated below:
├── 1_G727_hyb [# Phylogenetic analyses based on the HybPiper-selected sequences of 727 genes]
│ ├── 1_G727_hyb_individual_genes_and_trees [# Individual alignments of 727 genes and the corresponding maximum likelihood trees, with "xxxxxx" indicating gene codes]
│ │ ├── [xxxxxx].maincopy [# Individual alignments ]
│ │ ├── RAxML_bipartitions.[xxxxxx] [# Maximum likelihood trees]
│ ├── 2_astral_analysis [# ASTRAL analysis based on the G727_hyb dataset]
│ │ ├── astral.tre [# ASTRAL result]
│ │ └── individual_genes.tre [# Individual gene trees of the G727_hyb dataset with nodes having support lower than 30% collapsed; these trees are used for the ASTRAL analysis]
│ └── 3_supermatrix_analysis [# Supermatrix analysis of the G727_hyb dataset]
│ ├── G727_hyb_partition.raxml [# Partition file of the concatenated alignment]
│ ├── G727_hyb_supermatrix.fasta [# Concatenated alignment of the G727_hyb dataset]
│ └── RAxML_bipartitions.G727_hyb_supermatrix [# The result of the supermatrix analysis]
├── 2_G727 [# Phylogenetic analyses based on the 727 genes with all potential paralogous sequences retrieved]
│ ├── 1_G727_individual_genes_and_trees [# Individual gene alignments and corresponding ML trees, with "xxxxxx" indicating gene codes]
│ │ ├── [xxxxxx].fas [# Individual alignments ]
│ │ ├── RAxML_bipartitions.[xxxxxx] [# Maximum likelihood trees]
│ └── 2_rooted_and_collapsing_node_support_less_than_30 [# Rooted individual gene trees, with nodes having support lower than 30% collapsed, serving as the basis for the following Phylonet and HomeoSorter analyses]
│ └── 4_bs30_rooted
│ ├── RAxML_bipartitions.[xxxxxx].bs30.tre.reroot
├── 3_Phylonet_analyses [# A total of 17 Phylonet analyses based on the G727 dataset.]
│ ├── Phylonet_A[x]
│ │ ├── Phylonet_A[x]_1 [# Files for Phylonet analyses, including gene trees and settings]
│ │ ├── Phylonet_A[x]_1.log [# Phylonet results]
├── 4_simulation_tests [# Tests of HomeoSorter, AllCoPol, Phylonet, and MPAllopp, using simulated data]
│ ├── 1_HybridSim_simulations [# Simulation of gene trees using HybridSim]
│ │ ├── 1_allopolyploidy_parental_contribution_50-50 [# Simulation for allopolyploidy with parental contributions of 50 : 50 and coalescence rates ranging from 0.1 to 200]
│ │ │ ├── cr[xx]_input.nex [# Settings for simulation, with "xx" indicating the values of coalescence rates]
│ │ │ ├── cr[xx]_output.txt [# Results of simulation]
│ │ ├── 2_allopolyploidy_parental_contribution_40-60 [# Simulation for allopolyploidy with parental contributions of 40 : 60 and coalescence rates ranging from 0.1 to 200]
│ │ ├── 3_allopolyploidy_parental_contribution_30-70 [# Simulation for allopolyploidy with parental contributions of 30 : 70 and coalescence rates ranging from 0.1 to 200]
│ │ ├── 4_allopolyploidy_parental_contribution_20-80 [# Simulation for allopolyploidy with parental contributions of 20 : 80 and coalescence rates ranging from 0.1 to 200]
│ │ ├── 5_allopolyploidy_parental_contribution_10-90 [# Simulation for allopolyploidy with parental contributions of 10 : 90 and coalescence rates ranging from 0.1 to 200]
│ │ ├── 6_autopolyploidy [# Simulation for autopolyploidy with coalescence rates ranging from 0.1 to 200]
│ ├── 2_HomeoSorter_tests [# HomeoSorter analyses based on the simulated data; for each scenario 100 replicates of HomeoSorter shuffling analyses are run]
│ │ ├── 1_allopolyploidy_parental_contribution_50-50 - 6_autopolyploidy [# Different scenarios for either allopolyploidy or autopolyploidy, different parental contributions and coalescence rates]
│ │ │ ├── cr0.1-cr200
│ │ │ │ ├── r00001-rxxxxx [# Results of 100 replicates of HomeoSorter shuffling analyses]
│ │ │ │ │ ├── final_200_genes.astraltree [# ASTRAL tree suggested by HomeoSorter]
│ │ │ │ │ └── final_200_genes.taxonmap [# Allele assignments suggested by HomeoSorter]
│ ├── 3_AllCoPol_tests [# Results of 100 replicates of AllCoPol analyses]
│ │ ├── 1_allopolyploidy_parental_contribution_50-50 - 6_autopolyploidy [# Different scenarios for either allopolyploidy or autopolyploidy, different parental contributions and coalescence rates]
│ │ │ ├── cr0.1-cr200
│ │ │ │ ├── r00001-rxxxxx
│ │ │ │ │ ├── **.nex
│ │ │ │ │ ├── **.txt
│ ├── 4_Phylonet_tests [# Results of Phylonet analyses]
│ │ ├── 1_allopolyploidy_parental_contribution_50-50 - 6_autopolyploidy [# Different scenarios for either allopolyploidy or autopolyploidy, different parental contributions and coalescence rates]
│ │ │ ├── 1_polyploid_not_specified [# Phylonet analyses with the polyploid unspecified]
│ │ │ ├── 2_polyploid_specified [# Phylonet analyses with the polyploid specified]
│ └── 5_MPAllopp_tests [# Results of MPAllopp analyses]
│ │ ├── 1_allopolyploidy_parental_contribution_50-50 - 6_autopolyploidy [# Different scenarios for either allopolyploidy or autopolyploidy, different parental contributions and coalescence rates]
│ │ │ ├── 1_polyploid_not_specified [# Phylonet analyses with the polyploid unspecified]
│ │ │ ├── 2_polyploid_specified [# Phylonet analyses with the polyploid specified]
├── 5_HomeoSorter_analyses [# HomeoSorter analyses based on the G727 dataset]
│ ├── bootstrapping [# Results of 50 bootstrapping replicates]
│ │ ├── majority concensus tree.nexus [# Majority concensus tree based on the results of 50 bootstrapping replicates]
│ │ ├── r[xxxx] [# Results of individual bootstrap replicates]
│ ├── genetrees.txt [# A list of gene trees for HomeoSorter analyses]
│ ├── samplelist.txt [# A list of polyploid samples to be investigated]
│ └── shuffling [# Results of 50 shuffling replicates]
│ ├── majority concensus tree.nexus [# Majority concensus tree based on the results of 50 shuffling replicates]
│ ├── r[xxxx] [# Results of individual shuffling replicates]
├── 6_Sorted_homeologs [# Phylogenetic analyses based on homeologs sorted by HomeoSorter, with and without further gene filtration]
│ ├── filtered [# Analyses based on sorted and filtered homeologs]
│ │ ├── 1_individual_genes_and_trees_50_replicates [# New gene alignments generated based on the allele assignments suggested by HomeoSorter with further gene filtration]
│ │ ├── 2_Astral_analysis [# ASTRAL analysis based on the sorted and filtered homeologs]
│ │ └── 3_supermatrix_analysis [# Supermatrix analysis based on the sorted and filtered homeologs]
│ └── unfiltered [# Analyses based on sorted and filtered homeologs]
│ ├── 1_individual_genes_and_trees_50_replicates [# New gene alignments generated based on the allele assignments suggested by HomeoSorter without further gene filtration]
│ ├── 2_Astral_analysis [# ASTRAL analysis based on the sorted and unfiltered homeologs]
│ │ ├── astral.tre [# ASTRAL result]
│ │ └── individual genes.tre [# Individual gene trees with nodes having support lower than 30% collapsed]
│ └── 3_supermatrix_analysis [# Supermatrix analysis based on the sorted and unfiltered homeologs]
│ ├── r[xxxx] [# Different replicates]
│ │ ├── RAxML_bipartitions.r[xxxx].unfiltered.tre [# ML tree]
│ │ ├── r[xxxx].unfiltered.fasta [# Concatenated alignment ]
│ │ └── r[xxxx].unfiltered.partition.raxml [# Partition file]
│
├── Sample codes.xlsx [# A list of samples and corresponding codes]
Supplementary files in Zenodo
Supplementary_Table_S1.xls lists the information about samples, chromosome numbers, and assembly statistics for target nuclear genes and off-target chloroplast genes. Please also see the file of Supplementary_Table_S1_notes_and_references.docx for the notes and references.
Supplementary_Figures.docx contains all the supplementary figures supporting the paper.