Comprehensive phylogenomic time tree of bryophytes reveals deep relationships and uncovers gene incongruences in the last 500 million years of diversification
Data files
Nov 06, 2023 version files 383.38 MB
Abstract
Premise
Bryophytes, land plants defined by a free-living gametophyte and an unbranched sporophyte, form a major component of terrestrial plant biomass, structuring ecological communities in all biomes. Our understanding of the evolutionary history of hornworts, liverworts and mosses has been significantly reshaped by inferences from molecular data, highlighting extensive homoplasy in various traits and repeated bursts of diversification. However, the timing of key events in the phylogeny, and the degree to which the observed homoplasy represents error or biological processes, remain poorly resolved.
Methods
Using the GoFlag probe set, we sampled 405 exons representing 228 nuclear genes for 531 species from 51 of the 53 orders of bryophytes. We inferred the species phylogeny from gene tree analyses using concatenated and coalescence approaches, assessed gene conflict, and estimated the timing of divergences based on 29 fossil calibrations.
Results
The phylogeny resolves many relationships across the bryophytes, enabling us to resurrect five liverwort orders and recognize three more, and propose ten new orders of mosses. Most orders originated in the Jurassic or earlier and diversified in the Cretaceous or later. The phylogenomic data also highlight topological conflict in parts of the tree, suggesting complex processes of diversification that cannot be adequately captured in a single gene tree topology.
Conclusions
We sampled hundreds of homologous loci across a broad phylogenetic spectrum spanning at least 450 Ma of evolution, and these data resolved many of the critical nodes of the diversification of bryophytes. The data also highlight the need to explore the mechanisms underlying the phylogenetic ambiguity at specific nodes. The phylogenomic data provide an expandable framework toward reconstructing a comprehensive phylogeny of bryophytes and for investigating the transformations of traits in this important group of plants.
=====================Files correspond to
Comprehensive phylogenomic time tree of bryophytes reveals deep relationships and uncovers gene incongruences in the last 500 million years of diversification
Julia Bechteler*, Gabriel Peñaloza-Bojacá*, David Bell, Gordon Burleigh*, Stuart McDaniel, Christine Davis, Emily Sessa, Alexander Bippus, D.Christine Cargill, Sahut Chantanoarrapint, Isabel Draper, Lorena Endara, Laura L. Forrest, Ricardo Garilleti,Sean W. Graham, Sanna Huttunen, Javier Jauregui Lazo, Francisco Lara, Juan Larraín, Lily Lewis, David Long, Dietmar Quandt, Karen Renzaglia, Alfons Schäfer-Verwimp, Adriel Sierra Pinilla, Matt von Konrat, Charles Zartman, Marta Regina Pereira, Bernard Goffinet & Juan Carlos Villarreal A.
=====================Files in this archive
1. Scripts used in this study are listed in 1_Scripts_GoFlag_BryophytePhylogenomics.txt
2. Nucleotide (NT) dataset and corresponding analyses
2-1) Nucleotide-Alignments are stored in the folder "alignments_NT".
Within this folder is a folder "loci" for the single loci files and the concatenated loci alignment and a folder "genes" for the single gene/exon files (see explanation from the methods part of the ms below).
Explanation on the term "gene" cited from the methods part of our ms: "All loci in the GoFlag probe sets correspond to nuclear exons, some of which are part of the same gene (Breinholt et al., 2021). After our screening, we were left with alignments from 405 of the 408 loci (i.e., exons) covered by the GoFlag flagellate land plant probe set. We concatenated the loci found in the same gene in 228 alignments, each containing between one and nine exons."
Within the folder "loci" are three different and unique files and one folder:
- "Bryophytes.TargetOnly.Genus.nodups.28Jul2021_L.phy" is the concatenated alignment file based on single loci
- "Bryophytes.TargetOnly.Genus.nodups.28Jul2021.GeneBoundaries_L_RAxMLpart.txt" is the partition file used for RAxML analysis with the corresponding "loci" alignment file (see above)
- "Bryophytes.TargetOnly.Genus.nodups.28Jul2021.GeneBoundaries.txt" describes the boundaries of the single loci in the corresponding alignment file (see above) and was used to set the partitioning scheme (see above)
- Folder "single_locus_alignments" contains phylip files of each single loci that was used in the corresponding "loci" alignment (see above)
Within the folder "genes" are three different and unique files and one folder:
- "Bryophytes.TargetOnly.Genus.nodups.28Jul2021_G.phy" is the concatenated alignment file based on concatenated loci corresponding to the same exons (see explanation on the term "gene" above)
- "Bryophytes.TargetOnly.Genus.nodups.28JulGeneBoundaries_G_RAxMLpart.txt" is the partition file used for RAxML analysis with the corresponding "genes" alignment file (see above)
- "concatenate_G.pl" is the script used to concatenate the single "genes" alignments (listed in folder "genesNT_alignments") leading to the "Bryophytes.TargetOnly.Genus.nodups.28Jul2021_G.phy" alignment
- Folder "genesNT_alignments" contains phylip files of each single "gene" that was used in the corresponding "gene" alignment (see above "Bryophytes.TargetOnly.Genus.nodups.28Jul2021_G.phy")
2-2) Data related to ASTRAL analyses of the nucleotide data.
Within this folder is a folder including the nucleotide gene trees ("ML") obtained through RAxML including scripts used. This data was used as input for the final nt Astral analyses: "NoDups_G_GeneTrees_combined_Jul2021.phy". Files for quality scoring (-q) are listed as well.
Within the folder "ML" there are scripts used to obtain gene trees with RAxML that were used as input for ASTRAL
- "raxml_wrapper_bryo_G_ML.pl" a script to get the gene trees
- "config.raxml_G_ML" is the corresponding configuration file
- "raxml_gene_trees_G_ML.sh" was used to execute the "raxml_wrapper_bryo_G_ML.pl" and "config.raxml_G_ML" scripts
- Folder "RAxML_gene-trees" contains the output (gene trees) obtained via the scripts above
Within the folder "Astral" there are input and output files of the Astral analyses
- "NoDups_G_GeneTrees_combined_Jul2021.phy" contains the input file for the Astral analysis, i.e. all gene trees (see Folder "ML"/"RAxML_gene-trees") summarized in one file
- "Astral_qs_XX_Bryo_G.txt" are the scripts used to start the Astral analyses whereby "XX" in the name is a placeholder for t1, t2, t8, t16, t32 corresponding to the Astral parameters
- "Astral_NoDups_G_Jul2021_XX_log.txt" and "Astral_NoDups_G_Jul2021_XX.tre" are the Astral output files whereby "XX" in the name is a placeholder for qs_t1, t2, t8, t16, t32 corresponding to the Astral parameters used. "log.txt" are the log-output files and ".tre" are the tree files
- "freqQuad.csv" and "freqQuadVisualization.R" are further Astral output files used for visualization in R
2-3) Data related to the concordance factor analyses using the nuclear data and conducted in IQTree.
Within this folder are two folders (see below) and the script used in concordance factor analysis conducted in IQTree: "Scrip_Concordance_factors_Brio_NT.txt"
- Folder "ConcorfactNT_IQTree" contains 27 files: We inferred a species tree in IQTree, with the alignment “2-1_alignments_NT/genes”, containing concatenated nucleotide gene alignments, on which the concordance factors got annotated. The standard IQTree output file were: Analysis results written to IQ-TREE report “Concat.iqtree”; Maximum-likelihood tree “Concat.treefile”; Likelihood distances “Concat.mldist”; Ultrafast bootstrap approximation results written to Split support values “Concat.splits.nex”; Consensus tree “Concat.contree”; Partition information was printed to “Concat.best_model.nex”; Checkpoint file indicates that a previous run successfully finished “Concat.ckp.gz“; file of the entire run “Concat.log”.
We constructed a set of gene trees with the nucleotide gene sequences that can be found in folder “2-1_alignments_NT/genes”. The output files of the gene tree analysis in IQTree were: File with best partitioning scheme “FullGenes.best_scheme” and “FullGenes.best_scheme.nex”; summary partition information “FullGenes.best_model.nex”; analysis results written to IQ-TREE report “FullGenes.iqtree”; maximum-likelihood tree “FullGenes.treefile”; Checkpoint file indicates that a previous run successfully finished “FullGenes.ckp.gz“; model checkpoint file “FullGenes.model.gz”; file of the entire run “FullGenes.log”.
Results of the concordance factor analysis were written in a tree file including concordance factors: “ConcordFullGen.cf.tree”; annotated tree (best viewed in FigTree) written to “ConcordFullGen.cf.tree.nex”; tree with branch IDs written to “ConcordFullGen.cf.branch”; concordance factors per branch printed to “ConcordFullGen.cf.stat”; site concordance factors for quartets printed to “ConcordFullGen.cf.quartet”; concordance factors per branch and tree printed to “ConcordFullGen.cf.stat_tree”; concordance factors per branch and locus printed to “ConcordFullGen.cf.stat_loci” and file of the entire run “ConcordFullGen.log”. Result to test the assumptions of an incomplete lineage sorting model “Fulldata_A_ILS.csv” (see http://www.robertlanfear.com/blog/files/concordance_factors.html).
- Folder "ConcorfactNT_Astral" contains 11 files “Astral_NoDups_G_Jul2021.tre” is the species tree output of the Astral analysis. “Astral_NoDups_G_Jul2021_rooted.tre” is the Astral species tree rooted, and input tree topology used in the concordance factor (CF) analyses. File with rooted gene trees “NoDups_G_Jul2021_GeneTrees_combined_rooted.tre”. After running the CF analysis in IQtree, the output files were generated. Tree with concordance factors written to “ConFac_Astral_Rooted.cf.tree”; Annotated tree written to “ConFac_Astral_Rooted.cf.tree.nex”; Tree with branch IDs written to “ConFac_Astral_Rooted.cf.branch”; Concordance factors per branch printed to “ConFac_Astral_Rooted.cf.stat”; Site concordance factors for quartets printed to “ConFac_Astral_Rooted.cf.quartet”; Concordance factors per branch and tree printed to “ConFac_Astral_Rooted.cf.stat_tree”; Concordance factors per branch and locus printed to “ConFac_Astral_Rooted.cf.stat_loci”; and “ConFac_Astral_Rooted.log” file of the entire run.
2-4) Data related to the treePL analyses.
Within this folder are two files and three folders:
- File "RAxML_bestTree.Bryo_Genus_nodups_28Jul2021_part_G_N100_rooted_v2.tre": is the input file for treePL and results from the file "RAxML_bestTree.Bryo_Genus_nodups_28Jul2021_part_G_N100.result" which was re-rooted it in phyx (https://github.com/FePhyFoFum/phyx)
- File "RAxML_bestTree.Bryo_Genus_nodups_28Jul2021_part_G_N100.result" which is the unrooted version of "RAxML_bestTree.Bryo_Genus_nodups_28Jul2021_part_G_N100_rooted_v2.tre"
- Folder "BS_trees_for_confidenceIntervals" contains four input/output files of the RAxML bootstrap (BS) replicate analysis:
- "Bryophytes.TargetOnly.Genus.nodups.28Jul2021_G.phy" was used as input alignment for RAxML
- "Bryophytes_G_RAxMLpart.txt" was used to partition the above mentioned alignment file
- "RAxML_bestTree.Bryo_Genus_nodups_28Jul2021_part_G_N100.result" is the RAxML output file
- "RAxML_bootstrap.boot.tre" contains the BS replicate trees used in the following treePL analysis including confidence intervals
- Folder "treePL_v8_no-nodebars" contains four files and four folders:
- "GoFlag_Bryo_July2021_treePL_config-v8.tre" is the input tree topology used in the treePL analyses
- "treePL_config_Goflag_July21_v8-X.txt" are the three scripts used in treePL analysis whereby "X" is a placeholder for 1,2,3 corresponding to script 1, script 2, and script3 of the treePL analysis
- Folder "treePL_cross-validation_config-v8_results_X" contains each three files that are the output files of treePL, whereby "X" corresponds to four independent treePL runs with the same input files and scripts. "GoFlag_Bryo_July2021_treePL.tre" and "GoFlag_Bryo_July2021_treePL.tre.r8s" are the tree files and "randomcv_PEN.txt" contains the cross-validation statistics for each of the four independent runs.
- Folder "treePL_v8_1000BS_nodebars" contains four files corresponding to the treePL analysis with nodebars
- "reePL_config_Goflag_July21_nodebars_v8-3.txt" contains the script used in treePL
- "GoFlag_Bryo_July2021_treePL_config-v8_nodebars.tre" is the output of treePL and the script above
- "RAxML_bootstrap.boot_rooted.tre" is the input of the treePL analysis
- "RAxML_bootstrap.boot.tre" is the unrooted version of "RAxML_bootstrap.boot_rooted.tre"
2-5) Data related to the r8s analysis
The IQTree results of the nuclear dataset were used to calculate absolute substitution rates in r8s. Input and output files are located in this folder.
File input with tree topology used in the r8s analyses “Iqtree_Julia_brio_RAxML_bestTree_rooted.tre”, and nexus format with fossil calibration commands “Iqtree_Julia_brio_RAxML_bestTree_rooted.nex”.
The "r8s_results" folder contains 6 output files with the results of the analysis. The branch lengths are the standard estimated numbers of substitutions (with units the same as the input tree) “Iqtree_Julia_brio_RAxML_bestTree_rooted_phylo_description_testV8-collapse_R8s.tree”; they are scaled such that a branch length corresponds to a temporal duration “Iqtree_Julia_brio_RAxML_bestTree_rooted_chrono_description_testV8-collapse_R8s.tree”; the branch lengths printed correspond to estimated absolute rates in substitutions per site per unit time “Iqtree_Julia_brio_RAxML_bestTree_rooted_rato_description_testV8-collapse_R8s.tree”; table of information about each node in the tree “Iqtree_Julia_brio_RAxML_bestTree_rooted_node_info_testV8-collapse_R8s”; file with Estimated ages and substitution rates for tree tree_1 “Iqtree_Julia_brio_RAxML_bestTree_rooted_rates_testV8-collapse_R8s”; file of the entire run “Brio_Iqtree_julia_test_v8”
3. Aminoacid (AA) dataset and corresponding analyses
3-1) Amino-acid alignments of the translated nt "gene" alignments are stored in the folder "3-1genesAA_alignments". Within this folder a concatenated amino-acid alignment file is listed as well: "Bryophyte.AA.09Aug2022.phy".
3-2) Data related to ASTRAL analyses of the amino-acid dataset.
Within this folder are the amino-acid gene trees ("GenesAA.treefile"). The folder contains all the output files related to the species tree analysis in Astral using the AA dataset.
Within the folder "3-2_Astral_AA" there are 17 input and output files of the Astral analyses. “GenesAA.treefile” contains all gene trees input for the Astral analysis summarized in one file. File “GenesAA_root.treefile” same as the previous one, but with the rooted trees. Output file with species tree stimmed “SpeciesTree_AA.tre” and rooted “SpeciesTree_AA_root.tre”. Alternative quartet topologies: Outputs q1, q2, q3; these three values show quartet support for the main topology “GenesAA_AlterQuar.log”, “GenesAA_AlterQuar.tre” and rooted file “GenesAAroot_AlterQuar.tre”. Alternative posteriors: output includes three local posterior probabilities (pp1, pp2, pp3) for the main topology “GenesAA_AlterLpp.log”, “GenesAA_AlterLpp.tre” and rooted file “GenesAAroot_AlterLpp.tre”. File with score an existing species tree to quartet score, branch lengths, and branch support values “GenesAA_Scores.log”, “GenesAA_Scores.tre”, and rooted file “GenesAAroot_Scores.tre”. Files with full annotations, each branch get a lot of different measurement (see https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md) “GenesAA_fullanot.log” and “GenesAA_fullanot.tre”. the files " t16anot_freqQuad.csv" and " t16anot_freqQuadVisualization.R" are further Astral output files used for visualization in R.
3-3) Data related to concordance factor analyses of the amino-acid dataset conducted in IQTree.
Folder " 3-3_Concor_fact_AA" contains 33 files:
We inferred a species tree, with all an alignment file containing concatenated gene/exon sequences “3-1_genesAA_alignments”, on which the concordance factors were annotated. The output file were: Analysis results written to IQ-TREE report “SptreeAA.iqtree”; Maximum-likelihood tree “SptreeAA.treefile”; Likelihood distances “SptreeAA.mldist”; Ultrafast bootstrap approximation results written to Split support values “SptreeAA.splits.nex”; Consensus tree “SptreeAA.contree”; Partition information was printed to “SptreeAA.best_model.nex”; Checkpoint file indicates that a previous run successfully finished “SptreeAA.ckp.gz“; model checkpoint files “SptreeAA.best_scheme”, “SptreeAA.best_scheme.nex” and “SptreeAA.model.gz”; file of the entire run “SptreeAA.log”.
We constructed a set of gene trees with alignments corresponding to each gene/exon: “3-1_genesAA_alignments”. The output files of the best partitioning scheme “GenesAA.best_scheme” and “GenesAA.best_scheme.nex”; summary partition information “GenesAA.best_model.nex”; analysis results written to IQ-TREE report “GenesAA.iqtree”; maximum-likelihood tree “GenesAA.treefile”; Checkpoint file indicates that a previous run successfully finished “GenesAA.ckp.gz”; model checkpoint file “GenesAA.model.gz”; file of the entire run “GenesAA.log”.
Results of the concordance factor analysis were written in a tree file including concordance factors: “ConfacAA.cf.tree”; annotated tree (best viewed in FigTree) written to “ConfacAA.cf.tree.nex”; tree with branch IDs written to “ConfacAA.cf.branch”; concordance factors per branch printed to “ConfacAA.cf.stat”; site concordance factors for quartets printed to “ConfacAA.cf.quartet”; concordance factors per branch and tree printed to “ConfacAA.cf.stat_tree”; concordance factors per branch and locus printed to “ConfacAA.cf.stat_loci” and file of the entire run “ConfacAA.log”. Result to test the assumptions of an incomplete lineage sorting model “BrioJul_AA_Iqtree_A_ILS.csv” and “BrioJul_AA_Iqtree_B_ILS.csv” (see http://www.robertlanfear.com/blog/files/concordance_factors.html). PDF files with number of nodes in the main topology “tree_BrioJul_AA_Iqtree.pdf” and tree rooted “tree_BrioJul_AA_Iqtree_root.pdf”.
4. Supplemental figures S1-S9 including legends to the figure
In this folder are 9 figures in PDF format and a docx file listing the corresponding figure legends and descriptions.