Data from: Discordance Down Under: Combining phylogenomics & fungal symbioses to detangle difficult nodes in a diverse tribe of Australian terrestrial orchids
Data files
Dec 08, 2024 version files 114.48 MB
-
odonnell2024_sysbiol_supps.zip
114.48 MB
-
README.md
7.18 KB
Abstract
Orchid mycorrhizal fungi (OMF) associations in the Orchidaceae are thought to have been a major driver of diversification in the family. In the terrestrial orchid tribe Diurideae, it has long been hypothesised that OMF symbiont associations may reflect evolutionary relationships among orchid hosts. Given that recent phylogenomic efforts have been unable to fully resolve relationships among subtribes in the Diurideae, we sought to ascertain whether orchid OMF preferences may lend support to certain phylogenetic hypotheses. First, we used phylogenomic methods and Bayesian divergence time estimation to produce a genus-level tree for the Diurideae. Next, we synthesised decades of published fungal sequences and morphological/germination data to identify dominant fungal partners at the genus scale and perform ancestral state reconstruction to estimate the evolutionary trajectory of fungal symbiont shifts. Across the tribe, we found phylogenomic discordance stemming from incomplete lineage sorting. However, our results also revealed unprecedented phylogenetic niche conservatism of fungal symbionts within the tribe: entire genera, subtribes, and even groups of related subtribes associate with only a single fungal family, suggesting that fungal symbiont preferences in the Diurideae do indeed reflect phylogenetic relationships among orchid hosts. Moreover, we show that these relationships have evolved directionally from generalist associations with multiple fungal families towards more specific partnerships with only one fungal family. Orchid symbiont preferences here provide new insights into the placement of several groups with longstanding phylogenetic uncertainty. In spite of complex evolutionary histories, host-symbiont relationships can be used to help detangle alternative phylogenetic hypotheses.
README: Discordance Down Under: Combining phylogenomics & fungal symbioses to detangle difficult nodes in a diverse tribe of Australian terrestrial orchids
https://doi.org/10.5061/dryad.b2rbnzsqp
Supporting Material
Supporting material for this study has been uploaded as a zip file containing several folders and files matching the relevant supporting data captions below.
Data S1 Custom RefSeq Fungal ITS BLAST Database with additional ex-type sequences of Ceratobasidiaceae, Tulasnellaceae, Serendipitaceae, and Sebacinaceae. This folder contains the original raw RefSeq database (fungi.ITS.raw.fna), our modified RefSeq fasta set containing additional ex-type sequences (fungi.ITS.mod.fna), and all database files produced by blastn using the fungi.ITS.mod.fna file as the query set.
Data S2 Raw, unpruned cpDNA tree with reference plastome tips displayed and associated input alignments and *IQ-TREE *output. This folder contains two subfolders:
- alignments: Alignments for all chloroplast coding regions retrieved. Regions are named as per their respective annotated chloroplast region.
- iqtree: Output files produced by IQ-TREE2 using the alignments folder as the input alignment directory. The final raw unpruned cpDNA tree is cp_HP_reannotated.treefile.
Data S3 Raw timetree with mean node ages and 95% HPD intervals and associated MCMCtree input and output files. This folder contains several subfolders and files:
- data: Input data used for MCMCtree including the .phy alignment file containing three partitions, and the guide topology with branch lengths removed.
- gH: MCMCtree control file (mcmctree-outBV.ctl) used to calculate the gradient and Hessian along with subsequent gradient/Hessian output files
- mcmc: MCMCtree run 1, along with a script to rename the tips on the tree
- mcmc2: MCMCtree run 2
- final_timetree.tre: Final timetree output by MCMCtree from the folder mcmc
- tip_order.csv: A csv with tips listed in the order they should be displayed which is used by R scripts in subsequent methods folders.
Data S4 Unfiltered (blast_results_unfiltered.csv) and filtered (blast_host_joined.csv) BLAST results of fungal sequences from orchid hosts and fungal family classifications.
Figure S1 Convergence plot comparing mean estimates between MCMCtree divergence time estimation runs.
Note: For all methods folders with corresponding R/Rmd scripts, please ensure that where input files are specified in the relevant scripts that their location is changed to where you have downloaded and stored the files locally.
Methods S1 R script and associated gCF/sCF files to perform alternate topology frequency statistical tests. This folder contains two subfolders (esDNA and tsDNA), corresponding to both the extended set analysis and the transcriptome set analysis. Within each of these subfolders will be a subfolder and two files:
- *_gcf_stats_binom.html/.Rmd : R markdown script and knitted HTML version used to analyse the gcf data
- *_stat: contains IQ-TREE2 output data from the gCF/sCFl calculation steps. *.stat files contain the statistics themselves, while the *.branch files list the corresponding branch numbers corresponding with these statistics
Users may work with the .Rmd files which contain annotated code which will read in all files and produce output accordingly.
Methods S2 R script and associated ASTRAL output files to perform alternate topology frequency statistical tests. This folder contains three files:
- ES_astral_t32.csv: ASTRAL output statistics using the -t 32 argument from analysis of the extended set (esDNA)
- TS_astral_t32.csv: ASTRAL output statistics using the -t 32 argument from analysis of the transcriptome set (tsDNA)
- methods_S2.R: R script used to run statistical tests on the ASTRAL output statistics files in the same folder.
Users may work with the .R file which contains annotated code which will read in all files and produce output accordingly.
Methods S3 Input alignment (alignment.fasta), partition file (partitions.txt), gene trees (ES_r1_sub50pc_sub500bp_loci_sub0.tre), ASTRAL *species tree with branch lengths removed (astraltree_rooted_noBL_noSup.tre), genesortR analysis script (genesortR.R) and subsequent output produced by *genesortR (all other files).
Users may work with the .R file which contains annotated code which will read in all files and produce output accordingly.
Methods S4 R script (methods_S4.R) and associated files used to perform model fitting and subsequent ancestral state reconstruction along with subsequent output files. In addition to the R script file, this folder contains three subfolders and five files:
- final_models: Model scores output for each tested topology by methods_S4.R in csv and rds formats
- PP: Node character posterior probability scores for each tested topology output by methods_S4.R in csv format along with guide topologies with numbered nodes.
- transition_rate_figs: PDF figures of estimated transition rates for each model for each tested topology output by methods_S4.R
- dom_fungi_df: Percentage of fungal sequences identified for each orchid genus included in this study genus, along with final dominant fungal partners coded for subsequent ancestral state reconstruction analysis used as input for methods_S4.R
- fungi_counts.csv: csv containing the aggregated number of sequences for each fungal genus used as input for methods_S4.R
- tip_order.csv: csv containing tip branches in the order they should be plotted which is used as input by methods_S4.R to rearrange branch tips for topologies 1 and 2 for visualisation purposes
- tip_order_topo3.csv: csv containing tip branches in the order they should be plotted which is used as input by methods_S4.R to rearrange branch tips for topology 3 for visualisation purposes
- mcmctree_scaled.tre: Timetree output by MCMCtree with branches renamed and node ages scaled by 100 used as input for methods_S4.R
Users may work with the .R file which contains annotated code which will read in all files and produce output accordingly. To use the .rds objects instead of running the models yourself, uncomment the necessary lines which load in the rds files and run them instead.
Table S1 Table of taxa and associated accessions used in this study.
Table S2 Fossil calibrations used for Bayesian divergence time estimation.
Table S3 Percentage of fungal sequences identified for each orchid genus included in this study genus, along with final dominant fungal partners coded for subsequent ancestral state reconstruction analysis.
Table S4 Summary tables comparing the performance of unordered and ordered character evolution models for each tested topology.
Table S5 Summary table of all genera included for fungal character state analyses outlining the number of species with fungal data available, the type of data available, and a sampling fraction for each genus (i.e. the total number of species per genus with fungal data divided by the number of accepted species per genus). Fungal data literature sources are provided as additional tabs.