Data from: Exploring the possible role of hybridization in the evolution of photosynthetic pathways in Flaveria (Asteraceae), the prime model of C4 photosynthesis evolution
Data files
Aug 01, 2023 version files 1.08 GB
-
data.tar.gz
-
README.txt
-
supplemental_material.gz.tar
Aug 01, 2023 version files 1.08 GB
-
data.tar.gz
-
README.md
-
supplemental_material.gz.tar
Abstract
Flaveria (Asteraceae) is the prime model for the study of C4 photosynthesis evolution and seems to support a stepwise acquisition of the pathway through C3-C4 intermediate phenotypes, still existing in Flaveria today. Molecular phylogenies of Flaveria based on concatenated data matrices are currently used to reconstruct the complex sequence of trait shifts during C4 evolution. To assess the possible role of hybridization in C4 evolution in Flaveria, we re-analyzed transcriptome data of 17 Flaveria species to infer the extent of gene tree discordance and possible reticulation events. We found massive gene tree discordance as well as reticulation along the backbone and within clades containing C3-C4 intermediate and C4-like species. An early hybridization event between two C3 species might have triggered C4 evolution in the genus. The clade containing all C4 species plus the C4-like species F. vaginata and F. palmeri is highly supported in our phylogenetic analyses, but it might be of hybrid origin involving F. angustifolia and F. sonorensis (both C3-C4 intermediate) as parental lineages. Hybridization seems to be a driver of C4 evolution in Flaveria and likely promoted the fast acquisition of C4 traits. This new insight can be used in further exploring C4 evolution and can inform C4 bioengineering efforts.
Usage notes
Data package from Morales-Briones and Kadereit
Exploring the possible role of hybridization in the evolution of photosynthetic pathways in Flaveria (Asteraceae), the prime model of C4 photosynthesis evolution - https://doi.org/10.18061/bssb.v2i3.8992
This package contains the supplemental material and the data and software outputs (i.e. fasta files, alignments, trees, etc).
- Supplemental material
Table S1. Taxon sampling and source of data
Table S2. HyDe significant hybridization tests for all species of Flaveria.
Table S3. HyDe significant hybridization tests for all species of Flaveria excluding F. pringlei (C3).
Fig. S1. a) Maximum likelihood phylogeny of Flaveria inferred with IQ-TREE from the concatenated 2124-nuclear gene supermatrix. Numbers above branches represent bootstrap support (BS). Branch lengths as substitutions per site (scale bar on the bottom). b) ASTRAL tree of Flaveria inferred from the 2,127 nuclear gene trees. Local posterior probabilities (LPP) are shown next to nodes. Internal branch lengths are in coalescent units (scale bar on the bottom).
Fig. S2. a) Maximum likelihood cladogram of Flaveria inferred with IQ-TREE from the concatenated 2124-nuclear gene supermatrix. b) ASTRAL cladogram of Flaveria inferred from the 2,127 nuclear gene trees. Pie charts represent the proportion of gene trees that support that clade (blue), the main alternative bifurcation (green), the remaining alternatives (red), and conflict or support that have <50% bootstrap support (gray). Number above and below branches represent the number of concordant and discordant informative gene trees, respectively.
Fig. S3. A. Maximum likelihood cladogram of Flaveria inferred with IQ-TREE from the concatenated 2124-nuclear gene supermatrix. B. ASTRAL cladogram of Flaveria inferred from the 2,127 nuclear gene trees. Quartet Sampling (QS) scores are shown above branches. QS scores: Quartet concordance/Quartet differential/Quartet informativeness. Circles at nodes are colored by quartet concordance support.
Fig. S4. Distribution of tree-to-tree distances between empirical gene trees and the ASTRAL tree, compared to the distribution of tree-to-tree distances between simulated trees and the ASTRAL tree.
Fig. S5. Maximum pseudo-likelihood scores for species networks inferred with PhyloNet using the a) 18-taxon, b) and 16-taxon data sets. The x-axis notes the maximum number of reticulations for each of the network searches allowing up to ten reticulation events.
Fig. S6. Maximum pseudo-likelihood species networks inferred with PhyloNet using the a) 18-taxon, b) and 17-taxon data sets and allowing up to 12 reticulation events. Red and blue curved branches indicate the minor and major edges, respectively of hybrid nodes. Numbers next to curved branches indicate inheritance probabilities for each hybrid node.
Fig. S7. Maximum likelihood cladogram of Flaveria inferred with IQ-TREE from the concatenated 2124-nuclear gene supermatrix. Numbers above branches are gene duplication counts and numbers below branches are gene duplication percentages.
Note: Supplemental files are compatible with the CC0 license are not included in the main text of the article.
- Data
1_transcriptomes_and_genomes
final_filtered_transcriptomes:
- Coding sequence (CDS) fasta files (*.cds.fa)
- Protein translated (PEP) fasta files (*.pep.fa)
genomes:
- CDS and PEP fasta files (*.fasta)
original_transcriptome_assemblies:
- Fasta files (*.fasta) of original Trinity transcriptome assemblies
2_final_homologs:
- Tree files (*.subtre) of homologs in newick format inferred with RAxML after monophyletic clades and paraphyletic grades of same species were masked, and spurious tips removed with TreeShrink
3_MO_orthologs_min_10_taxa:
- Tree files (*.tre) of final MO orthologs in newick format. Orthologs were obtained from final homologs (2_final_homologs)
4_MO_min_21_taxa_aln_fasta_files:
- Ortholog DNA alignments (*.aln) obtained with MAFFT in fasta format
- Clean ortholog DNA alignmentsalignments (*aln-cln) from pxclsq in fasta format.
5_concatenated_matrices
- flaveria_21tx_500bp_concat.fa - alignment in nexus format
- flaveria_21tx_500bp_concat.model - partition file in RAxML format
- flaveria_21tx_500bp_concat.nex - alignment in nexus format
- flaveria_21tx_500bp_concat.phy - alignment in phylip format
- flaveria_21tx_500bp_concat_taxon_occupancy_stats - taxon occupancy stats
6_phylogenetic_analyses
ASTRAL:
- ASTRAL_flaveria_2124_gene_trees_concat.tre - Species tree output from ASTRAL
- flaveria_2124_gene_trees_concat.tre - Input file containing all ortholog trees
IQtree
IQtree_concatenated - Input and output files from IQ-Tree for the concatenated alignment.
IQtree_invidivual_gene_trees - Input and output files from IQ-Tree for the individual MO ortholog alignments.
IQtree_invidivual_gene_trees_rooted - Rooted trees in newick format from IQtree_invidivual_gene_trees
QS
QS_ASTRAL_flaveria_2115_gene_trees_genetree_part - QuartetSampling output files from the analysis using the ASTRAL tree
QS_IQtree2_flaveria_21taxa_500bp_concat_genetree_part - QuartetSampling output files from the analysis using the IQtree tree
Note: "NA" values in Quartet Sampling scores means that a node has full suport (e.g., 1/NA/1) and there is no skew for an alternative topology.
coalescent_simulations
paup - input and out files from PAUP analyses enforcing a strict molecular clock
input_output_code - input and output trees from coalescent simulations, also includes R code to run simulations
hyde
- Input and output files from HyDe analyses using the 18-taxa dataset (all species) and the 17-taxa dataset (F. pringlei removed).
map_wgd
orthogroups_min_15_taxa - orthogroup tree files (*.ortho) prune from 2_final_homologs
mapping - output files from othrogroup mapping
phylonet
reduced_trees - unrooted ortholog trees (*red) in newick format after pruning all outgroup but Helianthus
runs - Phylonet input (*.nex) and output (*.txt) files
phyparts
ASTRAL - output files from Phyparts from the analysis using the ASTRAL tree
IQtree - output files from Phyparts from the analysis using the IQtree tree
treePL
dated_tees - tree files (*.tre) in newick format of dated trees with treePL
fasta_files - fasta files (*.red) from reduced_trees
reduced_trees - unrooted ortholog trees (*red) in newick format after pruning all outgroup but Helianthus
reduced_trees_rr - rooted ortholog trees (*red) in newick format after pruning all outgroup but Helianthus
If you have any questions about the data, please do not hesitate to contact Diego F. Morales-Briones at dfmoralesb@gmail.com