Data from: Phylogenomic analysis of target enrichment and transcriptome data uncovers rapid radiation and extensive hybridization in slipper orchid genus Cypripedium L.
Data files
Sep 30, 2024 version files 763.73 MB
-
1_nuclear_analyses.tar.gz
509.22 MB
-
2_chloroplast_analyses.tar.gz
11.81 MB
-
3_1_Ks_plots.tar.gz
238.26 MB
-
4_references.tar.gz
4.43 MB
-
Dryad_folder_description_final_1.txt
10.54 KB
-
README.md
11.02 KB
Apr 14, 2025 version files 1.21 GB
-
1_nuclear_analyses.tar.gz
509.22 MB
-
2_chloroplast_analyses.tar.gz
11.81 MB
-
3_1_Ks_plots.tar.gz
238.26 MB
-
4_references.tar.gz
4.43 MB
-
5_SVG_Figures.tar.gz
447.61 MB
-
Dryad_folder_description_final_1.txt
10.54 KB
-
README.md
11.25 KB
Abstract
Background and Aims: Cypripedium is the most widespread and morphologically diverse genus of slipper orchids. Despite several published phylogenies, the topology and monophyly of its infrageneric taxa remained uncertain. Here, we aimed to reconstruct a robust section-level phylogeny of Cypripedium and explore its evolutionary history using target capture data for the first time.
Methods: We used the orchid-specific bait set Orchidaceae963 in combination with transcriptomic data to reconstruct the phylogeny of Cypripedium based on 913 nuclear loci, covering all 13 sections. Subsequently, we investigated discordance among nuclear and chloroplast trees, estimated divergence times and ancestral ranges, searched for anomaly zones, polytomies, and diversification rate shifts, and identified potential gene (genome) duplication and hybridization events.
Key Results: All sections were recovered as monophyletic, contrary to the two subsections within sect. Cypripedium. The two subclades within this section did not correspond to its subsections but matched the geographic distribution of their species. Additionally, we discovered high levels of discordance in the short backbone branches of the genus and within sect. Cypripedium, which can be attributed to hybridization events detected based on phylogenetic network analyses, and incomplete lineage sorting caused by rapid radiation. Our biogeographic analysis suggested a Neotropical origin of the genus during the Oligocene (~30 Ma), with a lineage of potentially hybrid origin spreading to the Old World in the Early Miocene (~22 Ma). The rapid radiation at the backbone likely occurred in Southeast Asia around the Middle Miocene Climatic Transition (~15–13 Ma), followed by several independent dispersals back to the New World. Moreover, the Pliocene-Quaternary glacial cycles may have contributed to further speciation and reticulate evolution within Cypripedium.
Conclusions: Our study provided novel insights into the evolutionary history of Cypripedium based on high-throughput molecular data, shedding light on the dynamics of its distribution and diversity patterns from its origin to the present.
https://doi.org/10.5061/dryad.tmpg4f57d
Description of the data and file structure
1_nuclear_analyses:
1_original_nuclear_fasta_files: Unaligned fasta files (*.fna) of nuclear loci
2_raw_nuclear_homologs: Raw nuclear homolog trees. Cleaned alignments with Phyx (*.aln-cln) and ML trees with IQ-TREE 2 without bootstrap support (*.treefile)
3_final_nuclear_homologs:
1_all_taxa: Filtered homolog trees containing all taxa in the original dataset (*.treefile; monophyletic clades and paraphyletic grades of same species masked, spurious tips removed with TreeShrink)
2_reduced_taxa: Filtered homolog trees (*.tm) with taxa C_calceolus_50, C_himalaicum_66, C_calceolus_var_par_54, C_guttatum_64, C_calceolus_51, C_tibeticum_72, C_cordigerum_60, C_tibeticum_73 removed due to low loci coverage and text file containing the taxon IDs of the removed taxa
4_nuclear_MO_orthologs:
1_fasta_to_trees:
1_fasta files_all_taxa: Ortholog fasta files containing all taxa in the original dataset (*.fa)
2_fasta_files_reduced: Ortholog fasta files (*.fa) with taxa C_calceolus_50, C_himalaicum_66, C_calceolus_var_par_54, C_guttatum_64, C_calceolus_51, C_tibeticum_72, C_cordigerum_60, C_tibeticum_73 removed due to low loci coverage and a text file containing the taxon IDs of the removed taxa
3_alignments: Cleaned alignments with Phyx (*.aln-cln) derived from aligning the reduced ortholog fasta files with OMM_MACSE
4_ortholog trees:
1_unrooted_gene_trees: Unrooted (*.treefile) ML ortholog gene trees inferred using IQ-TREE 2 with ultra-fast bootstrap support using the cleaned ortholog alignments
2_rooted_gene_trees: Rooted (with pxrr; *.rr) ML ortholog gene trees and a text file containing the taxon IDs of the outgroup taxa
5_concatenated_alns: Output files from the concatenation of the alignments of all 913 nuclear loci
2_analyses:
01_ASTRAL: Input file containing all 913 nuclear ortholog gene trees (gene_trees_1_1.tre) and output original and FigTree-rooted wASTRAL species trees (*.tre)
02_IQ-TREE: Input clean concatenated alignment (*.fa) and output files from the IQ-TREE phylogenetic reconstruction using the concatenated alignment
03_gene_duplication_mapping:
1_extracted_clades: Extracted clades and text file containing ingroup and outgroup information
2_bootstrap_filter: Output files from the subclade orthogroup tree topology method using the bootstrap filtering approach
3_topology_filter: Output files from the subclade orthogroup tree topology method using the topology filtering approach
04_Phyparts: Output files from the Phyparts piecharts analysis performed using the pxrr rooted ML ortholog gene trees (1_nuclear_analyses/4_nuclear_MO_orthologs/1_fasta_to_trees/4_ortholog trees/2_rooted_gene_trees) and the nuclear ASTRAL species tree (1_nuclear_analyses/4_nuclear_MO_orthologs/2_analyses/01_ASTRAL)
05_Quartet_Sampling: Output files from the Quartet Sampling analysis performed using the clean concatenated alignment (1_nuclear_analyses/4_nuclear_MO_orthologs/1_fasta_to_trees/5_concatenated_alns) and nuclear ASTRAL species tree (1_nuclear_analyses/4_nuclear_MO_orthologs/2_analyses/01_ASTRAL)
06_anomaly_zone: Python script and output text file from the anomaly zone test performed using the nuclear ASTRAL species tree (1_nuclear_analyses/4_nuclear_MO_orthologs/2_analyses/01_ASTRAL)
07_ASTRAL_polytomy_test: Output tree (*.tre) annotated with p-value labels resulting from the ASTRAL polytomy test performed using the the nuclear ASTRAL species tree and a file containing all 913 nuclear ortholog gene trees (1_nuclear_analyses/4_nuclear_MO_orthologs/2_analyses/01_ASTRAL)
08_PhyloNet:
1_backbone:
1_reduced_gene_trees: Gene trees (*.tm) containing only the taxa selected to investigate backbone hybridization events, and a file containing the taxon IDs of the removed taxa
2_input_files: A file (*.tre) containing all the reduced gene trees and a PhyloNet configuration file (*.nex) used to test for one to ten hybridization events at the backbone
3_output_files: All output files (*.txt) from testing for one to ten hybridization events at the backbone, containing the ten best networks each
2_C_x_alaskanum:
1_reduced_gene_trees: Gene trees (*.tm) containing only the taxa selected to investigate hybridization events within the subclade including Cypripedium x alaskanum, and a file containing the taxon IDs of the removed taxa
2_input_files: A file (*.tre) containing all the reduced gene trees and a PhyloNet configuration file (*.nex) used to test for one hybridization event within the subclade
3_output_files: The output file (*.txt) from testing for one hybridization event within the subclade including C. x alaskanum, containing the ten best networks
3_C_x_columbianum:
1_reduced_gene_trees: Gene trees (*.tm) containing only the taxa selected to investigate hybridization events within the subclade including Cypripedium x columbianum, and a file containing the taxon IDs of the removed taxa
2_input_files: A file (*.tre) containing all the reduced gene trees and a PhyloNet configuration file (*.nex) used to test for one to ten hybridization event within the subclade
3_output_files: All output files (*.txt) from testing for one to ten hybridization events within the subclade including C. x columbianum, containing the ten best networks each
4_C_x_ventricosum:
1_reduced_gene_trees: Gene trees (*.tm) containing only the taxa selected to investigate hybridization events within the subclade including Cypripedium x ventricosum, and a file containing the taxon IDs of the removed taxa
2_input_files: A file (*.tre) containing all the reduced gene trees and a PhyloNet configuration file (*.nex) used to test for one to ten hybridization event within the subclade
3_output_files: All output files (*.txt) from testing for one to ten hybridization events within the subclade including C. x ventricosum, containing the ten best networks each
09_BEAST2:
1_SortaDate:
1_SortaDate_analysis: Output files (*.txt) from the SortaDate analysis performed using the pxrr rooted ML ortholog gene trees (1_nuclear_analyses/4_nuclear_MO_orthologs/1_fasta_to_trees/4_ortholog trees/2_rooted_gene_trees) and the concatenation-based nuclear tree (1_nuclear_analyses/4_nuclear_MO_orthologs/2_analyses/02_IQ-TREE)
best_genes_aln_files: The clean alignment files (*.aln-cln) of the 20 best genes selected by the SortaDate analysis
2_concatenated_20_best_genes: Output files for the concatenation of the 20 best genes
2_runs:
0_starting_tree: Newick format of wASTRAL used as the starting tree (*.tre) without LPP support labels
1_run_A: Input (*.xml) and output files for the first BEAST2 run (-seed 123456789)
2_run_B: Input (*.xml) and output files for the second BEAST2 run (-seed 2162134795911422075)
3_run_C: Input (*.xml) and output files for the third BEAST2 run (-seed 8503081007213733102)
4_run_D: Input (*.xml) and output files for the fourth BEAST2 run (-seed 4566870218956303049)
5_sample_prior_run: Input (*.xml) and output files for the BEAST2 run sampling from the prior (-seed 281228161096942447)
3_LogCombiner: Output file from LogCombiner (*.trees)
4_TreeAnnotator: Output file from TreeAnnotator (*.nex)
10_BAMM:
1_input_files:
1_input_tree: Input tree (*.newick) created by removing certain taxa listed in an included text file (namely: non-Cypripedium taxa; Cypripedium taxa not accepted by Frosch and Cribb, 2012; taxon duplicates; hybrids; varieties) from the dated phylogeny created with BEAST2 (1_nuclear_analyses/4_nuclear_MO_orthologs/2_analyses/09_BEAST2/4_TreeAnnotator)
2_BAMM_files: A file (*.txt) including the estimated section-specific sampling fractions based on the classification by Frosch and Cribb (2012) and the control file used to perform the BAMM analysis (*.txt)
2_output_files: Output files (*.txt and *.pdf) from the BAMM analysis
11_BioGeoBEARS:
0_input_tree: Input tree (*.newick) created by removing certain taxa listed in two included text files consecutively from the dated phylogeny created with BEAST2 (1_nuclear_analyses/4_nuclear_MO_orthologs/2_analyses/09_BEAST2/4_TreeAnnotator)
1_nine_areas:
1_input_files: The R script (*.R) used to performed the BioGeoBEARS analysis based on nine specified areas and the input text files containing the taxon distribution data and the distance matrix between the nine specified areas in kilometers
2_output_files: Output files (*.txt and *.pdf) from the BioGeoBEARS analysis based on nine specified areas
2_new_vs_old_world:
1_input_files: The R script (*.R) used to performed the BioGeoBEARS analysis based on two specified areas (i.e., the New and the Old World) and the input text file containing the taxon distribution data
2_output_files: Output files (*.txt and *.pdf) from the BioGeoBEARS analysis based on two specified areas
2_chloroplast_analyses:
1_original_chloroplast_fasta_files: Unaligned fasta files (*.fna) of chloroplast loci
2_alignments: Cleaned alignments with Phyx (*.aln-cln) derived from aligning the fasta files with MACSE
3_concatenated_alns: Output files from the concatenation of the alignments of all 80 chloroplast loci
4_IQ-TREE: Input clean concatenated alignment (*.phy), input model file (*.model), shell executable file (*.sh) and output files for the IQ-TREE phylogenetic reconstruction using the concatenated alignment, including unrooted (*.treefile) and rooted with pxrr (*.treefile.rr) phylogenies
5_Quartet_Sampling: Shell executable file (*.sh) and output files from the Quartet Sampling analysis performed using the clean concatenated alignment (2_chloroplast_analyses/3_concatenated_alns) and chloroplast IQ-TREE phylogeny (2_chloroplast_analyses/4_IQ-TREE)
3_1_Ks_plots:
1_Ks_values:
1_within_species:
1_wgd_dmd: Output files (*.tsv) of wgd dmd delineation of whole paranomes within single species
2_wgd_ksd: Output files (*.tsv) of wgd ksd construction of KS age distributions within single species
2_between_species:
1_wgd_dmd: Output files (*.tsv) of wgd dmd delineation of reciprocal best hits (RBHs) between each corresponding species pair
2_wgd_ksd: Output files (*.tsv) of wgd ksd construction of KS age distributions between each corresponding species pair. (edited)
4_Assembly_references:
Extended set of references from orchid genomes and transcriptomes (*.fa) available on the Sequence Read Archive (SRA) of NCBI used to improve gene extractions.
5_Main_Figures:
The published figures might look weird since they were modified during the typesetting. Here, we provide the figures in SVG format.
If you have any questions about the data, please do not hesitate to contact Diego F. Morales-Briones at dfmoralesb@gmail.com
Change Log
April 2025: Added folder 5_Main_Figures
.