Phylogenomic insights into species relationships, reticulate evolution, and biogeographic diversification of the ginseng genus Panax (Araliaceae), with an emphasis on the diversification in the Himalayan – Hengduan Mountains
Data files
Sep 17, 2024 version files 1.46 MB
-
Panax_data_matrices.rar
1.46 MB
-
README.md
703 B
Abstract
Panax (Araliaceae) is a small genus with several well-known medicinally important species. It has a disjunct distribution between eastern Asia and eastern North America, with most species from eastern Asia, especially the Himalayan – Hengduan Mountains (HHM). This study used the genomic target enrichment method to obtain 358 nuclear ortholog loci and complete plastome sequences from 59 accessions representing all 18 species of the genus. Divergence time estimation and biogeographic analyses suggest that Panax was likely widely distributed from North America to Asia during the middle Eocene. During the late Eocene to Oligocene Panax may have experienced extensive extinctions during global climate cooling. It survived and diverged early in the mountains of southwest China and tropical Indochina, where some taxa migrated northwestward to the Himalayan-Hengduan Mountains, eastward to central and eastern China, and then onward towards Japan and North America. Gene flow is identified as the main contributor to phylogenetic discordance (33.46%) within Panax. We hypothesize that the common ancestors of the medicinally important P. ginseng+P. japonicus+P. quinquefolius clade had experienced allopolyploidization, which increased adaptability to cooler and drier environments. During the middle to late Miocene, several dispersals occurred from the region of the HHM to contiguous areas, suggesting that HHM acted as a refugium and also served as a secondary diversification center for Panax. Our findings highlight that the interplay of orographic uplift and climatic changes in the HHM greatly contributed to the species diversity of Panax.
README: Phylogenomic insights into species relationships, reticulate evolution, and biogeographic diversification of the ginseng genus Panax (Araliaceae), with an emphasis on the diversification in the Himalayan – Hengduan Mountains
https://doi.org/10.5061/dryad.931zcrjvm
Description of the data and file structure
The nuclear gene data matrix consisted of 358 ortholog genes with a total length of 303,894 bp. The plastome data matrix had a length of 156,100 bp.
The nuclear datasets assembled by HybPiper. We employed BWA for reference-guided (Panax notoginseng, Genbank KP036468) assembly based on clean reads to obtained the plastome sequences.
Methods
Raw sequencing reads were quality filtered using Trimmomatic v.0.39 (Bolger et al., 2014) to remove bases at read ends and low-quality bases with a minimum quality score of 20. Subsequently, the HybPiper pipeline v.1.3.1 (Johnson et al., 2016) was used to assemble nuclear loci. The process includes three major steps: using the nuclear sequences of 936 genes for bait design as the references to capture all the reads from sequenced accessions via the BWA 0.7.17 (Li & Durbin 2009) option with default settings, applying the SPAdes 3.12.0 (Bankevich et al. 2012) to assemble reads into contigs, and implementing the Python and R scripts from HybPiper to obtain the recovered sequences. Potential paralogs were detected and removed from the sequence dataset for subsequent phylogenetic analyses using the "paralog investigator" tool included in HybPiper. This resulted in 578 loci being removed from the final dataset and subsequent analyses. Therefore, the final dataset included 358 orthologues.
To obtain bycatch plastid sequences from the clean reads, a reference assembly strategy was used. We employed BWA for reference-guided (Panax notoginseng, Genbank KP036468) assembly based on clean reads.