Data from: Phylogenomics, reticulation, and biogeographical history of Elaeagnaceae
Data files
Jul 29, 2024 version files 919 KB
-
areas_in_biogeography_analyses.csv
1.77 KB
-
Assembled_data.7z
630.48 KB
-
Astral_tree_of_Elaeagnaceae.tre
4.32 KB
-
BAYAREALIKE_J.txt
187.98 KB
-
Dated_ML_tree_of_Elaeagnaceae.tre
70.55 KB
-
RAxML_concatenated_tree_of_Elaeagnaceae_based_on_plastid_CDS.tre
11.88 KB
-
RAxML_concatenated_tree_of_Rosales_based_on_nuclear_genes.tre
7.63 KB
-
README.md
4.40 KB
Abstract
The angiosperm family Elaeagnaceae comprises three genera and ca. 100 species distributed mainly in Eurasia and North America. Little family-wide phylogenetic and biogeographic research on Elaeagnaceae has been conducted, limiting the application and preservation of natural genetic resources. Here, we reconstructed a strongly supported phylogenetic framework of Elaeagnaceae to better understand inter- and intrageneric relationships, as well as the origin and biogeographical history of the family. For this purpose, we used both nuclear and plastid sequences from Hyb-Seq and genome skimming approaches to reconstruct a well-supported phylogeny and, along with current distributional data, infer historical biogeographical processes. Our phylogenetic analyses of both nuclear and plastid data strongly support the monophyly of Elaeagnaceae and each of the three genera. Elaeagnus was resolved as sister to the well-supported clade of Hippophae and Shepherdia. The intrageneric relationships of Elaeagnus and Hippophae were also well resolved. High levels of nuclear gene tree conflict and cytonuclear discordance were detected within Elaeagnus, and our analyses suggest putative ancient and recent hybridization. We inferred that Elaeagnaceae originated ca. 90.48 Ma (95% CI = 89.91–91.05 Ma), and long-distance dispersal likely played a major role in shaping its intercontinentally disjunct distribution. This work presents the most comprehensive phylogenetic framework for Elaeagnaceae to date, offers new insights into previously unresolved relationships in Elaeagnus, and provides a foundation for further studies on classification, evolution, biogeography, and conservation of Elaeagnaceae.
This archive contains code and data used for the paper “Phylogenomics, reticulation, and biogeographical history of Elaeagnaceae”
(Gu, Wei; Zhang, Ting; Liu, Shui-Yin; Tian, Qin1; Yang, Chen-Xuan; Lu, Qing; Fu, Xiao-Gang; Kates, Heather; Stull, Gregory; Soltis, Pamela; Soltis, Douglas; Folk, Ryan; Guralnick, Robert; Li, De-Zhu; Yi, Ting-Shuang), Plant Diversity, 2024.
This work focuses on the phylogenetic relationships within Elaeagnaceae, hybridizations, origination and dispersals of the family, and presents the most comprehensive phylogenetic framework for Elaeagnaceae to date, offers new insights into previously unresolved relationships in Elaeagnus, and provides a foundation for further studies on classification, evolution, biogeography, and conservation of Elaeagnaceae.
You can reuse these scripts and data under the license terms specified in the Dryad repository these files are hosted on. For further information, please contact Wei Gu (guwei1@mail.kib.ac.cn).
Folder structure
This Assembled_data.7z contains several subfolders:
- “Nuclear capture”: 83 nuclear genes from hyb-seq and one concatenated matrix
- “Off-target plastid CDS”: Off-target plastid CDS assembled from Hyb-Seq raw data
- “Plastid”: plastid data
-“All cds renamed”: renamed cds alignments extracted from assembled plastid data
-“Complete plastome”: assembled plastid genomes from genome skimming raw data
Other files
Config_file_for_treePL.txt is the config file used in treePL for dating analysis
Phytools_cophylo20220516.R is the script used in comparing topologies between two phylogenetic trees
Astral_tree_of_Elaeagnaceae.tre is the coalescent tree constructed using ASTRAL based on 83 nuclear genes
RAxML_concatenated_tree_of_Rosales_based_on_nuclear_genes.tre is the ML tree based on the concatenated matrix of all species used in this study
areas_in_biogeography_analyses.csv is the area setting file used in RASP
BAYAREALIKE_J.txt is the result of biogeographical analysis
Assembled_data.7z included the nuclear genes and all newly assembled chloroplast sequences used in this study
RAxML_concatenated_tree_of_Elaeagnaceae_based_on_plastid_CDS.tre is the ML tree based on plastome DNA
Dated_ML_tree_of_Elaeagnaceae.tre is the dated tree of Elaeagnaceae
# Note
Bait for 100 nuclear genes will be uploaded with the agreements of all collaborators
Acknowledgements
This research was supported by the National Natural Science Foundation of China, Key International (regional) Cooperative Research Project (no. 31720103903), the Science and Technology Basic Resources Investigation Program of China (no. 2019FY100900), the Strategic Priority Research Program of Chinese Academy of Sciences (no. XDB31000000), the National Natural Science Foundation of China (no. 31270274), the Yunling International High-end Experts Program of Yunnan Province, China (no. YNQR-GDWG-2017-002 and no. YNQR-GDWG-2018-012), the China Scholarship Council (202004910775), the Chinese Academy of Sciences (CAS) President’s International Fellowship Initiative (no. 2020PB0009), the China Postdoctoral Science Foundation (CPSF), and the United States Department of Energy (grant no. DE-SC0018247 to PSS, RPG, and DES). We are grateful to the following institutions for providing specimens or silica-dried materials: the herbarium of the California Academy of Sciences (CAS); the herbarium of Kunming Institute of Botany, Chinese Academy of Sciences (KUN); the Germplasm Bank of Wild Species and Molecular Biology Experiment Center, Kunming Institute of Botany, Chinese Academy of Sciences; the Missouri Botanical Garden Herbarium (MO); the New York Botanical Garden Herbarium (NY); the Ohio State University Herbarium (OS) and the University of Texas Herbarium (TEX). We are also grateful to Jiajin Wu for help with sampling and DNA extraction; to Germplasm Bank of Wild Species (Kunming institute of botany, Chinese Academy of Science) for providing genome skimming data of Elaeagnaceae; to Tiantian Xue (Institute of Botany, Chinese Academy of Sciences) for his suggestions on the project; and to the iFlora High Performance Computing Center of Germplasm Bank of Wild Species (iFlora HPC Center of GBOWS, KIB, CAS) for computing.
Data Collection:
Leaf material was gathered from multiple herbaria and the field. Total DNA extraction was performed via a modified CTAB method.
Sequencing and Processing:
Hybridization enrichment sequencing (Hyb-seq) was used to capture 100 low-copy nuclear genes. Raw sequenced reads underwent cleaning and filtering processes, which included trimming Illumina adapter sequence artifacts, discarding low-quality reads, and trimming low-quality read ends using TRIMMOMATIC v0.32. Assembly of processed nuclear reads was executed using HybPiper v1.2. As a result, 83 loci were kept for further analysis.
Genome skimming sequencing data were assembled into high-quality plastome sequence using GetOrganelle, then the plastomes were annotated with the software PGA, coding sequences of the plastome were extracted using the python script “get_annotated_regions_from_gb.py” from https://github.com/Kinggerm/PersonalUtilities.
Sequence Alignment and Cleaning:
The sequences of 83 nuclear genes were aligned using MAFFT, following which the original alignments were cleaned to reduce errors, removing gap-heavy and ambiguously aligned sites. To reduce errors in our alignments (i.e., gap-heavy and ambiguously aligned sites), the original alignment of each gene was cleaned using the pipeline of KewHybSeqWorkshop.