Phylogenetic relationships of Tilia (Malvaceae) inferred from multiple nuclear loci and plastid genomes
Data files
Oct 13, 2023 version files 23.49 MB
Abstract
Premise of research. Tilia is a Eurasian-eastern North American disjunct plant genus with great economic and ecological importance. However, a robust phylogeny has not been established with a comprehensive taxon sampling and a large amount of data.
Methodology. We obtained DNA sequences of plastomes and multiple nuclear loci using the anchored hybrid enrichment technique and next generation sequencing technology. Orthologous nuclear loci were inferred using tree-based methods. Phylogenetic analyses were performed using maximum parsimony, Bayesian, and coalescence-based species tree methods. Parentages of polyploids were inferred from the nuclear and plastid phylogenies.
Pivotal results. Craigia yunnanensis is sister to Tilia, within which T. endochrysea endemic to southern China is the earliest lineage diverging followed by the European species T. platyphyllos. The European T. cordata and eastern Asian T. amurensis and T. kiusiana form a clade, while the North American T. americana is most closely related to the eastern and western Asian species complex including T. chingiana, T. mandshurica, T. oliver, and western Asian T. tomentosa. Significant incongruence exists between nuclear and plastid phylogenies.
Conclusions. Our phylogenetic results suggest that there are six distinctive lineages or species complexes in the genus including 1) T. endochrysea, 2) T. platyphyllos, 3) T. kiusiana, 4) T. cordata and T. amurensis, 5) T. chingiana, T. miqueliana, T. oliveri, and 6) T. tomentosa, and T. americana. They may be divided into two sections (Trichophilyra and Tilia) and three subsections (Tilia, Lindnera, and Trabeculares). The prevalent incongruence between nuclear and plastid confirms the importance of ancient hybridization and introgression in the evolutionary history of the genus. Polyploid species in Eurasia and their corresponding diploid parental lineages may have formed a hybrid swarm in the region.
README: Phylogenetic Relationships of Tilia (Malvaceae) Inferred from Multiple Nuclear Loci and Plastid Genomes
https://doi.org/10.5061/dryad.hx3ffbgf6
There are two nuclear datasets and a plastome data set. The two types of datasets contain similar samples for comparing conflicting cytonuclear signals. For each of the nuclear loci, we sorted the assembled sequences into alleles or haplotypes, depending on the ploidy levels of the
species. For example, for diploid species, two alleles are given, seq1 and seq2, and for tetraploids 4 alleles are presented for each sample.
Description of the data and file structure
Dataset 1. Dryad.Data1.T581_134nuc83147Dip_and_Poly.nex
This dataset contains 134 nuclear genes and 83147 aligned sites. "-" represents gap and "?" indicates missing data. The 134 genes are most likely orthologous genes because the genes form a monophyletic group within the same sample or plant. The dataset contains both diploids and polyploids.
Dataset 2. Dryad.Data2.T581_134nuc83147Dip_and_Poly_merged.nex
The dataset is similar to the dataset 1, but the alleles from the same sample has been merged based on consensus rule. For example, for a diploid, if the two alleles show different nucleotides at a site (e.g., A and T), the merged consensus site is an ambiguous nucleotide (W for A or W). Similar to dataset 1, dataset 2 contains both diploid and polyploid species.
Dataset 3. Dryad.Data3.Xie_et_al.TiliaPlastomewithAllMissingRemoved.nex
Sequences of plastomes were skimmed from the anchored hybrid enrichment off-target sequencing reads using the published genome of Tilia oliveri (KT894774) as the reference in the computer program Sequencher 5.4.6. After removing sites with missing data from some samples, the dataset has 46 samples and 137592 aligned sites.
Sharing/Access information
Code/Software
The nexus files can be viewed using standard text editors or standard phylogenetic software such as MEGA and PAUP.
Methods
Anchored hybrid enrichment method was used to generate the data and the data set has been processed.