Waves of range dynamics and gene flow characterize the biogeographic history of Litsea elongata, a dominant tree in East Asian evergreen broadleaved forests
Data files
May 12, 2026 version files 315.08 MB
-
1.all-elongata_thin1000.phy
76.37 MB
-
10.MSC_elongata.txt
2.23 MB
-
11.SNAPP_21.phy
238.57 KB
-
12.leaf_traits.csv
25.80 KB
-
13.SLA.csv
10.91 KB
-
14.pairwise_PERMANOVA_results.csv
2.55 KB
-
2.all-elongata_thin2000.phy
50.74 MB
-
3.all-elongata_thin5000.phy
28.01 MB
-
4.all-elongata_thin1000_remove-zm.phy
72.44 MB
-
5.all-elongata_thin2000_remove-zm.phy
48.32 MB
-
6.all-elongata_thin5000_remove-zm.phy
26.67 MB
-
7.slidingwondows-BS70.trees
10 MB
-
8.Ten_clade_BBAA.txt
9.80 KB
-
9.Eleven_clade_BBAA.txt
13.49 KB
-
README.md
7.60 KB
Abstract
This dataset supports a study of the biogeographic history of Litsea elongata, a dominant and widespread tree species in East Asian evergreen broadleaved forests. The dataset integrates genome-wide single-nucleotide variant data, phylogenomic and population genomic analyses, gene-flow and introgression analyses, divergence-time estimation, fossil-informed biogeographic evidence, and leaf morphological measurements.
These data were used to identify geographically structured clades within L. elongata, evaluate historical range expansion and contraction, investigate ancient and ongoing gene flow among clades, and assess the potential hybrid origin of one clade in the Himalaya-Hengduan Mountains. The results indicate that present-day patterns of genetic diversity in L. elongata have been shaped by repeated waves of range dynamics and introgression. The dataset also provides evidence that East Asian evergreen broadleaved forests may have had more extensive historical connections across subtropical montane regions than previously recognized.
Together, these data provide a basis for reanalyzing the evolutionary history, population structure, gene flow, divergence history, and morphological variation of L. elongata across its distribution range.
[Access this dataset on Dryad](DOI: 10.5061/dryad.866t1g22w)
Corresponding author information:
Sheng-Yuan Qin
email:qinshengyuan@mail.kib.ac.cn
Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
Department of Science and Education, Field Museum, Chicago, Illinois, USA
Descriptions of data
This dataset supports a study of the geogenomic and biogeographic history of Litsea elongata, a dominant and widespread tree species in East Asian evergreen broadleaved forests. The submission includes genome-wide single-nucleotide variant datasets, phylogenomic and population genomic analysis files, gene-flow and introgression analysis files, divergence-time inference data, and leaf morphological trait data.
The data were used to identify geographic clades within L. elongata, investigate historical range expansion and contraction, assess ancient and ongoing gene flow among clades, test possible hybrid origins of selected clades, estimate divergence times, and examine variation in leaf morphology across clades.
Details on data
General notes
- Species names follow the taxonomy used in the associated manuscript.
- Litsea elongata is abbreviated as L. elongata in some files.
SNVrefers to single-nucleotide variant.MSC-Mrefers to the multispecies coalescent with migration model.MSC-Irefers to the multispecies coalescent with introgression model.Dtriosrefers to the Dsuite command used to calculate D-statistics and related gene-flow statistics for all possible trios of clades.XZ-ZM,XZ-TYPE,LES, and other clade names correspond to the geographic or genetic clades defined in the associated manuscript.- Sample identifiers are consistent across files unless otherwise noted.
File 1. [1.all-elongata_thin1000.phy]
This file contains a concatenated SNV dataset for phylogenomic analysis of L. elongata. The dataset was filtered using Strategy 1, in which neighboring SNVs were filtered using a 1,000 bp distance criterion across the genome.
File 2. [2.all-elongata_thin2000.phy]
This file contains a concatenated SNV dataset filtered using Strategy 2, in which neighboring SNVs were filtered using a 2,000 bp distance criterion across the genome.
File 3. [3.all-elongata_thin5000.phy]
This file contains a concatenated SNV dataset filtered using Strategy 3, in which neighboring SNVs were filtered using a 5,000 bp distance criterion across the genome.
File 4. [4.all-elongata_thin1000_remove-zm.phy]
This file contains a concatenated SNV dataset corresponding to Strategy 4. This dataset was generated using the same filtering scheme as Strategy 1, but individuals belonging to the XZ-ZM clade were removed.
File 5. [5.all-elongata_thin2000_remove-zm.phy]
This file contains a concatenated SNV dataset corresponding to Strategy 5. This dataset was generated using the same filtering scheme as Strategy 2, but individuals belonging to the XZ-ZM clade were removed.
File 6. [6.all-elongata_thin5000_remove-zm.phy]
This file contains a concatenated SNV dataset corresponding to Strategy 6. This dataset was generated using the same filtering scheme as Strategy 3, but individuals belonging to the XZ-ZM clade were removed.
File 7. [7.slidingwondows-BS70.trees]
This file contains putatively unlinked recombinational units used for coalescent-based species tree inference. The dataset consists of 1,168 windows. Each window contains 1,000 adjacent SNVs, and windows are separated by a minimum of 5,000 SNVs.
File 8. [8.Ten_clade_BBAA.txt]
This file contains Dsuite Dtrios results for all possible combinations of the ten clades inferred from the concatenated SNV topology. The columns P1, P2, and P3 denote the three clades included in each trio. Dstatistic is Patterson’s D statistic, calculated from the imbalance between ABBA and BABA site patterns. Z-score and p-value report the significance of this imbalance, while f4-ratio estimates the proportion of introgressed ancestry. The columns BBAA, ABBA, and BABA represent the counts or weighted counts of the three allele-pattern categories used in the test.
File 9. [9.Eleven_clade_BBAA.txt]
This file contains Dsuite Dtrios results calculated for all possible trios of clades based on the eleven clades identified from the ASTRAL topology. The columns P1, P2, and P3 denote the three clades included in each trio. Dstatistic is Patterson’s D statistic, calculated from the imbalance between ABBA and BABA site patterns. Z-score and p-value report the significance of this imbalance, while f4-ratio estimates the proportion of introgressed ancestry. The columns BBAA, ABBA, and BABA represent the counts or weighted counts of the three allele-pattern categories used in the test.
File 10. [10.MSC_elongata.txt]
This file contains the SNV-window dataset used for MSC-M and MSC-I analyses. The dataset includes 21 individuals representing all major clades of L. elongata and one outgroup, Litsea acutivena. The input data were limited to 100 SNV windows.
This file was used to estimate ongoing gene flow among major clades using the MSC-M model. The same dataset was also used for MSC-I analysis to infer the timing and extent of introgression across clades, especially to test the possible hybrid origin of the XZ-ZM clade from the XZ-TYPE and LES clades.
File 11. [11.SNAPP_21.phy]
This file contains the dataset used for SNAPP divergence-time estimation. The dataset includes 11,350 SNVs separated by more than 100,000 bp between sites, and 21 individuals representing the major clades of L. elongata.
File 12. [12.leaf_traits.csv]
This file contains morphological trait measurements used to infer the evolution of leaf morphology in L. elongata. The columns Species and Samples indicate the clade name and sample identifier, respectively. leaf_vein_angles_1, leaf_vein_angles_2, and leaf_vein_angles_3 represent multiply measurements of leaf vein angle for each sample. Leaf_1_Area_cm², Leaf_2_Area_cm², and Leaf_3_Area_cm² provide the areas of three measured leaves in square centimeters. Leaf_1_Weight_g, Leaf_2_Weight_g, and Leaf_3_Weight_g record the corresponding leaf weight measurements. Leaf_length_1, Leaf_width_1, Leaf_length_2, Leaf_width_2, Leaf_length_3, and Leaf_width_3 represent the length and width measurements of the same three leaves in millimeters.
File 13. [13.SLA.csv]
This file contains specific leaf area (SLA) measurements for samples assigned to different clades of L. elongata. The column SLA records the specific leaf area value for each sample, and Clade indicates the clade to which each sample belongs.
File 14. [14.pairwise_PERMANOVA_results.csv]
This file contains pairwise PERMANOVA results for multivariate variation in non-ratio morphological traits among clades of L. elongata. The column pairs indicates the pair of clades compared in each test. Df gives the degrees of freedom, SumsOfSqs represents the sum of squares, and F.Model is the PERMANOVA pseudo-F statistic. R2 indicates the proportion of multivariate variation explained by the clade comparison. p.value gives the raw permutation-based significance value, whereas p.adjusted reports the p-value after correction for multiple comparisons. The column sig summarizes the significance category based on the adjusted p-value.
