Host fragmentation reinforces dispersal limitation and lineage divergence in ectomycorrhizal fungi
Data files
Jan 16, 2026 version files 723.68 KB
-
clustered_F_rbcL.fasta
20.36 KB
-
clustered_ITS.fasta
396.36 KB
-
clustered_R_rbcL.fasta
20.90 KB
-
dModel10ECM.stan
1.12 KB
-
dModel1ECM.stan
1.35 KB
-
dModel1PSRE.stan
1.43 KB
-
dModel2ECM.stan
1.27 KB
-
dModel2PSRE.stan
1.35 KB
-
dModel3ECM.stan
1.27 KB
-
dModel3PSRE.stan
1.35 KB
-
dModel4ECM.stan
1.27 KB
-
dModel4PSRE.stan
1.35 KB
-
dModel5ECM.stan
1.27 KB
-
dModel5PSRE.stan
1.35 KB
-
dModel6ECM.stan
1.27 KB
-
dModel6PSRE.stan
1.35 KB
-
dModel7ECM.stan
1.20 KB
-
dModel7PSRE.stan
1.35 KB
-
dModel8ECM.stan
1.19 KB
-
dModel8PSRE.stan
1.27 KB
-
dModel9ECM.stan
1.19 KB
-
GeoDis_x_GenDis_ECM.csv
8.79 KB
-
GeoDis_x_GenDis_PSRE.csv
40.33 KB
-
Input_data97ECM.csv
15.28 KB
-
Input_data97ECMB.csv
15.28 KB
-
Input_data97PS.csv
16.68 KB
-
Input_data97PSRE.csv
16.68 KB
-
Input_data98ECM.csv
15.28 KB
-
Input_data98ECMB.csv
15.28 KB
-
Input_data98PS.csv
16.68 KB
-
Input_data98PSRE.csv
16.68 KB
-
Input_data99ECM.csv
15.28 KB
-
Input_data99ECMB.csv
15.28 KB
-
Input_data99PS.csv
16.68 KB
-
Input_data99PSRE.csv
16.68 KB
-
R-commands.R
16.87 KB
-
README.md
5.13 KB
Abstract
Five types of datasets are provided: FASTA-formatted sequence data (xxx.fasta), model codes written in the Stan language (dModelxxx.stan), input data used for modeling (input_dataxxx.csv), tables showing the geographical distance from the study site and the genetic distance from each query sequence for every hit sequence (GeoDis_x_GenDis_xxx.csv), and the R commands used in this study (R-commands.R).
Raw Illumina MiSeq sequencing data were processed and clustered into operational taxonomic units (OTUs) using Claident ver. 0.9.2024.06.10. The model codes and input data can be used to run the two-step binomial model, which estimates how many fungal OTUs detected at the study site are shared with each country or region.
Dataset DOI: 10.5061/dryad.z8w9ghxt1
Description of the data and file structure
Datasets used for the study entitled "Host fragmentation promotes genetic divergence and speciation processes in ectomycorrhizal fungi". Following abbreviations are used here: the ribosomal RNA internal transcribed spacer (ITS) region, operational taxonomic units (OTUs), ectomycorrhizal (ECM) fungi, ectomycorrhizal Basidiomycota (ECMB), plant-saprotrophs/root-endophytes (PS/RE), and plant-saprotrophs (PS).
Files and variables
File:clustered_ITS.fasta
Description: FASTA format file comprising ITS sequences of respective fungal OTUs.
File:clustered_F_rbcL.fasta
Description: FASTA format file comprising rbcL forward sequences of respective plant OTUs.
File:clustered_R_rbcL.fasta
Description: FASTA format file comprising rbcL reverse sequences of respective plant OTUs.
dModel10ECM.stan
dModel1ECM.stan
dModel1PSRE.stan
dModel2ECM.stan
dModel2PSRE.stan
dModel3ECM.stan
dModel3PSRE.stan
dModel4ECM.stan
dModel4PSRE.stan
dModel5ECM.stan
dModel5PSRE.stan
dModel6ECM.stan
dModel6PSRE.stan
dModel7ECM.stan
dModel7PSRE.stan
dModel8ECM.stan
dModel8PSRE.stan
dModel9ECM.stan
Description: Stan model code describing the two-step bionomial model under the focal model used with the focal dataset. Four datasets are included: ECM (ectomycorrhizal fungi), ECMB (ectomycorrhizal Basidiomycota), PS/RE (plant saprotrophs/root endophytes), and PS (plant saprotrophs). Ten models (Model1–Model10) are available for the ECM and ECMB datasets, and eight models (Model1–Model8) are available for the PS/RE and PS datasets. This Stan code can be run in R using the rstan package by executing R scripts as shown in R-commands.R. Thereby, the appropriate input file (Input_data[threshold][dataset].csv) is fitted to the two-step binomial model.
File:GeoDis_x_GenDis_ECM.csv
Description: Table showing the geographical distance from the study site and the genetic distance from each ECM query sequence for every hit sequence
Variables
- Area: The geographic regions where the fungal materials corresponding to each hit sequences were collected
- Geo.dist:Geographical distance from the study site (km)
- Gen.dist:Genetic distance (p-distance) between query and hit sequence
- query:Name of query OTU
- Phylum:Phylum to which query OTU belongs
File:GeoDis_x_GenDis_PSRE.csv
Description: Table showing the geographical distance from the study site and the genetic distance from each PS/RE query sequence for every hit sequence
Variables
- Area: The geographic regions where the fungal materials corresponding to each hit sequences were collected
- Geo.dist:Geographical distance from the study site (km)
- Gen.dist:Genetic distance (p-distance) between query and hit sequence
- query:Name of query OTU
- Lifestyle:Lifestyle of query OTU
- Phylum:Phylum to which query OTU belongs
Input_data97ECM.csv
Input_data97ECMB.csv
Input_data97PS.csv
Input_data97PSRE.csv
Input_data98ECM.csv
Input_data98ECMB.csv
Input_data98PS.csv
Input_data98PSRE.csv
Input_data99ECM.csv
Input_data99ECMB.csv
Input_data99PS.csv
Input_data99PSRE.csv
Description: Input data used to analyze the focal dataset under the focal sequence-identity threshold. Four datasets are included: ECM (ectomycorrhizal fungi), ECMB (ectomycorrhizal Basidiomycota), PS/RE (plant saprotrophs/root endophytes), and PS (plant saprotrophs). Operational taxonomic units (OTUs) were clustered using three sequence-identity thresholds: 97%, 98%, and 99%.
Variables
- No: Row number
- Region:Focal geographical region
- Country:Focal country
- lat:Latitude
- lon:Longitude
- OTU:Total number of fungal ITS OTUs detected in the nucleotide sequence database
- Singleton:Number of singleton ITS OTUs detected in the nucleotide sequence database
- detected:Number of query OTUs for which highly similar hit sequences are available in public database
- Host1:Log transformation (log(x+1)) of Fagaceae species richness
- Host2:Log transformation (log(x+1)) of Dipterocarpaceae species richness
- Dist:Log transformation (log(x+1)) of geographical distance from the study site
- temp:Annual mean temperature of capital city (°C)
- prec:Annual accumulated precipitation of capital city (m)
- Sds:Standard deviations (in this order) of Host1, Host2, Dist, temp, and prec across 152 geographical regions for the ECM and ECMB datasets, and of Host1, Host2, Host3, Dist, temp, and prec across the same 152 geographical regions for the PS/RE and PS datasets.
- N:Number of total query OTUs in each dataset
Code/software
File:R-commands.R
Description: R format file describing commands used for the analyses.
Access information
Other publicly accessible locations of the data:
Data was derived from the following sources:
