Data from: Host preference explains the high endemism of ectomycorrhizal fungi in a dipterocarp rainforest
Data files
Sep 13, 2024 version files 186.14 KB
-
Input_data_Dataset1_97Per_102Con.csv
12.73 KB
-
Input_data_Dataset1_97Per_127Con.csv
15.14 KB
-
Input_data_Dataset1_98Per_102Con.csv
12.73 KB
-
Input_data_Dataset1_98Per_127Con.csv
15.14 KB
-
Input_data_Dataset2_97Per_102Con.csv
12.73 KB
-
Input_data_Dataset2_97Per_127Con.csv
15.14 KB
-
Input_data_Dataset2_98Per_102Con.csv
12.73 KB
-
Input_data_Dataset2_98Per_127Con.csv
16.22 KB
-
Input_data_Dataset3_97Per_102Con.csv
12.73 KB
-
Input_data_Dataset3_97Per_127Con.csv
15.14 KB
-
Input_data_Dataset3_98Per_102Con.csv
12.73 KB
-
Input_data_Dataset3_98Per_127Con.csv
15.14 KB
-
README.md
17.81 KB
Abstract
In this study, we aimed to assess whether host preference could enhance the endemism of ectomycorrhizal (ECM) fungi that inhabit dipterocarp rainforests. Highly similar sequences of 175 operational taxonomic units (OTUs) for ECM fungi that were obtained from Lambir Hill’s National park, Sarawak, Malaysia, were searched for in a nucleotide sequence database. Using a two-step binomial model, the probability of presence for the query OTUs and the registration rate of barcode sequences in each country were simultaneously estimated. The results revealed that the probability of presence in the respective countries increased clearly with increasing species richness of Dipterocarpaceae and decreasing geographical distance from Lambir. Furthermore, most ECM fungi in Lambir were shown to be endemic to Malaysia and neighboring countries. These findings suggest that dispersal limitation as well as host preference are responsible for the high endemism of ECM fungi in dipterocarp rainforests. Moreover, host preference likely determines the areas where ECM fungi potentially expand and dispersal limitation creates distance–decay patterns within suitable habitats. Although host preference has received less attention than dispersal limitation, our findings support that host preference has a profound influence on the global distribution of ECM fungi.
README: Host preference explains the high endemism of ectomycorrhizal fungi in a dipterocarp rainforest
https://doi.org/10.5061/dryad.k3j9kd5gs
Description of the data and file structure
[Input_data_Dataset1_97Per_102Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 97% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 1. The number of focal countries for the estimation is 102 (127 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset1_97Per_127Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 97% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 1. The number of focal countries for the estimation is 127 (152 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset1_98Per_102Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 98% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 1. The number of focal countries for the estimation is 102 (127 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset1_98Per_127Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 98% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 1. The number of focal countries for the estimation is 127 (152 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset2_97Per_102Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 97% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 2. The number of focal countries for the estimation is 102 (127 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset2_97Per_127Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 97% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 2. The number of focal countries for the estimation is 127 (152 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset2_98Per_102Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 98% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 2. The number of focal countries for the estimation is 102 (127 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset2_98Per_127Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 98% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 2. The number of focal countries for the estimation is 127 (152 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset3_97Per_102Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 97% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 3. The number of focal countries for the estimation is 102 (127 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset3_97Per_127Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 97% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 3. The number of focal countries for the estimation is 127 (152 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset3_98Per_102Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 98% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 3. The number of focal countries for the estimation is 102 (127 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
[Input_data_Dataset3_98Per_127Con.csv]
Input data for rstan analysis. For this input data, the operational taxonomic units of fungal barcode sequences were clustered using 98% sequence similarity threshold. Focal OTUs for the estimation are those of Dataset 3. The number of focal countries for the estimation is 127 (152 regions). This input data includes information of each country, such as the regional name (Region), the country name (Country), the latitude (lat) and longitude (lon) of the capital city (lat), the total number of OTUs deposited in the NCBI database (OTU), the number of singleton OTUs in the NCBI database (Singleton), the number of OTUs that were detected in the study site (detected), the natural logarithm of one plus species richness of Dipterocarpaceae trees (Host1), the natural logarithm of one plus species richness of Lithocarpus trees (Host2), the geographical distance (1000 km) between the capital city and the study site (Dist), and the annual mean temperature (°C) of the capital city (temp), and annual accumulative precipitation (m) of the capital city (prec). Sds indicates the standard deviations of five variables (Host1, Host2, Dist, temp, and prec) across all countries. Sds2 indicates the standard deviations of absolute difference of five variables between focal country and the study site (Malaysia) across all countries. N indicates the total number of countries in this input data.
Missing data code: NA
Code/Software
[R_commands.R]
R format file describing commands used for the analyses (R ver. 4.3.0).
[Stan_code.stan]
Stan code (rstan ver. 2.26.23) describing two step binomial model that was used to estimate the global distribution of ectomycorrhizal fungi found in the study site.