Anthropogenic activity and climate change exacerbate the spread of pathogenic bacteria in the environment

Geng, Yu 1 ; Liu, Ya 1 ; Li, Peng1 ; Sun, Jingyu1 ; Jiang, Yiru1 ; Pan, Zhuo 1 ; Li, Yue-zhong1 ; Zhang, Zheng 1

Published Jan 22, 2025 on Dryad. https://doi.org/10.5061/dryad.msbcc2g82

Data files

Jan 22, 2025 version files 52.58 MB

human-pathogenic-bacteria.zip

52.57 MB
README.md

7.22 KB

Abstract

Climate change is profoundly impacting human health. Human pathogenic bacteria (HPB) infections mediated by the environment are considered a significant cause of global health losses. However, the biogeography of HPB and their response to climate change remain largely unknown. Here, we constructed and analyzed a global atlas of potential HPB using 1,066,584 samples worldwide. HPB are widely present in the global environment, and their distribution follows a latitudinal diversity gradient. Climate and anthropogenic factors are identified as major drivers of the global distribution of HPB. Our predictions indicated that by the end of this century, the richness, abundance, and invasion risk of HPB will increase globally, with this upward trend becoming more pronounced as development sustainability declines. Therefore, the threat of environmentally mediated HPB infections to human health may be more severe in a world where anthropogenic activities are intensifying and the global climate is warming.

Introduction

We have submitted our raw data and all R codes for machine learning and climate change analysis of human pathogenic bacteria (HPB). The files which are included in my submission contain all of the information for machine learning analysis.

Please place these files in a folder named pathogen and set default working directory of R to the previous level of the pathogen. For example, if the path of pathogen is a/b/c/pthogen, execute the command setwd(dir = "a/b/c/") in R.

Detailed description

variables.xlsx contains the definitions of variables in each dataset.
dat.csv is data used for global distribution of HPB richness and dat_abun.csv is data used for global distribution of HPB abundance. df_412_terra_nohuman.csv and df0_add_nohuman.csv are data used for climate change analysis of HPB richness and abundance.
Folder ending in spatial or spatial_abun: Split the training and testing sets that are spatially distinct using the blockCV package.
- map_spatial and map_spatial_abun are used to predict the global distribution of HPB richness and abundance. There are six code files: step1_select_sample.R is used for data preprocessing; step2_vif.R is used to calculate variance inflation factor; step3_feature_selection.R is used to identify the optimal feature set; step4_hyperparameter_tuning.R is used to optimize the model's hyperparameters; step5_test_model.R is used to validate model on the test set; and step6_modelling.R is used for model construction.
- future_spatial and future_spatial_abun are used to analyze the impact of climate change on HPB. There are seven code files: step1_extract_variables.R is used to extract climate variables corresponding to each location; step2_data_process.R is used for data preprocessing; step3_MESS.R is used to calculate multivariate environmental similarity surface; step4_feature_selection.R is used to identify the optimal feature set; step5_modelling.R is used for model construction; step6_test_model.R is used to validate model on the test set; and step7_future_predict.R is used to predict global distribution of HPB richness and abundance in the future.
Folder ending in spatial_cv or spatial_cv_abun: Apply spatial cross-validation through the blockCV package.
- map_spatial_cv and map_spatial_cv_abun are used to predict the global distribution of HPB richness and abundance. There are five code files: step1_vif.R is used to calculate variance inflation factor; step2_feature_selection.R is used to identify the optimal feature set; step3_hyperparameter_tuning.R is used to optimize the model's hyperparameters; step4_test_model.R is used to validate model on the test set; and step5_modelling.R is used for model construction.
- future_spatial_cv and future_spatial_cv_abun are used to analyze the impact of climate change on HPB. There are seven code files: step1_extract_variables.R is used to extract climate variables corresponding to each location; step2_data_process.R is used for data preprocessing; step3_MESS.R is used to calculate multivariate environmental similarity surface; step4_feature_selection.R is used to identify the optimal feature set; step5_modelling.R is used for model construction; step6_test_model.R is used to validate model on the test set; and step7_future_predict.R is used to predict global distribution of HPB richness and abundance in the future.
Folder ending in remove or remove_abun: Remove the commensal.
- map_remove and map_remove_abun are used to predict the global distribution of richness and abundance of HPB excluding commensal. There are six code files: step1_select_sample.R is used for data preprocessing; step2_vif.R is used to calculate variance inflation factor; step3_feature_selection.R is used to identify the optimal feature set; step4_hyperparameter_tuning.R is used to optimize the model's hyperparameters; step5_test_model.R is used to validate model on the test set; and step6_modelling.R is used for model construction. dat_remove.csv is data used for global distribution of HPB richness and dat_remove_abun.csv is data used for global distribution of HPB abundance.
- future_remove and future_remove_abun are used to analyze the impact of climate change on HPB excluding commensal. There are seven code files: step1_extract_variables.R is used to extract climate variables corresponding to each location; step2_data_process.R is used for data preprocessing; step3_MESS.R is used to calculate multivariate environmental similarity surface; step4_feature_selection.R is used to identify the optimal feature set; step5_modelling.R is used for model construction; step6_test_model.R is used to validate model on the test set; and step7_future_predict.R is used to predict global distribution of HPB richness and abundance in the future. df_412_terra_nohuman.csv and df0_add_nohuman.csv are data used for climate change analysis of HPB richness and abundance.
Folder ending in EMP: Analyze using the Earth Microbiome Project (EMP).
- map_EMP are used to predict the global distribution of HPB. There are six code files: step1_select_sample.R is used for data preprocessing; step2_vif.R is used to calculate variance inflation factor; step3_feature_selection.R is used to identify the optimal feature set; step4_hyperparameter_tuning.R is used to optimize the model's hyperparameters; step5_test_model.R is used to validate model on the test set; and step6_modelling.R is used for model construction. train_EMP.csv is data used for global distribution of HPB richness and location.csv is data related to location information.
- future_EMP are used to analyze the impact of climate change on HPB. There are seven code files: step1_extract_variables.R is used to extract climate variables corresponding to each location; step2_data_process.R is used for data preprocessing; step3_MESS.R is used to calculate multivariate environmental similarity surface; step4_feature_selection.R is used to identify the optimal feature set; step5_modelling.R is used for model construction; step6_test_model.R is used to validate model on the test set; and step7_future_predict.R is used to predict global distribution of HPB richness and abundance in the future. df_412_terra_nohuman.csv and df0_add_nohuman.csv are data used for climate change analysis of HPB richness and abundance.

Code/Software

The data analysis was mainly conducted using R (version 4.3.3).

Please note

Please download the map data for analysis from http://bioinfo.qd.sdu.edu.cn/kegg/map_data/map_data.zip using a browser or ftp://202.194.20.63/ by FTP download software, such as FileZilla. You can also use the File Explorer on the Windows system to download data.