Anthropogenic activity and climate change exacerbate the spread of pathogenic bacteria in the environment
Data files
Jan 22, 2025 version files 52.58 MB
-
human-pathogenic-bacteria.zip
52.57 MB
-
README.md
7.22 KB
Abstract
Climate change is profoundly impacting human health. Human pathogenic bacteria (HPB) infections mediated by the environment are considered a significant cause of global health losses. However, the biogeography of HPB and their response to climate change remain largely unknown. Here, we constructed and analyzed a global atlas of potential HPB using 1,066,584 samples worldwide. HPB are widely present in the global environment, and their distribution follows a latitudinal diversity gradient. Climate and anthropogenic factors are identified as major drivers of the global distribution of HPB. Our predictions indicated that by the end of this century, the richness, abundance, and invasion risk of HPB will increase globally, with this upward trend becoming more pronounced as development sustainability declines. Therefore, the threat of environmentally mediated HPB infections to human health may be more severe in a world where anthropogenic activities are intensifying and the global climate is warming.
Introduction
We have submitted our raw data and all R codes for machine learning and climate change analysis of human pathogenic bacteria (HPB). The files which are included in my submission contain all of the information for machine learning analysis.
Please place these files in a folder named pathogen and set default working directory of R to the previous level of the pathogen. For example, if the path of pathogen is a/b/c/pthogen, execute the command setwd(dir = "a/b/c/") in R.
Detailed description
variables.xlsxcontains the definitions of variables in each dataset.dat.csvis data used for global distribution of HPB richness anddat_abun.csvis data used for global distribution of HPB abundance.df_412_terra_nohuman.csvanddf0_add_nohuman.csvare data used for climate change analysis of HPB richness and abundance.- Folder ending in
spatialorspatial_abun: Split the training and testing sets that are spatially distinct using theblockCVpackage.map_spatialandmap_spatial_abunare used to predict the global distribution of HPB richness and abundance. There are six code files:step1_select_sample.Ris used for data preprocessing;step2_vif.Ris used to calculate variance inflation factor;step3_feature_selection.Ris used to identify the optimal feature set;step4_hyperparameter_tuning.Ris used to optimize the model's hyperparameters;step5_test_model.Ris used to validate model on the test set; andstep6_modelling.Ris used for model construction.future_spatialandfuture_spatial_abunare used to analyze the impact of climate change on HPB. There are seven code files:step1_extract_variables.Ris used to extract climate variables corresponding to each location;step2_data_process.Ris used for data preprocessing;step3_MESS.Ris used to calculate multivariate environmental similarity surface;step4_feature_selection.Ris used to identify the optimal feature set;step5_modelling.Ris used for model construction;step6_test_model.Ris used to validate model on the test set; andstep7_future_predict.Ris used to predict global distribution of HPB richness and abundance in the future.
- Folder ending in
spatial_cvorspatial_cv_abun: Apply spatial cross-validation through theblockCVpackage.map_spatial_cvandmap_spatial_cv_abunare used to predict the global distribution of HPB richness and abundance. There are five code files:step1_vif.Ris used to calculate variance inflation factor;step2_feature_selection.Ris used to identify the optimal feature set;step3_hyperparameter_tuning.Ris used to optimize the model's hyperparameters;step4_test_model.Ris used to validate model on the test set; andstep5_modelling.Ris used for model construction.future_spatial_cvandfuture_spatial_cv_abunare used to analyze the impact of climate change on HPB. There are seven code files:step1_extract_variables.Ris used to extract climate variables corresponding to each location;step2_data_process.Ris used for data preprocessing;step3_MESS.Ris used to calculate multivariate environmental similarity surface;step4_feature_selection.Ris used to identify the optimal feature set;step5_modelling.Ris used for model construction;step6_test_model.Ris used to validate model on the test set; andstep7_future_predict.Ris used to predict global distribution of HPB richness and abundance in the future.
- Folder ending in
removeorremove_abun: Remove the commensal.map_removeandmap_remove_abunare used to predict the global distribution of richness and abundance of HPB excluding commensal. There are six code files:step1_select_sample.Ris used for data preprocessing;step2_vif.Ris used to calculate variance inflation factor;step3_feature_selection.Ris used to identify the optimal feature set;step4_hyperparameter_tuning.Ris used to optimize the model's hyperparameters;step5_test_model.Ris used to validate model on the test set; andstep6_modelling.Ris used for model construction.dat_remove.csvis data used for global distribution of HPB richness anddat_remove_abun.csvis data used for global distribution of HPB abundance.future_removeandfuture_remove_abunare used to analyze the impact of climate change on HPB excluding commensal. There are seven code files:step1_extract_variables.Ris used to extract climate variables corresponding to each location;step2_data_process.Ris used for data preprocessing;step3_MESS.Ris used to calculate multivariate environmental similarity surface;step4_feature_selection.Ris used to identify the optimal feature set;step5_modelling.Ris used for model construction;step6_test_model.Ris used to validate model on the test set; andstep7_future_predict.Ris used to predict global distribution of HPB richness and abundance in the future.df_412_terra_nohuman.csvanddf0_add_nohuman.csvare data used for climate change analysis of HPB richness and abundance.
- Folder ending in
EMP: Analyze using the Earth Microbiome Project (EMP).map_EMPare used to predict the global distribution of HPB. There are six code files:step1_select_sample.Ris used for data preprocessing;step2_vif.Ris used to calculate variance inflation factor;step3_feature_selection.Ris used to identify the optimal feature set;step4_hyperparameter_tuning.Ris used to optimize the model's hyperparameters;step5_test_model.Ris used to validate model on the test set; andstep6_modelling.Ris used for model construction.train_EMP.csvis data used for global distribution of HPB richness andlocation.csvis data related to location information.future_EMPare used to analyze the impact of climate change on HPB. There are seven code files:step1_extract_variables.Ris used to extract climate variables corresponding to each location;step2_data_process.Ris used for data preprocessing;step3_MESS.Ris used to calculate multivariate environmental similarity surface;step4_feature_selection.Ris used to identify the optimal feature set;step5_modelling.Ris used for model construction;step6_test_model.Ris used to validate model on the test set; andstep7_future_predict.Ris used to predict global distribution of HPB richness and abundance in the future.df_412_terra_nohuman.csvanddf0_add_nohuman.csvare data used for climate change analysis of HPB richness and abundance.
Code/Software
The data analysis was mainly conducted using R (version 4.3.3).
Please note
Please download the map data for analysis from http://bioinfo.qd.sdu.edu.cn/kegg/map_data/map_data.zip using a browser or ftp://202.194.20.63/ by FTP download software, such as FileZilla. You can also use the File Explorer on the Windows system to download data.
