Anthropogenic activity and climate change exacerbate the spread of pathogenic bacteria in the environment
Data files
Jan 22, 2025 version files 52.58 MB
-
human-pathogenic-bacteria.zip
52.57 MB
-
README.md
7.22 KB
Abstract
Climate change is profoundly impacting human health. Human pathogenic bacteria (HPB) infections mediated by the environment are considered a significant cause of global health losses. However, the biogeography of HPB and their response to climate change remain largely unknown. Here, we constructed and analyzed a global atlas of potential HPB using 1,066,584 samples worldwide. HPB are widely present in the global environment, and their distribution follows a latitudinal diversity gradient. Climate and anthropogenic factors are identified as major drivers of the global distribution of HPB. Our predictions indicated that by the end of this century, the richness, abundance, and invasion risk of HPB will increase globally, with this upward trend becoming more pronounced as development sustainability declines. Therefore, the threat of environmentally mediated HPB infections to human health may be more severe in a world where anthropogenic activities are intensifying and the global climate is warming.
README: Anthropogenic activity and climate change exacerbate the spread of pathogenic bacteria in the environment
Introduction
We have submitted our raw data and all R codes for machine learning and climate change analysis of human pathogenic bacteria (HPB). The files which are included in my submission contain all of the information for machine learning analysis.
Please place these files in a folder named pathogen
and set default working directory of R to the previous level of the pathogen
. For example, if the path of pathogen
is a/b/c/pthogen
, execute the command setwd(dir = "a/b/c/")
in R.
Detailed description
variables.xlsx
contains the definitions of variables in each dataset.dat.csv
is data used for global distribution of HPB richness anddat_abun.csv
is data used for global distribution of HPB abundance.df_412_terra_nohuman.csv
anddf0_add_nohuman.csv
are data used for climate change analysis of HPB richness and abundance.- Folder ending in
spatial
orspatial_abun
: Split the training and testing sets that are spatially distinct using theblockCV
package.map_spatial
andmap_spatial_abun
are used to predict the global distribution of HPB richness and abundance. There are six code files:step1_select_sample.R
is used for data preprocessing;step2_vif.R
is used to calculate variance inflation factor;step3_feature_selection.R
is used to identify the optimal feature set;step4_hyperparameter_tuning.R
is used to optimize the model's hyperparameters;step5_test_model.R
is used to validate model on the test set; andstep6_modelling.R
is used for model construction.future_spatial
andfuture_spatial_abun
are used to analyze the impact of climate change on HPB. There are seven code files:step1_extract_variables.R
is used to extract climate variables corresponding to each location;step2_data_process.R
is used for data preprocessing;step3_MESS.R
is used to calculate multivariate environmental similarity surface;step4_feature_selection.R
is used to identify the optimal feature set;step5_modelling.R
is used for model construction;step6_test_model.R
is used to validate model on the test set; andstep7_future_predict.R
is used to predict global distribution of HPB richness and abundance in the future.
- Folder ending in
spatial_cv
orspatial_cv_abun
: Apply spatial cross-validation through theblockCV
package.map_spatial_cv
andmap_spatial_cv_abun
are used to predict the global distribution of HPB richness and abundance. There are five code files:step1_vif.R
is used to calculate variance inflation factor;step2_feature_selection.R
is used to identify the optimal feature set;step3_hyperparameter_tuning.R
is used to optimize the model's hyperparameters;step4_test_model.R
is used to validate model on the test set; andstep5_modelling.R
is used for model construction.future_spatial_cv
andfuture_spatial_cv_abun
are used to analyze the impact of climate change on HPB. There are seven code files:step1_extract_variables.R
is used to extract climate variables corresponding to each location;step2_data_process.R
is used for data preprocessing;step3_MESS.R
is used to calculate multivariate environmental similarity surface;step4_feature_selection.R
is used to identify the optimal feature set;step5_modelling.R
is used for model construction;step6_test_model.R
is used to validate model on the test set; andstep7_future_predict.R
is used to predict global distribution of HPB richness and abundance in the future.
- Folder ending in
remove
orremove_abun
: Remove the commensal.map_remove
andmap_remove_abun
are used to predict the global distribution of richness and abundance of HPB excluding commensal. There are six code files:step1_select_sample.R
is used for data preprocessing;step2_vif.R
is used to calculate variance inflation factor;step3_feature_selection.R
is used to identify the optimal feature set;step4_hyperparameter_tuning.R
is used to optimize the model's hyperparameters;step5_test_model.R
is used to validate model on the test set; andstep6_modelling.R
is used for model construction.dat_remove.csv
is data used for global distribution of HPB richness anddat_remove_abun.csv
is data used for global distribution of HPB abundance.future_remove
andfuture_remove_abun
are used to analyze the impact of climate change on HPB excluding commensal. There are seven code files:step1_extract_variables.R
is used to extract climate variables corresponding to each location;step2_data_process.R
is used for data preprocessing;step3_MESS.R
is used to calculate multivariate environmental similarity surface;step4_feature_selection.R
is used to identify the optimal feature set;step5_modelling.R
is used for model construction;step6_test_model.R
is used to validate model on the test set; andstep7_future_predict.R
is used to predict global distribution of HPB richness and abundance in the future.df_412_terra_nohuman.csv
anddf0_add_nohuman.csv
are data used for climate change analysis of HPB richness and abundance.
- Folder ending in
EMP
: Analyze using the Earth Microbiome Project (EMP).map_EMP
are used to predict the global distribution of HPB. There are six code files:step1_select_sample.R
is used for data preprocessing;step2_vif.R
is used to calculate variance inflation factor;step3_feature_selection.R
is used to identify the optimal feature set;step4_hyperparameter_tuning.R
is used to optimize the model's hyperparameters;step5_test_model.R
is used to validate model on the test set; andstep6_modelling.R
is used for model construction.train_EMP.csv
is data used for global distribution of HPB richness andlocation.csv
is data related to location information.future_EMP
are used to analyze the impact of climate change on HPB. There are seven code files:step1_extract_variables.R
is used to extract climate variables corresponding to each location;step2_data_process.R
is used for data preprocessing;step3_MESS.R
is used to calculate multivariate environmental similarity surface;step4_feature_selection.R
is used to identify the optimal feature set;step5_modelling.R
is used for model construction;step6_test_model.R
is used to validate model on the test set; andstep7_future_predict.R
is used to predict global distribution of HPB richness and abundance in the future.df_412_terra_nohuman.csv
anddf0_add_nohuman.csv
are data used for climate change analysis of HPB richness and abundance.
Code/Software
The data analysis was mainly conducted using R (version 4.3.3).
Please note
Please download the map data for analysis from http://bioinfo.qd.sdu.edu.cn/kegg/map_data/map_data.zip using a browser or ftp://202.194.20.63/ by FTP download software, such as FileZilla
. You can also use the File Explorer
on the Windows system to download data.