Dengue incidence and climatic variables in Cali from 2015 to 2021
Data files
May 06, 2024 version files 92.57 KB
-
Data_reporting.xlsx
90.94 KB
-
README.md
1.63 KB
Abstract
In this work we studied the relationship between dengue incidence in Cali and the climatic variables that are known to have an impact on the mosquito and were available (precipitation, relative humidity, minimum, mean, and maximum temperature). Since the natural processes of the mosquito imply that any changes on climatic variables need some time to be visible on the dengue incidence, a lagged correlation analysis was done in order to choose the predictor variables of count regression models. A Principal Component Analysis was done to reduce dimensionality and study the correlation among the climatic variables. Finally, aiming to predict the monthly dengue incidence, three different regression models were constructed and compared using de Akaike information criterion. The best model was the negative binomial regression model, and the predictor variables were mean temperature with a 3-month lag and mean temperature with a 5-month lag as well as their interaction. The other variables were not significant on the models. And interesting conclusion was that according to the coefficients of the regression model, a 1°C increase in the monthly mean temperature will reflect as a 45% increase in dengue incidence after 3 months. The rises to a 64% increase after 5 months.
https://doi.org/10.5061/dryad.0zpc8675h
Data reporting contains monthly dengue incidence in Cali from 2015 to 2021 and measurements of climatic variables related to the vectors biological processes.
The following are the climatic variables present in the data set:
Precipitation (january 2015 - december 2021) in mm^3
Mean temperature (january 2005 - february 2023) in °C
Maximum temperature (january 2005 - febreuary 2023) in °C
Minimum temperature (january 2005 - february 2023) in °C
Relative humidity (january 2010 - january 2021) in %
RandomForestRegression_Mean_temp is a Python notebook that was used for the data imputation.
Description of the data and file structure
Data reporting is an excel file where all the data can be found. Each variable has its own sheet along with the date of recording for each observation. Missing values are coded with #N/A. The last sheet “imputed data” is a data set where the missing values were imputed using a random forest regression model.
RandomTreeRegressor_Mean_temp is a python notebook that was used for the data imputation. It contains comments to guide through the usage. It was design to do one variable at a time, so in order to replicate the results load a single variable and run the notebook.
Sharing/Access information
Climatic data was downloded from the following sources:
Have in mind that the web page is only in spanish.
Monthly dengue incidence data was provided by the Public Health Department of Cali. The climatic data was collected from the Hydrology, Meteorology and Environmental Studies Institute (IDEAM) webpage. Missing data was imputed using a random forest regression model.