Data from: Predicting disease risk areas through co-production of spatial models: the example of Kyasanur Forest Disease in India’s forest landscapes
Data files
Mar 17, 2020 version files 512.30 KB
Abstract
This data package includes spatial environmental and social layers for Shivamogga District, Karnataka, India that were considered as potential predictors of patterns in human cases of Kyasanur Forest Disease (KFD). KFD is a fatal tick-borne viral haemorrhagic disease of humans, that is spreading across degraded forest ecosystems in India. The layers encompass a range of fifteen metrics of topography, land use and land use change, livestock and human population density and public health resources for Shivamogga District across 1km and 2km study grids. These spatial proxies for risk factors for KFD that had been jointly identified between cross-sectoral stakeholders and researchers through a co-production approach. Shivamogga District is the District longest affected by KFD in south India. The layers are distributed as 1km and 2km GeoTiffs in Albers equal area conic projection. For KFD, spatial models incorporating these layers identified characteristics of forest-plantation landscapes at higher risk for human KFD. These layers will be useful for modelling spatial patterns in other environmentally sensitive infectious diseases and biodiversity within the district.
Methods
Processing of environmental predictors of Kyasanur Forest Disease distribution
This file details the sources and processing of environmental predictors offered to the statistical analysis in the paper. All processing was performed in the raster package [1] of the R program [2] unless otherwise specified, with function names as specified below.
Topography predictors
Elevation data was extracted in tiles from Shuttle Radar Topography Mission data version 4 [3] an original resolution of 0.000833 degrees Latitude and Longitude resolution (approximately 90m by 90m grid cells). Tiles were mosaicked across the study region using the merge function. A slope value for each pixel was calculated (in degrees) using the terrain function of the raster package, and a focal window of 3 by 3 cells. Both the resulting elevation and slope rasters were cropped to the administrative boundaries of the Shivamogga District (raster package: crop function) and re-projected to an equal area projection (Albers equal area conic projection) using the projectRaster function (method=”bilinear”). Mean elevation and slope values were then calculated across the study 1km and 2km grid cells, using the aggregate function to average values across the appropriate number of ~90m grid cells and then the resample function to align the resulting grid to the study grids.
Landscape predictors
Metrics of the current availability (and fragmentation) of forest, agricultural and built-up land use types as well as that of water-bodies were extracted from the MonkeyFeverRisk Land Use Land Cover map of Shimoga. The latter was produced from classification of earth observation data from 2016 to 2018 using the methods described in the Supplementary information S3 file of the paper linked to this dataset. The LULC map had an original grid square resolution of 0.000269 degrees Latitude and Longitude resolution (or 30m x 28m grid cells) and nine different LULC classes. It was cropped to the administrative boundaries of the Shimoga District (raster package: crop function) and re-projected to the equal area projection (Albers equal area conic projection) using the projectRaster function (method=”ngb” for categorical data). The agriculture and fallow land classes were combined before landscape analysis (due to the difficulty of separating them accurately in the classification process).
An algorithm was developed in R to identify which of the pixels in the LULC map coincided with each 1km and 2km grid cell of the study area. The ClassStat function of the SDM Tools package [4] was used to calculate the proportional area of each 1km or 2km grid cell landscape that was made up of a particular land class, as well patch density and edge density metrics for the forest classes as indicators of fragmentation and forest-agriculture interface habitat respectively (Fig. S2B). The proportional area values (pi) of the n different forest classes (wet evergreen forest, moist deciduous forest, dry deciduous forest and plantation) were used to calculate an index of forest type diversity per grid cell as follows, after Shannon & Weaver (1949) [5]:
H'= -1npi(lognpi)
Metrics of longer term forest changes in Shimoga since 2000 were derived from a global product by Hansen et al. (2013) [6] available at a spatial resolution of 1 arc-second per pixel, (~ 30 meters per pixel at equator). Forest loss during the period 2000–2014, is defined as a stand-replacement disturbance, or a change from a forest to non-forest state, encoded as either 1 (loss) or 0 (no loss). Forest gain during the period 2000–2012, is defined as a non-forest to forest change entirely within the study period, encoded as either 1 (gain) or 0 (no gain).These layers were again cropped to the administrative boundaries of the Shimoga District (raster package: crop function) and re-projected to an equal area projection (Albers equal area conic projection) using the projectRaster function (method=”ngb”) in R. An algorithm was developed in R to identify which of the pixels in the loss and gain rasters coincided with each 1km and 2km grid cell of the study area. The ClassStat function of the SDM Tools package [4] was used to calculate the proportional area of each 1km or 2km grid cell that was made up of loss pixels or gain pixels. Forest gain and loss are very highly correlated (r=0.986) and occur in similar places in the landscape (Fig. S2C). Forest loss was a much more common transition than a forest gain affecting 1.2% of land pixels rather than 0.16% of land pixels for forest gain.
To assess how forest loss or gain from a global product like Hansen et al. (2013) should be interpreted locally in south India, we analysed how the loss and gain pixels from Hansen et al. 2013 coincided with classes in the MonkeyFeverRisk LULC map (by extracting the value of the LULC map for the centroids of loss or gain pixels).
The distribution of loss and gain pixels across forest classes from the MonkeyFeverRisk LULC map is shown in Table 1. Locations categorised as a loss by Hansen et al. were most commonly classified currently as plantation, followed by moist evergreen forest, followed by
moist or dry deciduous forest by the MonkeyFeverRisk LULC map. The pattern was similar for the gain pixels. Since not all forest loss pixels were non-forest in the current day and not all forest gain pixels were forest in the current day, the precise meaning of the Hansen et al. (2013) forest loss layer was unclear for south India, though we expect that it is at least indicative of areas where the forest has undergone a larger degree of change since 2000.
Table 1: Percentage of loss (n= 108398) and gain (n= 14646) land pixels from the global Hansen et al. (2013) product that fall into different forest classes according to the MonkeyFeverRisk LULC map
Land use class |
Gain |
Loss |
moist evergreen |
30.4 |
26.1 |
moist deciduous |
6.5 |
16.2 |
dry deciduous |
3.0 |
9.7 |
plantation |
46.2 |
37.2 |
Non-forest classes |
14.0 |
10.9 |
Host and public health predictors
Livestock host density data, namely buffalo and indigenous cattle densities in units of total head per village were obtained from Department of Animal Husbandry, Dairying and Fisheries, Government of India Census from 2011 at village level. These were linked to village boundaries from the Survey of India using the village census codes in R. The village areas were calculated from the spatial polygons dataframe of villages using the rgeos package in R, so that the total head per village metrics could be convert into an areal density of buffalo and indigenous cattle per km and then rasterized at 1km and 2km using the rasterize function of the raster package.
The human population size and public health metrics were obtained from the Government of India Population Census 2011. The human population size (census field TOT_P) was again linked to the spatial polygon village boundaries using the census village code (census field VCT_2011) and converted to an areal metric of population density per km and rasterized at 1km and 2km as above. The number of medics per head of population was derived by summing all doctors and para-medicals “in position” across all types of health centres, clinics and dispensaries per village and dividing by the total population of the village (TOT_P) and then linked to village boundaries and rasterized as above. The proximity to health centres was a categorical variable derived from the “Primary.Health.Centre..Numbers” field, where 1 = Primary Health Centre (PHC) within village boundary, 2 = PHC within 5km of village, 3=PHC within 5-10km of village, 4= PHC further than 10km from village. It was linked to village boundaries and rasterized as above.
The resulting raster layers for all predictors were saved in GeoTiff format.
References
- Robert J. Hijmans (2017). raster: Geographic Data Analysis and Modeling. R package version 2.6-7. https://CRAN.R-project.org/package=raster
- R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.URL https://www.R-project.org/
- Jarvis, A., Reuter, I., Nelson, A., Guevara, E. Hole-filled SRTM for the globe Version 4. 2008.
- VanDerWal, J., Falconi, L., Januchowski, S., Shoo, L., & Storlie, C. (2014). SDMTools: Species Distribution Modelling Tools: Tools for processing data associated with species distribution modelling exercises. R package version 1.1-221. https://CRAN.R-project.org/package=SDMTools
- Shannon, C. E., and Weaver, W., 1949. The Mathematical Theory of Communication. Urbana: University of Illinois Press.
- Hansen, M. C., P. V. Potapov, R. Moore, M. Hancher, S. A. Turubanova, A. Tyukavina, D. Thau, S. V. Stehman, S. J. Goetz, T. R. Loveland, A. Kommareddy, A. Egorov, L. Chini, C. O. Justice, and J. R. G. Townshend. 2013. “High-Resolution Global Maps of 21st-Century Forest Cover Change.” Science 342 (15 November): 850–53. Data available on-line from: http://earthenginepartners.appspot.com/science-2013-global-forest, accessed November 2017.
- Bivand, R. & Rundel, C. (2018). rgeos: Interface to Geometry Engine - Open Source ('GEOS'). R package version 0.3-28. https://CRAN.R-project.org/package=rgeos
Usage notes
The sixteen layers (15 environmental and social layers and 1 land sea mask) are provided at a 1km resolution within the 1km_layers.zip folder.
The sixteen layers (15 environmental and social layers and 1 land sea mask) are provided at a 2km resolution within the 2km_layers.zip folder.
A guide to the file names, type, description and units of the layers are provided within these folders, named "Guide_to_layers_1km.xlsx" and "Guide_to_layers_2km.xlsx" respectively.
The projection of the layers is Albers equal area conic projection, for which the proj4 definition is: "+proj=aea +lat_1=7 +lat_2=-32 +lat_0=-15 +lon_0=125 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +towgs84=0,0,0".
The extents of the 1km and 2km grids are as follows:
2km grid:
class : Extent
xmin : -5674039
xmax : -5548039
ymin : 2734116
ymax : 2858116
1km grid:
class : Extent
xmin : -5674039
xmax : -5549039
ymin : 2734116
ymax : 2858116