Standardized incidence ratio dataset of human West Nile Virus in Italy (2012-2024)
Data files
May 26, 2025 version files 1.52 MB
-
DryadWNV.zip
1.51 MB
-
README.md
5.98 KB
Abstract
West Nile virus (WNV) is primarily transmitted by Culex pipiens mosquitoes feeding on infected birds, while humans and other mammals serve as dead-end hosts. Although first detected in horses in Italy in 1998, human cases only emerged after the implementation of a multispecies surveillance program in 2001, with a major outbreak occurring in 2008 in northern regions.
In this dataset, we computed the Standardized Incidence Ratio (SIR) for human West Nile virus (WNV) cases at the provincial level in Italy over the period 2012–2024.
The data are provided in a CSV file, where each row corresponds to a province that reported at least one confirmed case during the study period, and each column contains annual SIR values. Provinces with no reported cases in a given year are marked as NA. Additionally, for each province, we calculated summary statistics including the temporal gradient, mean, standard deviation, maximum, and minimum SIR values, based on the available data.
Dataset DOI: 10.5061/dryad.95x69p8x3
We have computed the Standardized Incidence Ratio (SIR) of West Nile Virus (WNV) on humans, at Italian provincial level, covering the period (2012-2024).
The SIR is an epidemiological metric used to compare the observed number of cases in a study population to the number expected based on a reference population. It accounts for differences in population size and structure, and is calculated as the ratio between observed and expected cases. An SIR greater than 1 indicates a higher-than-expected incidence, while a value below 1 indicates a lower-than-expected incidence.
Since the WNV is not present or recorded in several provinces, We considered as reference population the set of all provinces which recorded at least one positive case, in order to avoid huge variation of the SIR.
Computation of Standardized Incidence Ratio (SIR) of West Nile Virus (WNV) on humans, at Italian provincial level (2012-2024).
SIR = Oₚ / Eₚ
where:
- Oₚ is the observed number of positive cases in the study population;
- Eₚ is the expected number of cases, estimated using age-specific incidence rates from a reference population.
Calculation of Eₚ:
Eₚ = ∑ (Rᵢʳ * nᵢ), for i = 1, ..., k
where:
- Rᵢʳ is the age-specific incidence rate in the reference population for age group i;
- nᵢ is the size of the study population in age group i;
- k is the total number of age groups considered.
The age-specific incidence rate Rᵢʳ is computed as:
Rᵢʳ = Pᵢ / Nᵢ
where:
- Pᵢ is the number of observed positive cases in age group i in the reference population;
- Nᵢ is the population size in age group i in the reference population.
Commonly used age groups (may vary depending on available data):
- 0–4 years
- 5–14 years
- 15–24 years
- 25–44 years
- 45–64 years
- 65–74 years
- 75+ years
Interpretation:
- SIR > 1: Higher-than-expected incidence in the study population;
- SIR < 1: Lower-than-expected incidence;
- SIR = 1: Observed incidence matches expected levels.
File: DryadWNV.zip
We created a .zip folder named "DryadWNV.zip", which contains 2 different objects: a folder named "Auxiliary_data", in which there are all the variables used to compute the SIR values, and the final dataset obtained, in .csv format, named "sir_tot.csv".
File: sir_tot.csv
This file contains the SIR values computed using the Auxiliary data the R code.
- Province: Name of all the Italian provinces, as presented on the ISTAT website.
- sir_20* *: SIR values for the specific Province and specific year.
- Gradient: Slope of the interpolation line of all available SIR values for each Province.
- Mean: Average SIR values for a specific Province, across all available years.
- Sd: Standard deviation of SIR values for a specific Province, across all available years.
- Max: Max SIR values for a specific Province, across all available years.
- Min: Min SIR values for a specific Province, across all available years.
Folder: Auxiliary_data
This folder contains a file called "Ni.xlsx" and thirteen .csv file, each one starting with "wn-ita-provinces-human-surveillance-20**.csv", covering all the years mentioned above.
File: Ni.xlsx
This dataset contains the resident population at provincial level at specific age, for all the year considered. The file is organized in sheets, each one regarding a specific year. The data of each year can be found on the ISTAT (Istituto Nazionale di Statistica) website [1].
- Province: Name of all the Italian provinces, as presented on the ISTAT website.
-
- *-year *: Population at the specific age, from 0 to 100+ year.
- total: Total number of the resident population in he specific province.
Files: wn-ita-provinces-human-surveillance-20**.csv
These datasets contains the information provided by the ISS (Istituto Superiore di Sanità) regarding the WNV cases, organized by Mingione, Marco, et al. [3] and available on the GitHub page [2]. Each file contains the same columns.
- url_bulletins: Web link that points to the specific bulletin published by ISS.
- data: the date of the reference latest bulletin of that year.
- code_region: ID code for each Italian region, compliant with ISTAT system.
- name_region: Full name of each italian region.
- code_province: ID code for each Italian province, compliant with ISTAT system.
- name_province: Full name of each italian province.
- abbreviation_province: Abbreviation of each Italian province, compliant with ISTAT system.
- lat: Centroid latitude of the province.
- long: Centroid longitude of the province.
- age: Age group of the infected people.
- new_cases: New cases recorded for that specific province and specific year.
- total_cases: Cumulative distribution of positive population for that specific province.
- type_infection: Type of infection, neuroinvasive or fever.
Code/software
The software is R, version 4.3.1, and the packages used are "readxl" and "dplyr".
Initially, a function called calculate_sir is developed, which requires the paths to the auxiliary data stored in the Dryad file. Once the function is executed for each year, the resulting data are merged into a single dataframe. Subsequently, statistical metrics such as gradient, mean, standard deviation, minimum, and maximum are computed and appended to the dataframe. Finally, the dataset is saved in CSV format.
Data was derived from the following sources:
[1] ISTAT data: http://dati.istat.it/
[2] Auxiliary data: https://github.com/fbranda/west-nile/tree/main
References:
[3] Mingione, Marco, et al. "Monitoring the West Nile virus outbreaks in Italy using open access data." Scientific Data 10.1 (2023): 777.
