Data from: Early spread of COVID-19 in Romania


Hancean, Marian-Gabriel; Perc, Matjaz; Lerner, Juergen (2020), Data from: Early spread of COVID-19 in Romania, Dryad, Dataset,


This individual-level dataset describes (a) the early spread of the novel coronavirus (COVID-19) and (b) the first human-to-human transmission networks, in Romania. Specifically, in the first set of data (a), we profile the first 147 cases referring to: whether an individual is an index case, place of residence, sex, age, probable citizenship, probable country and place of infection, arrival date to a Romanian county, COVID-19 confirmation date as well as the sources of information. Also, the second set of data (b) contains the first observed human-to-human COVID-19 transmission networks (attributes of the nodes and the direction of COVID-19 transmission, i.e. who infects whom). Networks embed 159 nodes and 203 transmission ties. Indirect identifiers are masked / de-identified. 


The dataset was collected from the Romanian Ministry of Health communiqués. Data from official statements was supplemented with information reported by Romanian local media. This strategy was deemed to improve the quality and accuracy of the information communicated by the Romanian officials. Every reported individual case (microdata) is assigned online public sources which can be subsequently accessed for further details. The level of data granularity prevents any form of disclosing and tracking the infected persons. Additionally,  indirect identifiers are masked / de-identified. The first COVID-19 confirmed case in the dataset is on February 22, 2020, while the last one is on April 2, 2020. 

We employed the following case selection method: firstly, we started by selecting for each Romanian county, the first patients (index cases). Afterwards, we continued by selecting all publicly available individual cases officially reported on the territory of Romania. When the official Romanian authorities restricted public access to COVID-19 infected patients, we stopped the data collection procedure. Human-to-human transmission networks were built by scanning, in the available official data, for infection chains (since February 22, 2020, and as of March 20, 2020). The process was driven by the condition that both the source and the target of a chain are officially COVID-19 confirmed cases. 

The dataset is made up of three sets of information: (i.) attributes of the first 147 COVID-19 confirmed cases in Romania; (ii.) attributes of the nodes embedded in the COVID-19 human-to-human transmission networks; (iii.) transmission ties (arrows) illustrating who infects who. The first two sets of information are in a rectangular format (case by variable), while the third set of information is in a square matrix format (an adjacency binary matrix of 159 by 159 nodes). A description of the variables necessary for a potential re-use of the whole dataset is available as a ReadMe file (see "HanceanMG PercM LernerJ Early spread of COVID19 in Romania_Readme.txt") or in the "COVID-19_ROMANIA_variabile_description.xlsx" file. We did not apply any missing data imputation technique in the dataset.     


Unitatea Executiva pentru Finantarea Invatamantului Superior, a Cercetarii, Dezvoltarii si Inovarii, Award: PN-III-P1-1.1-TE-2016-0362

Javna Agencija za Raziskovalno Dejavnost RS, Award: J4-9302

Javna Agencija za Raziskovalno Dejavnost RS, Award: J1-9112

Javna Agencija za Raziskovalno Dejavnost RS, Award: P1-0403

Deutsche Forschungsgemeinschaft, Award: LE 2237/2-1