Data from: Gaps and spatial trends in the accurate data available on mosquitoes (Diptera, Culicidae) in Brazil: Inventory completeness and priority areas
Data files
Mar 24, 2025 version files 22.21 MB
-
dataset_submission.zip
22.20 MB
-
README.md
13.83 KB
Abstract
Gaps and trends in species distribution knowledge can negatively influence biodiversity studies, emphasizing the need to map these limitations and assess inventory completeness. This study analyzed spatial inventories of Culicidae, insects with high medical relevance, to identify priority research areas in Brazil. Records from 1900-2021 were collected from digital databases and literature, excluding those without scientific names, coordinates, or sampling year. Sampling effort and inventory completeness were assessed across ecoregions, states, and grid cells at 0.5° and 1° size resolution. Metrics analyzed included record counts, the percentage of observed and expected richness ratio (completeness index, Cc), and accumulation curve slope (Cs). Units were classified as “well-surveyed” based on different thresholds, and priority zones were defined based on the last quartile of cells with the greatest distance and climatic uniqueness. A total of 9,899 records from 22 scientific collections and 356 articles highlight comprehensive datasets in Southeast and Amazonas states, with limited data in the Northeast region. The Atlantic Forest contained the most complete information, yet well-surveyed areas covered less than 1% of Brazil. This scenario shows that Brazilian Culicidae inventories are under construction due to low spatial representativeness and sampling biases for vector species, roads, and urban areas. Filling these gaps with new sampling designs will enhance predictions of epidemiological risks and Culicidae species loss, especially in Acre, Pará, West-Amazon, Northeast-Atlantic, Brazilian Diagonal, and Araucaria-Pampean zones.
https://doi.org/10.5061/dryad.gb5mkkx0t
Description of the data and file structure:
Species occurrence records for Brazil were obtained from GBIF, speciesLink, and literature between 1900 and 2021. They were evaluated across Brazilian states, ecoregions, and grid cells (1° and 0.5°) to assess the completeness of inventories, and sampling biases and define priority areas for research. The data and codes are organized into five folders relating to the four main assessments carried out (e.g. obtaining and processing occurrence data; Sampling effort and inventory completeness; Sampling bias and Priority areas characterization) and supplementary material with the spatial presentation of the completeness values (shapefiles).
Files and variables
File: dataset_submission.zip
Description: Culicidae_Brazil_completeness-results
Obs: missing values (“NA”)
File | Description |
---|---|
Primary biodiversity data (DAK) related to Culicidae in Brazil (Material and Methods: Obtaining and processing occurrence data) | |
data_cleaning_dak_analyses_code.txt | Script from data cleaning and DAK analyses (RStudio). The occurrence dataset (Culicidae_Brazil_occurrence_.zip) used is available in the Supplementary Material hosted by Zenodo. |
Completeness analysis (Material and Methods: Sampling effort and inventory completeness) | |
completeness_analyses_code.txt | Script from Sampling effort and inventory completeness (R program) |
DAK_analyses.csv | Final Culicidae dataset from Brazil, including records by states (stateSIGL), ecoregion (IDProv), 1° grid cell (new_id_G1), and 0.5° grid cell (G05_new_id). |
Bias - linegraph and correlation test (Material and Methods: Sampling bias) | |
sampling_bias_code.txt | Script from Sampling bias analysis (RStudio) |
Bias_culicidae_analyses.csv | Final Culicidae dataset from Brazil with distances (m) between the records and roads, waterways, cities, and natural vegetation features that were measured with QGIS: hid_id: waterways id; hid_dist: distance to waterways; rod_id: road id; rod_dist: distance to roads; city_dist: distance to the nearest urban areas; veg_type: vegetation type (6: forest, 10: grassland); vg_dist_cr: distance to the nearest vegetation type |
kernel_dens_state.csv | Dataset with the density values for records, access routes, and urban and natural vegetation areas estimated through kernel interpolation in QGIS for Brazilian states: Id: 0.5° grid cells id; dens_occ: records density; Dens_rod: road density; Dens_hid: waterways density; Dens_urb: urban density; Dens_veg: natural vegetation density; State: states abbreviation |
kernel_dens_ecoreg.csv | Dataset with the density values for records, access routes, and urban and natural interpolation in QGIS for Brazilian ecoregions: Id: 0.5° grid cells id; dens_occ: records density; Dens_rod: road density; Dens_hid: waterways density; Dens_urb: urban density; Dens_veg: natural vegetation density; ecoregion: ecoregion abbreviation |
kernel_dens_categ_reg_eco.csv | Dataset with the number and percentage of 0.5° grid cells distributed between five kernel density categories based on Jenks’ natural breaks, between states and ecoregions: Area: spatial unit (ecoregion or state); Region: Brazilian regions; Unit: abbreviation of states and ecoregions; Dens_cat: density category; N: number of 0.5° grid cells; Dens_perc: percentage of 0.5° grid cells by unit |
Climatic coverage and priority areas (Material and Methods: Priority areas characterization) | |
climatic_coverage_priority_areas_code.txt | Script from Climatic coverage and priority areas analyses (RStudio) |
BR_UF_2020.shp | Shapefile used to obtain the extension of the climate raster files for the Brazilian territory |
grid_05_kolmogorovtest_new.csv | Data set with georeferenced centroids of well-surveyed (known), sampled (sampled), and unknown (unknown) 0.5° grid cells: long_arr: centroid longitude; lat_arr: centroid latitude; grid_05_id: 0.5° grid cell id; knowledge: type of cell |
result_05_knn.csv | Results for Euclidean climatic distance - well-surveyed to other cells: Id: cell id; long_arr: centroid longitude; lat_arr: centroid latitude; new_id: 0.5° grid cell id; dist: climatic distance; which: closest well-surveyed cells |
Supplementary material | |
dak_estados_br_completeness.shp | Results of the completeness analysis for Brazilian states: N: number of occurrences; Sobs: observed number of species; Sexp: expected number of species; Cc: the percentage of the observed Culicidae species against the predicted one; Cs: final slope of the species accumulation curves; N/Sobs: ratio between the number of records and observed species equal to one |
ecoregions_brasil_completeness.shp | Results of the completeness analysis for Brazilian ecoregions: N: number of occurrences; Sobs: observed number of species; Sexp: expected number of species; Cc: the percentage of the observed Culicidae species against the predicted one; Cs: final slope of the species accumulation curves; N/Sobs: ratio between the number of records and observed species equal to one |
grid_1_completeness.shp | Results of the completeness analysis for 1° grid cells: N: number of occurrences; Sobs: observed number of species; Sexp: expected number of species; Cc: the percentage of the observed Culicidae species against the predicted one; Cs: final slope of the species accumulation curves; N/Sobs: ratio between the number of records and observed species equal to one |
Supplemental Information: Culicidae_Brazil_occurrence_ folder (Material and Methods: Obtaining and processing occurrence data) uploaded to Zenodo.
Code/software
All these analyses were performed using R program v. 3.6.2 and QGIS v.3.28.8
We compiled georeferenced records of Culicidae for Brazil for the years 1900 to 2021 from the GBIF and species link repositories and published articles. These data were evaluated across Brazilian states, ecoregions, and grid cells (1° and 0.5°) to access completeness of inventories, sampling biases, and define priority areas for research.