Skip to main content

Data from: Integrating niche and occupancy models to infer the distribution of an endemic fossorial snake (Atractus lasallei)

Cite this dataset

Parra, Juan; Toro-Cardona, Felipe; Cruz-Arroyave, Camilo (2024). Data from: Integrating niche and occupancy models to infer the distribution of an endemic fossorial snake (Atractus lasallei) [Dataset]. Dryad.


Understanding species distribution and habitat preferences is crucial for effective conservation strategies. However, the lack of information about population responses to environmental change at different scales hinders effective conservation measures.  In this study, we estimate the potential and realized distribution of Atractus lasallei, a semi-fossorial snake endemic to the northwestern region of Colombia. We modelled the potential distribution of A. lasallei based on ecological niche theory (using maxent), and habitat use was characterized while accounting for imperfect detection using a single-season occupancy model. Our results suggest that A. lasallei selects areas characterized by slopes below 10°, with high average annual precipitation (>2500mm/year) and herbaceous and shrubby vegetation. Its potential distribution encompasses the northern Central Cordillera and two smaller centers along the Western Cordillera, but its habitat is heavily fragmented within this potential distribution. When the two models are combined, the species’ realized distribution sums up to 935 km2, highlighting its vulnerability. We recommend approaches that focus on variability at different spatio-temporal scales to better comprehend the variables that affect species’ ranges and identify threats to vulnerable species. Prompt actions are needed to protect herbaceous and shrub vegetation in this region, highly demanded for agriculture and cattle grazing.

README: Supporting data: Integrating niche and occupancy models to infer the distribution of an endemic fossorial snake (Atractus lasallei)

Description of the data and file structure

The data presented here include all input data to run the niche and occupancy models.

Niche Modelling

This includes the training (A_lasallei_train2.csv), testing (A_lasallei_test2.csv) and joint (A_lasallei_joint2.csv) data sets of georeferenced presence records for Atractus lasallei. All three files contain three columns each, the first refers to the species name, the second and third to Longitude and Latitude in decimal degrees.

In addition, the two sets of predictor variables in ascii format already masked for the area of accesibility used for modeling (set_1 and set_2).  


02convergenceFinal_set1.asc: Topographic convergence (unitless)

03ctiFinal_set1.asc: Compound Topographic Index (ln(squared meters/tan(slope in degrees))

04slopeFinal_set1.asc: Terrain slope (percent change in meters that occurs along a 100 m stretch)

05o_carbonFinal_set1.asc: Organic carbon in soil (dg/kg)

07evapotransFinal_set1.asc: Evapotranspiration (mm per month)

08precipitationFinal_set1.asc: Average annual precipitation (mm)

11t_seasonality_soilFinal_set1.asc: Temperature seasonality (Standard deviation of monthly mean Temperature (°C) * 100, ground-level)

12t_max_soilFinal_set1.asc: Maximum temperature of the warmest month (°C, ground-level)


02convergenceFinal_set2.asc: Topographic convergence (unitless)

03ctiFinal_set2.asc: Compound Topographic Index (ln(squared meters/tan(slope in degrees))

04slopeFinal_set2.asc: Terrain slope (percent change in meters that occurs along a 100 m stretch)

05o_carbonFinal_set2.asc: Organic carbon in soil (dg/kg)

07evapotransFinal_set2.asc: Evapotranspiration (mm per month)

08precipitationFinal_set2.asc: Average annual precipitation (mm)

17t_seasonalityFinal_set2.asc: Temperature seasonality (Standard deviation of monthly mean Temperature (°C) * 100)

18t_maxFinal_set2.asc: Maximum temperature (°C)

Occupancy Modelling

All files necessary to run occupancy analyses are included. The file named detection histories (detection_history.csv), the occupancy covariates (v_ocupacion_Final.csv) and the four detection covariates (v_detec_hour.csv, v_detec_N_obj.csv, v_detec-Soil_Moisture.csv, and v_detec-T_ground_.csv)

detection_history.csv: Each file contains the number assigned to each site (first column) and the detection history for 5 visits (v1,v2,...,v5).

v_ocupacion_Final.csv: Each row contains all the covariates for each site. The columns contain:

  •    sitio: sampling site
  •    site_owner: Name of person who owns land.
  •    soil_use: Whether land is protected (conservation) or used for cattle raising.
  •    cattle_raising: Same as previous column but in binary format (0:conservation, 1:cattle raising).
  •    plant_cover: Type of vegetation cover (BF: Forest fragment, VSB: Lower secondary vegetation, VSA: Taller secondary vegetation, BR: Riparian forest,       P: grassland)
  •    BF: Forest fragment in binary format.
  •    VSB: Lower secondary vegetation in binary format.
  •    VSA: Taller secondary vegetation in binary format.
  •    BR: Riparian Forest in binary format.
  •    P: Grasslands in binary format.
  •    slope: rate of change of elevation in the direction of the water flow line expressed as percentage of vertical displacement over 100m.
  •    convergence: The convergence index ranges from −100 for ridges to +100 for sink areas and 0 for planar or flat areas.
  •    itc: Compount topographic index, calculated as the logarithm of the cumulative upstream catchment area divided by the tangent of the local slope       angle.
  •    t_min: Minimum temperature of the coldest month (°C, ground_level). 
  •    t_prom: Annual mean soil temperature (°C, ground_level). 
  •    t_max: Maximum temperature of the warmest month (°C, ground_level). 
  •    D_water: Distance to water bodies (m).
  •    D_house: Distance to houses (m).
  •    D_forest: distance to forest patches.
  •    Veg_H: Vegetation height - 2019 (m).
  •    Leaf_Dep: Leaf litter depth (cm).
  •    Hori0: Depth of the zero horizon (cm).
  •    Veg_H_2019: Vegetation height (m). 
  •    Veg_H_2019_buffer50m: Mean Vegetation height at a 50m buffer (m).
  •    Veg_H_2019_buffer100m: Mean Vegetation height at a 100m buffer (m).
  •    Veg_H_2019_buffer200m: Mean Vegetation height at a 200m buffer (m).
  •    Veg_H_2019_buffer300m: Mean Vegetation height at a 300m buffer (m).
  •    Veg_H_2019_buffer400m: Mean Vegetation height at a 4000m buffer (m).
  •    Veg_H_2019_buffer500m: Mean Vegetation height at a 500m buffer (m).
  •    Veg_H_2019_buffer1000m: Mean Vegetation height at a 1000m buffer.
  •    ocu: Detected (1) or non-detected (0).

v_detec_hour.csv: Hour of detection for each site visit. When organism was not detected, a random hour was assigned. 

v_detec_N_obj.csv: Number of cover objects at each visit. 

v_detec-Soil_Moisture.csv: Soil moisture at each visit. 

v_detec-T_ground_.csv: Ground Temperature at each visit. 

The variable forest height (Forest_height_2019_SAM_M.tif) was used to determine mean canopy height at each site. and apply the model to the whole region of interest. 


Script with all the code in R to perform the niche models is provided:

A.lasallei.RMD: R script to run niche models using kuenm. All input files necessary to use the script are included and named the same way they appear in the script. 


Ecological Niche Model and Potential Distribution

Presence data were acquired from three sources: 1) specimens from biological collections, obtained from the Global Biodiversity Information Facility (accessed 22 March 2022) [35] and most of them revised in situ in the following collections: MHUA-Museo de Herpetología Universidad de Antioquia (Curator: J.M. Daza-Rojas), CSJ-h-Museo de Ciencias Naturales de La Salle (D.Z. Urrego), CBUCES-D-Colecciones Biológicas Universidad CES (J.C. Duque); 2) iNaturalist records obtained directly from them (accessed March-May 2022; we did not use iNaturalist records from GBIF) by searching Atractus records in the northwest of Colombia that included pictures that allowed verification through morphology (coloration patterns, and scale counts when possible), and 3) individuals encountered during the field phase for occupancy models. Identification of individuals was based on the original species description and taxonomic revisions of the genus [28, 33]. Further, a geographical filter was applied to presence records that were within 1 km of each other to reduce spatial autocorrelation [36, 37].

We used the final database to delimit the species accessible area or M [38] based on the intersection between the minimum convex polygon generated with 50 km buffers around each presence record using QGIS (v.3.10) [39], and the biogeographic regions of the northern Central Cordillera and the Western Cordillera of Colombia [40] (Fig 1). The environmental variables for the niche model included topographic variables, atmospheric climate variables including temperature corrected to ground level [41] and soil variables [42](S1). Climate variables represent long-term averages (1980-2010 in the case of atmospheric variables, and 2000-2020 in the case of ground-level temperature; see S1). These variables were selected based on previous research findings regarding the distribution of semi-fossorial reptiles [25, 43-45]. All variables were used in the models at a spatial resolution of 1 km. Variables with finer resolutions were resampled using the bilinear method, with the R-package “raster” v.3.5 [46] in R v.4.2.1 [47]. Subsequently, a Spearman correlation test (S2) was conducted to select non-correlated variables (< 0.8), using R-package “corrplot” v.0.92 [48]. Finally, two sets of variables were created, one that included two ground-level temperature variables estimated at five centimeters above the ground (S1) [41], and the second included the same two variables but measured at atmospheric level [49]. Models were calibrated with each data set independently, ensuring all variables used were not correlated.

The ecological niche model was generated using the maximum entropy algorithm [15] through the R-package “kuenm” [50]. This methodology allows the evaluation of different sets of environmental variables (set 1 and set 2, S1) and various model parameterizations to ultimately identify the best model according to a set of criteria. We allowed the regularization parameter to vary from 0.1 (very strict in relation to observed values) to 5 (more flexible in relation to observed values), where 1 is the default value. We also evaluated across linear (l), quadratic (q), and linear-quadratic (lq) responses. The models were trained using a 20% random partition of the occurrence data for model evaluation. The evaluation criteria included omission rate (<5%), partial receiver operating characteristic (partial ROC), area under the curve (AUC ratio>1), and the Akaike Information Criterion corrected for sample size (AICc) [51]. In case several models achieved the evaluation criteria, we performed a consensus model using the median of the selected models. Finally, to obtain the geographic projection, a cutoff threshold was applied using the 10 percentile training presence criteria from the best model(s) to generate a presence/absence map. 

Occupancy Models

To identify fine-scale factors influencing the occupancy of A. lasallei within its known distribution area, single-season occupancy models were employed [24]. The sampling design followed the recommendations of a previous study for semi-fossorial species [52], wherein 30 linear transects of 100 m x 2 m were established within the sampling area, spaced at a minimum of 200 m apart to ensure independence of detection histories across sites (Fig 1). Each transect was equipped with nine artificial cover objects (three roof tiles, three boards, and three plastic sheets), which were installed a minimum of two months prior to sampling for the organisms to habituate to their presence (S3). The transects were surveyed between October 2021 and January 2022 to ensure consistent occupancy status during the sampling period (closed-site assumption) between 8 AM and 4 PM. Surveys involved searching beneath leaf litter and under cover objects (both artificial and natural). Each transect was surveyed a minimum of four times, with visits spaced at least two weeks apart to satisfy the assumption of temporal independence. Animals were photographed and examined in the field to ensure correct identification (Approval Act No. 138, February 9, 2021, granted by the Committee on Ethics for Animal Experimentation, Universidad de Antioquia).

Occupancy models were constructed using the R-package “unmarked” (v.1.2.5) [53] implemented in the R software. All covariates were standardized (mean=0, units in standard deviations) prior to modelling. To identify the best models, we first established the best detection model assuming constant occupancy, and then we used this detection model in all occupancy models [54]. To model detectability, we included as covariates, the number of cover objects, both natural and artificial (N_obj); vegetation height (Veg_H) [55]; soil moisture (Soil_moisture); and soil temperature (T_ground), both measured using a HOBO proV2 datalogger beneath a roof tile or under the object where an individual of the species was located at the time of each visit. As covariates for occupancy, we used vegetation height (Veg_H) [55]; terrain slope (Slope); topographic convergence (Con); compound topographic index (CTI) [56]; annual mean soil temperature (Tprom), maximum temperature of the warmest month (Tmax), and minimum temperature of the coldest month (Tmin)[41]; depth of leaf litter (Leaf_Dep) and depth of the 0 horizon (Hori0), both measured in the field using a soil auger; euclidean distance to the nearest house (D_house), nearest forest (D_forest), and nearest water body (D_water). These distances were estimated in QGIS [39], identifying the nearest houses and forests to the centroid of each transect using satellite imagery from GoogleEarth ( To calculate the distance to water bodies, it was necessary to construct a detailed hydrographic network for the area using a 12.5 m resolution DEM obtained from Alaska vertex (, utilizing the hydrology toolbox in ArcGIS Pro (v.2.7) [57].

A total of 87 biologically plausible and simple models were evaluated, each including one or two variables (S4), 20 of the models were for the detection component with constant occupancy, and the remaining models were for the occupancy component. Finally, to evaluate model fit to our data, we performed a parametric bootstrap test on the chosen model, using the parboot function of R package “unmarked” v.1.4.1 [53]. This test generates multiple sets of data iteratively from the best model and then compares these sets with the detection histories obtained in the field. A chi-squared test was employed to evaluate the null hypothesis that the observations are consistent with the proposed model.

Integration of models

To estimate the species’ realized distribution area [58], we used the binary (presence-absence) geographic projection from the consensus niche model to identify the areas where the macro conditions were suitable and applied the best occupancy model within those areas at a higher spatial resolution (0.00025°  27 meters). Finally, the resulting map was transformed into a binary outcome using a threshold of 0.78, based on the Q3 (third quartile) of the occupancy distribution values of that map; this threshold corresponds to 4 m of vegetation height according to the best occupancy model (Fig 2), which is biologically justified if we consider that all presence records obtained in the field phase were found in places with vegetation below 4 m.


15. Phillips SJ, Anderson RP, Schapire RE. Maximum entropy modeling of species geographic distributions. Ecol. Modell. 2006;190: 231–259.
24. Mackenzie DI, Nichols JD, Lachman GB, Droege S, Royle JA, Langtimm CA. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecol. 2002;83: 2248–2255.[2248:ESORWD]2.0.CO;2
25. González-Fernández A, Manjarrez J, García-Vázquez U, D’Addario M, Sunny A. Present and future ecological niche modeling of garter snake species from the Trans-Mexican Volcanic Belt. PeerJ. 2018;6: 1–20.
28. Do Amaral A. Studies of neotropical Ophidia. XXVI. Ophidia of Colombia. Bull. Antivenin Inst. America. 1931;4: 87–88.
33. Passos P, Arredondo JC, Fernandes R, Lynch JD. Three new Atractus (Serpentes: Dipsadidae) from the Andes of Colombia. Copeia. 2009;3: 425–436. 
35. GBIF Occurrence Download. 22 March 2022. [Available from:]
36. Chapman AD. Principles and Methods of Data Cleaning – Primary Species and Species- Occurrence Data, version 1.0. Copenhagen, Report for the Global Biodiversity Information Facility. 2005.
37. Cobos ME, Jiménez L, Nuñez-Penichet C, Romero-Alvarez D, Simoes M. Sample data and training modules for cleaning biodiversity information. Biodivers Inform. 2018;14: 49–50.
38. Soberón J, Townsend Peterson A. Interpretation of models of fundamental ecological niches and species’ distributional areas. Biodivers Inform. 2005;2: 1–10.
39. QGIS 3.10 Geographic Information System. QGIS Association. 2022.
40. Hazzi NA, Moreno JS, Ortiz-Movliav C, Palacio RD. Biogeographic regions and events of isolation and diversification of the endemic biota of the tropical Andes. Proc Natl Acad Sci USA. 2018;115: 7985–7990.
41. Lembrechts JJ, van den Hoogen J, Aalto J, Ashcroft MB, De Frenne P, Kopecký M, et al. Global maps of soil temperature. Global Change Biol. 2021;28: 3110–3144.
42. Poggio L, de Sousa LM, Batjes NH, Heuvelink GBM, Kempen B, Ribeiro E, et al. SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty. SOIL. 2021;7:217–240.
43. Ferreira-Silva C, Ribeiro SC, De Alcantara EP, Ávila RW. Natural history of the rare and endangered snake Atractus ronnie (Serpentes: Colubridae) in northeastern Brazil. Phyllomedusa. 2019;18: 77–87.
44. Passos P, Martins A, Pinto-Coelho D. Population Morphological Variation and Natural History of Atractus potschi (Serpentes: Dipsadidae) in Northeastern Brazil. South Am J Herpetol. 2016;11: 188–211.
45. Sunny A, González-Fernández A, D’Addario M. Potential distribution of the endemic imbricate alligator lizard (Barisia imbricata imbricata) in highlands of central Mexico. Amphib-reptil. 2017;38: 225–231.
46. Hijmans RJ. Raster-R package (3.5-21). 2022.
47. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.; 2023.
48. Wei, T., Simko, V. corrplot-R package (0.92).; 2021.
49. Karger, DN, Lange, S, Hari, C, Reyer, CPO, Zimmermann, NE. CHELSA-W5E5 v1.0: W5E5 v1.0 downscaled with CHELSA v2.0. ISIMIP Repository.; 2022.
50. Cobos ME, Peterson AT, Barve N, Osorio-Olvera L. Kuenm: An R package for detailed development of ecological niche models using Maxent. PeerJ. 2019;7: 1–15.
51. Peterson, AT, Papeş, M, Soberón, J. Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecol Model. 2008; 213: 63–72.
52. Ward RJ, Griffiths RA, Wilkinson JW, Cornish N. Optimizing monitoring efforts for secretive snakes: A comparison of occupancy and N-mixture models for assessment of population status. Sci Rep. 2017;7:1–12.
53. Fiske I, Chandler R. unmarked: An R Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance. J Stat Softw. 2011;43: 1–23.
54. Andrade-Ponce G, Cepeda-Duque JC, Mandujano S, Velásquez-C KL, Lizcano DJ, Gómez-Valencia B. Modelos de ocupación para datos de cámaras trampa. MaNo. 2021;7: 200.
55. Potapov P, Li X, Hernandez-Serna A, Tyukavina A, Hansen MC, Kommareddy A, et al. Mapping global forest canopy height through integration of GEDI and Landsat data. Remote Sens. Environ. 2020;253: 112165.
56. Amatulli G, Mclnerney D, Sethi T, Strobl P, Domisch S. Geomorpho 90m, empirical evaluation and accuracy assessment of global high- resolution geomorphometric layers. Sci Data. 2020;7: 1–18.
57. ESRI. ArcGIS pro (2.7). Hydrology Toolbox. Redlands, CA; 2022.
58. Jiménez-Valverde A, Lobo JM, Hortal J. Not as good as they seem: The importance of concepts in species distribution modelling. Divers. Distrib. 2008;14: 885–890.