# The effect of urbanisation and local environmental heterogeneity on phenotypic variability of a tropical treefrog

## Cite this dataset

Severgnini, Marcos; Provete, Diogo (2024). The effect of urbanisation and local environmental heterogeneity on phenotypic variability of a tropical treefrog [Dataset]. Dryad. https://doi.org/10.5061/dryad.3xsj3txp2

## Abstract

Urbanisation reduces species richness and change community composition. However, little is known on how the phenotype of organisms with low dispersal ability respond to environmental changes associated with urbanisation in fast urbanizing centres, such as those in the Global South.

Here, we tested how urbanisation rate, local environmental heterogeneity, land surface temperature, and spatial gradients affect phenotypic traits associated with dispersal, resource acquisition, and performance, namely: body size, head shape, and leg length of the Dwarf Treefrog (*Dendropsophus nanus*) using a space-for-time substitution approach.

We took linear measurements from 768 individuals in 21 ponds along an urban gradient in central Brazil. We also measured local environmental variables and summarized them using Hill-Smith Principal Component Analysis. The spatial arrangement of ponds at multiple scales was described using Moran Eigenvector Maps. Those variables were entered into a Structural Equation Model to test their direct and indirect effects on the mean and coefficient of variation (CV) of phenotypic traits. Additionally, we calculated the Scaled Mass Index as a proxy for fitness and estimated the adaptive landscape for body size, size-free leg length, and head shape. We also tested for spatial autocorrelation in traits.

Body size decreased from the periphery to the urban centre, whereas CV of body size and head shape had the opposite pattern. Body size increased, whereas CV of body size and head shape decreased in man-made ponds. The CV of leg length decreased with increasing land surface temperature. The remaining traits were not affected by any predictor variable. None of the traits were spatially autocorrelated. Both body size and head shape were under weak directional selection, but in opposite directions.

Our results suggest that the lack of a clear spatial variation in phenotypic traits can be due to a weak selection, due to a recent, although intense, urbanisation process. In conclusion, eco-evolutionary dynamics in tropical cities seem to have a different pace compared to temperate ones. Our results can contribute to building urban ecological theory that explicitly includes city age, their development, growth rate, and history.

## README: The effect of urbanization and local environmental heterogeneity on phenotypic variability of a tropical treefrog

https://doi.org/10.5061/dryad.3xsj3txp2

This dataset below contains a series of R code, R Markdown, datasets containing morphological measurements of the Dwarf Treefrog *Dendropsophus nanus*, and local environmental variables collected at the same ponds. Their contents and details are given below:

### Description of the data and file structure

#### data_envir.txt

This is a .txt file in tabular format containing both the local environmental variables measured at each pond, their geographical coordinates (`lat`

and `long`

), and the landscape-scale variables quantified from the satallite images.

This is a data.frame containing 58 columns by 22 rows. Columns with 0 values represents zero percentage of urbanisation. Below you'll find an explanation for the data in each column

`ponds`

represents the number of ponds sampled in the study;`lon`

and`lat`

are geographical coordinates in decimal degrees;`hydroperiod`

is a categorical variable with two levels: 'perm' for permanent or 'temp' for temporary;`margin`

is a categorical variable with three levels representing the shape of pond banks, i.e., 'excavated' (man-made ponds) with a well-delimited and upright margin, 'sloping' is shallow in the beginning and deeper to centre, and 'flat' means that the margin and the centre have similar depths;`predator`

is a categorical variable with two levels representing the presence of invertebrate and vertebrate predators. If a ponds has only invertebrate predators the word 'invert' is assigned to it. However, if a ponds has invertebrate (e.g., aquatic insects) and vertebrate predators (i.e., fish), then 'both' is assigned to it;`cattle`

is a categorical variable with two levels; 'yes' for presence or 'no' for absence of cattle around each pond;`float_veg_perc`

is numerical variable representing the percentage of floatting vegetation on the surface of each pond estimated visually;`temp_c`

is a numerical variable representing the temperature in Celcius degrees estimated using a thermometer;`area_m2`

is a numerical variable representing the area measured in squared meters using Google Earth;`depth_ponds_mean_cm`

is a numerical variable representing pond depth in centimeters. Each value is the average of five measurements taken at different places within ponds. Measurements were made using a measuring tape;`elev_mean_m`

is a numerical variable representing the elevation in meters using Google Earth tools;`tsince_urb1985_2021`

is a numerical variable representing the average of percentage of urbanisation from 1985 to 2021 in a 1-km buffer around ponds. Urbanization was quantified using sattelite images and represents the area occupied by roads and buildings;`urb1985-2021_rate`

is a numerical variable in representing urbanisation rate i.e., the percentage of urbanisation of 2021 subtracted from percentage of urbanisation in 1985 and divided by 36 years in a 1-km buffer;`Urb_1985`

to`Urb_2021`

is a numerical variable in representing the percentage of urbanisation of each year in a 1-km buffer;`Urb_1985_500`

is a numerical variable representing percentage of urbanisation of 1985 in a 500-m buffer;`Urb_2021_500`

is a numerical variable representing the percentage of urbanisation of 2021 in a 500-m buffer;`Urb_rate_500`

is a numerical variable representing rate i.e., the percentage of urbanisation of 2021 subtracted from percentage of urbanisation of 1985 and divided by 36 years in a 500-m buffer;`t_mean1985uhi_500`

is a numerical variable representing the mean of urban heat island temperature (see main text for more details) in a 500-m buffer for 1985;`t_mean1985uhi_1000`

is a numerical variable representing the mean of urban heat island temperature (see main text for more details) in a 1-km buffer for 1985.`t_mean2022uhi_500`

is a numerical variable representing the mean of urban heat island temperature (see main text for more details) in a 500-m buffer for 2022;`t_mean2022uhi_1000`

is a numerical variable representing the mean of urban heat island temperature (see main text for more details) in a 1-km buffer for 2022;

Please, refer to the main manuscript to obtain the full names of variables, their scales, and how they were measured in the field.

#### data_urb_anu_cg.txt

This is the .txt file in tabular form (21 columns by 952 rows) that contains all the linear measurements taken from *Dendropsophus nanus* in the field. Additional data for other species are provided, but not used in the current manuscript

`ponds`

is the ID of the pond in which frogs were collected;`ind_qte`

is a numerical data that represents the count of total number of individuals sampled in the study;`ind_seq`

is ID of individuals in a given pond, - 3rd column (ind_seq) is a numerical data that represents the count of total number of individuals sampled in each pond;`ind_planilha`

is a numerical variable that represents the number assined to each individual sampled in field;`species`

is a character variable that represents the scientific name of species sampled in the study;`acron`

is a character variable that represents the abreviation of scientific name of the species;`month`

is a character variable that represents months of field sampling. Monts are abreviated as: 'Jan' for January, 'Feb' for February, 'Mar' for March, 'Apr' for April, 'Nov' for November, 'Dec' for December;`date`

in DD/MM/YY format of sampling,`call`

is a categorical variable with two levels: 'yes' if calls were recorded using a recorder and a semi-direction microphone or 'no' if calls were not recorded;`total_weigth`

is a numerical variable representing the total weigth of a frog in field. Total weigth is a sum of frog weigh and bag weigh (bag in which frog was temporarily stored for weighing);`bag_weight`

is a numerical variable representing bag weigh;`real_weight`

is a numerical variable representing the real weigh of an individual frog, calculated by subtracting`total_weight`

(value given in the Pesola) from`bag_weight`

, which is the weight of the plastic bag used to weight frogs.;`NA`

in columns 10 to 12 means that individuals were not weighed in the field.`svl`

to`foot_length`

are numerical variables representing morphometric measurents taken using a digital caliper 150 mm, following Watters et al. 2016 Zootaxa.`svl`

is the snout-ventral length or body size in frogs.`total_leg_length`

is a numerical variable representing the sum`thigh_length`

,`tibia_length`

, and`tibia_length`

.`head_width`

and`head_length`

are numerical variables representing morphometric measurents taken using a digital caliper 150 mm following Watters et al. 2016 Zootaxa;`head_wid/len`

is a numerical variable representing the ratio between`head_width`

and`head_length`

;`marker_num`

is character variable that represents the number of individual alpha-tag (e.g., E77, E78) or the elastomer used to mark individuals (e.g., 'Elastomero' or 'E') for the mark-recapture technique..`NA`

represents individuals that were not marked

Please, refer to the main manuscript for details on how each measurement was taken.

#### marcos.RData

This is the saved workspace containing all objects created during the data analysis procedure. It's being made available here because it contains the custom-built neighbour network created to represent the connectivity hypothesis between our ponds. This is necessary to conduct spatial matrix selection and building the Moran

### Sharing/Access information

Parts of the data were derived from the following sources:

- Raster files containing data for land use, from which we calculated current urbanisation and urbanisation rate, were obtained from MapBiomas. http://brasil.mapbiomas.org
- LANDSAT Thermal satellite images used to calculate surface temperature and Urban Heat Island were extracted from Earth Explorer. To reproduce the results, one can access the following URL https://earthexplorer.usgs.gov/ and download the same image we used by searching for its ID
. - We used the shape file describing the urban boundaries ("Perímetro Urbano" in Portuguese) of Campo Grande, Mato Grosso do Sul, Brazil. This was obtained from the website of the municipal government here. One can access the files by clickling on the last button "SHAPEFILES(.SHP) clique para ver", extract the .zip, and then looking for the file named
`ANEXO_02_1_PERIMETRO_URBANO_SEDE.shp`

. This is the file that was renamed to`Perimetro_Urbano.shp`

in the associated R code.

### Code/Software

We made available the complete code in `R`

Markdown format that thoroughly describes how we conducted, data handling, exploratory data analysis, and modelling. It contains all our data analysis workflow. This is in the following file: `Complete code.Rmd`

. We also provide the compiled Markdown file in html format. The R packages and their versions are described in the beginning of the file. The code also includes multi-scale, eigenfunction-based spatial modelling (Moran's Eigenvector Maps, MEMs).

To plot the MEMs and spatial distribution of traits in the urban limits, we used the function `sr.value.R`

that is s slight modification of the original `ade4::s.value`

function, and is provided as a supplementary material by Borcard et al. 2020 Numerical Ecology with R, 2nd ed. This can be downloaded from the book website here.

Finally, we're providing the file `pond_buffers_extract.R`

that describes how we processed the time series of raster files obtained from MapBiomas to calculate the rate of urbanisation in a 500-m around each pond sampled. This companion R script is used within the main .Rmd file to produce the figure shown in the section "Urbanization percentage growth per pond over the years".

## Methods

**Study site and sampling design**

We conducted fieldwork in Campo Grande, Mato Grosso do Sul, central Brazil (Projection: UTM–21S; Coordinates: 751114, 7731659; DATUM=SIRGAS/2000; 529 m a.s.l.; Figure 2). Campo Grande has been founded 127 years ago (Arruda 2006; PLANURB 2019) and has about 897,938 citizens (IBGE, 2023). Although recent, urbanisation has been fast, with the city gaining ~ 13,000 ha of urban infrastructure from 1985 to 2021 (MRS, pers. obs. based on data from MapBiomas, 2023). The urban area has about 35,903 ha with most human population living in the urban centre (PLANURB, 2019). The climate is Equatorial Savanna or Köppen’s Aw, with dry winters and rainy summers (Kottek et al., 2006). Vegetation is composed mainly by *Cerradão*, *Cerrado*, *Campo Sujo*, and *Vereda* vegetation types (Ribeiro & Walter, 2008).

We sampled adults of the hylid frog *Dendropsophus nanus* in 21 ponds (Figure 2; Figure S1) along an urban gradient through surveys at breeding sites (Scott Jr. et al., 1994) and visual encounter surveys (Crump & Scott Jr., 1994). Field work was conducted between 17:30 and midnight from November 2021 to April 2022 and from November 2022 to January 2023 visiting two ponds per day. We did not visit all ponds in all months; instead, each pond was visited until we obtained 30 individuals to guarantee the same sampling effort.

**Phenotypic traits**

We took the following linear measurements in the field using hand digital calliper (MTX 150 mm) to the nearest 0.01 mm: body size (Snout-Vent Length – SVL), head width, head length, and leg length (foot, thigh, and tibiofibula) following Watters et al. (2016). Relative leg length is related to vulnerability to predation (Citadini et al., 2018; Emerson, 1985a) and dispersal ability (Phillips et al., 2006), while head length and width is related to size and variety of feeding resources (Emerson, 1985b; Parmelee, 1999). Body size is related to thermoregulation, dispersal, and desiccation resistance (Wells, 2007). After measurements, every frog was tagged with visible implant elastomer (VIE) and released back.

Prior to analysis, we transformed head width, head length, and leg length to remove the effect of body size by applying a linear regression on each trait as a function of body size and then retained the residuals. To obtain head shape, we took the log-shape ratio (Mosimann, 1970) by computing the geometric mean of head length and head width to obtain their sizes. Then, we divided each value by their geometric means and log-transformed the result (Claude, 2013). Finally, we performed a Principal Component Analysis and took the first axis to represent head shape. PC1 retained 92.5% of the variance and was negatively correlated with head length (-0.74) and width (-0.67). Negative scores along PC1 represent individuals with wider and longer heads, while positive ones represent those with shorter and narrower heads. Analysis was performed in R v. 4.2.3 (R Core Team, 2023).

**Fitness proxy**

We weighted all individuals using a Pesola (Lightline 10 g #10010) to calculate the Scaled Mass Index (SMI), following Peig & Green (2009). This index relates body mass and size using a Major Axis Regression to obtain a measure of body condition, which was used here as a fitness proxy. Analysis was conducted in lmodel2 R package (Legendre, 2018). The SMI assumes a non-linear growth relationship that removes the effect of body size. This index has been used to access body condition of amphibians, especially in urbanized environments (Franco-Belussi et al., 2024), since it provides information about animal health status and is positively related with lipid and protein content (MacCracken & Stebbings, 2012). Lipid content in turn is correlated with survival in amphibians (Scott et al. 2007) but body condition also seems directly correlated with survival in a wild toad population (Reading 2007). Field measurements of fitness components are challenging for many taxa, including frogs. We acknowledge that using SMI as a fitness proxy can be problematic (Wilder et al., 2016). However, evidence exists linking lipidic storage to energetic reserves and ultimately to physiological status in frogs. In this context, SMI can be considered a performance index (but see Franklin & Morrissey, 2017), since animals with good body condition that have energetic reserves provided by a past successful feeding are more likely to survive (Jakob et al., 1996; Wells, 2007).

**Urbanisation and local environmental heterogeneity**

We measured urbanisation as the percentage of building area and roads (Szulkin, Garroway, et al., 2020) in a 500-m buffer around each pond (Figure S2). We also extracted this same variable from raster layers from 1985 to 2021 provided by MapBiomas (2023). To obtain urbanisation rate, we subtracted the current (2021) urban area from that of 1985 and divide it by 36 (number of years in the period). We repeated all analyses using a 1-km buffer and results did not differ. Spatial data handling was conducted in the R packages terra (Hijmans, 2023) and landscapemetrics (Hesselbarth et al., 2019).

To quantify local environmental heterogeneity, we measured the following variables at pond level: pond area (m^{2}; using Google Earth®), pond depth (mean of five points per pond; in m), pond hydroperiod (permanent or temporary – binary variable), presence or absence of cattle (binary variable), aquatic predators (vertebrate or both invertebrate and vertebrate – categorical variable), margin profile (excavated, flat or sloping – categorical variable), water temperature (using a thermometer submerged at 1 m for 1 min; in ºC), and percentage of floating vegetation (visual estimation %). Ponds with excavated margins were artificial, man-made ponds. Likewise, ponds with vertebrate predators and cattle were more associated with rural environments (Figure S1). The type of pond margins limits the amount and kinds of perches available for frogs, influencing how they use the environment (Vasconcelos et al. 2009), likely constraining the range of phenotypic traits associated with them.** **Those were later summarized using a Hill-Smith Principal Component Analysis (Hill & Smith, 1976), which allows combining discrete and quantitative variables. The PC1 retained ~ 63% of the variance in the data. The environmental variables that most contributed to PC1 (correlation > 0.6) were: excavated margin (positive), temporary hydroperiod (negative), invertebrate predators (negative), and cattle presence (negative; Table S1, Figure S3 and S4). Analysis was conducted in the ade4 R package (Dray & Dufour, 2007).

Lastly, we calculated land surface temperature (LST) and urban heat island (UHI; Figure S5). We used the following satellite images: LANDSAT 8 thermal band 10 for 2022 (07 November 2022) and LANDSAT 5 thermal band 6 for 1985 (17 November 1985). All images were obtained from Earth Explorer (United States Geological Survey – USGS: https://earthexplorer.usgs.gov/). The pixels of these images contain solar light variables, such as reflectance and radiance. To extract land surface temperature, we used the equation proposed by Coelho (2013) and Coelho & Correa (2013), which converts digital numbers to solar radiance, radiance to brightness temperature, and then to temperature in Kelvin or Celsius. Then, we used land surface temperature to calculate urban heat island (UHI) (see Ahmed et al., 2013; Faisal et al., 2021) for the whole urban area, as follows:

UHI=(Ts-Tm)/SD

where, *Ts* is land surface temperature; *tm* is the mean of land surface temperature, and SD is its standard deviation. Finally, we built an urban heat island map and calculated maximum, minimum, variance, and mean temperature in 500-m buffers around each pond. Mean land surface temperature was later used as predictor variable in the path analysis. Data handling and extraction were done in QGIS v. 3.22.1 (QGIS.org, 2020).

**Building spatial predictors and spatial autocorrelation**

To describe the spatial arrangement of ponds at multiple scales, we used Moran’s Eigenvector Maps (MEMs). Firstly, we used the geographical coordinates of ponds to build a spatial neighbourhood network. We tested different types of networks that represent different hypotheses of connections between ponds (i.e., Delaunay triangulation, Gabriel graphs, Minimum spanning trees, and a custom-built connectivity network). Then, we built a spatial weighting matrix in which weighting was a linear function of the inverse of distance. Afterwards, we generated a set of MEMs for each spatial neighbourhood network. Finally, we performed a selection of spatial matrices in the adespatial R package (Dray et al., 2023). We retained the best subset of MEMs that represents the autocorrelation of body size, built from the best spatial neighbourhood, which was selected based on the Akaike's Information Criterion corrected for small sample sizes (AICc). Then, we computed the corresponding Moran’s *I *statistic of each MEM and calculated its significance using a Monte Carlo permutation test to retain only eigenvectors with positive and significant autocorrelation.

The best spatial model was the custom neighbour network with six positive and significant MEMs (Figure S6). MEM3 had the lowest AICc and was used to represent pond spatial arrangement in the path analysis. Positive scores along MEM3 were arranged in a Northeast-Southwest direction and comprised most of the urbanized ponds (Figure S2 and S6). Finally, to test for spatial autocorrelation in traits and environmental variables, we used Moran's *I *correlograms (Legendre & Legendre, 2012) using the same custom-built spatial neighbourhood network.

**Space-for-time substitution approach**

We use this approach to make inference about microevolutionary processes. By using space as a substitute for time, we can use present environmental conditions to understand patterns produced by a past process (Wogan & Wang, 2018). Here, we sampled ponds along a rural-urban gradient (space) and calculated their rate of change in urbanisation from 1985 to 2021 (36 years) (time). Urbanisation rate varied from 0% to 2.11% per year. Ponds in the core urban had high urbanisation rates and had already undergone a moderate building process before 1985, providing evidence that ponds in urban core experienced a long lasting change. Thus, we assume frogs living in highly urbanised ponds have experiencing the effect of urbanisation for more time than those in peri-urban or rural sites. Provided tropical treefrogs live approximately 12 years (Stark & Meiri, 2018), at last three generations have passed since 1985.

**Data analysis**

We checked for multicollinearity of all environmental variables using Variance Inflation Factor (VIF) in the R package usdm (Naimi et al., 2014), and standardized the data to zero mean and unit variance in the R package vegan (Oksanen et al., 2018). All variables had VIF < 6 and were retained for further analysis. Finally, the PC1 of Hill-Smith PCA (local environmental variables), MEM3 (spatial arrangement), urbanisation rate, and land surface temperature were entered into a Structural Equation Model to test for their direct and indirect effects on the mean and coefficient of variation of each phenotypic trait.

We calculate the mean as a sum of each phenotypic trait divided by the number of individuals in each pond. Also, we calculated the Kvålseth coefficient of variation (KCV; see Kvålseth, 2017) for each phenotypic trait as the standard deviation divided by the square root of the mean of squared values, with an R function provided by Lobry et al., (2023). Path analysis was performed in the R package lavaan (Rosseel, 2012). Figures were made in the R package semPlot (Epskamp, 2022).

To estimate the adaptive landscape, we separately regressed the Scaled Mass Index as a function of body size, size-free leg length, and head shape. We used a non-parametric regression technique called projection-pursuit regression implemented in the gsg R package (Morrissey, 2014). This approach assumes that the partial regression coefficient represents the selection gradient as a way to estimate the adaptive landscape (Lande & Arnold, 1983). The data and associated R code are available in Dryad (Severgnini & Provete 2024).

## Funding

Coordenação de Aperfeicoamento de Pessoal de Nível Superior, Award: 001, Bolsa Demanda Social

National Council for Scientific and Technological Development, Award: 407318/2021-6, Projeto Universal

Alexander von Humboldt Foundation, CAPES-Humboldt fellowship for experienced researchers