Data from: Soil macrofauna communities vary by land use type and environmental conditions in the Serengeti-Mara ecosystem
Data files
Dec 16, 2025 version files 180.81 KB
-
ANTS2.csv
24.67 KB
-
DECOMP2.csv
72.96 KB
-
RDA_env.csv
6.66 KB
-
RDA_obs.csv
3 KB
-
README.md
21.16 KB
-
scaled_logcoefs.csv
1.20 KB
-
scaled_nbcoefs.csv
1.04 KB
-
soildat.csv
1.37 KB
-
TERMS2.csv
24.01 KB
-
WORMS2.csv
24.75 KB
Abstract
Soil macrofauna are useful indicators of soil health, given their low resistance to environmental stressors. The magnitude of such stressors varies by land use type and environmental conditions. Despite their ecological importance, soil macrofauna remain understudied in the Eastern Afrotropics. The greater Serengeti-Mara ecosystem (GSME) holds high conservation value and is experiencing acute environmental strain. Our study surveyed soil macrofauna communities across four habitat types: bush/forest, grassland, human use, and wetlands, following the Tropical Soil Biology & Fertility (TSBF) sampling protocol. We discuss the community structure and dynamics of ants (Hymenoptera: Formicidae), termites (Insecta: Isoptera), and earthworms (Annelida: Oligochaeta) due to their relative abundance and biomass in soil communities and for their role as ecosystem service providers. Redundancy analysis (RDA) revealed the partitioning of habitat types by relative water resource availability as quantified by distance to water (m), litter water content (%), and litter mass (g/m2). Water limitation increased between bush/forest, grassland, and human use habitats, respectively. The spatial patterning of habitat diversity and soil macrofauna communities alike are similarly linked to local moisture availability in the study region. Ants were observed at higher abundances than termites or earthworms throughout the study system and especially within relatively water resource-limited grassland and human-use study areas. By contrast, earthworms were observed most frequently and at higher relative abundances in bush/forest and wetland habitats. Termite abundances were low for nearly every study site. These patterns emphasize the degree to which landscape-scale heterogeneity plays a role in the spatial patterning of soil macrofauna communities in a semi-arid tropical landscape.
Dataset DOI: 10.5061/dryad.pzgmsbcv8
Description of the data and file structure
Presence and abundance data were collected for 14 soil macrofauna invertebrate taxa using the Tropical Soil Biology & Fertility (TSBF) hand-sorting method in Kenya's Maasai Mara.
Note: Including location data will not pose any risk to endangered or vulnerable species in the ecosystem.
Files and variables
File: ANTS2.csv
Description: filtered (ants observations only), cleaned dataset in long format with all environmental and site variables. Raw and transformed presence/abundance values and community (i.e., site-level diversity + richness indices) included.
Variables
- Area: Study area name abbreviation (Enonkishu Conservancy [ENO], Emarti township [EMA], Ol Choro Conservancy [OLC], Lemek Conservancy [LEM], Naretoi Wildlife Estates [NAR], and Mbokishi Conservancy [MBO]).
- Protected: Study area protection status. Binary variable (Y = 1, N = 0).
- SiteID: Unique Site ID (A-R). Each comprisesrised of 5 points, at which replicate monoliths were extracted.
- Point: Unique point ID. Each site has 5 points, numbered arbitrarily (1-5).
- PointID: Unique Point ID (e.g., A1, A2, A3, A4, A5) corresponding to each of the sampling points within a particular site.
- SoilType: Soil type classification based on USDA Soil Texture Triangle classifications. Designation based on calculated volumetric proportions of clay, sand, and silt for each soil monolith extracted.
- PctClay: % clay of soil sample extracted at a sampling point.
- Lat: Sample site latitude (taken from center of sampling site, +/- 5m accuracy)
- Long: Sample site longitude (takenthe from centthe er of sampling site, +/- 5m accuracy)
- DistWater: Distance to nearest permanent/semi-permanent water source (i.e., creek, river, pond, wetland) in metersthe from centthe er of sample site.
- Elev: Sample site elevation in meters.
- TreeCov%: % tree cover in 1 km2 surrounding sample site using satellite data. Not used in final analysis.
- Floodplain: Binarythe , is sample site within a floodplain area? (Y = 1, N = 0).
- CatGrazing: Categorical, semi-qualitative variable classifying cattle grazing intensity (NONE, LOW, MODERATE, HIGH) based on researcher field observations, conservancy management, plans and discussions with herders and managers.
- Fire: Fire history at sample site within last 2 years (Y = 1, N = 0).
- hab_type: grassland (open, dominated by grasses and/or shrubs), wetland (frequently inundated, low-lying areas of landscape), human-use (areas under intensive daily use by humans and/or livestock), bush/forest (closed canopy areas dominated by woody species).
- FreshWgtCor: Fresh litter weight collected at sample point in grams (g).
- PctWaterCt: % of fresh litter weight as water (calculated as % difference between fresh and dry litter weights in g)
- ShanDiv: Shannon Diversity (H) index calculated (ants, termites, earthwormthe s) at sample point.
- SpecRich: Species richness calculated (includes all 14 taxa for which data were originally collected) at sample point.
- taxa: Ants
- num_ind: Number of individuals observed at a sampling point (integer)
- logBiomass: log-transformed local biomass value (m3km-2y-1)
- logRainprev_30: log-transformed precipitation value (in cm)the 30 daysperiod prior to soil monolith excavation. Not used in final analyses.
- logRainprev_7: log-transformed precipitation value (in cm) over the 7-day pbeforeior before soil monolith excavation. Not used in final analyses.
- logRainprev_2: log-transformed precipitation value (in cm) for the 2 period prior to soil monolith excavation. Not used in final analyses.
- logSpecAbun: Log-transformed species abundance at sampling point (calculated using observations from all 14 taxa for whichwereta was originally collected).
- logNum_ind: Log-transformed number of individuals observed at a sampling point by taxa.Presencee: Binary indicator of taxon presence at a sampling point within a site (Y = 1, N = 0).
- proportion: Proportion of ecosystem engineering soil macroinvertebrate (ants, termites, earthworms) community at a particular sampling point (i.e., if 10 ants, 5 termites, and 5 earthworms are observed at a sampling poin,t their proportions will be 0.50, 0.25, 0.25, respectively).
File: DECOMP2.csv
Description: Filtered (ants, termites, and earthworms), cleaned dataset in long format with all environmental and site variables. Raw and transformed presence/abundance values and community (i.e., site-level diversity + richness indices) included.
Variables
- Area: Study area name abbreviation (Enonkishu Conservancy [ENO], Emarti township [EMA], Ol Choro Conservancy [OLC], Lemek Conservancy [LEM], Naretoi Wildlife Estates [NAR], and Mbokishi Conservancy [MBO]).
- Protected: Study area protection status. Binary variable (Y = 1, N = 0).
- SiteID: Unique Site ID (A-R). Each scomprisesised of 5 points, at which replicate monoliths were extracted.
- Point: Unique point ID. Each site has 5 points, numbered arbitrarily (1-5).
- PointID: Unique Point ID (e.g., A1, A2, A3, A4, A5) corresponding to each of the sampling points within a particular site.
- SoilType: Soil type classification based on USDA Soil Texture Triangle classifications. Designation based on calculated volumetric proportions of clay, sand, and silt for each soil monolith extracted.
- PctClay: % clay of soil sample extracted at a sampling point.
- Lat: Sample site latitude (taken from center of sampling site, +/- 5m accuracy)
- Long: Sample site longitude (takenthe from centthe er of sampling site, +/- 5m accuracy)
- DistWater: Distance to nearest permanent/semi-permanent water source (i.e., creek, river, pond, wetland) in meters from centthe er of the sample site.
- Elev: Sample site elevation in meters.
- TreeCov%: % tree cover in 1 km2 surrounding sample site using satellite data. Not used in final analysis.
- Floodplain: Binarthe y, is sample site within a floodplain area? (Y = 1, N = 0).
- CatGrazing: Categorical, semi-qualitative variable classifying cattle grazing intensity (NONE, LOW, MODERATE, HIGH) based on researcher field observations, conservancy management, plans and discussions with herders and managers.
- Fire: Fire history at sample site within last 2 years (Y = 1, N = 0).
- hab_type: grassland (open, dominated by grasses and/or shrubs), wetland (frequently inundated, low-lying areas of landscape), human-use (areas under intensive daily use by humans and/or livestock), bush/forest (closed canopy areas dominated by woody species).
- FreshWgtCor: Fresh litter weight collected at sample point in grams (g).
- PctWaterCt: % of fresh litter weight as water (calculated as % difference between fresh and dry litter weights in g)
- ShanDiv: Shannon Diversity (H) index calculated (ants, termites, earthwormthe) at sample point.
- SpecRich: Species richness calculated (includes all 14 taxa for which data were originally collected) at sample point.Taxaa: Taxon for which raw abundance values and transformed values are calculated (ants, termites, and earthworms).
- num_ind: Number of individuals observed at a sampling point (integer)
- logBiomass: log-transformed local biomass value (m3km-2y-1)
- logRainprev_30: log-transformed precipitation value (in cm)the 30 daysperiod prior to soil monolith excavation. Not used in final analyses.
- logRainprev_7: log-transformed precipitation value (in cm) over the 7-day period to soil monolith excavation. Not used in final analyses.
- logRainprev_2: log-transformed precipitation value (in cm) for the 2 period prior to soil monolith excavation. Not used in final analyses.
- logSpecAbun: Log-transformed species abundance at sampling point (calculated using observations from all 14 taxa for whichwereta was originally collected).
- logNum_ind: Log-transformed number of individuals observed at a sampling point by taxa.Presencee: Binary indicator of taxon presence at a sampling point within a site (Y = 1, N = 0).
- proportion: Proportion of ecosystem engineering soil macroinvertebrate (ants, termites, earthworms) community at a particular sampling point (i.e., if 10 ants, 5 termites, and 5 earthworms are observed at a sampling poin,t their proportions will be 0.50, 0.25, 0.25, respectively).
File: TERMS2.csv
Description: Filtered (termites observations only), cleaned dataset in long format with all environmental and site variables. Raw and transformed presence/abundance values and community (i.e., site-level diversity + richness indices) included.
Variables
- Area: Study area name abbreviation (Enonkishu Conservancy [ENO], Emarti township [EMA], Ol Choro Conservancy [OLC], Lemek Conservancy [LEM], Naretoi Wildlife Estates [NAR], and Mbokishi Conservancy [MBO]).
- Protected: Study area protection status. Binary variable (Y = 1, N = 0).
- SiteID: Unique Site ID (A-R). Eachcomprisesprised of 5 points, at which replicate monoliths were extracted.
- Point: Unique point ID. Each site has 5 points, numbered arbitrarily (1-5).
- PointID: Unique Point ID (e.g., A1, A2, A3, A4, A5) corresponding to each of the sampling points within a particular site.
- SoilType: Soil type classification based on USDA Soil Texture Triangle classifications. Designation based on calculated volumetric proportions of clay, sand, and silt for each soil monolith extracted.
- PctClay: % clay of soil sample extracted at a sampling point.
- Lat: Sample site latitude (taken from center of sampling site, +/- 5m accuracy)
- Long: Sample site longitude (takenthe from centthe er of sampling site, +/- 5m accuracy)
- DistWater: Distance to nearest permanent/semi-permanent water source (i.e., creek, river, pond, wetland) in meters fromthe center ofthe sample site.
- Elev: Sample site elevation in meters.
- TreeCov%: % tree cover in 1 km2 surrounding sample site using satellite data. Not used in final analysis.
- Floodplain: Binarthe y, is sample site within a floodplain area? (Y = 1, N = 0).
- CatGrazing: Categorical, semi-qualitative variable classifying cattle grazing intensity (NONE, LOW, MODERATE, HIGH) based on researcher field observations, conservancy management plan,s and discussions with herders and managers.
- Fire: Fire history at sample site within last 2 years (Y = 1, N = 0).
- hab_type: grassland (open, dominated by grasses and/or shrubs), wetland (frequently inundated, low-lying areas of landscape), human-use (areas under intensive daily use by humans and/or livestock), bush/forest (closed canopy areas dominated by woody species).
- FreshWgtCor: Fresh litter weight collected at sample point in grams (g).
- PctWaterCt: % of fresh litter weight as water (calculated as % difference between fresh and dry litter weights in g)
- ShanDiv: Shannon Diversity (H) index calculated (ants, termites, earthworthe ms) at sample point.
- SpecRich: Species richness calculated (includes all 14 taxa for which data were originally collected) at sample point.
- taxa: Termites
- num_ind: Number of individuals observed at a sampling point (integer)
- logBiomass: log-transformed local biomass value (m3km-2y-1)
- logRainprev_30: log-transformed precipitation value (in cm)the 30 daysperiod prior to soil monolith excavation. Not used in final analyses.
- logRainprev_7: log-transformed precipitation value (in cm) over the 7-day period prior to soil monolith excavation. Not used in final analyses.
- logRainprev_2: log-transformed precipitation value (in cm)the 2 daysperiod prior to soil monolith excavation. Not used in final analyses.
- logSpecAbun: Log-transformed species abundance at sampling point (calculated using observations from all 14 taxa for which data were originally collected).
- logNum_ind: Log-transformed number of individuals observed at a sampling point by taxa.
- presence: Binary indicator of taxon presence at a sampling point within a site (Y = 1, N = 0).
- proportion: Proportion of ecosystem engineering soil macroinvertebrate (ants, termites, earthworms) community at a particular sampling point (i.e., if 10 ants, 5 termites, and 5 earthworms are observed at a sampling poin,t their proportions will be 0.50, 0.25, 0.25, respectively).
File: soildat.csv
Description: Intra-site soil composition and litter mass variability (average and standard error values) and corresponding habitat types.
Variables
- SiteID: Unique Site ID (A-R). Each comprisesrised of 5 points, at which replicate monoliths were extracted.
- variable: clay (% clay at sampling point) and litter (average litter weight in [g] at sampling point)
- avg: average value (% clay or litter weight in grams) calculated across 5 replicate sampling points within a given site.
- sterr: Standard error associated with the average value (% clay or litter weight in grams) calculated across 5 replicate sampling points within a given site.
- hab_type: grassland (open, dominated by grasses and/or shrubs), wetland (frequently inundated, low-lying areas of landscape), human-use (areas under intensive daily use by humans and/or livestock), bush/forest (closed canopy areas dominated by woody species).:
File: WORMS2.csv
Description: Filtered (earthworms observations only), cleaned dataset in long format with all environmental and site variables. Raw and transformed presence/abundance values and community (i.e., site-level diversity + richness indices) included.
Variables
- Area: Study area name abbreviation (Enonkishu Conservancy [ENO], Emarti township [EMA], Ol Choro Conservancy [OLC], Lemek Conservancy [LEM], Naretoi Wildlife Estates [NAR], and Mbokishi Conservancy [MBO]).
- Protected: Study area protection status. Binary variable (Y = 1, N = 0).
- SiteID: Unique Site ID (A-R). Eachcomprisesprised of 5 points, at which replicate monoliths were extracted.
- Point: Unique point ID. Each site has 5 points, numbered arbitrarily (1-5).
- PointID: Unique Point ID (e.g., A1, A2, A3, A4, A5) corresponding to each of the sampling points within a particular site.
- SoilType: Soil type classification based on USDA Soil Texture Triangle classifications. Designation based on calculated volumetric proportions of clay, sand, and silt for each soil monolith extracted.
- PctClay: % clay of soil sample extracted at a sampling point.
- Lat: Sample site latitude (taken from center of sampling site, +/- 5m accuracy)
- Long: Sample site longitude (taken from center ofthe sampling site, +/- 5m accuracy)
- DistWater: Distance to nearest permanent/semi-permanent water source (i.e., creek, river, pond, wetland) in meters fromthe center ofthe sample site.
- Elev: Sample site elevation in meters.
- TreeCov%: % tree cover in 1 km2 surrounding sample site using satellite data. Not used in final analysis.
- Floodplain: Binarthe y, is sample site within a floodplain area? (Y = 1, N = 0).
- CatGrazing: Categorical, semi-qualitative variable classifying cattle grazing intensity (NONE, LOW, MODERATE, HIGH) based on researcher field observations, conservancy management plan,s and discussions with herders and managers.
- Fire: Fire history at sample site within last 2 years (Y = 1, N = 0).
- hab_type: grassland (open, dominated by grasses and/or shrubs), wetland (frequently inundated, low-lying areas of landscape), human-use (areas under intensive daily use by humans and/or livestock), bush/forest (closed canopy areas dominated by woody species).
- FreshWgtCor: Fresh litter weight collected at sample point in grams (g).
- PctWaterCt: % of fresh litter weight as water (calculated as % difference between fresh and dry litter weights in g)
- ShanDiv: Shannon Diversity (H) index calculated (ants, termites, earthworms) atthe sample point.
- SpecRich: Species richness calculated (includes all 14 taxa for which data were originally collected) at sample point.
- taxa: Earthworms
- num_ind: Number of individuals observed at a sampling point (integer)
- logBiomass: log-transformed local biomass value (m3km-2y-1)
- logRainprev_30: log-transformed precipitation value (in cm) overthe 30 daysd prior to soil monolith excavation. Not used in final analyses.
- logRainprev_7: log-transformed precipitation value (in cm) over the 7-day period prior to soil monolith excavation. Not used in final analyses.
- logRainprev_2: log-transformed precipitation value (in cm) over the 2-day period prior to soil monolith excavation. Not used in final analyses.
- logSpecAbun: Log-transformed species abundance at sampling point (calculated using observations from all 14 taxa for which data were originally collected).
- logNum_ind: Log-transformed number of individuals observed at a sampling point by taxa.
- presence: Binary indicator of taxon presence at a sampling point within a site (Y = 1, N = 0).
- proportion: Proportion of ecosystem engineering soil macroinvertebrate (ants, termites, earthworms) community at a particular sampling point (i.e., if 10 ants, 5 termites, and 5 earthworms are observed at a sampling poin,t their proportions will be 0.50, 0.25, 0.25, respectively).
File: scaled_logcoefs.csv
Description: Environmental variable coefficients scaled using a logarithmic distribution
Variables
- taxa: Taxon for which scaled coefficients are calculated (Ants, termites, and earthworms).
- variable: Environmental site variable for which estimates, standard errors, CI,s and p-values are calculated. logDistWater_scaled (site distance to permanent water source in meters, log-transformed and scaled), logBiomass_scaled (local biomass production m3k-2y-1, log-transformed and scaled), logDistBoma_scaled (distance to nearest boma in meters, log-transformed and scaled), PctWaterCt_scaled (% litter water content by weight in g, scaled), PctClay_scaled (% clay in soil sample, scaled), FreshWgtCor_scaled (fresh litter weight in g, scaled).
- estimate: Site-level coefficient estimate for each given variable.
- se: Standard error value
- lower_ci: Lower 95% CI value
- upper_ci: Upper 95% CI value
- p: Associated p-value (a = 0.05)
File: RDA_obs.csv
Description:
Variables
- hell_Earthworms: Hellinger-transformed earthworm abundance values by sampling poHell_Antsl_Ants: Hellinger-transformed ants abundance values by sampling poHell_Termitesrmites: Hellinger-transformed termites abundance values by sampling point.
File: RDA_env.csv
Description:
Variables
- PctClay: % clay of soil sample extracted at a sampling point.
- Elev: Site elevation in meters (m)
- hab_type: grassland (open, dominated by grasses and/or shrubs), wetland (frequently inundated, low-lying areas of landscape), human-use (areas under intensive daily use by humans and/or livestock), bush/forest (closed canopy areas dominated by woody species).
- FreshWgtCor: Fresh litter weight collected at sample point in grams (g).
- PctWaterCt: % of fresh litter weight as water (calculated as % difference between fresh and dry litter weights in g)
- logDistWater: Log-transformed distance to nearest permanent/semi-permanent water source (i.e., creek, river, pond, wetland) in meters from center of sample site.
- logBiomass: log-transformed local biomass value (m3km-2y-1)
File: scaled_nbcoefs.csv
Description: Environmental variable coefficients scaled using a negative binomial distribution
Variables
- taxa: Taxon for which scaled coefficients are calculated (Ants, termites, and earthworms).Variablee: Environmental site variable for which estimates, standard errors, CI,s and p-values are calculated. logDistWater_scaled (site distance to permanent water source in meters, log-transformed and scaled), logBiomass_scaled (local biomass production m3k-2y-1, log-transformed and scaled), logDistBoma_scaled (distance to nearest boma in meters, log-transformed and scaled), PctWaterCt_scaled (% litter water content by weight in g, scaled), PctClay_scaled (% clay in soil sample, scaled), FreshWgtCor_scaled (fresh litter weight in g, scaled).
- estimate: Site-level coefficient estimate for each given variable.
- se: Standard error value
- lower_ci: Lower 95% CI value
- upper_ci: Upper 95% CI value
- p: Associated p-value (a = 0.05)
All missing data are represented as NA.
Code/software
R/RStudio. See .RMD file (Invert_Report.rmd) for versions, packages, and workflow. It aggregates observations by grouping and calculates relative proportions, then prepares datasets for RDA analysis and plotting.
This code loads and cleans multiple ecological datasets, derives transformed and aggregated variables, and then conducts exploratory analyses, statistical tests, and multivariate ordinations to examine taxon abundance, habitat differences, and environmental relationships.
Access information
Other publicly accessible locations of the data:
Study system
The Greater Serengeti-Mara ecosystem (GSME) straddles the Kenya-Tanzania border of East Africa (1◦15′-3◦30′S, 34–36◦E) (Fig. 1A) and hosts one of the most abundant ungulate populations in the world. The protected areas of the GSME form a large swath of unfenced land which facilitates the annual migration of 1.6 million ungulates (McNaughton, 1985). The edge conservancies of the GSME therefore represent the final stronghold buffering wildlife populations from the negative effects of rapid land-use intensification on the fringes of the core protected area (Newmark, 2008; Veldhuis et al., 2019). The GSME—approximately 25,000 km2 in area—is dominated by vast expanses of acacia savannah. Within our study area (1◦01′-1◦11′S, 35◦08′-35◦20′E) (Fig. 1B), the landscape is dominated by bush/forest and open grassland habitat types. The spatial ecology of the GSME is due in large part to its prevailing temperature and precipitation patterns. The study region has distinct wet and dry seasons resulting from a bimodal annual rainfall pattern (Bartzke et al., 2018; Norton-Griffiths et al., 1975). Daily temperatures range as low as 7.3 ◦C and as high as 28.5 ◦C annually (Mukeka et al., 2019), with daily dry season high temperatures ranging from 23.3 ◦C - 26.6 ◦C. Average regional temperatures are increasing in response to land-use intensification-driven vegetation changes (Nduati et al., 2013; Ogutu et al., 2008).
While humans and their livestock have been present in the GSME for millennia (Asiema and Situma, 1994), the last few decades have brought about dramatic shifts in land and livestock management practices resulting in marked environmental changes at the landscape level. Traditional Maasai pastoralism, mainly through the keeping of cattle, holds significant sociocultural and economic importance (Asiema and Situma, 1994). Although often considered more compatible with conservation than other land use types (Western, 1982), the influence of pastoralism on the ecology of protected areas varies greatly by context. While the overstocking of livestock (i.e., cattle, sheep, and goats) may harm local plant and animal assemblages (Lamprey and Reid, 2004), low to moderate livestock grazing intensity may promote grazing succession for herbivorous wildlife species (Herrik et al., 2023).
Within conservancies, the herds of cattle which are ubiquitous on the landscape by day are kept in ‘bomas’ overnight. Bomas are mobile livestock enclosures that create feces-rich patches on the landscape, ultimately altering the physical and chemical characteristics of the soil (Reid and Ellis, 1995; Stelfox, 1986). Each conservancy maintains and implements its own livestock grazing plan; human and livestock use therefore differs between conservancies and across seasons in accordance with annual rainfall patterns. Cattle management strategies vary spatiotemporally, ranging from the full exclusion of cattle in core areas to year-round rotational grazing within structured grazing blocks.
Our sampling areas primarily consisted of four conservancies which make up the northernmost end of the GSME (Table 1; Fig. 1B). It is here, between the fenceless conservancies to the south and densely populated human settlement areas to the north, that the human-livestock-wildlife interface is most vulnerable to the effects of land use change. To investigate the extent to which permanent livestock and/or human settlement impact soil macrofauna communities, we also sampled in the neighboring Emarti village (EMA) and neighboring private wildlife estate, Naretoi Wildlife Estate (NAR). Emarti is a village densely populated by humans and livestock alike. Naretoi, by contrast, is a ‘re-wilded’ human estate development made up of privately-owned plots.
Macrofaunal sampling design
For the purposes of our study, we chose to focus on three soil invertebrate groups: ants (Hymenoptera: Formicidae), termites (Insecta: Isoptera), and earthworms (Annelida: Oligochaeta) due to their relative abundance in soil communities and for their ecological role as ecosystem engineers (Anderson, 1995; Beare et al., 1997). However, data were recorded for 17 groups of soil macroinvertebrates sampled at our sites (Supplementary Table S1), defined as any soil animal which exceeds 2 mm in length (Swift et al., 1979). Specimens were identified using a dichotomous key published by the Natural History Museum (Potts and Holt, n.d.). Relevant keys for the identification of East African soil macroinvertebrates to the species level were not available and thus species-level observations are not reported. Fortunately, changes in group composition at a higher taxonomic scale (i.e., class, order) have been posited to better indicate the effects of changes in soil biodiversity on ecosystem function than changes at finer taxonomic resolutions (Beare et al., 1997).
Following the Tropical Soil Biology & Fertility (TSBF) sampling protocol (Anderson and Ingram, 1994), we sampled soil macrofauna by excavating 25 x 25 x 20 cm (length x width x depth) soil monoliths— intact vertical sections of soil carefully extracted from the ground to preserve physical characteristics (Taylor, 1960)—at five replicate points at each randomly selected site location within each nested habitat and study area strata. GPS coordinates (accuracy within five meters) were taken for the central point at each site, and replicate points were each placed 15 m from the central point perpendicular to one another (Supplementary Fig. S1). Surface-dwelling macroinvertebrates were collected prior to monolith extraction to minimize sampling bias during the collection process. After the extraction at each sample point within a given site, soil samples were hand sorted without replacement to remove all soil invertebrate animals >2 mm in body length. Furthermore, the dry, clay-laden soils observed throughout the study region minimized concern of subterranean soil animal escape during the soil monolith excavation process. After identification, extracted organisms were fixed in test tubes containing 70 % alcohol. We sampled at 18 sites across four habitat types. Each site had five nested sample points, resulting in the excavation of 90 monoliths across bush/forest (n = 15), grassland (n = 40), human use (n = 15), and wetland (n = 20) habitat types (Table 2) from June–July during the years Parreira et al., 2022 and 2023. This period marks the annual transition between the region’s late wet and early dry seasons (Norton-Griffiths et al., 1975). The hand-sorting method was implemented in accordance with the TSBF sampling protocol (Anderson and Ingram, 1994) to minimize any potential undersampling of earthworms in the study system which may be in seasonal diapause.
With respect to earthworms, we employed a visual approach to distinguish between aestivated, juvenile, and adult earthworm species (Supplementary Fig. S2). Juveniles lack the clearly visible clitellum possessed by adult earthworms (Edwards, 2004; Sims and Gerard, 1985). Aestivated individuals, possessing tightly-wrapped, shrunken bodies bound by mucus and pebble-coated cocoons (Storey and Storey, 2012), were extracted from aestivation chambers in the soil (Storey and Storey, 2012). This quiescent state may cause external features, including reproductive organs, to become less prominent or obscured due to the worm’s contracted posture and protective mucous layers (Bayley et al., 2010; Juan et al., 2000). Most earthworms collected were juveniles, aestivated, or adults whose growth was stunted by drought conditions, thereby failing to meet the assumptions of existing species-level dichotomous keys. Such low numbers of reproductively mature earthworms in combination with a paucity of keys for the identification of East African soil macroinvertebrates therefore rendered the identification of earthworms to species level infeasible for inclusion in this analysis.
Characterization of environmental features
Habitat types were selected to include those which characterize the local landscape: bush/forest, grassland, human use, and wetland areas. A stratified site sampling approach was taken to ensure accurate habitatlevel representation of the local landscape during data collection. Bush/forest areas are characterized by closed canopy cover dominated by woody species (Sites O, P, and R). Grasslands are open expanses with plant communities made up of grasses and forbs (Sites A, B, E, H, K, L, M, and Q) which dominate the local landscape. Human-use areas experience direct day-to-day effects of human settlement (Sites D, F, and G). Our human-use study sites included the town center which experiences high vehicle and foot traffic in addition to high livestock grazing intensity (Site D), a schoolyard subject to intermittent cropping and livestock grazing (Site F), and a family garden (Site G). Our wetland habitat zones experience frequent soil water inundation, often at elevational minima within the local landscape (Sites C, I, J, and N). Note that floodplains and wetland areas are not intrinsically mutually inclusive. Low-lying wetland areas experience predictable soil water inundation while floodplains are subject to catastrophic flooding at infrequent intervals (McClain et al., 2014).
Site-level habitat type was characterized in situ and in combination with satellite imagery. Environment and management information were taken at each point of soil fauna extraction. The following metrics were recorded to characterize environmental variation: distance to water (m), distance to boma (m), litter water content (%), fresh litter mass (g m-2), vegetation biomass (m3 km-2 year-1) (Food and Agriculture Organization, 2002), and soil clay content (%).
Geospatial data quantifying distances to water and bomas were analyzed using QGIS (version 3.28, 2023). Upon placement of the 25 x 25 x 20 cm metal frame used to designate the location of each replicate sample point, all surface organic litter within the frame was collected. Litter samples were then weighed twice: once immediately following collection and once upon drying completely. The difference in measurements was used to calculate the proportion of water content present by mass in each litter sample at the time of collection.
Soil samples were collected by hand at each sample point following litter collection. Percent clay content was determined via soil particle size analysis (Gee and Bauder, 1986), which assumes that soil particle types fall into distinct size classes. Sand particles are largest (2000–50 μm), followed by silt particles (50–2.0 μm) and clay particles (< 2.0 μm). Following Gee and Bauder’s, 1986 protocol, we created a homogeneous slurry of soil and water which was left to sit after homogenization. After soil particles had stratified into distinct horizons by particle size, relative proportions of soil particle size classes were calculated to determine percent soil clay particle content.
Statistical analysis
Abundance data for each taxon were subject to a log (n + 1) transformation prior to statistical analyses to normalize distributions while minimizing zero-inflation. Log (n) transformations were carried out for those continuous variables with high variances or those that did not fit a normal distribution and for which no zeroes were present (i.e., distance to water and distance to boma). Quantiles, means, and standard error (SE) values were calculated on transformed abundance data for each taxon across habitat types. Analysis of variance (ANOVA) and Tukey tests were carried out to assess the magnitude of the true difference in mean taxon abundances and to calculate pairwise confidence intervals between taxa. All p-values reported in the text are derived from these Tukey tests.
Our point-presence proportion metric captured the proportion of replicate points within a sample site at which a particular taxon was present. These point-presence proportion quantile values, means, and SE values were calculated for all taxa across habitat types. Again, ANOVA and Tukey tests were employed to assess the true difference in mean point-presence proportions between taxa.
We Hellinger-transformed our raw taxon abundance prior to carrying out a redundancy analysis (RDA). Upon ordination, Euclidean values for axes 1 and 2 (RD1 and RD2, respectively) were extracted and plotted separately from the initial RDA biplot. Convex hulls were constructed around Euclidean values for each habitat type using the stat_chull() function from R’s ‘ggpubr’ package (Kassambara, 2022) to model the proportion of variance in taxon abundances explained by the environmental predictors included in the model. Mean Euclidean values were calculated for each habitat type and included in the plot to characterize differences in environmental characteristics within and between habitat types.
We constructed logistic and negative binomial regression models to perform the explanatory modeling of patterns of soil invertebrate presence and abundance (Supplementary Table S2). Coefficient estimates and 95 % confidence intervals (CIs) were extracted from these models to construct forest plots. Effect sizes for all predictors included in the model were scaled and centered for improved model convergence and interpretability.
Shannon diversity (H) was calculated by applying the diversity (index = “shannon”) function and argument in the ‘vegan’ R package (Dixon, 2003) to the raw abundance data collected for all 17 taxa for which data were collected at each sampling point. Taxonomic density at a sampling point was calculated as the total number of taxonomic groups for which one or more individual was sampled. Point-level evenness was calculated by dividing Shannon diversity by the natural log (ln) of the taxonomic density at each sampling point. Pairwise differences between mean values of Shannon diversity, taxonomic density, and evenness indices across habitat type were calculated using ANOVA and Tukey tests.
Statistical tests were also carried out to test for significant within and between-site differences in community composition (i.e., Shannon diversity, taxonomic density, and evenness indices). Dunn’s and Kruskal- Wallis multiple comparisons tests using rank sums were carried out to test for significant within-site differences. Kruskal-Wallis chi-squared and adjusted p-values were calculated by applying the dunn.test (method = “bonferroni”) function and argument in the ‘dunn.test’ R package (Dinno, 2024). Pairwise differences between mean values of Shannon diversity, taxonomic density, and evenness indices between sites were calculated using ANOVA and Tukey tests.
- North, Gretchen; Frelich, Lee E.; Guthmann, Abby E. (2025). Data from: Soil macrofauna communities vary by land use type and environmental conditions in the Serengeti-Mara ecosystem. Zenodo. https://doi.org/10.5281/zenodo.10578437
- North, Gretchen; Frelich, Lee E.; Guthmann, Abby E. (2025). Data from: Soil macrofauna communities vary by land use type and environmental conditions in the Serengeti-Mara ecosystem. Zenodo. https://doi.org/10.5281/zenodo.10578438
- North, Gretchen C.; Frelich, Lee E.; Guthmann, Abby E. (2025). Soil macrofauna communities vary by land use type and environmental conditions in the Serengeti-Mara ecosystem. Applied Soil Ecology. https://doi.org/10.1016/j.apsoil.2025.105897
