# Data from: Megafauna decline have reduced pathogen dispersal which may have increased emergent infectious diseases

## Cite this dataset

Doughty, Chris et al. (2020). Data from: Megafauna decline have reduced pathogen dispersal which may have increased emergent infectious diseases [Dataset]. Dryad. https://doi.org/10.5061/dryad.dfn2z34xk

## Abstract

The Late Quaternary extinctions of megafauna (defined as animal species > 44.5 kg) reduced the dispersal of seeds and nutrients, and likely also microbes and parasites. Here we use body-mass based scaling and range maps for extinct and extant mammal species to show that these extinctions led to an almost seven-fold reduction in the movement of gut-transported microbes, such as Escherichia coli (3.3–0.5 km 2 d − 1 ). Similarly, the extinctions led to a seven-fold reduction in the mean home ranges of vector-borne pathogens (7.8–1.1km 2 ). To understand the impact of this, we created an individualbased model where an order of magnitude decrease in home range increased maximum aggregated microbial mutations 4-fold after 20 000 yr. We hypothesize that pathogen speciation and hence endemism increased with isolation, as global dispersal distances decreased through a mechanism similar to the theory of island biogeography. To investigate if such an effect could be found, we analysed where 145 zoonotic diseases have emerged in human populations and found quantitative estimates of reduced dispersal of ectoparasites and fecal pathogens significantly improved our ability to predict the locations of outbreaks (increasing variance explained by 8%). There are limitations to this analysis which we discuss in detail, but if further studies support these results, they broadly suggest that reduced pathogen dispersal following megafauna extinctions may have increased the emergence of zoonotic pathogens moving into human populations.

## Methods

### Materials and Methods

#### Impact of megafauna extinctions on microbial and blood parasite movement

We estimated current global ecto-parasite and fecal pathogen dispersal patterns using the IUCN mammal species range maps for all extant species (removing all bats because mass scaling of dispersal for bats is inaccurate) (N=5,487). To create maps of dispersal patters for a world without the Pleistocene megafauna extinctions, we added species range maps (N=274) of the now extinct megafauna (within 130,000 years) created in Faurby & Svenning (2015a) to the current IUCN based dispersal maps. These ranges estimate the natural range as the area that a given species would occupy under the present climate, without anthropogenic interference. In cases of evident anthropogenic range reductions for extant mammals, like the Asian elephant (*Elephas maximus*), the current ranges encompass only the IUCN defined ranges but the world without extinctions includes the ranges on these extant animals prior to anthropogenic range reductions. The taxonomy of recent species followed IUCN while the taxonomy of extinct species (which were included if there are dates records less than 130,000 years old) followed Faurby & Svenning (2015b). For each living and extinct animal species, we use body mass estimates (Faurby & Svenning, 2016), with the few species lacking data assigned masses based on the mass of their relatives. We used the following mass based (M: average body mass per species (kg)) scaling equations (recalculated from primary data in Figure S1) to estimate home range (Kelt and Van Vuren, 2001), day range (Carbone et al 2005), and gut retention time (Demment and Van Soest, 1985, Demment 1983):

** Equation 1** - Home Range

HR (km^{2}) = 0.04*M^{1.09}

This dataset, originally compiled by Kelt and Van Vuren (2001) (N=113 mammalian herbivores), used the convex hull approach to calculate home range and found the mass-based scaling to be highly size dependent (with mass scaling exponent of >1)

** Equation 2** –Mean home range for all mammals per pixel (MHR) or ectoparasite dispersal

MHR (km^{2}) = Σ HR_{i} / n

(i: per pixel species number; n: = number of mammal species per pixel)

We define the mean ectoparasite dispersal per pixel as the average distance a pathogen could travel across all mammals present in the pixel and assuming an equal change of colonizing any mammal species.

Next, we estimate fecal pathogen diffusivity with the following equations. We start with day range (daily distance travelled) originally from Carbone et al 2005 (N=171 mammalian herbivores) but recalculated from primary data in Figure S1.

** Equation 3** - Day Range (DR)

DR (km/day) = 0.45*M^{0.37}

Next, to estimate the minimum time a generalist microbe might stay in the body of a mammalian herbivore, we use passage time:

** Equation 4** - Passage time (PT) from (Demment and Van Soest, 1985, Demment 1983)

PT (days) = 0.589*D*M^{0.28}

Where D is digestibility, which we set to 0.5 as a parsimonious assumption because the actual value is unknown for many extant and extinct animals.

Distance between consumption and defecation or straight line fecal transmission distance is simply multiplying equation 3 by equation 4:

** Equation 5** – Straight-line fecal transmission distance (FTD)

FTD (km) = DR * PT

However, animals rarely move in a straight line, and without any additional information, we can assume a random walk pattern with a probability density function governed by a random walk as:

** Equation 6** – Random walk transmission (RWT) per species

RWT (km^{2}/day) = (FTD)^{2}/(2*PT)

Here we define the mean fecal diffusivity as the mean range in any pixel a generalist microbe could travel during its lifetime assuming an equal chance of colonizing any mammal species.

** Equation 7** – Mean Fecal Diffusivity (FD)

FD (km^{2}/day) = ∑RWT_{i}/ n

(i: per pixel species number; n: = number of mammal species per pixel)

This equation represents the average distance a fecal pathogen would travel in an ecosystem if it had an equal chance of being picked up by any nearby species walking in a random walk.

#### EID modelling

We then tested whether these changes in pathogen dispersal distance could help explain the location of 145 new zoonotic diseases that emerged over the past 64 years (Jones et al., 2008). Jones et al 2008 searched the literature to find biological, temporal and spatial data on 335 human EID ‘events’ between 1940 and 2004 of which 145 were defined as zoonotic. We also divide our analysis into vector driven (Table S3), non-vector driven (Table S4) and all diseases (Table 2). To control for spatial reporting bias, they estimated the mean annual per country publication rate of the Journal of Infectious Disease (JID). However, this is not a perfect control for reporting bias as it may bias towards first world countries. In their paper, they used predictor variables of log(JID), log(human population density), human population growth rates, mean monthly rainfall, mammal biodiversity, and latitude. We repeat this study but add our six data layers shown in Figure 1 of animal function, as well as other variables such as rainfall seasonality, total biodiversity (species richness including the now extinct megafauna), biomass weighted species richness and the change in biomass weighted species richness. In total, we tested 16 variables against the EID outbreaks (explained in Table S1). In addition to the 145 known EID outbreaks, we randomly generated ~five times more random points (>600 points) to compare them (all results in the paper are the average of three separate runs where the control points vary randomly) (see Figure S2 as an example distribution).

We then used the Ordinary Least Squares (OLS) multiple regression models to predict the EID events. We used Akaike’s Information Criterion (AIC) for model inter-comparison, corrected for small sample size. Whenever spatial data are used there is a risk of autocorrelation because points closer to each other will have more similar signals than points far from each other. We therefore used Simultaneous Auto-Regressive (SAR^{err}) models (Table 2) to account for spatial autocorrelation (Dormann et al., 2007) using the R library ‘spdep’ (Bivand, Hauke, & Kossowski, 2013). SAR-err reduces the sample size by assuming that all outbreaks within the same neighbourhood are the same. We examined possible neighbourhood sizes to determine how effective each was at removing residual autocorrelation from model predictions. We defined neighbourhoods by distance to the sample site. We tried distances from 5 km to 300 km and found that AIC was minimized at 200 km (Figure S3 – the average of 16 simulations). Following this reduction of our dataset, our correlogram (Figure S4) indicates vastly reduced spatial autocorrelation. We estimated the overall SAR model performance by calculating the square of the correlation between the predicted (only the predictor and not the spatial parts) and the raw values. We will refer to this as pseudo-R^{2} in the paper even though we are aware that several different estimates of model fit are frequently referred to as pseudo-R^{2}. We also did a VIF analysis using the R package usdm (Naimi et al., 2014) to control for multicollinearity and all VIFs of the predictor variables are below 1.5 showing little multicollinearity.

#### Individual based model

To establish whether the loss of terrestrial megafauna increased microbe heterogeneity, we used Matlab (Mathworks) to create an individual based model (IBM) with two randomly distributed animal species carrying a generalist microbe. We varied our model assumptions (parentheses below) in sensitivity studies (Tables S7). The IBM consisted of a 500x500 cell grid (300x300 and 1000x1000 – in our sensitivity study, we tried big and small grids) with species A in 10% (5 and 20%) of randomly selected cells and species B also in 10% (5 and 20%) of cells. 10% (5 and 20%) of animals contained the generalist microbe. We then created a 9 by 9 grid around each of species A. This was considered the home range of the species and the group of animals would interact with all other groups of animals within that home range. We assumed the home range of species B to be one grid cell. We make a simple assumption that mutations in this generalist microbe increase linearly with time until two animals interact, at which point the microbe was assumed to have been shared and the accumulated difference between host microbiomes is reset to zero. Later, we reduced the home range of species A from a 9x9 to a 3x3 grid, mimicking the decline in dispersal following the extinctions. We then, at each time step, identified the microbe with the highest accumulated mutation rate within the 500 by 500 grid for the megafauna world (9 by 9 simulation) and the post extinction world (3 by 3 simulations) (Figure 2). In order to parameterize the model with real world values, we assigned a single time step an arbitrary value of a single year (see justification in supplementary methods). The model was run for 20,000 years, putting the range reduction of species A at around 10,000 years ago, an approximate date for a large part of the Late Pleistocene extinctions.

## Usage notes

#### Figure map

**Equation 1 and 3** - To determine the initial allometric scaling relationships run the program codeHR.m. This requires the .csv data called MergeTraitsMSW_slim. This will produce graphs with the coefficients for Eq 1 and 3. Also, data and stats are in spreadsheet HRandDR.xls.

#### Figure 1 and table 1

To create the dispersal maps, run the program diseasezfirst.m. The top part of the code will create the past dispersal maps. These include not just extinct mammals, but former range maps of extant species, like elephants that have a much reduced range. To avoid double counting, I use column 3 of the extinct variable to identify these species. I then use the code spnotextinct.m to find the current maps to subtract from these. This is done in lines 21-31. From lines 68, I calculate the same variables for the current IUCN dataset. Line 72 removes bats from the analysis.

To create table 1 – run createtable1.m – output is on line 110

To create maps from Figure 1, change the input on line 118

**Figure 2 - **For Figure 2 – to create the figure, the data is in 20000all.mat. The code to create that data is IBMdisease2.m. To get tables S7 and S8, we vary parameters in this same code.

**Table 2 -**Input datasets are created in diseasezfirst.m as explained above. Use the sortcountriez.m code to create the country level JID and population datasets. The saved maps from these datasets go into the datasets for the code finaldiseasecode.m.

Modify q in line 6 to choose the dataset to analyze. q=1=vector, q=2=notvector, q=3=all of the EIDs from Jones et al.

Code description - Line 50 eliminates points too close to each other to reduce spatial autocorrelation. Line 122 calculates the key megafauna variables. From line 158, this finds and aggregates all the data for the disease pixels. From line 270, finds random points on the land surface and generates data for these pixels. From line 360, saves the data to analyze with the R code. Dataz3.xls is the dataset used as input into the R code

Then run the r code SAR.r. In line 7, choose one of three variables from the matlab code (dataz1, dataz2, dataz3). Choose the variables by modifying the dz variable. To produce the data in Table 2 run SAR.r– uncomment the dz term for the three different models. For instance, to get model FD uncomment: dz = cbind(y1, y2, y6, y11)

Table 2 and Table S2 takes the pseudo r2 the aic values the VIF and the model coefficints from these data.

**Table S3** - To produce the data in Table S3 use SARnovector.r and input dataz2.xls . uncomment the dz term for the three different models. For instance, to get model FD uncomment: dz = cbind(y1, y2, y6, y11)

**Table S4** - To produce the data in Table S4 use SARvector.r and input dataz1.xls. uncomment the dz term for the three different models. For instance, to get model FD uncomment: dz = cbind(y1, y2, y6, y11)

**Figure 3** – This figure was produced by Victor Leshyk and the data come from Table 1.

**Figure 4** - For figure 4, we use the equations listed in Table 2 model FD to estimate EID likelihood. We use the code predictor.m. To produce the numbers listed in the final results paragraph, for model FD on line 37 choose 0, and for model HR, choose 1. Use lines 80 and 82 to create maps 4a and b.