Data and scripts from: Most mammals do not wander: few species escape continental endemism
Data files
Jul 03, 2025 version files 76.40 KB
-
README.md
4.12 KB
-
Supplementary_Files.zip
72.28 KB
Abstract
Terrestrial mammals are found nearly everywhere on earth. Yet, most taxa are endemic to a single continent; geological, evolutionary, ecological or physiological filters constrain geographic distributions. Here, we synthesize data on geography, taxonomy, lineage age, dispersal, body size, and diet for >4,000 terrestrial mammals prior to detectable human-mediated biodiversity losses and quantify factors correlated with the likelihood of dispersal between continents. We confirm the uniqueness of being on multiple continents: excluding humans and commensals, only 260 mammals are found on two continents, while six span three or more continents (the red deer, red fox, brown bear, least weasel, and common bent-wing bat), and just a single species—the lion—once had a geographic range that included four continents. Clearly, the challenges of colonizing and persisting on multiple continents are severe. No single characteristic enables taxa to be on more than one continent. Rather, a suite of prerequisite conditions under some circumstances lead to distributions spanning multiple continents. Interestingly, the suite of factors facilitating the occupation of two continents, like being volant, is distinct from those that lead to the occupation of three or more, which are primarily faunivores. Other than humans and our commensals, very few species have become truly cosmopolitan over evolutionary time and geographic space.
https://doi.org/10.5061/dryad.1g1jwsv69
Description of the data and file structure
Files and variables
File: familyOrigin_Oct2024.xlsx
Description: A dataset of mammalian families and the continent of first fossil occurrence to as of Oct. 2024. Sheet 1 has the data; Sheet 2 "origin of place reference" contains the references used to determine continent of origin for the family; Sheet 3 "origin of date reference" contains the references for the first occurrence of that family. Empty cells indicate that no data was available at the time of the dataset's creation.
Variables for Sheet 1
- family: Accepted mammalian family based off Wilson & Reeder 2005 or the literature
- continent: Continent where the family originated, which can be Eurasia, Africa, Australia, North America (North.America), or South America (South.America).
- continent_reference: A number that corresponds to a reference in Sheet 2 "origin of place reference".
- first origin time mya: A number in millions of years or an interval of the first occurrence of a species of that family.
- time_reference: A number that corresponds to a reference in Sheet 2 "origin of date reference".
- notes: An open field for comments.
File: familyOrigin_Oct2024.csv
Description: A .csv version of only the data values (i.e., not references) of familyOrigin_Oct2024.xlsx.
Variables for Sheet 1
- see above
Code/software
Setup
Follow these steps to set up the project:
Download associated data
- Download data
Supplementary_Files.zip. Unzip - Download data and convert to appropriate file types following the instructions in "Analysis.R". These are commented in the script.
Download data
- In
Supplementary_Files.zip, there is also a file "familyOrigin_Oct2024.xlsx". - Convert .xlsx file to .csv [CSV UTF-8 (Comma delimited) (.csv)]. This file is also provided.
Set up the environment
- Open R. Set your directory to your files locations.
- Be sure to install the following packages:
- dplyr (1.1.4)
- purrrlyr (0.0.8)
- tidyverse (2.0.0)
- tidyr (1.3.1)
- reshape2 (1.4.4)
- ggplot2 (3.5.1)
- stringr (1.5.1)
- gcookbook (2.0)
- scales (1.3.0)
- rpart (4.1.23)
- rpart.plot (3.1.2)
- randomForest (4.7.1.1)
- caret (6.0.94)
- MASS (7.3.60.2)
- mfx (1.2.2)
- stargazer (5.2.3)
- raster (3.6.30)
- data.table (1.15.4)
- tiff (0.1.12)
- paleobioDB (1.0.0)
- ape (5.8)
- doParrallel (1.0.17)
- Run the scripts.
Access information
Data was derived from the following sources:
- https://github.com/SmithLabUNM/MOM-Database/tree/MOMv11.1
- Smith, F.A., Lyons, S.K., Ernest, S.M., Jones, K.E., Kaufman, D.M., Dayan, T., Marquet, P.A., Brown, J.H. and Haskell, J.P., 2003. Body mass of late quaternary mammals: ecological archives E084‐094. Ecology, 84(12), 3403-3403. https://doi.org/10.1890/02-9003
- Faurby, S., Davis, M., Pedersen, R.Ø., Schowanek, S.D., Antonelli, A. and Svenning, J.C., 2018. PHYLACINE 1.2: the phylogenetic atlas of mammal macroecology. Ecology, 99(11), 2626. https://zenodo.org/records/1250504
- Pacifici, M., Santini, L., Di Marco, M., Baisero, D., Francucci, L., Marasini, G.G., Visconti, P. and Rondinini, C., 2013. Generation length for mammals. Nature Conservation, 5, 89-94. https://doi.org/10.3897/natureconservation.5.5734
- Jones, K.E., Bielby, J., Cardillo, M., Fritz, S.A., O'Dell, J., Orme, C.D.L., Safi, K., Sechrest, W., Boakes, E.H., Carbone, C. and Connolly, C., 2009. PanTHERIA: a species‐level database of life history, ecology, and geography of extant and recently extinct mammals: Ecological Archives E090‐184. Ecology, 90(9), 2648-2648. https://doi.org/10.1890/08-1494.1
Data Collection
We used the updated Body Mass of Late Quaternary Mammals dataset (Smith et al. 2003) to version 11.1. See supplemental information of manuscript for deatils. We additionally collected contitnet of family origin ("familyOrigin_Oct2024.csv"). We also added in generic first appearance from the PaleobioDB (see Analysis.R) and Faurby et al. 2018 (PHYLACINE). We also combined data about geographic range, home range, and age of dispersal from Jones et al. 2009 (PanTHERIA), natural ranges from Faurby et al. 2018 (PHYLACINE), as well as generation length from Pacifici et al. 2013. We do not republish existing datasets here.
Data cleaning
Data for Analysis
We removed all species records not on a continent (i.e., insular and marine species). We also removed non-native species, including introduced and domesticated species. This is in "Analysis.R" under "TRIM DATA".
Since we do not include previously published data, the script “Analysis.R” includes instructions for finding and uploading other datasets used.
Data Manipulation
All data manipulation can be found in "Analysis.R".
We first standardized diet categories under "FIX DIET". We then filled in missing diet by creating generic and above diet categories, under "MAKE GENERIC & FAMILY AVERAGES".
Analysis
The script for all analyses is "Balk_etal_GlobeTrotters.R". These include sections, "NUM SP PER CONTINENT", "CONNECTIVITY", "FAMILY ORIGIN", "DIVERSITY OF CLADE", "BODY SIZE:, "DIET", "HABITAT MODE", "DECISION TREE", "LOG ODDS", and "GEOGRAPHIC RANGE SIZE".
