Data from: Coevolution of cooperative lifestyles and reduced cancer prevalence in mammals
Data files
Oct 30, 2025 version files 11.83 MB
-
consensus_phylogeny.tre
110.20 KB
-
Dataset1.csv
22.63 KB
-
Dataset2.csv
5.66 KB
-
Dataset3.csv
12.96 KB
-
Dataset4.csv
323 B
-
Math_Models_Simulation_Results.ipynb
11.62 MB
-
README.md
47.10 KB
-
SCRIPT_Coevolution_of_cooperative_lifestyles_and_reduced_cancer_prevalence_in_mammals.R
11.41 KB
-
tree_for_database2.txt
547 B
Abstract
Why cancer is so prevalent among mammals, despite the fact that some species evolved resistance mechanisms, remains an open question. We hypothesized that cancer prevalence and mortality risk might have been fine-tuned by evolution. Using public databases, we show that species with cooperative habits have lower cancer prevalence and mortality risk. By developing a mathematical model, we provide a mechanistic explanation: an oncogenic variant that elicits higher cancer mortality in older and less reproductive individuals is detrimental to cooperative mammalian societies but can lead to a counterintuitive overcompensation in population size and fitness within competitive contexts. The phenomenon of a population increasing in response to a decrease in its per capita survival rate is called the hydra effect, a process never explored in the field of cancer before. Therefore, cancer can be considered as a selected mechanism of biological obsolescence in competitive species.
Dataset DOI: 10.5061/dryad.xgxd254vh
Description of the data and file structure
Dataset1.csv
Cancer mortality risk (CMR) was calculated for each species as the proportion of cancer-related deaths among the total number of records, based on post-mortem pathological records (n = 11,840, Vincze et al., 2022). This information was sourced from Species360 and The Zoological Information Management System. The dataset initially included 191 species, but Dasyuroides byrnei was removed because of its extremely high CMR, which was considered an outlier. This CMR data were gathered from mammals in zoos worldwide, providing high-resolution cause-of-death data. CMR was estimated from neoplastic samples that substantially contributed to the animal death, as confirmed by necropsies. The CMR estimated for every species included in this dataset is based on more than 20 necropsies per species (mean = 62).
Dataset2.csv
Prevalence of neoplasia was estimated as the prevalence of any neoplasm in mammalian species from San Diegos zoos (Boddy et al., 2020). The dataset initially included 37 species, but Loxodonta africana was removed because of incongruences with other publications reporting lower cancer rates. The prevalence of neoplasia estimated for the species included in this dataset is based on an average of 23 necropsies per species. Vulpes zerda, Puma concolor, Canis mesomelas, Lama glama, Lycaon pictus, Tarsius syrichta, Macropus rufus, and Equus asinus are the only species with less than 10 necropsies analyzed.
Dataset3.csv
We used a recently curated and standardized dataset of malignancy prevalence across mammalian species that is based on more than 20 necropsies per species (Compton et al., 2025). This resource includes additional species not considered in the other datasets. In this analysis, a list of archetypal species with very high or very low malignancy prevalence was constructed: All species were ranked according to their malignancy prevalence, and three subsets were defined using different cutoffs: rank10, rank15, and rank20, each including the 10, 15, or 20 species with the highest and lowest malignancy prevalence, respectively. These ranked groups consisting of 20, 30 and 40 species, respectively, were then used for downstream comparative analyses. The total dataset comprised 102 mammalian species.
Dataset4.csv
This is an accesory dataset, that is a summary of the main content of Dataset 1. This dataset (dataset 4) was employed to perform an order level analysis, mean CMR for all the species belonging to each order with at least 15 species (Artiodactyla, Carnivora, Primates, and Rodentia) was calculated.
Files and variables
File: Dataset_1.csv
| Variable | Definition | Data type | Reference |
| species | Binomial scientific name of the species. | Character | https://doi.org/10.1038/s41586-021-04224-5 |
| order | Order name of the species. | Character | https://doi.org/10.1038/s41586-021-04224-5 |
| CMR | Adult cancer mortality risk. Takes values between 0 and 1. | Numeric | https://doi.org/10.1038/s41586-021-04224-5 |
| n_dead_ind | Total number of dead individuals per species in the database. | Numeric | https://doi.org/10.1038/s41586-021-04224-5 |
| known_deaths | Total number of dead individuals whose pathological records were identified. | Numeric | https://doi.org/10.1038/s41586-021-04224-5 |
| n_neoplasia | Total number of neoplasia cases recorded in each species, that were considered to be significant contributors to the death of the animals. | Numeric | https://doi.org/10.1038/s41586-021-04224-5 |
| n_no_neoplasia | Total number of dead individuals with pathological records ("known_deaths") substracting the total number of neoplasia cases ("n_neoplasia") | Numeric | https://doi.org/10.1038/s41586-021-04224-5 |
| life_expectancy_d | Average number of days lived after sexual maturity was reached, i.e. remaining life expectancy. | Numeric | https://doi.org/10.1038/s41586-021-04224-5 |
| body_mass_kg | Average female and male body mass in kilograms. | Numeric | https://doi.org/10.1093/nar/gkx1042 |
| metabolic_rate | Metabolic rate of the species in Watts (W). | Numeric | https://doi.org/10.1002/ecy.3344 |
| litter_size_n | Number of offspring born per litter per female. | Numeric | https://doi.org/10.1002/ecy.3344 |
| gestation_length_d | Duration of fetal growth in days. | Numeric | - |
| total_litters | Total Litters born per female in a lifetime. Calculated as the number of litters per year multiplied by litter size and the difference between the maximum longevity and female sexual maturity for each species. | Numeric | - |
| litters | A classification variable that distinguishes species based on their litter size. It categorizes species as either "monotocous" (typically producing one offspring per litter) or "polytocous" (producing more than one offspring per litter), with a litter size threshold of 1.5. | Binary (monotocous, polytocous) | https://doi.org/10.1093/beheco/araa039 |
| breeding_system | A variable indicating whether females occupy a shared territory or separate territories during the breeding season. It is categorized as either SingularBreeder (females in separate territories) or PluralBreeder (females in a common territory). | Binary (SingularBreeder, PluralBreeder) | https://doi.org/10.1093/beheco/araa039 |
| paternal_care | A variable indicating whether males provide care or assistance to offspring. It is categorized as either "Yes" (males provide paternal care) or "No" (males do not provide paternal care). | Binary (Yes, No) | https://doi.org/10.1111/j.1558-5646.2007.00229.x / https://doi.org/10.1126/science.1238677 |
| group_living | A variable indicating whether a species lives in social groups with regular interactions among individuals. It is categorized as either "yes" (the species engages in group living) or "no" (the species does not). | Binary (yes, no) | |
| *Values referred to as NA mean data is not available |
File: Dataset_2.csv
| Variable | Definition | Data type | Reference |
| species | Binomial scientific name of the species. | Character | https://doi.org/10.1093/emph/eoaa015 |
| common_name | Widely recognized or vernacular name of a species, used to identify it in everyday language, as opposed to its scientific name. | Character | https://doi.org/10.1093/emph/eoaa015 |
| order | Order name of the species. | Character | https://doi.org/10.1093/emph/eoaa015 |
| family | Family name of the species. | Character | https://doi.org/10.1093/emph/eoaa015 |
| genus | Genus name of the species. | Character | https://doi.org/10.1093/emph/eoaa015 |
| total_necropsies | Total number of individuals necropsied. | Numeric | https://doi.org/10.1093/emph/eoaa015 |
| any_neoplasia | Total number of neoplasia diagnosed (malignant and benign). | Numeric | https://doi.org/10.1093/emph/eoaa015 |
| any_no_neoplasia | Total number of individuals necropsied ("total_necropsies") substracting the total number of neoplasia. ("any_neoplasia") | Numeric | - |
| any_malignant | Number of malignant neoplasia diagnosed. | Numeric | https://doi.org/10.1093/emph/eoaa015 |
| prop_neoplasia | Proportion of number of neoplasia diagnosed (any_neoplasia) and the total number of necropsies (total_necropsies) | Numeric | https://doi.org/10.1093/emph/eoaa015 |
| prop_malignant | Proportion of number of neoplasia diagnosed (any_malignant) and the total number of necropsies (total_necropsies) | Numeric | https://doi.org/10.1093/emph/eoaa015 |
| adult_mass_kg | Body mass of an adult individual in kilograms. | Numeric | https://doi.org/10.1002/ecy.3344 |
| metabolic_rate | Metabolic rate of the species in Watts (W). | Numeric | https://doi.org/10.1093/nar/gkx1042 |
| max_lifespan_yr | Maximum reported age at death for the species in years. | Numeric | https://doi.org/10.1002/ecy.3344 |
| gestation_length_d | Duration of fetal growth in days. | Numeric | https://doi.org/10.1002/ecy.3344 |
| litter_size_n | Number of offspring born per litter per female. | Numeric | https://doi.org/10.1002/ecy.3344 |
| litters | A classification variable that distinguishes species based on their litter size. It categorizes species as either "monotocous" (typically producing one offspring per litter) or "polytocous" (producing more than one offspring per litter), with a litter size threshold of 1.5. | Binary (monotocous, polytocous) | - |
| total_litters | Total Litters born per female in a lifetime. Calculated as the number of litters per year multiplied by litter size and the difference between the maximum longevity and female sexual maturity for each species. | Numeric | - |
| breeding_system | A variable indicating whether females occupy a shared territory or separate territories during the breeding season. It is categorized as either SingularBreeder (females in separate territories) or PluralBreeder (females in a common territory). | Binary (SingularBreeder, PluralBreeder) | https://doi.org/10.1093/beheco/araa039 |
| paternal_care | A variable indicating whether males provide care or assistance to offspring. It is categorized as either "Yes" (males provide paternal care) or "No" (males do not provide paternal care). | Binary (Yes, No) | https://doi.org/10.1093/beheco/araa039 |
| group_living | A variable indicating whether a species lives in social groups with regular interactions among individuals. It is categorized as either "yes" (the species engages in group living) or "no" (the species does not). | Binary (yes, no) | https://doi.org/10.1111/j.1558-5646.2007.00229.x / https://doi.org/10.1126/science.1238677 |
| *Values referred to as NA mean data is not available |
File: Dataset_3.csv
| Variable | Definition | Data type |
| common_name | Widely recognized or vernacular name of a species, used to identify it in everyday language, as opposed to its scientific name. | Character |
| species | Binomial scientific name of the species. | Character |
| malignancy_prevalence | A variable indicating whether a species has reported low or high cancer incidence. This classification is based on a review of available literature and reflects the frequency of cancer occurrences within that species. | Numeric |
| malignancy_prevalence_reference | A reference indicating the source of information regarding malignancy prevalence. | Character |
| adult_mass_kg | Body mass of an adult individual in kilograms. | Numeric |
| log_adult_mass_kg | The natural logarithm of the adult mass in kilograms. | Numeric |
| log_adult_mass_kg_category | Categorization of adult mass based on logarithmic values: <1 (small) or >1 (large). | Binary (<1, >1) |
| max_lifespan_d | Maximum reported age at death for the species in days. | Numeric |
| log_max_lifespan_d | The natural logarithm of the maximum lifespan in days. | Numeric |
| log_max_lifespan_d_category | Categorization of lifespan based on logarithmic values: <9 (short-lived) or >9 (long-lived). | Binary (<9, >9) |
| litter_size | Number of offspring born per litter per female. | Numeric |
| metabolic_rate_W | Metabolic rate of the species in Watts (W). | Numeric |
| log_metabolic_rate_W | The natural logarithm of the metabolic rate in Watts. | Numeric |
| log_metabolic_rate_W_category | Categorization of metabolic rate based on logarithmic values: <1 (low) or >1 (high). | Binary (<1, >1) |
| metabolic_rate_reference | A reference indicating the source of information regarding metabolic rate. | Character |
| breeding_system | A variable indicating whether females occupy a shared territory or separate territories during the breeding season. It is categorized as either SingularBreeder (females in separate territories) or PluralBreeder (females in a common territory). | Binary (SingularBreeder, PluralBreeder) |
| litters | A classification variable that distinguishes species based on their litter size. It categorizes species as either "monotocous" (typically producing one offspring per litter) or "polytocous" (producing more than one offspring per litter), with a litter size threshold of 1.5. | Binary (monotocous, polytocous) |
| group_living | A variable indicating whether a species lives in social groups with regular interactions among individuals. It is categorized as either "yes" (the species engages in group living) or "no" (the species does not). | Binary (yes, no) |
| group_living_reference | A reference indicating the source of information regarding group living behavior. | Character |
| *Values referred to as NA mean data is not available |
File: Dataset_4.csv
| Variable | Definition | Data type |
| order | Order name of the species grouped for the analysis. Only orders with more than 15 species were included. | Character |
| CMR_mean | Mean of the Adult cancer mortality risk. | Numeric |
| CMR_median | Median of the Adult cancer mortality risk. | Numeric |
| index_gl | Ratio between the species with and without Group Living within each order. | Numeric |
| index_litters | Ratio between monotocous and polytocous species within each order. | Numeric |
| index_BS | Ratio between plural and singular breeding species within each order. | Numeric |
| n_cancer | Total number of neoplasia cases recorded in each order, that were considered to be significant contributors to the death of the animals. | Numeric |
| n_no_cancer | Total number of dead individuals with pathological records ("n_total") substracting the total number of neoplasia cases ("n_cancer"). | Numeric |
| n_total | Total number of dead individuals in the order whose pathological records were identified. | Numeric |
- SCRIPT_Coevolution_of_cooperative_lifestyles_and_reduced_cancer_prevalence_in_mammals.R
Statistical analysis code for correlations between cancer prevalence and mortality risk with different phenotypic traits
Correlations of CMR and neoplasia with the different traits were performed using phylGLMMs using phyr in R Statistical and Programming Environment, version 4.2.3. Previous investigations with the species included in these datasets showed that there is a phylogenetic signal for CMR and neoplasia among mammal species (Vincze et al. 2022). To control for phylogenetic relatedness among species, we performed phylGLMMs using the original robust phylogeny by Vincze et al. (2022) (consensus_phylogeny.tre). phylGLMMs used a binomial error distribution and a logit link function, adding a random variable at the level of observations to avoid overdispersion problems. This random variable, called “species,” was constructed with the identity of each species analyzed. Not all analyses using CMR data were performed with the full set of species, since the information for some of the traits analyzed was not available for all species. All modelsperformed were evaluated for overdispersion and zero inflation using DHARMa package. All model tests showed P > 0.05, which indicates that no fit problems were detected, and therefore, unlike previous investigations, we chose to perform the analyses using species with both zero and nonzero CMR.
Models used:
For Dataset1.csv
1) An additive phylGLMM was performed with CMR as response variable and log transformed continuous variables of covariate traits body mass, litter size, life expectancy, and gestation length. Log- transformed variables were used as fixed effects, as well as species as a random variable at the observation level ( n = 190). The physiological trait metabolic rate was also log transformed and analyzed in a separate model to avoid collinearity problems with log body mass (n = 52).
2) A simple model for dichotomous variable litters was performed (n = 190) to test for CMR differences in monotocous or polytocous species. This dichotomous variable was tested in a model with continuous variables log life expectancy and log body mass to evaluate interaction ( n = 190).
3) For lifestyle dichotomous variables group living, breeding system, and paternal care, we performed separate analyses to avoid collinearity problems, in all cases with CMR as the response variable, as well as species as a random variable at the level of observations. For group living and breeding system variables, we also performed models with the continuous variables (log body mass, log litter size, and log life expectancy) and tested the interaction with log body mass ( n = 146).
4) Animal diet as a dichotomous variable was analyzed using CMR as the response variable, as well as species as a random variable at the level of observations. The association between animal diet and CMR was also assessed in relation to the other life history and lifestyle traits using four different models that include species of animal diet and group living (gregarious/solitary) and breeding system (singular/plural).
For the Dataset2.csv
Neoplasia data on 36 species from the second dataset were analyzed using the same phylGLMM simple models with one variable per model as before but with a different phylogeny of the 36 mammal species constructed from the updated mammalian supertree (tree_for_database2). The same data for the different morphophysiological, life history, and lifestyle traits as before were used, with the exception of log body mass and log maximum lifespan where the analyses were performed with adult mass (in kilograms) and maximum lifespan (in days) from Boddy et al. (2020).
For Dataset4.csv
To perform an order level analysis, mean CMR for all the species belonging to each order with at least 15 species (i.e., Artiodactyla, Carnivora, Primates, and Rodentia) was calculated. We also built indexes for each trait of interest: (i) litter index, ratio between monotocous and polytocous species within each order; (ii) group living index, ratio between the species with and without group living within each order; and (iii) breeding system index, ratio between plural and singular breeding species within eachorder. The analysis was performed with GLMs using a binomial error distribution and a logit link function, using the glmmTMB package. The total set of P values derived from analysis using CMR data was corrected for multiple testing using FDR correction.
The total set of P values derived from analysis using each dataset was corrected for multiple testing using FDR correction.
For Math_Models_Simulation_Results.ipynb
All code used to simulate population dynamics including parameters chosen and generation of plots are in this python notebook file. Reading and executing this code will give results shown on the paper. No further statistical analysis was included in the paper regarding this section.
References
O. Vincze, F. Colchero, J.-F. Lemaître, D. A. Conde, S. Pavard, M. Bieuville, A. O. Urrutia, B. Ujvari, A. M. Boddy, C. C. Maley, Cancer risk across mammals. Nature 601, 263–267 (2022).
A. M. Boddy, L. M. Abegglen, A. P. Pessier, A. Aktipis, J. D. Schiffman, C. C. Maley, C. Witte, Lifetime cancer prevalence and life history traits in mammals. Evol. Med. Public Health 2020, 187–195 (2020).
Z. T. Compton, W. Mellon, V. K. Harris, S. Rupp, D. Mallo, S. E. Kapsetaki, M. Wilmot, R. Kennington, K. Noble, C. Baciu, Cancer prevalence across vertebrates. Cancer discovery 15, 227–244 (2025).
Phylogenetic Trees
consensus_phylogeny.tre
tree_for_database2.txt
See the following DOI: 10.1126/sciadv.adw0685 (available on November 12, 2025)
CMR, neoplasia and malignancy prevalence in mammalian species
First dataset: Cancer Mortality Risk (CMR) was calculated for each species as the proportion of cancer-related deaths out of the total number of records, based on post-mortem pathological records (n=11,840). This information was sourced from Species360 and The Zoological Information Management System (ZIMS). The dataset initially included 191 species, but D. byrnei was removed due to its extremely high CMR, which was considered an outlier. This CMR data was gathered from mammals in zoos worldwide, providing high-resolution cause-of-death data. CMR was estimated from neoplastic samples that substantially contributed to the animal death, as confirmed by necropsies. The CMR estimated for each and every species included in this dataset is based on more than 20 necropsies per species (mean = 62).
Second dataset: Prevalence of neoplasia was estimated as the prevalence of any neoplasm in mammalian species from San Diego's zoos. The dataset initially included 37 species, but L. africana was removed due to incongruences with other publications reporting lower cancer rates. The prevalence of neoplasia estimated for the species included in this dataset is based on an average of 23 necropsies per species. Vulpes zerda, Puma concolor, Canis mesomelas, Lama glama, Lycaon pictus, Tarsius syrichta, Macropus rufus and Equus asinus are the only species with less than 10 necropsies analyzed.
Third dataset: We used a recently curated and standardized dataset of malignancy prevalence across mammalian species that is based on more than 20 necropsies per species. This resource includes additional species not considered in the other datasets. In this analysis, a list of archetypal species with very high or very low malignancy prevalence was constructed: all species were ranked according to their malignancy prevalence, and three subsets were defined using different cut-offs: Rank10, Rank15, and Rank20, each including the 10, 15, or 20 species with the highest and lowest malignancy prevalence, respectively. These ranked groups consisting of 20, 30 and 40 species, respectively, were then used for downstream comparative analyses. The total dataset comprised 102 mammalian species.
Morpho-physiological, life history and lifestyle traits
Data on Body Mass (kg) and Life Expectancy (days) used for the first dataset have been extracted from Vincze et al. (n=190 species). Data on Adult Mass (kg,) and Maximum Lifespan (days) used for the second (n=32 and n=36 species, respectively) and third databases (n=94 species, in both cases) was obtained from the COMBINE database. Data on Metabolic Rate (n=52 for the first dataset, n=31 for the second dataset and n=52 for the third dataset) was obtained from the AnAge database and expressed in Watts (W). For the third database, a categorization was made for variables Adult Mass, Metabolic Rate and Maximum Lifespan, in order to divide the species into two categories, with a threshold such as to have two groups with a comparable number of species.
We defined life history traits (Litter Size, Litters, Gestation Length, Life Expectancy and Maximum Lifespan) as those that depend on the history of the individual but are not clearly behavioral like lifestyle traits (Group Living, Breeding System). We chose litter size, gestation time and life expectancy as three classic life history traits. In particular, life expectancy is a well-determined variable in many species, which helps to have a larger sample size.
Data on Litter Size (mean number of descendants per female, n=190 species for the first dataset, n=32 for the second dataset and n=94 for the third dataset) and Gestation Length (days, n=190 for the first dataset and n=32 for the second dataset) was obtained from the COMBINE database. The variable "Litters" was used to classify species as either monotocous or polytocous, using a litter size of 1.5 as threshold. Transforming Litter Size into a dichotomous variable allowed us to statistically test its interaction with body mass, similarly to what we did with dichotomous variables such as Group Living. Total Litters was calculated as the number of litters per year multiplied by litter size and the difference between the maximum longevity and female sexual maturity for each species. All the data for the calculations were obtained from the COMBINE database. Group Living (n=144 species for the first dataset, n=24 for the second dataset and n=77 for the third dataset) was determined by integrating data from two sources: Pérez-Barberia et al. and Lukas & Clutton-Brock. The variable is dichotomous, indicating whether a species engages in group living based on regular associations among individuals. A species was classified as Group Living if it showed sociality or was listed as group living by either source. Conversely, it was classified as not having Group Living if it exhibited no sociality or was listed as solitary or socially monogamous by either source. When data from both sources were available, a species was included only if both sources agreed, otherwise it was either excluded, or a choice was made based on available literature. Data on Breeding System (singular breeders or plural breeders, n=147 species for the first dataset, n=28 for the second dataset and n=79 for the third dataset) was gathered from Lukas & Clutton-Brock. The category of singular or plural breeders was assigned if the females occupy a separate or common territory or range during the breeding season, respectively. Data on the dichotomous variable Paternal Care (n=157 species for first dataset and n=29 for second dataset) was also obtained from Lukas & Clutton-Brock.
Data on Animal Diet (consumption of animals, including vertebrates and invertebrates) was sourced from Vincze et al., who compiled the information from a global mammalian diet database. This dataset categorizes dietary components into four hierarchical levels: never consumed, occasionally consumed, secondary food item, and primary food item. For our analysis, we focused solely on whether animal matter was present in the diet, without differentiating between specific types. Since the intermediate categories (occasional and secondary consumption) included relatively few species, Vincze et al. consolidated the dietary classifications into two broader levels: rarely/never consumed and regularly consumed (i.e., as a primary or secondary food source). Diet information was included only for the first dataset, due to the strength of the analysis and the sample size available.
Statistical analysis
Correlations of CMR and neoplasia with the different traits were performed employing phylogeny-corrected generalized linear mixed models (phylGLMM) using phyr in R Statistical and Programming Environment, version 4.2.3. Previous investigations with the species included in these datasets showed there is a phylogenetic signal for CMR and neoplasia among mammal species. To control for phylogenetic relatedness among species we performed phylGLMM models using the original robust phylogeny by Vincze et al. phylGLMMs used a binomial error distribution and a logit link function, adding a random variable at the level of observations to avoid overdispersion problems. This random variable, called “Species”, was constructed with the identity of each species analyzed. Not all analyses using CMR data were performed with the full set of species, since the information for some of the traits analyzed was not available for all species. All models performed were evaluated for overdispersion and zero-inflation using DHARMa package. All model tests showed p-values > 0.05, which indicates that no fit problems were detected and therefore, unlike previous investigations, we chose to perform the analyses using species with both zero and non-zero CMR.
Models used:
(1) An additive phylGLMM was performed with CMR as response variable and log transformed continuous variables of covariate traits Body Mass, Litter Size, Life Expectancy and Gestation Length. Log transformed variables were used as fixed effects, and Species as a random variable at the observation level. The physiological trait Metabolic Rate was also log transformed and analyzed in a separate model to avoid collinearity problems with Log Body Mass.
(2) A simple model for dichotomous variable Litters was performed to test for CMR differences in monotocous or polytocous species. This dichotomous variable was tested in a model with continuous variables Log Life Expectancy and Log Body Mass to evaluate interaction.
(3) For lifestyle dichotomous variables Group Living, Breeding System and Paternal Care we performed separate analyses to avoid collinearity problems, in all cases with CMR as the response variable, and Species as a random variable at the level of observations. For Group Living and Breeding System variables we also performed models with the continuous variables (Log Body Mass, Log Litter Size, and Log Life Expectancy) and tested the interaction with Log Body Mass.
(4) Animal Diet as a dichotomous variable was analyzed using CMR as the response variable, and Species as a random variable at the level of observations. The association between Animal Diet and CMR was also assessed in relation to the other life history and lifestyle traits using four different models that include species of Animal Diet and Group Living (gregarious/solitary) and Breeding System (singular/plural).
(5) To perform an order level analysis, mean CMR for all the species belonging to each order with at least 15 species (i.e. Artiodactyla, Carnivora, Primates, Rodentia) was calculated. We also built indexes for each trait of interest: (a) Litters Index: ratio between monotocous and polytocous species within each order, (b) Group Living Index: ratio between the species with and without Group Living within each order, and (c) Breeding System Index: ratio between plural and singular breeding species within each order. The analysis was performed with GLMs employing a binomial error distribution and a logit link function, using the glmmTMB package (table S4). The total set of p-values derived from analysis using CMR data was corrected for multiple testing using FDR correction.
(6) Neoplasia data on 36 species from the second dataset was analyzed using the same phylGLMM simple models with one variable per model as before, but with a different phylogeny of the 36 mammal species constructed from the updated mammalian super-tree. The same data for the different morpho-physiological, life history and lifestyle traits as before was used, with the exception of Log Body Mass and Log Maximum Lifespan where the analyses were performed with Adult Mass (kg) and Maximum Lifespan (days) from Boddy et al. The total set of p-values derived from analysis using this dataset was corrected for multiple testing using FDR correction. Statistical analyses for dichotomous variables were not performed on this data set because the power of the model is not strong enough to test small samples.
The analyses of the archetypal species with the highest or lowest levels of malignancy prevalence from the third database were performed qualitatively. For each dichotomous variable, a group was judged to be more enriched in species with a high prevalence of malignancies if we observed differences greater than 50% in each and every one of the three ranks (cut-offs 10, 15, and 20) and only if these differences became larger as we narrowed the rank (which is expected to occur if there is a direct relationship between both variables).
Mathematical modeling and simulation
We developed a system of ordinary differential equations (ODEs) representing a consumer population of any mammal species depending on its resources for subsistence.
The population is stage-structured based on age: pre-reproductive juveniles (J), reproductive adults (A), and senior post-reproductive adults (S). This emphasizes the reproductive capacity of individuals depending on age. Resources (R) are kept unstructured, and their dynamics are governed by a production function and by consumer foraging f. The function does not increase with the resource density R and is therefore independent of it. This allows the system to reach a steady state of non-negative values. Furthermore, consumer foraging is organized in stage-specific functions, fJ , fA and fS (all of which depend on R) associated with the age of the consumer group. The resulting resource intake is translated into physiological processes with an efficiency given by non-negative and non-decreasing functions gA (fertility rate) and gJ (rate of juvenile sexual maturation to become reproductive adults). The senescence process, given by σ > 0, represents a fixed quantity by which adults age into senior individuals. Likewise, all per-capita mortality rates (μJ, μA, μS) are positive constants. Changes in cancer mortality were modeled through the value of the μS parameter. Dependence on resources (R) for vital processes (e.g., transition rates between life stages) conveys non-social intraspecific competition between life stages. Effects of non-social competition are focused on changes in stage density distribution, as the lack of resources limits population growth by preventing juveniles from reaching adult stage and the latter from having further offspring. Such life history processes govern transition rates between life stages. In this model, α represents the strength of social intraspecific cooperation. Positive values of α are interpreted as supportive or caring interactions of older individuals towards juveniles. Increasing α values result in a decrease of juvenile mortality, while α = 0 does not take into account these phenomena (α cannot adopt negative values). Similarly, ω > 0 stands for the strength of social intraspecific competition between seniors and other individuals. Increasing ω values result in higher juvenile mortality (Eq. 1b). In our model, when α adopts positive values, we set ω to zero, and vice versa. Intraspecific competition may also be associated with resource density, in which case ρ > 0. Abundance of resources (high values of ρ) diminishes the effect of direct competition. The social parameters α and ω control the non-transitional processes of competitive and cooperative interaction between life stages of the consumer population, respectively. Alternative cooperative or competitive interactions have been also modeled in which the parameter α /ω is the ability of senior individuals (S) to influence juvenile (J) access to resources, juvenile's development into adults (A), or adult's reproductive output.
By mathematical analysis of the ODE system we were able to find necessary and sufficient conditions for the model to display a hydra effect when α, ω = 0, that is to ensure an increase in the carrying capacity of the population (N = J + A + S, the population density at dynamical equilibrium) as a result of increasing the value of the parameter μS (interpreted as higher CMR). If the senior stage has the largest consuming rate value at equilibrium, then, for this model, the specific increase in mortality among seniors will lead to the hydra effect. Conversely, if the senior consumption rate is the smallest one, the intuitive effect of a decreasing population equilibrium with increasing mortality of part of their individuals ensues. Considering the per-capita consumption rate of senior adults lower than that of reproductive adults but higher than that of juveniles (fA > fS * >* J) due to age-related size differences, then the hydra effect is indeed conditioned upon the life history trait parameter values (at equilibrium).
Expressed in this way, we can see that a larger fertility rate value at the system equilibrium gA(fA (R)) is positively associated with the existence of a hydra effect (all else being equal).
For the purposes of simulations, we employed a linearized version of the model to numerically obtain time courses by integration with Python (using Scipy, Numpy, Pandas and Matplotlib). For this end, we chose a semi-chemostat model for the resources dynamics, where the resource production function remains constant, p(R) = π. We also considered linear consuming relations, weighted by different constants associated with a characteristic size and age of the consuming stage, that is fJ (R) = κJ R, fA (R) = κA R, fS (R) = κS R. The resulting resource intake is converted into physiological processes of reproduction and maturation linearly with an efficiency given by the βA (i.e., gA (x) = βA x) and 𝛾A (i.e., gJ (x) = 𝛾A x) parameters, respectively (𝛾A refers to the maturation of juveniles originating from adults, to differentiate them from those originating from seniors, 𝛾S).
Extended versions of these models that allow other processes to occur, such as reproduction of senior adults, distinct survival rates for the senior born juveniles, and finally, the spread of a higher CMR genetic variant on the population were also implemented.
For the model with mixed populations, we started with a population in equilibrium before the introduction of the oncogenic variant (both subpopulations had the same senior mortality rates μS1 = μS2). We performed two types of tests in this initial state. A) We introduced the oncogenic variant by increasing μS2 in half of the population in equilibrium (a process of migration or subpopulation mixing), and we evolved the system to its equilibrium frequencies. B) We introduced the variant as a mutation (low initial frequency), transferring 5% of the juveniles from subpopulation 1 to subpopulation 2 (higher μS2) and monitored relative frequency time evolution as well.
The direct fitness of a genotype Gx was calculated as NGx(t)/NGx(0), where NGx(t) is the density over time of subpopulation with the genotype Gx, and N(0) is its density at time t = 0. The indirect fitness of the metapopulation was calculated as (NG1(t)+NG2(t))/(NG1(0)+NG2(0)). The relative frequency of each gene variant over time was calculated as NGx(t)/(NG1(t)+NG2(t)).
References
O. Vincze, F. Colchero, J.-F. Lemaître, D. A. Conde, S. Pavard, M. Bieuville, A. O. Urrutia, B. Ujvari, A. M. Boddy, C. C. Maley, Cancer risk across mammals. Nature 601, 263–267 (2022).
A. M. Boddy, L. M. Abegglen, A. P. Pessier, A. Aktipis, J. D. Schiffman, C. C. Maley, C. Witte, Lifetime cancer prevalence and life history traits in mammals. Evolution, medicine, and public health 2020, 187–195 (2020).
Z. T. Compton, W. Mellon, V. K. Harris, S. Rupp, D. Mallo, S. E. Kapsetaki, M. Wilmot, R. Kennington, K. Noble, C. Baciu, Cancer prevalence across vertebrates. Cancer discovery 15, 227–244 (2025).
D. Lukas, T. Clutton-Brock, Monotocy and the evolution of plural breeding in mammals. Behav Ecol 31, 943–949 (2020).
F. J. Pérez‐Barbería, S. Shultz, R. I. Dunbar, Evidence for coevolution of sociality and relative brain size in three orders of mammals. Evolution 61, 2811–2821 (2007).
D. Lukas, T. H. Clutton-Brock, The evolution of social monogamy in mammals. Science 341, 526–530 (2013).
W. D. Kissling, L. Dalby, C. Fløjgaard, J. Lenoir, B. Sandel, C. Sandom, K. Trøjelsgaard, J. Svenning, Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide. Ecology and Evolution 4, 2913–2930 (2014).
D. Li, R. Dinnage, L. A. Nell, M. R. Helmus, A. R. Ives, phyr: an R package for phylogenetic species‐distribution modelling in ecological communities. Methods in Ecology and Evolution 11, 1455–1463 (2020).
R. C. Team, R: A language and environment for statistical computing. R Foundation for Statistical Computing. (No Title) (2013).
A. F. Zuur, E. N. Ieno, N. J. Walker, A. A. Saveliev, G. M. Smith, Mixed Effects Models and Extensions in Ecology with R (Springer, 2009)vol. 574.
X. A. Harrison, Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ 2, e616 (2014).
F. Hartig, L. Lohse, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models. 2022. R package version 0.4 6.
M. E. Brooks, K. Kristensen, K. J. Van Benthem, A. Magnusson, C. W. Berg, A. Nielsen, H. J. Skaug, M. Machler, B. M. Bolker, glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R journal 9, 378–400 (2017).
Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57, 289–300 (1995).
O. R. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. MacPhee, R. M. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittleman, A. Purvis, The delayed rise of present-day mammals. Nature 446, 507–512 (2007).
A. M. de Roos, When individual life history matters: conditions for juvenile-adult stage structure effects on population dynamics. Theoretical Ecology 11, 397–416 (2018).
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 261–272 (2020).
C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, Array programming with NumPy. Nature 585, 357–362 (2020).
J. D. Hunter, Matplotlib: A 2D graphics environment. Computing in science & engineering 9, 90–95 (2007).
G. Van Rossum, F. L. Drake, Python/C Api Manual-Python 3 (CreateSpace, 2009).
W. McKinney, “Data structures for statistical computing in Python.” (2010)vol. 445, pp. 51–56.
