Skip to main content

Avoiding growing pains in reproductive trait databases: the curse of dimensionality

Cite this dataset

Ginther, Samuel; Cameron, Hayley; White, Craig; Marshall, Dustin (2022). Avoiding growing pains in reproductive trait databases: the curse of dimensionality [Dataset]. Dryad.


Aim: Reproductive output features prominently in many trait databases, but the metrics describing it vary and are often untethered to temporal- and volumetric-dimensions (e.g., fecundity-per-bout). Using such ambiguous reproductive measures to make broadscale comparisons across taxonomic groups will only be meaningful if they show a 1:1 relationship with a reproductive measure that explicitly includes both a volumetric and temporal component (i.e., reproductive mass-per-year). We sought to map the prevalence of ambiguous and explicit reproductive measures across taxa, and explore their relationships with one another to determine the cross-compatibility and utility of reproductive metrics in trait databases.

Location: Global.

Time period: 1990-2021.

Major taxa studied: We searched for reproductive measures across all Metazoa, and identified 19,785 Chordata species, along with 440 species of Arthropoda, Cnidaria, or Mollusca.

Methods: We included 37 databases from which we summarised the commonality of reproductive metrics across taxonomic groups. We also quantified scaling relationships between ambiguous reproductive traits (fecundity-per-bout, fecundity-per-year and reproductive mass-per-bout) and an explicit measure (reproductive mass per-year) to assess their cross-compatibility.

Results: Most species were missing at least one temporal or volumetric dimension of reproductive output, such that reproductive mass-per-year could be reconstructed for only 4,786 vertebrate species. Ambiguous reproductive measures were poor predictors of reproductive mass-per-year – in no instance did these measures scale at 1:1.

Main Conclusions: Ambiguous measures systematically misestimate reproductive mass-per-year. Until more data are collected, we suggest authors use the clade-specific scaling relationships provided here to convert ambiguous reproductive measures to reproductive mass-per-year.  



Methodology overview (see paper for full methods):

We followed the guidelines of the Systematic Mapping Methodology (James et al., 2016) to determine what reproductive traits are provided in animal databases, and to quantify their relationships with one another.

We searched the literature from 2020 to 2021 – we conducted the final search in October 2021. First, to identify literature to screen, we trialled a combination of search terms in the publication database, Web of Science. After trialling 16 search terms, we selected the search term ‘(phylum) AND (life history* OR trait) AND (database* OR compil*) NOT (plant*)’, where phylum was substituted each time with each metazoan phyla. These search terms resulted in a total of 28,028 hits. 

To identify the eligible databases returned by our literature searches (as previously described), we sorted the titles by ‘relevance’, and screened titles and abstracts of the first 500 hits of each phylum-specific search term. If studies appeared to match the inclusion criteria, they were marked and uploaded into the Rayyan Systematic Reviews web application (Ouzzani et al., 2016), where each study and its associated database was fully reviewed. In total, 3,410 titles and abstracts were reviewed in Web of Science, and 240 studies were fully reviewed. Studies were sorted into two groups, ‘include’ or ‘exclude’, after fully assessing them; a list of assessed articles can be found in supplementary material. Screening and eligibility assessments reduced the number of eligible databases to 42.

We coded 42 databases into Microsoft Excel (version 16.46), noting the 1) reference information, 2) species information, and 3) trait data. For each species in each online database, we coded the species name and the following reproductive traits (when available): adult body size (mass or length), fecundity measure (fecundity.bout-1, fecundity.time-1, reproductive mass.bout-1, or reproductive mass time-1), offspring size (mass or length), and reproductive frequency (number of reproductive events time-1). We also extracted the numeric values for each trait to explore relationships between ambiguous and explicit reproductive measures. When not reported directly, we used different reproductive trait combinations to calculate unreported reproductive traits. For example, fecundity as a rate (i.e., fecundity.time-1) was calculated by multiplying fecundity.bout-1 and reproductive frequency. Note that ‘time-1’ can refer to ‘year-1’ or ‘day-1’, but can be easily converted into a common currency. We converted all measures of fecundity.time-1 and reproductive mass.time-1 to a yearly rate, and refer to these hereafter as fecundity.year-1 or reproductive mass.year-1, respectively, unless otherwise specified. Additionally, only offspring mass (and not offspring length) was used to calculate reproductive mass (bout-1 and year-1), as we wished to minimize error that can occur from using length-to-mass conversions that are not species-specific.

After recording the observations contained within all eligible databases into a single Excel file, we used R studio (version 1.4.1106) (R Core Team 2018) and ‘tidyverse’ packages (version 1.3.1) (Wickham et al., 2019) to summarise and combine duplicate species observations. In some instances, multiple sources are listed for each species because one trait may have been identified in one database, but not another – that is, traits reported with multiple dimensions (e.g., fecundity.bout-1 and fecundity.year-1) for the same species could have originated from different sources. When we found the same species had duplicate trait observations across multiple databases (e.g., fecundity.bout-1 was reported twice for the same species), we defaulted to the oldest database and removed duplicate observations from the more recent database(s). We omitted duplicate observations to avoid biases in our assessment of the commonality of reproductive measures, given that species from well-represented taxa were found across multiple databases. However, if users are interested in combining multiple trait estimates (e.g., to obtain mean trait values for these species), they can refer to the original databases in Appendix 1 and our supplemental material. After duplicate species were removed, our final database included observations of reproductive traits for 20,225 species from 37 studies.


James, K.L., Randall, N.P. & Haddaway, N.R. (2016) A methodology for systematic mapping in environmental sciences. Environmental Evidence, 5, 7.

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T., Miller, E., Bache, S., Müller, K., Ooms, J., Robinson, D., Seidel, D., Spinu, V. & Yutani, H. (2019) Welcome to the Tidyverse. Journal of Open Source Software, 4, 1686.

Usage notes

Data are .csv and non-proprietary. 


Centre for Geometric Biology, Monash University