The relationship between neutral genetic diversity and performance in wild arthropod populations
Data files
Dec 17, 2024 version files 48.52 KB
-
README.md
6.25 KB
-
supp_tables.xlsx
42.27 KB
Abstract
Larger effective populations (Ne) are characterised by higher genetic diversity, which is expected to predict population performance (average individual performance that influences fitness). Empirical studies of the relationship between neutral diversity and performance mostly represent species with small Ne, while there is limited data from the species-rich and ecologically important arthropods that are assumed to have large Ne but are threatened by massive declines. We performed a systematic literature search and used meta-analytical models to test the prediction of a positive association between neutral genetic diversity and performance in wild arthropods. From 14 relevant studies of 286 populations, we detected a weak (r = 0.15) but non-significant positive association both in the full data set (121 effect sizes) and a reduced data set accounting for dependency (14 effect sizes). Theory predicts that traits closely associated with fitness show relatively stronger correlation with neutral diversity, this relationship was upheld for longevity and marginally for reproduction. Our analyses point to major knowledge gaps in our understanding of relationships between neutral diversity and performance. Future studies using genome-wide data sets across populations could guide more powerful designs to evaluate relationships between adaptive, deleterious and neutral diversity and performance.
README: The relationship between neutral genetic diversity and performance in wild arthropod populations
File name: supp_R_code.R
This R script contains all the code needed to replicate the analyses (including packages and functions).
Models described in the methods section of the manuscript are located in this code.
The code is organised into five main parts:
- Data manipulation (lines 50 to 65)
- Publication bias tests (lines 70 to 89)
- Differences between genetic diversity measures (lines 95 to 210)
- Analysis of all effect sizes (lines 216 to 390)
- Analysis of reduced dataset (lines 395 to 548)
#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#
File name: supp_tables.xlsx
This Excel document contains the following four sheets:
- 'Table_S1'
- the data are organised in long format (134 rows)
- column descriptions: A. study = study reference from which relevant estimates were extracted B. category = performance trait category C. performance = fitness trait D. genetic diversity = the metric used to estimate genetic diversity in the study E. genetic diversity minimum = lowest level of genetic diversity respectively F. genetic diversity maximum = highest level of genetic diversity respectively G. theoretical minimum = lowest theoretical level of genetic diversity for respective metric. NAs are used when there is no theoretical minimum H. theoretical maximum = highest theoretical level of genetic diversity for respective metric. NAs are used when there is no theoretical maximum I. Fisher's z-transformation of r = effect size for study. NAs refer to undefined numbers that result from calculations where one or more input values were 0 J. expected impact on fitness = expected direction of correlation between respective genetic diversity and fitness trait measured. + refers to positive expected impact, - refers to negative expected impact K. adjusted Fisher's z-transformation of r = z-statistic corrected according to the expected impact on fitness L. sampling variance = sampling variance of Fisher's z-transformation of r M. latin binomial = scientific name of each species N. family = taxonomic family of each species O. order = taxonomic order of each species P. data type = data analysis method Q. source = where the relevant estimates were extracted from in the study R. genetic type = genetic marker type used to estimate genetic diversity in study S. marker number = number of genetic markers used to estimate genetic diversity in study
- 'all_data for R'
- subset of Table S1
- the data are organised in long format (122 rows)
- export as .csv files and read into R (some of the column headings will need changing to match with the R code)
- column descriptions: A. study = study reference from which relevant estimates were extracted B. category = performance trait group C. genetic diversity category = the type of metric used to estimate genetic diversity in the study D. genetic diversity minimum = lowest level of genetic diversity respectively E. genetic diversity maximum = highest level of genetic diversity respectively F. theoretical minimum = lowest theoretical level of genetic diversity for respective metric. NAs are used when there is no theoretical minimum G. theoretical maximum = highest theoretical level of genetic diversity for respective metric. NAs are used when there is no theoretical maximum H. Fisher's z-transformation of r = effect size for study I. expected impact on fitness = expected direction of correlation between respective genetic diversity fitness trait measured J. adjusted Fisher's z-transformation of r = z-statistic corrected according the expected impact on fitness K. sampling variance = sampling variance of Fisher's z-transformation of r L. latin binomial = scientific name of each species
- 'reduced_data for R'
- the reduced data to account for within-study dependencies
- see 'supp_data_B.txt' for a description of how the data were reduced
- the data are organised in long format (15 rows)
- export as .csv files and read into R (some of the column headings will need changing to match with the R code)
- column descriptions: A. study = study reference from which relevant estimates were extracted B. category = performance trait group C. genetic diversity = the metric used to estimate genetic diversity in the study D. genetic diversity category = the type of metric used to estimate genetic diversity in the study E. genetic diversity minimum = lowest level of genetic diversity respectively F. genetic diversity maximum = highest level of genetic diversity respectively G. theoretical minimum = lowest theoretical level of genetic diversity for respective metric. NAs are used when there is no theoretical minimum H. theoretical maximum = highest theoretical level of genetic diversity for respective metric. NAs are used when there is no theoretical maximum I. Fisher's z-transformation of r = effect size for study J. expected impact on fitness = expected direction of correlation between respective genetic diversity fitness trait measured K. adjusted Fisher's z-transformation of r = z-statistic corrected according the expected impact on fitness L. sampling variance = sampling variance of Fisher's z-transformation of r M. latin binomial = scientific name of each species
#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#
File name: supp_data_A.txt
This plain text document contains details of the figures, tables, and text fragments from which data were obtained, and the calculations used to pool data, organised alphabetically by latin name.
#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#
File name: supp_data_B.txt
This plain text document contains details of how the data in supp_data_A.txt were collapsed to have one effect size per species
#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#
File name: tree.nex
phylogenetic tree to read into R.
Methods
Study eligibility criteria
To review studies on the association between neutral genetic diversity and performance in natural arthropod populations, we strictly included studies that presented data on both population-level neutral genetic diversity and measure(s) of performance in at least two populations of an arthropod species. Performance traits were selected as proxies for fitness and included reproductive, life history, immunity, morphology, and behavioural traits, listed in Table 1 (adapted from Mousseau & Roff, 1987). The suitability of these traits as proxies for fitness in natural populations is supported by associations between offspring production and longevity, morphology (body size) or behaviour (number of mates) in wild crickets (Rodríguez-Muñoz et al., 2010) but otherwise rely on extrapolations from experimental or comparative studies (e.g., Fabian et al., 2018; Pekkala et al., 2011). Performance traits were measured either in the natural setting of the population or under controlled conditions in the lab or common gardens. We adopted very broad inclusion criteria for estimates of neutral genetic diversity by accepting low-coverage markers that are traditionally used for this purpose (e.g., microsatellites and allozymes) as well as estimates based on high genome coverage (e.g., ddRAD sequencing). Some of these markers may not always be strictly neutral just as e.g. ddRAD sequences may include a smaller subset of loci under selection. However, when averaged over several or many loci these estimates are here considered to reflect neutral genetic diversity. We rejected studies that sampled non-representatively and non-randomly (e.g., using isofemale lines or supersisters to generate populations, or using juvenile clutches or colonies without specifying how many populations they were sampled from). We removed studies that artificially manipulated genetic diversity of populations (‘experimental’ in Figure 1), studies that used too few markers to infer genetic diversity i.e., less than 3 loci except when used to determine parentage (‘insufficient genetic data’ in Figure 1) and studies based on exactly the same data (‘repeated data’ in Figure 1).
Selection of search terms
To find eligible studies, we developed four parallel search strings: a string for relevant Web of Science categories, a genetic diversity string, a performance trait string, and a taxa group string (see Appendix 1). The genetic diversity string keywords were adapted from Hughes et al. (2008). Search terms for performance traits were designed to capture traits within the nine categories described in Table 1. Behavioural traits were included when a direct effect on survival or reproduction could be assumed. Morphological traits were assumed to mostly reflect body size and hence be proxies for fecundity or offspring production (Marshall & Gittleman, 1994; Rodríguez-Muñoz et al., 2010; Walker et al., 2003). Traits that do not have an impact on fitness that is straightforward to interpret were excluded from the meta-analysis. These include learning, memory retention, nest mate recognition, sex ratio or colony aggression.
Given the large number of taxonomic groups among arthropods, we optimized the taxa group search string to remove redundant taxa keywords so that the search string did not surpass the keyword limit of Web of Science. We selected scientific names for arthropod subphyla, classes and commonly used sub-classes and orders using BIOSIS Previews (2020), Marshall (2006) and Wheeler et al. (2001) (Appendix 2 Table 1). Frequently used common names for these groups were then collated using Marshall (2006) and Wikipedia contributors (2023) as supporting information and included the genera ‘Drosophila’ and ‘Daphnia’. Each common name was individually searched with the genetic diversity string and the Web of Science categories string, against an exclusionary string with the respective scientific names for class (excluding ‘Insecta’), sub-class and order (e.g., Beetle NOT Coleoptera). The same process was repeated vice versa for the scientific names against their respective common names. The keyword(s) that yielded no relevant unique hits were dropped. As a last step, the selected keywords were searched with the genetic diversity string and the Web of Science categories against the sub-phyla and against the class ‘Insecta’. The full list of scientific and common names that were optimised is presented in Appendix 2 (Table 1). All optimisation searches were conducted in ‘All Fields’ field on Web of Science using abbreviations that would concisely capture relevant unique hits. For example, ‘Metaboli*’ was used instead of ‘Metabolism’ and ‘Metabolic rate’ separately.
Literature search and study selection
We performed the literature search in Web of Science (accessed through XX Library) covering the period 1900 to December 2022. We first performed a search with the Web of Science categories, genetic diversity, performance trait and taxa group strings in the ‘Abstract’ field. We then added results from a second search with the Web of Science categories, genetic diversity and performance trait strings still in the ‘Abstract’ field, but with the taxa group string in the ‘Title’ field plus an exclusionary taxa group string in the ‘Abstract’ field. This was to capture articles that only used the taxa names in their titles. Lastly, we searched the reference lists of papers that met our study selection criteria and of relevant review papers for additional articles that our Web of Science searches failed to capture (Charlesworth & Charlesworth, 1987; Crnokrak & Roff, 1999; DeWoody et al., 2021; Gibson & Nguyen, 2021; Hughes et al., 2008; Keller & Waller, 2002; King & Lively, 2012; Reed & Frankham, 2003; Soper et al., 2021).
Following the PRISMA 2020 guidelines for systematic reviews (Page et al., 2021), we obtained a total of 6,698 studies published in English from Web of Science and reference list searches (Figure 1). Based on the search criteria explained above, more than 6400 studies were deemed ‘irrelevant’ and of 235 records retained, 219 were experimental and therefore also excluded (Figure 1). In total, we acquired 14 unique studies with both neutral genetic diversity and performance data in natural arthropod populations.
Data extraction and effect size calculation
We extracted relevant information from each study in order to calculate the correlation (Pearson's r) between genetic diversity and performance, which we normalised using Fisher’s Z transform: Zr = (1/2) x log((1+r) / (1-r)). The sampling variance for this effect size is 1 / (N-3) and is used to weight our statistical models. N is the number of populations from which each effect size is calculated. The correlation coefficient was calculated from: data provided in the study, data in the supplementary information, from reported test statistics and figures (e.g., Spearman’s rank correlations, t values, F-ratio from one-way ANOVA) or from our own calculations based on genetic data kindly provided by the authors (Pearson correlations presented in Appendix 3, Arteaga et al., 2019; Dobelmann et al., 2019; Freilij et al., 2022; Koch et al., 2020). Full details of the calculations and conversions underlying each effect size are provided in the supplementary data extraction file (supplementary data file A). To standardize representation of data, we reversed the direction of the statistical association for traits that have a negative expected impact on fitness (e.g., parasite prevalence, extinction probability), making it comparable to traits with positive expected impact on fitness (e.g., number of eggs produced). In total we calculated 121 effect sizes from the 14 studies involving 14 arthropod species (Table S1).
We extracted the following information from each study: i) how performance and genetic diversity were measured, ii) the observed range of genetic diversity, and iii) taxonomic information. In our sample of studies, performance traits were in the reproduction, longevity, immunity or morphology categories (see Table 1). We divided genetic diversity measures into four categories: heterozygosity (Ho, He), inbreeding coefficients (GIS, FIS, homozygosity), lineage-level diversity (polyandry), and molecular-level diversity (% polymorphic loci).
Non-independence
There are four levels of non-independence in our dataset: i) phylogenetic non-independence due to shared evolutionary history, ii) species-level non-independence due to multiple effect sizes per species (12 of the 14 species), iii) study-level non-independence due to multiple effect sizes per study (12 of the 14 studies), and iv) within-study non-independence due to multiple effect sizes being calculated from the same individuals. To address phylogenetic non-independence we downloaded a phylogenetic tree of the 14 species in our dataset using the rotl R package (Michonneau et al., 2016) from which we derived a phylogenetic correlation matrix, using functions in the APE R package (Paradis & Schliep, 2019) to use as a random effect in our statistical models. The study and species levels of non-independence are confounded with each other because most studies are on unique species (10 of the 14 studies) with the exceptions being two studies on two species each and two studies on the same species (Table S1). To account for study and species levels of non-independence we included species ID as a random effect in our statistical models. Finally, to explore the impact of within-study non-independence on our results, we constructed a reduced dataset with one observation per species. This dataset avoids non-independence due to multiple measurements made on the same individuals and multiple measurements per study / species since Nstudy = Nspecies = Neffect sizes = 14 and only phylogeny needs to be accounted for. To do this, we averaged or removed as many effect sizes as possible from each study while keeping unique levels of how performance and genetic diversity were measured (supplementary data file B). For example, if a study measured longevity and reproduction on the same individuals, both effect sizes were kept. From this, we sampled one effect size at random from each species. We report the results of the analyses described below on the full- and species-level datasets side by side.
Publication bias and the measurement of genetic diversity
The tendency not to publish non-significant findings (publication bias) can distort the distribution of effect sizes in meta-analyses (Koricheva et al., 2013). We detected limited evidence of publication bias using three different methods: the funnel plot (standard error vs effect size) was symmetrical, zero studies were estimated to be missing from our sample of studies based on a trim and fill analysis, and the intercept from Egger’s regression was non-significant (estimate = 0.34, se = 0.54, p = 0.54). These tests were conducted on the species-level dataset to avoid issues of non-independence.
The association between performance and genetic diversity was not strongly affected by how genetic diversity was measured. We examined this by modelling Zr as a function of genetic diversity measure category (four levels, see above) with the global intercept suppressed to estimate the mean effect size for each level. The only statistically significant association between performance and genetic diversity was when diversity was measured at the molecular level (% polymorphic loci) and only in the full data set (full data molecular level diversity Zr = 0.68, 95% CI = 0.16 to 1.06; species data molecular level diversity Zr = 0.36, 95% CI = -0.53 to 1.05, Table S2).
We calculated a scaled genetic diversity range that is comparable across species to investigate if strength of the correlation between genetic diversity and performance depends on range of genetic diversity sampled. We did this by taking the difference between the maximum and minimum values of genetic diversity sampled for each effect size and dividing this difference by the maximum possible difference for a given measure of genetic diversity: (maxobserved - minobserved) / (maxpossible - minpossible). This expresses the observed difference as a fraction of the maximum possible difference, and ranges between 0 and 1, with 1 indicating that the sampled range is equal to the maximum possible range of genetic diversity. For example, for Bombus jonellus, He was measured which ranges between 0 and 1 and the minimum and maximum observed values in the study population were 0.696 and 0.766 respectively. This gives a scaled genetic diversity range of 0.070 = (0.766-0.696) / (1-0). It was not possible to calculate a theoretically possible range of genetic values for effect sizes based on polyandry and allelic richness (Ar). This reduced our sample size for this analysis to 85 effect sizes from 12 studies on 12 species.
Data analysis
We constructed three statistical models in the R environment (R Core Team, 2023) to answer our questions (see the supplementary R code). To estimate the mean effect size on the correlation between genetic diversity and performance, we constructed an intercept only model with Zr as the response variable. To determine if traits closely associated with fitness show a stronger positive correlation with genetic diversity, we modelled Zr as a function of performance category (four levels: immunity, reproduction, longevity, morphology). We suppressed the global intercept in this model to estimate the mean effect size for each level. Finally, to determine if we can explain variation in the association between genetic diversity and performance based on the genetic diversity range sampled, we modelled Zr as a function of the scaled genetic diversity range (continuous). Note that it was not possible to fit interactions between how performance was measured and the observed range of genetic diversity because of limited replication in the different performance categories.
In all models, Zr was weighted by its inverse sampling variance and the phylogenetic correlation matrix was included as a random effect. When analysing the full dataset, but not the species-level data, species ID is included as an additional random effect. We report I2 estimates (Nakagawa & Santos, 2012) for these variance components in Tables S3-S5. Models were run in the MCMCglmm (Hadfield, 2010) and metafor (Viechtbauer, 2010) R packages to compare Bayesian vs. restricted maximum likelihood approaches to parameter estimation. We report the parameter estimates (posterior mode ± 95% credible intervals) from MCMCglmm in the main text and both are reported in the Tables S3-S5. In all models, the parameter estimates from MCMCglmm and metafor were consistent.