# Data used in: Heritability and variance components of seed size in wild species: influences of breeding design and the number of genotypes tested

## Cite this dataset

Larios, Eugenio; Ramirez-Parada, Tadeo; Mazer, Susan (2023). Data used in: Heritability and variance components of seed size in wild species: influences of breeding design and the number of genotypes tested [Dataset]. Dryad. https://doi.org/10.25349/D9660G

## Abstract

Seed size affects individual fitness in wild plant populations, but its ability to evolve may be limited by low narrow-sense heritability (h2). h2 is estimated as the proportion of total phenotypic variance (s2P) attributable to additive genetic variance (s2A), so low values of h2 may be due to low s2A (potentially eroded by natural selection) or to high values of the other factors that contribute to s2P, such as extranuclear maternal effects (m2) and environmental variance effects (e2). Here, we reviewed the published literature and performed a meta-analysis to determine whether h2 of seed size is routinely low in wild populations and, if so, which components of s2P contribute most strongly to total phenotypic variance. We analyzed available estimates of narrow-sense heritability (h2) of seed size, as well as the variance components contributing to these parameters. Maternal and environmental components of s2P were significantly greater than s2A, dominance, paternal, and epistatic components. These results suggest that low h2 of seed size in wild populations (the mean value observed in this study was 0.13) is due to both high values of maternally derived and environmental (residual) s2, and low values of s2A in seed size. The type of breeding design used to estimate h2 and m2 also influenced their values, with studies using diallel designs generating lower variance ratios than nested and other designs. e2 was not influenced by breeding design. For some breeding designs, the number of genotypes included in a study also influenced the resulting h2 and e2 estimates, but not m2. Our data support the view that a diallel design is better suited than the alternatives for the accurate estimation of s2A in seed size due to its factorial design and the inclusion of reciprocal crosses, which allows the independent estimation of both additive and non-additive components of variance.

## Methods

Data search

We performed a literature search to investigate published estimates of variance components for seed size in wild plant species. We used a query containing the following keywords: (“heritab*” or “variance component*”) and (“seed size” or “seed mass” or “seed weight”). We included only studies that examined the quantitative genetics of seed size of wild plant populations and excluded those that investigated agricultural or commercial species. We also excluded studies that pooled genotypes from multiple populations to estimate species-level heritabilities. Studies included for analysis reported narrow- and/or broad-sense heritabilities from individual populations, the variance components (as raw values) used to estimate these parameters, or both. We performed our search using the Web of Science search engine.

Data extraction

From all selected studies, we extracted the following estimated parameters: 1) raw values of variance components of seed size (s2A, s2M, s2P, s2E, s2D, s2K; Table 1); and 2) narrow-sense heritability (h2 = s2A: s2P) when available. Whenever a reported value of a raw variance component was measured for the same population but in different year, we averaged them and reported a single value. When parameters were reported for more than one population, we included each estimate in our data set. In cases where seed size was measured in multiple ways (e.g., seed weight, seed length, or seed area), we used the value of seed weight and discarded the others.

Additionally, for each published parameter we recorded the following: publication identity, breeding design, and the number of maternal and paternal genotypes used in the breeding design that generated the parameter estimate. Breeding design refers to the pattern by which controlled pollinations were performed or by which natural pollinations occurred in each study. Categories examined here include: diallel and nested designs, clonal replication (hereafter “clones”), and autogamously self-fertilizing genotypes (hereafter “selfing”). Studies that reported heritability estimates but no breeding design (e.g., naturally pollinated maternal lines) were also excluded because such estimates were derived from open-pollinated genotypes and were likely to be confounded by environmentally induced maternal effects.

Data analysis

Standardization of variance components. In order to compare the relative contributions to seed size of each type of variance component, in each published study used here, we standardized the raw variance components by calculating variance ratios, which were computed by dividing a given raw variance component by the total phenotypic variance in seed size (i.e., the sum of all reported variance components, including the focal raw variance component). h2, for instance, is equivalent to additive genetic variance divided by total phenotypic variance (s2A:s2P). In the same manner, we defined m2 as the maternal variance component divided by total phenotypic variance (s2M:s2P); and e2 as the environmental (residual) variance component divided by total phenotypic variance (s2E:s2P). The same procedure was applied to the remaining components of seed size: paternal variance (Pat2 or s2Pat:s2P), dominance variance (d2 ors2D:s2P), and epistasis variance (k2 or s2K:s2P). Such standardized values are well-suited for comparisons among species, among independently conducted studies, and among different units of measurement (e.g., mass vs. linear measures) because standardized components are unitless.

Model construction. Because the variance ratios analyzed here originate from publications that, in some cases, reported multiple estimates of the same variance component per publication, such estimates were not fully independent, thereby violating an assumption of ordinary least-squares methods. To account for variation among publications in the variance ratios of seed size, we analyzed the data with generalized linear mixed-effects models (GLMMs) that included publication identity as a random effect. We chose to include only publication identity as a random effect because most studies estimated variance components for a single species, making publication and species identity contain nearly the same information, such that their simultaneous inclusion in the model as random effects would be problematic.

The number of maternal and paternal genotypes used to estimate each variance component varied greatly among published studies (range = 2-170 genotypes; Table S1), and we reasoned that the number of genotypes contributing to a genetically determined parameter estimate might influence its estimated value. Accordingly, we controlled for variation among studies in the number of genotypes sampled by including a weighting factor in the GLMMs. Weights were estimated as the number of genotypes—maternal or paternal genotypes, depending on the variance component—used in each study. Specifically, we used the number of paternal genotypes as weightings for models using estimates of h2 or Pat2 because these parameters are usually estimated using trait variation derived from the paternal lines. We used the number of maternal genotypes as weightings for models using estimates of m2 and e2 because maternal and environmental sources of variance in seed size are more likely to influenced by maternal than by paternal genotypes. We used the number of maternal genotypes as weightings for models using estimates of d2 and k2 because these parameters may be influenced by the number of either maternal or paternal genotypes. Among the studies analyzed here, the number of maternal genotypes equaled or exceeded the number of paternal genotypes (Table S1), so using the higher number would take this into account.

GLMMs are robust tools for the analysis of variables that vary over multiple levels and can be used with alternative distributional assumptions of the residuals. Because the variance ratios that we used as response variables have values in the closed interval from zero to one (with many zero values present), distributions commonly employed to model proportion data, such as the beta distribution, were inappropriate for our response variables. Because of this, we used a pseudo-likelihood approach where the variance structure between the mean and the variance of the observations, and the range of the response (but not its precise distribution), are assumed. Specifically, we used a quasi-binomial GLMM with a logit link function for all models, which uses the variance structure of a binomial distribution while allowing for continuous values in the [0, 1] range. We fitted all GLMMs with the ‘glmmPQL’ function of the R package ‘MASS’ version 7.3-58.1 , which uses penalized quasi-likelihood (PQL) for parameter estimation.

Comparison of variance components of seed size.

To compare the means of the distinct proportional variance component types (h2, d2, pat2, m2, e2, and k2), we used a quasi-binomial GLMM with the observed value of each variance ratio estimate as a response, and the type of variance ratio for each observation as a categorical predictor. Publication identity was included as a random effect, and each observation was weighted based on the number of paternal or maternal genotypes used in estimating the reported ratio. As described above, whether we used the number of maternal or paternal genotypes depended on the identity of the variance ratio to which each observation corresponded. Marginal means of each variance ratio type and pairwise statistical comparisons between ratio types were conducted using the ‘emmeans’ function of the ‘emmeans’ R package version 1.7.5, with Tukey contrasts used to assess the significance of pairwise differences, and a Tukey correction of p-values to account for multiple hypothesis testing.

Heritability and variance components of seed mass in relation to breeding design and number of genotypes. To determine whether breeding design and number of genotypes influenced h2, m2, and e2 of seed size, we fitted quasi-binomial GLMMs using each of these variance ratio types as a separate response variable. These models included breeding design as the main explanatory variable, and the number of genotypes (maternal or paternal as assigned above) and the interaction between breeding design and number of genotypes as control variables. The model also included publication identity as a random effect. Number of genotypes were also included as a weight in order to assign more information value to estimates obtained from a greater number of genotypes. In each model, we used the ‘emmeans’ package to estimate the marginal means for each breeding design, using Tukey multiple comparisons tests to assess the significance of differences in mean ratios between breeding designs. In each model, p-values obtained from pairwise comparison of breeding designs were adjusted for multiple hypothesis testing using the Tukey method. Significance testing for the effects of breeding design and genotype number were obtained using Type III ANOVA as implemented in the ‘car’ package version 3.1-0 in R.

## Usage notes

RStudio was used to create the Rmd file that contains the code used to create the output included in the Results section of the manuscript (including the Figures and Tables in the main text and in the Supplemental Materials).

## Funding

University of California, Riverside