# Reproducible analysis for the Dalechampia artificial selection paper ## Table 1: Genetic and Phenotypic matrices of variance-covariance Two traits, log Gland Area (GA) and log Upper Bract Area (UBA) for two populations, Tovar and Tulum. The data comes from a Diallel experiment reported elsewhere (Bolstad et al. 2014). The table reports the G and the P matrices estimated by MCMCglmm from the diallel data. P matrices are the sum of three covariance matrices: genetic (level "animal" in the MCMC model), residual (level "unit" in the model), and individual (level "ind"), as several measurements were provided for each individual. Batch and date effects were not included in the calculation of the P matrix. MCMC chains are stored in data/POP_logUBA_logGA_times100_x.RData , POP being either Tovar or Tulum, and x being 1 or 2 (two parallel chains for each model). G and P matrices are computed in scripts/Gmatrices.R. **Status: not totally reproducible**. The covariance matrices were built from the MCMC posteriors from an earlier analysis (Bolstad et al 2014). ## Table 2 Prediction uncertainties, corresponding to the shaded areas in figures 3 and 4. **Status: reproducible** ## Table 3: Model selection. The Selection Response Analysis procedure (sra) was applied on the selection response data for both populations, Tovar and Tulum independently. Only summary statistics (means, variances, and covariances for Up and Down selected lines each generation) were considered for the selection response. Model selection was based on comparing the AICc score of four models: Identical genetic variances in Up and Down lines (symmetric models) and Independent genetic variances in Up and Down lines (asymmetric models), each case with and without considering the Bulmer effect (depletion of genetic variances due to the linkage disequilibrium generated by the truncation selection procedure). Model selection was applied independently on the raw phenotypic data and on the same data centered on the control line. AICc scores must not be compared between models fitted on different datasets, so the procedure is not informative to assess whether centering the data on the control line improves the fit. **Status: reproducible** The routine calls the summarypop procedure that produces summary statistics (as in Appendices S1 and S2) from the raw datasets. ## Table 4: Regression and correlations between GA and UBA. The table presents, for each generation, in all three lines (Up, Down, Control) and for both populations, (1) the slope of the regression between log(UBA) (response variable) and log(GA) (predictor variable), and (2) the correlation coefficient between mean individual UBA and mean GA (not log). In practice, the regression (1) is a mixed-effect model including the individual level as a random effect (up to three replicates measurements within each individual). The standard error of the regression slope is provided along with the slope, and the approximate 95% Confidence Interval (based on Fisher's Z transform method) is indicated. **Status: reproducible.** ## Figures 3 and 4 Observed vs predicted selection response for focal (up) and correlated (down) traits, and for Tovar (left) and Tulum (right) populations. Figure 2 is up-down centered, while figure 3 is control-centered. Centering affects mean phenotype, phenotypic standard errors, and prediction errors. The formula for prediction error is described in table S3 below. **Status: reproducible.** ## Table S1 and Table S2 Summary statistics of the results of the selection experiment (table S1 for Tovar, table S2 for Tulum). Summary statistics include: Replicate (Up, Down, Control), Generation (from 1 to 5), average log GA (trait x = selected trait), phenotypic variance in log GA, selection differential for GA, difference in phenotypic variance between breeders and the whole population, average log UBA (trait y = correlated trait), phenotypic variance in log UBA, phenotypic covariance log GA - log UBA, sample size N, standard errors for the mean trait x and y. **Status: reproducible.** ## Figure S3 Selection response and predictions on the original (log) scale. The figure is built in the same way as figures 2 and 3. **Status: reproducible.** ### General setting Equation (7) in the main text features three sources of uncertainty: * Environmental uncertainty: every generation, there is an independent environmental variance Ve on each individual measurement, leading to a variance error on the mean of V1 = Ve/N * Genetic drift: as detailed in Appendix S4, in the current setting, genetic drift contributes to the error variance by an extra V2= t Va(1/Np - 1/2N - var(W)/N) every generation (where t is the number of generations after the initial populations, i.e. t=0 in the first generation). * Uncertainty on Va: the estimate of Va from the diallel is affected by a variance Var(Va). This does not affect the error variance in the control line, assuming that the error is symmetrical, but it affects the selected line by a factor V3=Var(Va) t^2 beta^2. **Note 1**: for error variance calculations, the selection gradient beta was averaged (in absolute value) among up/down lines and across generations. It was considered that N=64 and Np=12 in all lines and populations. **Note 2**: the theory predicts an effect of the selection strength on genetic drift, through the term var(W)/N. var(W) cannot be simply calculated for artificial selection, but it can be approximated, as detailed in appendix S6. In practice, heritabilities for the selected trait are about 0.3, which is in the area where drift is barely affected by selection. For the sake of correctness, different drift variances (with and without the var(W)/N term) were thus calculated for selected and control lines, but the resulting estimates were identical up to the fourth digit. Therefore, only one column for the drift uncertainty was provided in the table. ### Raw data * Control line: the error at generation t (t=1 is the starting generation) is Vc(t) = V1 + V2 * Selected lines: the error is Vs(t) = V1 + V2 + V3 ### Control-centered data By definition, there is no variance on the control line. All the error is reported in the selected lines: * Control line : error = 0 * Selected lines: error = 2 V1 + 2 V2 + V3 ### Up-down centered data Now the variance is redistributed in all lines. Not straightforward, as drift is independent in both selected lines, but the error on the additive variance (V3) is the same in both selected lines. * Control line, error = 3/2 V1 + V2 + 1/2 V3 * Selected lines: error = 1/2 V1 + 1/2 V2 + V3 **Note**: a similar calculation had to be made to distribute the phenotypic standard errors across the three lines. ## Figure S6 Variance-Covariance ellipses (assuming a multivariate normal distribution, ellipses standing for the 95% region) of G matrix estimates from the diallel experiment. Light lines correspond to 1000 iterations of the MCMC chain, thick lines represent the mean posterior. **Status: not totally reproducible**, as for table 1.