Results of quantitative genetic sensitivity analysis performed on reconstructed pedigrees based on large-scale genealogies

Published Feb 18, 2025 on Dryad. https://doi.org/10.5061/dryad.fn2z34v5q

Data files

Feb 18, 2025 version files 293.48 MB

bayestest.R

28.16 KB
genlib.R

5.15 KB
pedgiree.R

28.29 KB
README.md

3.62 KB
scenario_1_data_increased_error_rate.zip

454.23 KB
scenario_1_data.zip

453.21 KB
scenario_2_data.zip

291.95 MB
scenario_3_data.zip

553.83 KB

Abstract

Investigating the evolution of complex traits in nature requires accurate assessment of their genetic basis. Quantitative genetic (QG) modeling is frequently applied to estimate the additive genetic variance (VA) in traits, combining phenotypic and pedigree data from a sample of individuals. Whether reconstructed from social links or molecular markers, empirical pedigrees differ in completeness, genealogical error rates and other attributes that can impact QG estimation. Here we investigate this impact using human genealogical data for six French-Canadian (FC) populations originating from the same genetic founding source but differing in their pedigrees’ attributes. First, we simulated phenotypic values along pedigrees and under different trait architecture and ‘true’ parameter values (e.g. VA). Then we fitted mixed effects ‘animal’ models to these simulated data, to assess how QG estimation was impacted by pedigree attributes. Our results show that pedigree size and depth were important determinants of the precision, but not accuracy, of genetic parameter estimates. In contrast, pedigree completeness and entropy, two attributes related to the density of genealogical links, were not clearly associated with the performance of parameter estimation. Noticeably, a slight increase in the genealogical error rate was sufficient to cause a detectable underestimation of VA. Including maternal genetic effects into the simulations lead to a slight underestimation of VA with pedigrees of smaller size and depth. Despite originating from the same genetic source, the six pedigrees yielded wide variations in QG estimates under identical conditions. These findings highlight the importance of sensitivity analyses in pedigree-based genetic studies on natural populations.

https://doi.org/10.5061/dryad.fn2z34v5q

We have submitted our fitted model output data to recreate the published figures (scenario_1_data.zip,scenario_1_data_increased_error_rate.zip,scenario_2_data.zip,scenario_3_data.zip), R scripts (pedgiree.R, bayestest.R,genlib.R).

Descriptions

scenario_1_data.zip

This zip files contains 12 rds files (6 pedigrees x 2 tested values) for Scenario 1 outlined in published paper corresponding to a univariate phenotype simulation with additive and environmental variances components. Each RDS file should contain 5 columns, each column contains the 1,000 saved MCMC samples from the posterior distribution of the estimated additive variance of the simulated phenotype corresponding to a single phenotype simulation + MCMCglmm run.

scenario_1_data_increased_error_rate.zip

This zip files contains 12 rds files (6 pedigrees x 2 tested values) for Scenario 1.1 which exactly the same as the one above but the simulations included a slightly higher genealogical error rate. Each RDS file should contain 5 columns, each column contains the 1,000 saved MCMC samples from the posterior distribution of the estimated additive variance of the simulated phenotype corresponding to a single phenotype simulation + MCMCglmm run. Each RDS file should be read in as a table (1000 rows x 5 columns)

scenario_2_data.zip

This zip files contains 30 rda files (6 pedigrees x 2 tested values x 5 simulations) for Scenario 2 outlined in published paper corresponding to a univariate phenotype simulation with additive, maternal genetic and environmental variances components. This scenario required careful analysis, therefore each pedigree for each test value of additive variance we extracted the complete MCMCglmm model as a RDA file.

Each RDA file is loaded as a model and can be analyzed using base R or specific MCMCglmm functions. The important part of the object is model$VCV where we take the first 3 columns each with 1000 rows representing a 1,000 saved MCMC samples:

Column 1:additive variance
Column 2:maternal genetic variance
Column 3:residual (or environmental) variance

scenario_3_data.zip

This zip files contains 24 rds files (6 pedigrees x 2 tested additive genetic values x 2 additive covariance values) for Scenario 3 outlined in the published paper corresponding to a bivariate phenotype (two correlated simulated traits) simulation with additive and environmental variances components and positive genetic correlation. Each RDS file should contain 3 columns, each column is the 1,000 saved MCMC samples from a single phenotype simulation + MCMCglmm run:

Column 1:additive variance of trait 1
Column 2:additive covariance between traits 1 and 2
Column 3:additive variance of trait 2

Key Information Sources

Pedigree data is based on genealogical data which is currently private and only accessible by request to the BALSAC project.

Sharing/Access information

Links to other publicly accessible locations of the data:

Partly accessible through BALSAC genealogical project (https://balsac.uqac.ca/en/)

Code/Software

R is required to run pedigree, bayestest, genlib; the script was created using version 3.6.3.
Annotations are provided throughout the script through 1) library loading, 2) dataset loading and cleaning, 3) analyses, and 4) figure creation.