Results of quantitative genetic sensitivity analysis performed on reconstructed pedigrees based on large-scale genealogies
Data files
Feb 18, 2025 version files 293.48 MB
-
bayestest.R
28.16 KB
-
genlib.R
5.15 KB
-
pedgiree.R
28.29 KB
-
README.md
3.62 KB
-
scenario_1_data_increased_error_rate.zip
454.23 KB
-
scenario_1_data.zip
453.21 KB
-
scenario_2_data.zip
291.95 MB
-
scenario_3_data.zip
553.83 KB
Abstract
Investigating the evolution of complex traits in nature requires accurate assessment of their genetic basis. Quantitative genetic (QG) modeling is frequently applied to estimate the additive genetic variance (VA) in traits, combining phenotypic and pedigree data from a sample of individuals. Whether reconstructed from social links or molecular markers, empirical pedigrees differ in completeness, genealogical error rates and other attributes that can impact QG estimation. Here we investigate this impact using human genealogical data for six French-Canadian (FC) populations originating from the same genetic founding source but differing in their pedigrees’ attributes. First, we simulated phenotypic values along pedigrees and under different trait architecture and ‘true’ parameter values (e.g. VA). Then we fitted mixed effects ‘animal’ models to these simulated data, to assess how QG estimation was impacted by pedigree attributes. Our results show that pedigree size and depth were important determinants of the precision, but not accuracy, of genetic parameter estimates. In contrast, pedigree completeness and entropy, two attributes related to the density of genealogical links, were not clearly associated with the performance of parameter estimation. Noticeably, a slight increase in the genealogical error rate was sufficient to cause a detectable underestimation of VA. Including maternal genetic effects into the simulations lead to a slight underestimation of VA with pedigrees of smaller size and depth. Despite originating from the same genetic source, the six pedigrees yielded wide variations in QG estimates under identical conditions. These findings highlight the importance of sensitivity analyses in pedigree-based genetic studies on natural populations.
https://doi.org/10.5061/dryad.fn2z34v5q
We have submitted our fitted model output data to recreate the published figures (scenario_1_data.zip,scenario_1_data_increased_error_rate.zip,scenario_2_data.zip,scenario_3_data.zip), R scripts (pedgiree.R, bayestest.R,genlib.R).
Descriptions
scenario_1_data.zip
This zip files contains 12 rds files (6 pedigrees x 2 tested values) for Scenario 1 outlined in published paper corresponding to a univariate phenotype simulation with additive and environmental variances components. Each RDS file should contain 5 columns, each column contains the 1,000 saved MCMC samples from the posterior distribution of the estimated additive variance of the simulated phenotype corresponding to a single phenotype simulation + MCMCglmm run.
scenario_1_data_increased_error_rate.zip
This zip files contains 12 rds files (6 pedigrees x 2 tested values) for Scenario 1.1 which exactly the same as the one above but the simulations included a slightly higher genealogical error rate. Each RDS file should contain 5 columns, each column contains the 1,000 saved MCMC samples from the posterior distribution of the estimated additive variance of the simulated phenotype corresponding to a single phenotype simulation + MCMCglmm run. Each RDS file should be read in as a table (1000 rows x 5 columns)
scenario_2_data.zip
This zip files contains 30 rda files (6 pedigrees x 2 tested values x 5 simulations) for Scenario 2 outlined in published paper corresponding to a univariate phenotype simulation with additive, maternal genetic and environmental variances components. This scenario required careful analysis, therefore each pedigree for each test value of additive variance we extracted the complete MCMCglmm model as a RDA file.
Each RDA file is loaded as a model and can be analyzed using base R or specific MCMCglmm functions. The important part of the object is model$VCV where we take the first 3 columns each with 1000 rows representing a 1,000 saved MCMC samples:
Column 1:additive variance
Column 2:maternal genetic variance
Column 3:residual (or environmental) variance
scenario_3_data.zip
This zip files contains 24 rds files (6 pedigrees x 2 tested additive genetic values x 2 additive covariance values) for Scenario 3 outlined in the published paper corresponding to a bivariate phenotype (two correlated simulated traits) simulation with additive and environmental variances components and positive genetic correlation. Each RDS file should contain 3 columns, each column is the 1,000 saved MCMC samples from a single phenotype simulation + MCMCglmm run:
Column 1:additive variance of trait 1
Column 2:additive covariance between traits 1 and 2
Column 3:additive variance of trait 2
Key Information Sources
Pedigree data is based on genealogical data which is currently private and only accessible by request to the BALSAC project.
Sharing/Access information
Links to other publicly accessible locations of the data:
Partly accessible through BALSAC genealogical project (https://balsac.uqac.ca/en/)
Code/Software
R is required to run pedigree, bayestest, genlib; the script was created using version 3.6.3.
Annotations are provided throughout the script through 1) library loading, 2) dataset loading and cleaning, 3) analyses, and 4) figure creation.
The dataset contains the output data from analysis done on a set of reconstructed pedigrees from the French-Canadian genealogies provided by the BALSAC group. The raw data is private and is only available upon request at the moment. Therefore the pedigree data is not provided and only the output data and related code analysis can be found in this dataset.
