Data from: Forest tree breeding using genomic Markov causal models: A new approach to genomic tree breeding improvement
Data files
Mar 11, 2025 version files 44.19 MB
-
Phenotypic_and_pedigree_file.txt
138.58 KB
-
README.md
3.73 KB
-
SNP_file.fwf
44.05 MB
Abstract
Traditionally, a pedigree-based individual-tree mixed model (ABLUP) has been used in forest genetic evaluations to identify individuals with the highest breeding values (BVs). ABLUP is a Markovian causal model, as any individual BV can be expressed as a linear regression on its parental BVs. The regression coefficients are based on the genealogical parent-offspring relationship and are equal to one-half. This study aimed to develop and apply two new causal models that replace these fixed coefficients with ones calculated using genomic information, specifically derived from the genomic-based relationship matrix. We compared the performance of these genomic-based causal models with ABLUP and non-causal GBLUP models. To do so, we evaluated a four-generation population of Eucalyptus grandis, consisting of 3,082 genotyped trees with 14,033 single nucleotide polymorphism markers. Six traits were assessed in 1,219 trees across the first three breeding cycles. The heritability and genetic means estimates were higher in the causal pedigree- and genomic-based models compared to GBLUP. Realized genetic gains were similar across all models, but the causal models more closely matched the predicted gains than GBLUP. In turn, GBLUP demonstrated better predictive performance, albeit with lower precision. The causal models developed in this study enable discerning intra-familial variations in the predictions of BVs at a lower computational burden and offer a potential alternative to the GBLUP model.
https://doi.org/10.5061/dryad.pzgmsbczh
Description of the data and file structure
GENERAL INFORMATION
1. Title of Dataset: Forest tree breeding using genomic Markov causal models: A new approach to genomic tree breeding improvement
2. Author Information
A. Principal Investigator Contact Information
Name: Esteban Javier Jurcic
Institution: Instituto Nacional de Tecnología Agropecuaria (INTA)
Address: De Los Reseros y Dr. Nicolás Repetto s/n, 1686, Hurlingham, Buenos Aires, Argentina.
Email: jurcic.esteban@inta.gob.ar
B. Associate or Co-investigator Contact Information
Name: Eduardo Pablo Cappa
Institution: Instituto Nacional de Tecnología Agropecuaria (INTA) - CONICET
Address: De Los Reseros y Dr. Nicolás Repetto s/n, 1686, Hurlingham, Buenos Aires, Argentina.
Email: cappa.eduardo@inta.gob.ar
Information about funding sources that supported the collection of the data: This research was supported by UPM-Forestal Oriental S.A.
DATA & FILE OVERVIEW
1. File List:
-
Phenotypic_and_pedigree_file.txt: tree information
-
SNP_file.fwf: marker information
2. Relationship between files, if important: self (tree ID) to relationship the Phenotypic_and_pedigree_file.txt with the SNP_file.fwf files
3. Additional related data collected that was not included in the current data package: -
4. Are there multiple versions of the dataset? no
People involved with sample collection, processing, analysis and/or submission: Joaquín Dotour, Alexandra Simonov , Robert Silvestre, Esteban J. Jurcic, Eduardo P. Cappa
Files and variables
DATA-SPECIFIC INFORMATION FOR: Phenotypic_and_pedigree_file.txt
1. Number of variables: 11
2. Number of cases/rows: 3082
3. Variable List:
self: tree ID
DAD: dad ID
MUM: mum ID
Trial: Trials: Tres bocas (TB), Pandule (PA), Young (YO), Gallinal (GA) and Greenhouse (GH)
DBH (cm): diameter at breast height measured in centimeters
HT (m): total tree height measured in meters
PY (%): pulp yield expressed as a percentage
LIG (%): lignin expressed as a percentage
CEL (%): cellulose expressed as a percentage
WD (kg/m3): wood density measured in kilograms per cubic meter
4. Missing data codes: 0
DATA-SPECIFIC INFORMATION FOR: SNP_file.fwf
1. Number of variables: 3082
2. Number of cases/rows: 2
3. Variable List:
Column 1: self (tree ID)
Column 2: 14286 snp markers
4. Missing data codes: 5
Code/software
Title: Relationship matrices (and their inverses) corresponding to the new genomic causal PARBLUP PARBLUP_sm models:
an example based on the Figure 1 of this manuscript.
Author: Esteban J. Jurcic (jurcic.esteban@inta.gob.ar)
Date: “February 19th, 2025”
NOTE: It is important to note that this R-script is only valid for populations that do not have related parents by pedigree,this is the most common situation within forest improvement populations.
Data files: To build the relationship matrix corresponding to the PARBLUP and PARBLUP_sm models (and their inverses), pedigree information (data) is required. The data should include the columns suc
, dad
, mum
, and var
, which indicate the individual ID, father, mother, and type of variable (exogenous or endogenous), respectively. Additionally, the genomic relationship matrix (G) is required, and it should be in the same order as the data file. The files used in this analysis are based on the pedigree shown in Figure 1 (U=1, V=2, W=3, X=4, Y=5, and Z=6).