Efficient genomics based ‘end-to-end’ selective tree breeding framework

El-Kassaby, Yousry A.1 ; Cappa, Eduardo P.2; Chen, Charles3; Ratcliffe, Blaise1; Porth, Ilga M.4

Published Oct 27, 2023; Updated Dec 18, 2023 on Dryad. https://doi.org/10.5061/dryad.7h44j101d

Data files

Oct 27, 2023 version files 31.80 MB

Phenotype_and_SNPs.xlsx

31.80 MB
README.md

2.85 KB

Dec 18, 2023 version files 32.20 MB

Abstract

Since their initiation in the 1950s, worldwide selective tree breeding programs followed the recurrent selection scheme of repeated cycles of selection, breeding (mating), and testing phases and essentially remained unchanged to accelerate this process or address environmental contingences and concerns. Here, we introduce an “end-to-end” selective tree breeding framework that: 1) leverages strategically preselected GWAS-based sequence data capturing trait architecture information, 2) generates unprecedented resolution of genealogical relationships among tested individuals, and 3) leads to the elimination of the breeding phase through the utilization of readily available wind-pollinated (OP) families. Individuals’ breeding values generated from multi-trait multi-site analysis were also used in an optimum contribution selection protocol to effectively manage genetic gain/co-ancestry trade-offs and traits’ correlated response to selection. The proof-of-concept study involved a 40-year-old spruce OP testing population growing on three sites in British Columbia, Canada, clearly demonstrating our method's superiority in capturing most of the available genetic gains in a substantially reduced timeline relative to the traditional approach. The proposed framework is expected to increase the efficiency of existing selective breeding programs, accelerate the start of new programs for ecologically and environmentally important tree species, and address climate-change caused biotic and abiotic stress concerns more effectively.

This El-Kassaby_README.md file was generated on 2023-10-22 by Yousry A. El-Kassaby
Revised: 2023-12-15

GENERAL INFORMATION

Title of Dataset: Efficient genomics-based ‘end-to-end’ selective tree breeding framework
Author Information
Name: Yousry A. El-Kassaby
Institution: Department of Forest and Conservation Sciences | Faculty of Forestry | The University of British Columbia
Address: 4603-2424 Main Mall | Vancouver, BC Canada V6T 1Z4
Email: y.el-kassaby@ubc.ca
Date of data collection (single date, range, approximate date): See Materials and Methods section in the manuscript.
Geographic location of data collection <latitude, longiute, or city/region, State, Country>: See Materials and Methods section in the manuscript.
Information about funding sources that supported the collection of the data: NSERC Discovery Grant to Yousry A. El-Kassaby.

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: -
Links to publications that cite or use the data: -
Links to other publicly accessible locations of the data: -
Links/relationships to ancillary data sets: -
Was data derived from another source? Yes
A. If yes, list source(s): Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.m4vh4 and http://doi.org/10.5061/dryad.8kb37.
Recommended citation for this dataset: -

DATA & FILE OVERVIEW

File List:
Phenotype and SNPs.xlsx: Contains information on tree phenotype and the SNP markers used in this study.
SNP_effects.txt: Provides SNP effects corresponding to each trait.
SNPs selected by GWAS.txt: Lists 5,628 SNP IDs selected based on GWAS absolute effects, associated with the three studied traits.
Genomic-based relationship matrices.R: R-script used for calculating various genomic relationship matrices in different study scenarios.
Single-trait multi-site ABLUP and GBLUP models.R: R-script employed for fitting the single-trait multi-site ABLUP, GBLUP-ALL, and GBLUP-ADE models.
Multi-trait multi-site ABLUP and GBLUP models.R: R-script utilized for fitting the multi-trait multi-site ABLUP, GBLUP-ALL, and GBLUP-GWAS models.
Relationship between files, if important: -
Additional related data collected that was not included in the current data package: Raw marker data, and adjusted-phenotypes.
Are there multiple versions of the dataset? no
A. If yes, name of file(s) that was updated:
i. Why was the file updated?
ii. When was the file updated?

METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data: See the Methods section in the manuscript.
Methods for processing the data: -

Instrument- or software-specific information needed to interpret the data: -
<include full name and version of the software, and any necessary packages or libraries needed to run scripts>
Standards and calibration information, if appropriate: -
Environmental/experimental conditions: -
Describe any quality-assurance procedures performed on the data: -
People involved with sample collection, processing, analysis and/or submission: Yousry A. El-Kassaby, Eduardo P. Cappa, and Blaise Ratcliffe.

DATA-SPECIFIC INFORMATION FOR:

Phenotype and SNPs.xlsx

Number of variables: 8,775
Number of cases/rows: 1,101
Variable List:
ID_TREE: tree ID
MUM: mum ID
DAD: dad ID
SITE: Site name
REP: Replication design effect for each site
HT40: total height measured in meters
DBH40: diameter at breast height (1.3 m, DBH) measured in centimeters
AvgXray: X-ray densitometry to determine the wood density measured in g·cm−3
TP244 - TP1478967: marker ID
Missing data codes: NA
Specialized formats or other abbreviations used: -

SNP_effects.txt

Number of variables: 4
Number of cases/rows: 8,767
Variable List:
snp: SNP marker ID
HT40: SNP marker effects for total height
DBH40: SNP marker effects diameter at breast height (1.3 m, DBH)
AvgXray: SNP marker effects X-ray densitometry
Missing data codes: NA
Specialized formats or other abbreviations used: -

SNPs selected by GWAS.txt

Number of variables: 1
Number of cases/rows: 5,628
Variable List:
snp: SNP marker ID
Missing data codes: NA
Specialized formats or other abbreviations used: -

Genomic-based relationship matrices.R

R-script code with 159 lines.

Single-trait multi-site ABLUP and GBLUP models.R
R-script code with 436 lines.

Multi-trait multi-site ABLUP and GBLUP models.R
R-script code with 269 lines.

Efficient genomics based ‘end-to-end’ selective tree breeding framework

Data files

Abstract

README

Methods

Works referencing this dataset