Skip to main content
Dryad

Efficient genomics based ‘end-to-end’ selective tree breeding framework

Cite this dataset

El-Kassaby, Yousry A. et al. (2023). Efficient genomics based ‘end-to-end’ selective tree breeding framework [Dataset]. Dryad. https://doi.org/10.5061/dryad.7h44j101d

Abstract

Since their initiation in the 1950s, worldwide selective tree breeding programs followed the recurrent selection scheme of repeated cycles of selection, breeding (mating), and testing phases and essentially remained unchanged to accelerate this process or address environmental contingences and concerns. Here, we introduce an “end-to-end” selective tree breeding framework that: 1) leverages strategically preselected GWAS-based sequence data capturing trait architecture information, 2) generates unprecedented resolution of genealogical relationships among tested individuals, and 3) leads to the elimination of the breeding phase through the utilization of readily available wind-pollinated (OP) families. Individuals’ breeding values generated from multi-trait multi-site analysis were also used in an optimum contribution selection protocol to effectively manage genetic gain/co-ancestry trade-offs and traits’ correlated response to selection. The proof-of-concept study involved a 40-year-old spruce OP testing population growing on three sites in British Columbia, Canada, clearly demonstrating our method's superiority in capturing most of the available genetic gains in a substantially reduced timeline relative to the traditional approach. The proposed framework is expected to increase the efficiency of existing selective breeding programs, accelerate the start of new programs for ecologically and environmentally important tree species, and address climate-change caused biotic and abiotic stress concerns more effectively.

README

This El-Kassaby_README.md file was generated on 2023-10-22 by Yousry A. El-Kassaby
Revised: 2023-12-15

GENERAL INFORMATION

  1. Title of Dataset: Efficient genomics-based ‘end-to-end’ selective tree breeding framework

  2. Author Information
    Name: Yousry A. El-Kassaby
    Institution: Department of Forest and Conservation Sciences | Faculty of Forestry | The University of British Columbia
    Address: 4603-2424 Main Mall | Vancouver, BC Canada V6T 1Z4
    Email: y.el-kassaby@ubc.ca

  3. Date of data collection (single date, range, approximate date): See Materials and Methods section in the manuscript.

  4. Geographic location of data collection : See Materials and Methods section in the manuscript.

  5. Information about funding sources that supported the collection of the data: NSERC Discovery Grant to Yousry A. El-Kassaby.

SHARING/ACCESS INFORMATION

  1. Licenses/restrictions placed on the data: -

  2. Links to publications that cite or use the data: -

  3. Links to other publicly accessible locations of the data: -

  4. Links/relationships to ancillary data sets: -

  5. Was data derived from another source? Yes
    A. If yes, list source(s): Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.m4vh4 and http://doi.org/10.5061/dryad.8kb37.

  6. Recommended citation for this dataset: -

DATA & FILE OVERVIEW

  1. File List:
    Phenotype and SNPs.xlsx: Contains information on tree phenotype and the SNP markers used in this study.
    SNP_effects.txt: Provides SNP effects corresponding to each trait.
    SNPs selected by GWAS.txt: Lists 5,628 SNP IDs selected based on GWAS absolute effects, associated with the three studied traits.
    Genomic-based relationship matrices.R: R-script used for calculating various genomic relationship matrices in different study scenarios.
    Single-trait multi-site ABLUP and GBLUP models.R: R-script employed for fitting the single-trait multi-site ABLUP, GBLUP-ALL, and GBLUP-ADE models.
    Multi-trait multi-site ABLUP and GBLUP models.R: R-script utilized for fitting the multi-trait multi-site ABLUP, GBLUP-ALL, and GBLUP-GWAS models.

  2. Relationship between files, if important: -

  3. Additional related data collected that was not included in the current data package: Raw marker data, and adjusted-phenotypes.

  4. Are there multiple versions of the dataset? no
    A. If yes, name of file(s) that was updated:
    i. Why was the file updated?
    ii. When was the file updated?

METHODOLOGICAL INFORMATION

  1. Description of methods used for collection/generation of data: See the Methods section in the manuscript.

  2. Methods for processing the data: -

  3. Instrument- or software-specific information needed to interpret the data: -

  4. Standards and calibration information, if appropriate: -

  5. Environmental/experimental conditions: -

  6. Describe any quality-assurance procedures performed on the data: -

  7. People involved with sample collection, processing, analysis and/or submission: Yousry A. El-Kassaby, Eduardo P. Cappa, and Blaise Ratcliffe.

DATA-SPECIFIC INFORMATION FOR:

Phenotype and SNPs.xlsx

  1. Number of variables: 8,775

  2. Number of cases/rows: 1,101

  3. Variable List:
    ID_TREE: tree ID
    MUM: mum ID
    DAD: dad ID
    SITE: Site name
    REP: Replication design effect for each site
    HT40: total height measured in meters
    DBH40: diameter at breast height (1.3 m, DBH) measured in centimeters
    AvgXray: X-ray densitometry to determine the wood density measured in g·cm−3
    TP244 - TP1478967: marker ID

  4. Missing data codes: NA

  5. Specialized formats or other abbreviations used: -

SNP_effects.txt

  1. Number of variables: 4

  2. Number of cases/rows: 8,767

  3. Variable List:
    snp: SNP marker ID
    HT40: SNP marker effects for total height
    DBH40: SNP marker effects diameter at breast height (1.3 m, DBH)
    AvgXray: SNP marker effects X-ray densitometry

  4. Missing data codes: NA

  5. Specialized formats or other abbreviations used: -

SNPs selected by GWAS.txt

  1. Number of variables: 1

  2. Number of cases/rows: 5,628

  3. Variable List:
    snp: SNP marker ID

  4. Missing data codes: NA

  5. Specialized formats or other abbreviations used: -

Genomic-based relationship matrices.R

R-script code with 159 lines.

Single-trait multi-site ABLUP and GBLUP models.R

R-script code with 436 lines.

Multi-trait multi-site ABLUP and GBLUP models.R

R-script code with 269 lines.

Methods

See Readme file

Funding

Natural Sciences and Engineering Research Council, Award: Discovery Grant

Johnson’s Family Forest Biotechnology Endowment