Growth traits of a tropical timber species at Southeast Asia, Shorea macrophylla, and scripts for genome wide association study and genomic prediction
Cite this dataset
Tani, Naoki et al. (2023). Growth traits of a tropical timber species at Southeast Asia, Shorea macrophylla, and scripts for genome wide association study and genomic prediction [Dataset]. Dryad. https://doi.org/10.5061/dryad.kkwh70s8d
Abstract
Shorea macrophylla is a commercially important tropical tree species grown for timber and oil. It is amenable to plantation forestry due to its fast initial growth. Genomic selection (GS) has been used in tree breeding studies to shorten long breeding cycles but has not previously been applied to S. macrophylla. To build genomic prediction models for GS, leaves and growth trait data were collected from a half-sib progeny population of S. macrophylla in Sari Bumi Kusuma forest concession, central Kalimantan, Indonesia. 18037 SNP markers were identified in two ddRAD-seq libraries. Genomic prediction models based on these SNPs were then generated for breast height and total height in the 7th year from planting (D7 and H7). These traits were chosen because of their relatively high narrow-sense genomic heritability and because seven years was considered long enough to assess initial growth. Genomic prediction models were built using 12 methods with the full set of identified SNPs and subsets of 48, 96, and 192 SNPs selected based on the results of a genome-wide association study (GWAS). The GBLUP and RKHS methods gave the highest predictive ability (PA) for D7 and H7 and showed that D7 has an additive genetic architecture while H7 has an epistatic genetic architecture. LightGBM and CNN1D also achieved high PA for D7 with 48 and 96 selected SNPs, and for H7 with 96 and 192 selected SNPs, showing that gradient boosting decision trees and deep learning can be useful in genomic prediction. For almost all methods and both traits, PA was higher when SNPs were selected based on their GWAS P-values than when using the full set of SNPs. These results suggest that GS with GWAS-based SNP selection could be used in S. macrophylla breeding to improve initial growth and reduce genotyping costs for next generation seedlings.
README: Growth traits of a tropical timber species at Southeast Asia, Shorea macrophylla, and scripts for genome wide association study (GWAS) and genomic prediction (GP)
Description of the data and file structure
CSV file named as "Pheno_raw.csv"
This CSV file shows the measured diameter at breast height (from D1 to D12.5) and tree height (from H1 to H12.5) data of 940 Shorea macrophylla progenies which are initially planted in the field.
CSV file named as "Pheno_adujusted_rm.csv"
This CSV file shows the corrected data of diameter at breast height (from D1 to D12.5) and tree height (from H1 to H12.5) data of 360 individuals of the Shorea macrophylla progeny trial from year 1 to 12.5 years after transplanting. The units for diameter at breast height are in centimeters and the units for tree height are in meters. This data has already been corrected by spatial structure analysis.
The missing values are shown as "n/a". No blank cells are included in the two CSV files.
The missing values are not used for the scripts in 04: Spatial analysis and data correction, exclusion of outlier data.
In the 04 analysis, these phenotypic data adjusted for the subsequent analyses in 05-07 (See Code/Software section).
If The codes in 04 is not properly worked, please replace "n/a" to "NA".
Code/Software
The following scripts were used for the analyses, which were written using bash, R and python (shown by extension). In our analyses components were subdivided as follows and the scripts were classified by the left two digit number of the file name.
01: Filtering
02: Principle Coordinate Analysis (PCA)
03: Linkage disequilibrium (LD)
04: Spatial analysis and data correction, exclusion of outlier data
05: Narrow-sense genomic heritability
06: Genome-wide association study (GWAS)
07: Genomic prediction model
0101_dDocent_filters_v2.9.4_mod.sh
0102_remove_alone_snps_on_each_scaffold.R
0103_extract_estimated_nes.R
0201_thin_vcf.sh
0202_convert_vcf_to_csv.py
0203_pca.R
0301_calculate_ld.sh
0302_subset_ld_file.R
0303_optimize_loess_span.R
0304_calculate_ld_trend_line.R
0305_plot_ld_decay.R
0401_spatial_analysis.R
0402_remove_outlier.R
0501_genomic_heritability.R
0601_convert_vcf_to_hmpdpl.sh
0602_gwas.R
0603_qq_manhattan_plot.R
0701_requirements.txt
0702_split_data.py
0703_gwas_training_population.R
0704_gblup.R
0705_bglr.R
0706_rf.py
0707_xgb.py
0708_lgb.py
0709_cnn1d.py
0710_convert_vcf_to_netCDF.py
0711_cnn2d.py
Methods
CSV file named as "Pheno_raw.csv"
This CSV file shows the measured diameter at breast height (from D1 to D12.5) and tree height (from H1 to H12.5) data of 940 Shorea macrophylla progenies which are initially planted in the field.
CSV file named as "Pheno_adujusted_rm.csv"
This CSV file shows the corrected data of diameter at breast height (from D1 to D12.5) and tree height (from H1 to H12.5) data of 360 individuals of the Shorea macrophylla progeny trial from year 1 to 12.5 years after transplanting. The units for diameter at breast height are in centimeters and the units for tree height are in meters. This data has already been corrected by spatial structure analysis.
Funding
Japan International Research Center for Agricultural Sciences, Award: a1A401b
Japan Science and Technology Agency, Award: JPMJSA2101