Skip to main content
Dryad logo

Data and scripts for: Genetic dissection of seasonal vegetation index dynamics in maize through aerial based high-throughput phenotyping

Citation

Wang, Jinyu et al. (2022), Data and scripts for: Genetic dissection of seasonal vegetation index dynamics in maize through aerial based high-throughput phenotyping, Dryad, Dataset, https://doi.org/10.5061/dryad.44j0zpcf0

Abstract

Plant phenotyping under field conditions plays an important role in agricultural research. Efficient and accurate high-throughput phenotyping strategies enable a better connection between genotype and phenotype. Unmanned aerial vehicle-based high-throughput phenotyping platforms (UAV-HTPPs) provide novel opportunities for large-scale proximal measurement of plant traits with high efficiency, high resolution, and low cost. The objective of this study was to use time series normalized difference vegetation index (NDVI) extracted from UAV-based multispectral imagery to characterize its pattern across development and conduct genetic dissection of NDVI in a large maize population. The time series NDVI data from the multispectral sensor were obtained at 5 time points across the growing season for 1,752 diverse maize accessions with a UAV-HTPP. Cluster analysis of the acquired measurements classified 1,752 maize accessions into 2 groups with distinct NDVI developmental trends. To capture the dynamics underlying these static observations, penalized-splines (P-splines) model was used to obtain genotype-specific curve parameters. Genome-wide association study (GWAS) using static NDVI values and curve parameters as phenotypic traits detected signals significantly associated with the traits. Additionally, GWAS using the projected NDVI values from the P-splines models revealed the dynamic change of genetic effects, indicating the role of gene-environment interplay in controlling NDVI across the growing season. Our results demonstrated the utility of ultra-high spatial resolution multispectral imagery, as that acquired using a UAV-based remote sensing, for genetic dissection of NDVI.

Methods

The UAV system contains a DJI S900 UAV and an NIR converted multispectral Canon Rebel SL1 DSLR camera with an intervalometer and GPS. We conducted 5 UAV overflights across the growing season in 2017. Overflights were scheduled around 5 growth stages (V4, V8, V12, VT, and R5). The following image processing steps were applied to obtain high quality data from the raw UAV images: image pre-processing, orthomosaic generation, VIs calculation, and plot-level data extraction. The plot-level NDVI mean was calculated from the reflectance measurements in the red and NIR portion of the spectrum from the transect area of each plot. NDVI values generated on the -1 to 1 scale were rescaled by adding 1 and then multiplying by 128 to convert them into the [0 – 255] range. Time series NDVI values were obtained from five overflights for 1,752 diverse maize accessions. GWAS was conducted for the static NDVI values and the curve parameters derived across stages.

Usage Notes

## Dataset repository for the "Genetic Dissection of Seasonal Vegetation Index Dynamics in Maize through Aerial Based High-throughput Phenotyping" Project

### Outline of the repository
In order to better guide the visitors about the repository, here we briefly introduce the outline of the repository

### NDVI Distribution Dataset
#### 1. Dataset for NDVI distribution analysis
##### A. Name of the data file: AmesDP_NDVI_hand_measured_trait_with_group_infor.txt
##### B. File Overview

-Number of variables/columns: 12

-Number of rows: 1752

-Variable List:

    Group: Population information for each accession
    New_Group: Population information for each accession, different from 'Group' column is that the sweet/popcorn using the kernel-based criteria (doesn’t matter what the genetic background is, only whether the kernel type is sweet or pop)
    Order: Manually defined accession order number
    Taxa: Accession name
    NDVI_37DAP_2017 - NDVI_115DAP_2017 (5 variables in total): NDVI measurements at 37, 44, 60, 73, and 115 DAP. DAP stands for Days after planting
    FT_2017: flowering time measurement (DAP)
    PH_2017: plant height measurement (cm)
    EH_2017: ear height measurement (cm)
   

### NDVI Clustering And Population Structure Dataset
#### 1. Dataset for NDVI distribution analysis
##### A. Name of the data file: AmesDP_NDVI_hand_measured_trait_with_group_infor.txt
##### B. File Overview

-Basically the same dataset as above, so skip the file overview part 

#### 2. Dataset for K-means clustering result
##### A. Name of the data file: AmesPanel_NDVI_data_with_clustering_infor_2cluster
##### B. File Overview
-Number of variables/columns: 6

-Number of rows: 1752

-Variable List:

    Taxa: Accession name
    NDVI_37DAP_2017 - NDVI_115DAP_2017 (5 variables in total): NDVI measurements at 37, 44, 60, 73, and 115 DAP. DAP stands for Days after planting
    cluster: cluster category information for each accession, obtained from K-means clustering analysis

#### 3. Dataset for tSNE analysis
##### A. Name of the data file: AmesDP_genome_hmp_m1m20_s10
##### B. File Overview
-Number of variables/columns: 1753

-Number of rows: 31674

-Variable List:

    rs#: SNP ID
    the rest columns: accession name/taxa for each accession

-Dataset for SNP set used for tSNE analysis

#### 4. Dataset for ploting growth curve, clustering result and tSNE result
##### A. Name of the data file: AmesDP_2017_NDVI_cluster_tSNE_PCA_synthetic
##### B. File Overview
-Number of variables/columns: 17

-Number of rows: 1752

-Variable List:

    Group: Population information for each accession
    New_Group: Population information for each accession, different from 'Group' column is that the sweet/popcorn using the kernel-based criteria (doesn’t matter what the genetic background is, only whether the kernel type is sweet or pop)
    Order: Manually defined accession order number
    Taxa: Accession name
    NDVI_37DAP_2017 - NDVI_115DAP_2017 (5 variables in total): NDVI measurements at 37, 44, 60, 73, and 115 DAP. DAP stands for Days after planting
    cluster: cluster category information for each accession, obtained from K-means clustering analysis
    pheno_tsne_Y1: value on first dimension of tSNE result with phenotype data
    pheno_tsne_Y2: value on second dimension of tSNE result with phenotype data
    geno_tsne_Y1: value on first dimension of tSNE result with genotype data
    geno_tsne_Y2: value on second dimension of tSNE result with genotype data
    PC1 - PC3 (3 variables): first, second, third dimention data from PCA analysis

### P-Spline Modeling NDVI growth Dataset
#### 1. Dataset for P-spline modeling
##### A. Name of the data file: AmesDP_NDVI_hand_measured_trait
##### B. File Overview

-Number of variables/columns: 9

-Number of rows: 1752

-Variable List: 

    Taxa, NDVI_37DAP_2017 - NDVI_115DAP_2017, FT_2017, PH_2017, EH_2017 are the same as above dataset named AmesDP_NDVI_hand_measured_trait_with_group_infor.txt

#### 2. Dataset for P-spline modeling
##### A. Name of the data file: AmesDP2017_NDVI_long
-Basically an intermediate file, which is reframed from the file named 'AmesDP_NDVI_hand_measured_trait' (in wide format) to long format
##### B. File Overview
-Number of variables/columns: 4

-Number of rows: 1752

-Variable List: 

    Gen: Accession name
    Genor: Manually assigned accession order
    Date: Days After Planting
    NDVI: NDVI measurement

#### 3. Dataset for Psplines modeling parameter
##### A. Name of the data file: AmesDP_NDVI_Psplines_modeling_parameter.csv
##### B. File Overview
-Number of variables/columns: 4

-Number of rows: 1752

-Variable List: 

    Geno: Accession name
    asymptote: model estimated maximum NDVI value
    max_rate: model estimated maximum growth rate of NDVI
    inflection_point: point in time with maximum growth rate of NDVI

#### 4. Dataset for Psplines predicted NDVI value
##### A. Name of the data file: AmesDP_NDVI_Pspline_NDVI_prediction_by_1day.csv
##### B. File Overview

-Number of variables/columns: 3

-Number of rows: 1752

-Variable List: 

    geno: Accession name
    biomassspline: predicted biomass from 37-115 DAP with 1 day interval, the predicted biomass value is connected with '_'
    growthratespline: predicted growth rate from 37-115 DAP with 1 day interval, the predicted growth rate value is connected with '_'

#### 5. Dataset for Psplines predicted NDVI value
##### A. Name of the data file: AmesDP2017_observed_Psplines_model_fitted_value
##### B. File Overview
-Number of variables/columns: 21

-Number of rows: 1752

-Variable List: 

    geno: Accession name
    NDVI_37DAP_2017 - NDVI_115DAP_2017 (5 variables in total): NDVI measurements at 37, 44, 60, 73, and 115 DAP.  DAP stands for Days after planting
    Pspline_37DAP - Pspline_115DAP (15 variables in total): predicted NDVI value at 37, 44, 44, 51, 58, 65, 72, 79, 86, 93, 100, 107, 114, 60, 73, 115 DAP

#### 6. Dataset for plotting growth curve, and correlation between observed and predicted NDVI value
##### A. Name of the data file: AmesPanel2017_observed_Psplines_model_fitted_with_pop_structure
##### B. File Overview
-Number of variables/columns: 14

-Number of rows: 1752

-Variable List: 

    Group: Population information for each accession
    New_Group: Population information for each accession, different from 'Group' column is that the sweet/popcorn using the kernel-based criteria (doesn’t matter what the genetic background is, only whether the kernel type is sweet or pop)
    Order: Manually defined accession order number
    Taxa: Accession name
    NDVI_37DAP_2017 - NDVI_115DAP_2017 (5 variables in total): NDVI measurements at 37, 44, 60, 73, and 115 DAP. DAP stands for Days after planting
    Pspline_37DAP - Pspline_115DAP (5 variables in total): predicted NDVI value at 37, 44, 60, 73, and 115 DAP. 

### GWAS of NDVI and Pspline Curve Parameters Dataset

#### 1. Dataset for plotting Manhattan plot for NDVI and Pspline Curve Parameters
##### A. Name of the data file: GWAS-21M_SNPs.txt
##### B. File Overview 

-Number of variables/columns: 2

-Number of rows: 21129389

-Variable List: 

    CHROM: Chromosome number
    POS: Position on correspongind chromosome

#### 2. Dataset for plotting Manhattan plot for NDVI and Pspline Curve Parameters
##### A. Name of the data file: NDVI_candidate_gene_list_FDR
##### B. File Overview 

-Number of variables/columns: 13

-Number of rows: 93

-Variable List: 

    Trait: trait name
    Gene_name: Gene name
    Gene_ID_v3: Gene ID from maize B73 genome Version 3
    Gene_Chr_V3: chromosome number basedon B73 genome Version 3
    Gene_S: start position on chromosome
    Gene_E: end position on chromosome
    Tag_SNP: tagged SNP position on chromosome
    Distance: the distance from tagged SNP to gene
    Abt_distance: the absoute distance from tagged SNP to gene
    Alias: gene alias name
    Significant: whether the tagged SNP is significantly associated with the trait
    AtID: gene ID in arabidopsis
    OsID: gene ID in rice

#### 3. Dataset for plotting Manhattan plot for NDVI and Pspline Curve Parameters
##### A. Name of the data file: NDVI_GWAS_FDR_threshold
##### B. File Overview

-Number of variables/columns: 5

-Number of rows: 25

-Variable List: 

    Trait: Trait name
    p_value: p-value obtained from GWAS
    Threshold: calculted FDR threshold

#### 4. Dataset for plotting Manhattan plot for NDVI and Pspline Curve Parameters
##### A. Name of the data file: ##### A. Name of the data file: AmesDP_NDVI_73DAP_2017_genome_wide_gwas_results_sorted_log2_wh, mesDP_NDVI_115DAP_2017_genome_wide_gwas_results_sorted_log2_wh, AmesDP_max_rate_genome_wide_gwas_results_sorted_log2_wh, AmesDP_asymptote_genome_wide_gwas_results_sorted_log2_wh
##### B. File Overview
- GWAS output from GAPIT for 4 different trait, NDVI_73DAP, NDVI_115DAP, asymptote, max rate. normal GAPIT output file. Skip the description for file overview

### Dynamic Changes of Allelic Effect Dataset
##### A. Name of the data file: Psplines_SNP_effect
##### B. File Overview

Number of variables/columns:9

-Number of rows: 45

-Variable List: 

    SNP: SNP ID
    Position: SNP position
    P.value: P-value
    maf: minor allele frequency
    Rsquare.of.Model.without.SNP: Rsquare of model when not including the SNP
    Rsquare.of.Model.with.SNP: Rsquare of model when including the SNP
    FDR_Adjusted_P.values: FDR adjusted p_values 
    effect: SNP effect
    Chro: Chromosome number

Funding

National Institute of Food and Agriculture, Award: 2017-67007-25942