Crop performance, aerial, and satellite data from multistate maize yield trials
Data files
May 09, 2024 version files 3.20 GB
-
DataPublication_final.zip
-
README.md
May 09, 2024 version files 3.20 GB
-
DataPublication_final.zip
-
README.md
Abstract
Accurate genotype-specific early yield estimates at fields and plots offer potential benefits to farmers in optimizing their agronomic practices, breeders in screening hundreds and thousands of varieties, and policymakers in decisions contributing to the overall improvement of agriculture and food production systems. Effective, generalizable approaches to track plant growth and predict yield at the individual plot level require large matched datasets of remote sensing and ground truth data collected across multiple environments. Low-altitude drone flights are increasingly being used to collect data from field evaluations of new crop varieties, while satellite imagery is being explored to track yield and management practices at the regional and field scales. Despite their lower spatial resolution, satellite platforms exhibit multiple logistical and technical advantages in scalability and accessibility, and could facilitate plot-level predictions, especially with steadily improving spatial resolution. However, genotype-specific, plot-level, high-resolution satellite images from multiple environments integrated with the ground truth measurements are not yet publicly available. Here we generated, described, and evaluated a set of more than 20,000 plot-level images of over 80 hybrid maize (Zea mays) varieties grown in six locations across the US corn belt under various management practices collected from (near simultaneous) satellite and drone flights integrated with ground truth measurements of crop yield. Of the six baseline models examined, models employing data collected from satellite images often matched or exceeded the performance of models employing data collected from drones for both within-environment and cross-environment yield prediction. Large, multimodal, multi-environment, genetically diverse training datasets such as those generated in this study, along with more complex models could help unlock the power of satellite imagery as an important new addition to the tool of farmers, plant geneticists, crop breeders, and policymakers.
README: Crop performance, aerial, and satellite data from multistate maize yield trials
https://doi.org/10.5061/dryad.905qftttm
Maize (Zea mays) field experiments were conducted at five locations in 2022: Scottsbluff, NE, Lincoln, NE, Missouri Valley, IA, and Ames, IA with two-thirds of the experiment in one field and one-third of the experiment in another field and Crawfordsville, IA.
This file describes the organization and format of the satellite, UAV, and ground truth data published in Shrestha et al. In addition, we provide an example code for searching and retrieving images from individual hybrid corn research plots as well as calculating the wavelength intensity values and indices employed in our study.
Directory Layout:
- Satellite
- Five directories correspond to five of the six locations from which satellite imagery was collected (North Platte, NE was excluded because of issues with plot segmentation).
- Six directories numbered sequentially "TP1", "TP2", "TP3", "TP4", "TP5", and "TP6" corresponding to the order of satellite images collected at each location. Note that the timing of time points with the same number are not identical across locations, and are not guaranteed to be similar. The time of acquisition for each time point at each location is provided in the file DateofCollection.xlsx in the folder "GroundTruth".
- .TIF formatted images segmented for each plot at a given location at a given time point. Rows often were at orientations other than 0 or 90 degrees in satellite images. Segmented images consist of the minimum bounding box encompassing the plot of interest, with real pixel data for all plot pixels and zero values entered for all pixels within the minimum bounding box but not part of the plot of interest.
- Plot image names follow the format location-time-experiment_range_row.tif (Lincoln-TP1-hybirds_2_2.TIF). An example implementation of how to map between image names, plot IDs, and ground truth information is provided later in this document. The necessary information to map between ground truth, plot ID, and images is provided in the file HYBRID_HIPS_V3.5_ALLPLOTS.csv located in the "GroundTruth" folder.
- Each image contains data on six bands per pixel: near-infrared, red edge, red, green, blue, and deep blue.
- FieldLevelImages contains field level images before cropping/segmenting to produce plot level images with copyright attribution © Airbus DS (2022). .PNG formatted images names follow location-time-experiment.PNG (Lincoln-TP1-hybrids.PNG).
- UAV
- Five directories corresponding to five of the six locations from which UAV imagery was collected (as with the satellite images, North Platte, NE UAV images were excluded because of issues with plot segmentation).
- Three directories numbered sequentially "TP1", "TP2", and "TP3" corresponding to the order of UAV flights conducted at each location. Note that the timing of time points with the same number is not identical, and is not guaranteed to be similar across locations. The time of acquisition for each time point at each location is provided in the file DateofCollection.xlsx in folder "GroundTruth".
- .PNG formatted images segmented for each plot at a given location at a given time point. Rows often were at orientations other than 0 or 90 degrees in UAV image mosaics. Segmented images consist of the minimum bounding box encompassing the plot of interest, with real pixel data for all plot pixels and zero values entered for all pixels within the minimum bounding box but not part of the plot of interest.
- Plot image names follow the format location-time-experiment_range_row.png (Crawfordsville-TP1-4351_3_15.PNG). An example implementation of how to map between image names, plot IDs, and ground truth information is provided later in this document. The necessary information to map between ground truth, plot ID, and images is provided in the file HYBRID_HIPS_V3.5_ALLPLOTS.csv located in the "GroundTruth" folder.
- Each image contains data on three bands per pixel: red, green, and blue.
- GroundTruth
- HYBRID_HIPS_V3.5_ALLPLOTS.csv contains one record per hybrid maize plot grown at each of the six locations in this study. Each record includes information on the field, location within the field (row and column), the hybrid genotype planted in that plot, and a set of ground truth data collected from that plot. The individual ground truth measurements are defined below
- plantingDate: the date the plot was planted.
- totalStandCount: The number of living plants observed in the middle two rows of the four-row plot. Note that plots are variable length between locations, so this needs to be corrected for plot size to calculate the density of plants per unit area.
- daysToAnthesis: The difference between the planting date and the first date where at least 50% of living plants in the plot had visible anthers present on their tassels (syn. "male flowering"). Not present for all locations.
- GDDToAnthesis: This is a method for quantifying flowering time that corrects for the fact plants grow faster on warm days and slower in the cold. GDD stands for growing degree days. The number of growing degree days per day was calculated using temperatures in Fahrenheit with a crop base temperature of 50 degrees and a crop maximum temperature of 86 degrees.
- yieldPerAcre: estimated grain yield per plot. This measurement starts with the direct measurement of grain pass per plot and grain moisture percentage and then corrects for variation in moisture content and differences in plot size across locations to calculate bushels per acre yield at a standardized 15.5% moisture content.
- DateofCollection.xlsx translates location + TP1/2/3/etc into the specific date of image collection for both UAV and Satellite images collected at all locations included in this study.
- HYBRID_HIPS_V3.5_ALLPLOTS.csv contains one record per hybrid maize plot grown at each of the six locations in this study. Each record includes information on the field, location within the field (row and column), the hybrid genotype planted in that plot, and a set of ground truth data collected from that plot. The individual ground truth measurements are defined below
- Documentation
- Documentation.ipynb: example code for extracting images, calculating different indices, and linking images/indices to the ground truth data.
- Readme.md: explaination of data types and layout.
Methods
UAV Image Acquisition and Processing
UAV visible spectral (RGB) imagery was collected at three time points per location. The goal was to acquire images of maize during the vegetative, reproductive, and post-flowering growth stages from the fields at each location, capturing images at three different time points (Supplemental Data Set S1). In Scottsbluff, NE, images were acquired with DJI Matrice 600 Pro with DJI Zenmuse X3 and a 12 Mega Pixel (MP) RGB (red, green, blue) camera as an image acquisition sensor. Images were acquired at an altitude of 100 ft (30.48 m) with a front overlap of 90% and a side overlap of 65%. In North Platte, images were acquired using DJI Inspire 2 with a Sentra Double 4K AG+ RGB camera as an image sensor at an altitude of 50 ft (15.24 m) with front and side overlap of 70%. In Lincoln, images were acquired with a DJI Phantom 4 RTK with a DJI Zenmuse P1 camera, with a 45 MP RGB camera as an image acquisition sensor. Images were acquired at an altitude of 115 ft (35 m) with front and side overlap of 80%. In Missouri Valley, Ames, and Crawfordsville, IA, DJI Phantom 4 Pro V2.0 with DJI 20 MP RGB cameras was used as an image acquisition sensor, and images were acquired at an altitude of 100 ft (30.48 m) with front and side overlap of 80%. The UAV images were processed and stitched using Pix4D Mapper 4.8.4 (Pix4D 2024) and AgiSoft Metashape 1.8.4 (Agisoft Metashape 2024), photogrammetric software to create RGB orthomosaic images using default parameters during image processing.
Satellite Image Acquisition
Pléiades Neo was used to capture images at all locations at six different time points (approximately two weeks apart), with the first three time points close to the dates of the three UAV image acquisitions at each location. Table 1 shows the specifications of this satellite constellation. The average widths of the six bands in satellite multispectral images are as follows: Red (620 – 690 nm), Green (530 – 590 nm), Blue (450 – 520 nm), Near-infrared (NIR, 770 – 880 nm), Red Edge (700 – 750 nm), and Deep Blue (400 – 450 nm). Along with multispectral images, a single-band panchromatic raster file with a wide width band of approximately 450-800 nm was generated. Each image captured a total area of 100 km × 100 km per location, covering the entire experimental field at each location simultaneously. Final 16-bit GeoTIFF satellite images with 30-cm resolution were generated and provided to us after panchromatic sharpening or pan-sharpening using panchromatic band image files, manual ortho-rectification, and atmospheric correction by Pleiades Neo.