Data from: High-throughput phenotyping for the prediction and quantification of flower-related traits in sugarcane
Data files
Feb 23, 2026 version files 668.91 MB
-
IAC_Flowering_Field_Data.csv
21.15 KB
-
IAC_FloweringCount_byCNN.csv
8.90 KB
-
Orthomosaic_Metashape_Reports.zip
666.88 MB
-
Raw_CC_Data.csv
156.28 KB
-
Raw_EXG_Data.csv
911.24 KB
-
Raw_PH_Data.csv
499.39 KB
-
Raw_PV_Data.csv
422.90 KB
-
README.md
8.19 KB
Abstract
This dataset provides the comprehensive primary data and metadata supporting the research article: "High-Throughput Phenotyping for the Prediction and Quantification of Flower-Related Traits in Sugarcane." It integrates traditional agronomic field assessments with digital metrics derived from high-throughput phenotyping (HTP) using Unmanned Aerial Vehicles (UAVs). The data is structured to facilitate the development and validation of Machine Learning (ML) models for both classification and regression tasks in plant breeding. The dataset includes: HTP Features: Raw values for all vegetation indices (e.g., ExG) and structural metrics (Canopy Cover, Plant Height, and Volume) extracted from RGB orthomosaics across multiple time points; and Ground Truth Field Data: Complete manual measurements of flower-related traits, including Days to Flag Leaf (DTFL), Days to Flowering (DTF), and flowering intensity.
Paulo H. da Silva Santos¹˒²˒⁶, João R. Vieira Manechini²˒⁶, Carlos Kantack², Mauro Xavier²˒⁶, Samira Domingues Carlin²˒⁶, Dilermando Perecin¹, Thiago G. Marconi⁴, Leandro G. Marconi⁴, Elisson Romanel³, Steve Jackson⁵, Marcos G.A. Landell²˒⁶, Luciana R. Pinto¹˒²˒⁶.
¹ São Paulo State University (FCAV/UNESP) ² Agronomic Institute of Campinas (IAC/APTA) Advanced Center for Sugarcane Development and Research ³ University of São Paulo (EEL-USP) ⁴ Digital Phenotyping - Votuporanga/SP ⁵ University of Warwick, UK ⁶*Center for Plant Molecular Breeding (CeM²P)-UNICAMP- Campinas, São Paulo, Brazil
*
Project Overview
- Data Collection: May 2023 – August 2024
- Dataset Compiled: 2025
- Location: Serra Grande, Bahia, Brazil (14°28'25.084" S, 39° 4'39.021" W)
- Experimental Design: The augmented block design consists of 13 blocks arranged in 60 metres of rows, encompassing a total of 195 plots.
This dataset integrates traditional field inspections with High-Throughput Phenotyping (HTP) using a UAV DJI MAVIC 3E to evaluate flowering time and intensity in sugarcane across two crop seasons: Plant Cane (2023) and First Ratoon (2024).
Data Description
1. Visual Field Assessments
Manual inspections conducted to determine the onset of reproductive stages:
- DTFL: Days to Flag Leaf.
- DTF: Days to Flowering.
- Stalk Count: Total culms at maturity.
- Flowering Intensity: Quantitative assessment of blooms per plot.
2. HTP Metrics (Drone-based)
Data derived from RGB sensors and orthomosaics (129 data points):
- Vegetation Index: Excess Greenness (ExG).
- Structural Metrics: Canopy Height Model (CC), Plant Height (PH), and Plant Volume (PV).
- Indirect Observation: Days to Flowering via orthomosaic database (DTF_Ortho).
- In the data files, the acronym "av" is used to denote 'average', and "sd" is used to denote 'standard deviation' with regard to the aforementioned metric.
- [!NOTE]
Accessing Imagery: Raw imagery (~4GB per data point) is available upon request. Orthomosaic data can be explored at iacflor.fenotipagemdigital.com. Requests for additional information regarding sensitive details of the IAC sugarcane breeding program may be considered upon reasonable request.
File Dictionary
These files support analysis for ANOVA, Heritability, Repeatability, Feature Selection, and Machine Learning (Classification/Regression).
| File Name | Description |
|---|---|
IAC_Flowering_Field_Data.csv |
Contains all raw field observations and manual measurements. |
Orthomosaic_Metashape_Reports.zip |
Contains all reports from processed orthomosaics in Agisoft Metashape software. |
IAC_FloweringCount_byCNN.csv |
Contains seven datapoints on which the CNN model predicted the flowering counts to be compared with visual assessments at the Plant cane stage. |
Raw_CC_Data.csv |
Contains all raw model metrics calculated in all datapoints for Canopy cover. |
Raw_EXG_Data.csv |
Contains all raw vegetation indices calculated in all data points for excess greenness. |
Raw_PH_Data.csv |
Contains all raw model metrics calculated in all datapoints for plant height. |
Raw_PV_Data.csv |
Contains all raw model metrics calculated in all datapoints for plant volume. |
Design_Flowering_Experiment_Trial.pdf |
Contains an image from the experimental design, the genotypes (or treatments) highlighted in yellow are late flowering ones, blue are the early flowering and green the checks repeated in all blocks considered as mid-flowering. |
Variable Abbreviations
| Variable | Description | Stage |
|---|---|---|
| DTFL / DTFL2 | Days to Flag Leaf | Plant Cane / 1st Ratoon |
| DTF / DTF2 | Days to Flowering | Plant Cane / 1st Ratoon |
| StalkCount / 2 | Stalk count at maturity | Plant Cane / 1st Ratoon |
| intflor | Flowering intensity | Plant Cane |
| DTF_ortho / 2 | Days to Flowering (via Orthomosaic) | Plant Cane / 1st Ratoon |
Dataset Description: IAC Flowering & HTP Data
This dataset contains ground truth observations and High-Throughput Phenotyping (HTP) data for sugarcane flowering trials. The experiment follows an Augmented Block Design.
1. Experimental Design & Ground Truth
- File:
IAC_Flowering_Field_Data.csv - Structure: This file contains the raw ground truth data.
- Block: Refers to the 13 experimental blocks.
- Treatment: Denotes the genotypes tested.
- Checks: Three specific genotypes (125-IAC0752, 127-IAC07395, and 128-IACSP018158) were replicated across all 13 blocks to facilitate the augmented block design. This structure applies to both field and HTP data.
2. Flowering Counts (Visual vs. CNN)
- File:
IAC_FlowerigCount_byCNN.csv - Details: This file compares manual observations with deep learning outputs.
- Date Format: Day/Month/Year (DD/MM/YYYY).
- Visual: Manual flower counts performed in the field (Ground Truth).
- CNN: Automated flower counts generated by a Convolutional Neural Network (Deep Learning model).
3. HTP Metrics (CC, PH, PV, and ExG)
- Files:
Raw_CC_Data.csv,Raw_PH_Data.csv,Raw_PV_Data.csv, andRaw_EXG_Data.csv. - Column Guide:
- Ignored Columns:
id,Experiment, andvi(internal database identifiers). - row: Equivalent to the Block in the augmented design (1–13).
- col: Refers to the 15 plots/genotypes per block.
- type: The flowering response classification (EF = Early Flowering; LF = Late Flowering).
- Metric Columns: Named by the metric acronym followed by the flight date in YYMMDD format (e.g., PH230825 = Plant Height measured on August 25, 2023).
- Ignored Columns:
4. Vegetation Indices & Metric Definitions
All metrics were derived from RGB (Red, Green, Blue) orthomosaics and digital surface/terrain models:
Observation: To replicate the orthomosaic processing settings and ensure consistent results, please refer to the file Orthomosaic_Metashape_Reports.zip. This archive contains a detailed PDF report for every flight conducted during the experiment, providing the full software configuration used in Agisoft Metashape.
| Metric | Description | Scale/Range |
|---|---|---|
| Canopy Cover (CC) | The percentage of area covered by vegetation within a plot. | 0 – 100% |
| Excess Green (ExG) | A proxy for plant vigor and chlorophyll content. | -1 to 1 |
| Plant Height (PH) | Estimated plant height using Digital Surface Models (DSM). | Meters |
| Plant Volume (PV) | Estimated volume occupied by the plant within the plot. | Cubic Meters |
Code Availability
All statistical analyses and machine learning scripts used in this study are available at:
