Acute pseudo-landmarking and Constellation homologies: A generalized workflow to identify and track segmented structures in plant time series images
Hodge, John (2021), Acute pseudo-landmarking and Constellation homologies: A generalized workflow to identify and track segmented structures in plant time series images, Dryad, Dataset, https://doi.org/10.5061/dryad.sxksn033w
Assessing plant phenotypes throughout the lifecycle is integral to exploring the development, genetics, and evolution of morphology, and can be critical for agronomic and basic research studies. Although various automated or semi-automated phenomic approaches have been developed, it has been challenging to analyze differential growth because of difficulties in segmenting and annotating specific structures or positions in the plant body and maintaining their identities throughout time-series data. To address this gap, we have developed a generalized workflow linking our previously published function, Acute, with a companion homology workflow, Constellation, in the PlantCV environment. Acute identifies acute shapes (pseudo-landmarks) in the plant body, most often corresponding to leaf tips and ligular regions. Constellation uses a strategy of dimensionality reduction via starscape followed by hierarchical clustering through constella to identify ‘constellations’ of segments in eigenspace that represent the same landmark in consecutive images of a time-series. We devised a quality control function, constellaQC, to test the accuracy of the clustering approach, and use it to show that the approach appropriately clusters the pseudo-landmarks derived from Acute, with 80-90% accuracy. We discuss the reasons for and consequences of this lack of 100% accuracy in automated workflows and suggest how to develop these functions for other phenomics datasets that may vary in dimensional complexity.
Images were collected from an imaging cabinet designed from a series of Raspberry Pi 3-Model B boards using the native raspistill function found in the Raspbian version of the Debian OS which were subsequently used to generate the pseudo-landmark datasets available here. Pseudo-landmarks were then grouped into conserved identities through time via the Constellation homology workflow which could then be used for morphometric analysis thereafter. Ground-truthing measures were derived from FIJI using the corresponding images used for pseudo-landmarking and homology analysis.
This dataset includes two directories described below (one output data and other supplementary image data used to generate these inputs) in addition to an R script 'acute_heights_QC.R' which was used for ground-truthing the accuracy of the pseudo-landmarks against manual measures for ligular plant height (the corresponding data files for this script can be found in 'constella_homology_groups.QC_height_testing contents'.
Homology group files for each genotype surveyed contain 10 columns which can be interpreted as follows:
 'group'- de novo homology group assignment either as a serial number (constella output) or reannotated into a structure of biological interest. Note that '-' are considered NA grouping values given that these are rogue points which are considered uninformative for downstream morphometric analysis.
 'plmname'-serial name identifiers given to pseudo-landmarks for each image frame.
 'filename'-name of the image frame plmnames are derived from.
 'x' and  'y'-the pseudo-landmark x and y pixel coordinate locations in the image frame.
 'SS_x' and  'SS_y'-The bounding Start Site x and y pixel coordinate locations for acute islands along a contour which acute uses to define a specific plm coordinate location.
 'TS_x' and  'TS_y'-The bounding Termination Site x and y pixel coordinate locations for acute islands along a contour which acute uses to define a specific plm coordinate location.
 'cc_ratio'-The Convexity-Concavity Ratio defined by the average pixel intensity of the binary mask within the acute island which can serve as a useful metadata dimension in downstream homology grouping.
Ground-truthing heights data provides the age, leaf number, and ligular heights which correspond to the plants surveyed under this pseudo-landmarking and homology grouping strategy.
Three different image series directories for the genotypes and reps sampled for pseudo-landmarking are stored within this folder which include:
 The raw image files used as input data for this workflow.
 The initial batch binary threshold masks which were performed under a generalized pipeline in the PlantCV environment
 The corrected masks which situationally alleviated segmentation issues of the binary mask capturing the plants shape, especially around rolling leaves where the mask could bifurcate. Additionally corrections may have been secondary image thresholding in cases where the generalized color thresholds were insufficient to properly capture the plant shape.
National Science Foundation, Award: 1339332