Measuring hidden phenotype: quantifying the shape of barley seeds using the Euler characteristic transform
Cite this dataset
Amézquita, Erik et al. (2023). Measuring hidden phenotype: quantifying the shape of barley seeds using the Euler characteristic transform [Dataset]. Dryad. https://doi.org/10.5061/dryad.rxwdbrv93
Shape plays a fundamental role in biology. Traditional phenotypic analysis methods measure some features but fail to measure the information embedded in shape comprehensively. To extract, compare and analyse this information embedded in a robust and concise way, we turn to topological data analysis (TDA), specifically the Euler characteristic transform. TDA measures shape comprehensively using mathematical representations based on algebraic topology features. To study its use, we compute both traditional and topological shape descriptors to quantify the morphology of 3121 barley seeds scanned with X-ray computed tomography (CT) technology at 127 μm resolution. The Euler characteristic transform measures shape by analysing topological features of an object at thresholds across a number of directional axes. A Kruskal–Wallis analysis of the information encoded by the topological signature reveals that the Euler characteristic transform picks up successfully the shape of the crease and bottom of the seeds. Moreover, while traditional shape descriptors can cluster the seeds based on their accession, topological shape descriptors can cluster them further based on their panicle. We then successfully train a support vector machine to classify 28 different accessions of barley based exclusively on the shape of their grains. We observe that combining both traditional and topological descriptors classifies barley seeds better than using just traditional descriptors alone. This improvement suggests that TDA is thus a powerful complement to traditional morphometrics to comprehensively describe a multitude of ‘hidden’ shape nuances which are otherwise not detected.
We selected 28 barley accessions with diverse spike morphologies and geographical origins for our analysis (Harlan and Martini 1929, 1936, 1940). In November of 2016, seeds from each accession were stratified at 4C on wet paper towels for a week, and germinated on the bench at room temperature. Four day old seedlings were transferred into pots in triplicate and arranged in a completely randomized design in a greenhouse. Day length was extended throughout the experiment using artificial lighting ---minimum 16h light / 8h dark. After the plants reached maturity and dried, a single spike was collected from each replicate for scanning at Michigan State University.The scans were produced using the North Star Imaging X3000 system and the included efX software, with 720 projections per scan, with 3 frames averaged per projection. The data was obtained in continuous mode. The X‐ray source was set to a voltage of 75 kV, current of 100 μA, and focal spot size of 7.5μm. The 3D reconstruction of the spikes was computed with the efX-CT software, obtaining a final voxel size of 127 microns. The intensity values for all raw reconstructions was standardized as a first step to guarantee that the air and the barley material had the same density values across all scans. Next, the air and debris were thresholded out, and awns digitally.
Finally, the seed coat of the caryopses was digitally removed, leaving only the embryo and endosperm due to their high water content. We did not have enough resolution in the raw scans to distinguish clearly the endosperm from the embryo. Hereafter, we will refer to these embryo-endosperm unions simply as seeds. Due to the large volume of data, we used an in-house scipy-based python script to automate the image processing pipeline for all panicles and grains.
X-ray CT scan images are provided for 774 individual barley panicle and their corresponding 37 881 clean, individual seeds. All the scans are provided as single 3D 8-bit TIFF files. Three raw X-ray CT scans containing 4 barley panicles each are included as well. The code developed to process these scans and segment out the panicles and seeds is provided as well. The code is written as commented jupyter notebooks, using both python and R.
Please read the README files included with the data for more details.
National Institute of Food and Agriculture
Michigan State University
National Science Foundation, Award: CCF-1907591
National Science Foundation, Award: CCF-2106578
National Science Foundation, Award: 1711807: Plant Genome Postdoctoral Fellowship
National Science Foundation, Award: IOS-2046256: Plant Genome Research Program