Skip to main content

PyTorch geometric datasets for morphVQ models

Cite this dataset

Thomas, Oshane et al. (2022). PyTorch geometric datasets for morphVQ models [Dataset]. Dryad.


The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset, we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation and area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit.


The main dataset consists of 102 triangular meshes from laser surface scans of hominoid cuboid bones. These cuboids were from wild-collected individuals housed in the American Museum of Natural History, the National Museum of Natural History, the Harvard Museum of Comparative Biology, and the Field Museum. Hylobates, Pongo, Gorilla, Pan, and Homo are all well represented.

Each triangular mesh is denoised, remeshed, and cleaned using the Geomagic Studio Wrap Software. The resulting meshes vary in vertex-count/resolution from 2,000 - 390,000. Each mesh is then upsampled or decimated to an even 12,000 vertices using the recursive subdivisions process and quadric decimation algorithm implemented in VTK python.

The first of the two smaller datasets is comprised of 26 hominoid medial cuneiforms meshes isolated from laser surface scans obtained from the same museum collections listed above. The second dataset comprises 33 mouse humeri meshes from micro-CT data (34.5 μm resolution using a Skyscan 1172). These datasets were processed identically to the 102 hominoid cuboid meshes introduced above.

Usage notes

These datasets are customized Torch Geometric Datasets that contain raw .off polygon meshes as well as preprocessed .pt files needed for training morphVQ models. morphVQ can be found at


National Science Foundation, Award: BCS-0925734