Codes and data for: Clustering optimisation method for highly connected biological data
Cite this dataset
Tjörnhammar, Richard (2022). Codes and data for: Clustering optimisation method for highly connected biological data [Dataset]. Dryad. https://doi.org/10.5061/dryad.q2bvq83p7
Currently, data-driven discovery in biological sciences resides in finding segmentation strategies in multivariate data that produce sensible descriptions of the data. Clustering is but one of several approaches and sometimes falls short because of difficulties in assessing reasonable cutoffs, the number of clusters that need to be formed or that an approach fails to preserve topological properties of the original system in its clustered form. In this work, we show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data.
The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data. The resulting clustering approach only relies on metrics derived from the inherent properties of the clustering. The new method facilitates knowledge for optimised clustering, which is easy to implement.
We discuss how the clustering optimisation strategy corresponds to the viable information content yielded by the final segmentation. We further elaborate on how the clustering results, in the optimal solution, corresponds to prior knowledge of three different data sets.
This is the dataset and the codes required to conduct the above-mentioned analysis.
Knut and Alice Wallenberg Foundation, Award: WASPDDLS21:092