Data from: Comparing regression-based approaches for identifying microbial functional groups
Data files
May 06, 2025 version files 11.82 KB
-
data_figure2.mat
2.44 KB
-
data_figure3.mat
3.06 KB
-
data_figureS2.mat
1.19 KB
-
data_figureS3.mat
2.82 KB
-
README.md
2.31 KB
Abstract
Microbial communities are composed of functionally integrated taxa, and identifying which taxa contribute to a given ecosystem function is essential for predicting community behaviors. This study compares the effectiveness of a previously proposed method for identifying ``functional taxa,'' Ensemble Quotient Optimization (EQO), to a potentially simpler approach based on the Least Absolute Shrinkage and Selection Operator (LASSO). In contrast to LASSO, EQO uses a binary prior on coefficients, assuming uniform contribution strength across taxa. Using synthetic datasets with increasingly realistic structure, we demonstrate that EQO's strong prior enables it to perform better in low-data regime. However, LASSO’s flexibility and efficiency can make it preferable as data complexity increases. Our results detail the favorable conditions for EQO and emphasize LASSO as a viable alternative.
https://doi.org/10.5061/dryad.n8pk0p366
Description of the data and file structure
The data was generated from the RUN_ME.m script in the attached code for plotting the figures shown in the article.
Files and variables
File: data_figure2.mat
Description: Data used to generate figure 2.
Variables
- data: A data structure with fields:
- LASSOMapA: A structure with field 'accuracies' -- a matrix representing the accuracy heatmap for LASSO under binary ground truth
- EQOMapA: A structure with field 'accuracies' -- a matrix representing the accuracy heatmap for EQO under binary ground truth
- LASSOMapB: A structure with field 'accuracies' -- a matrix representing the accuracy heatmap for LASSO under non-binary ground truth
- EQOMapB: A structure with field 'accuracies' -- a matrix representing the accuracy heatmap for EQO under non-binary ground truth
- accuracyGapChange: A structure with field 'accuracies' -- a matrix representing the heatmap of accuracy gap between EQO and LASSO under non-binary ground truth
File: data_figure3.mat
Description: Data used to generate figure 3.
Variables
- data: A data structure with fields:
- varyFracReal: Accuracies of LASSO and EQO under varying fraction of real taxa
- varyPhylo: Accuracies of LASSO and EQO under varying strength of phylo-dependency
File: data_figureS2.mat
Description: Data used to generate figure S2.
Variables
- data: A data structure with fields:
- LASSO: LASSO accuracies under different noise level
- EQO: EQO accuracies under different noise level
File: data_figureS3.mat
Description: Data used to generate figure S3.
Variables
- data: A data structure with fields:
- LASSO: LASSO accuracies and std's under different cutoff values
- EQO: EQO accuracies and std's under different cutoff values
Code/software
Import data into MATLAB to view it. Download scripts, navigate to the directory containing RUN_ME.m and run RUN_ME.m (code files under Zenodo) in MATLAB to generate the resulted figures (uploaded to Zenodo as supplemental files) from the computed data, or from scratch.
