From microbiome to sperm motility traits: An inside out perspective
Data files
Feb 02, 2026 version files 12.14 MB
-
1_phyloseq.R
1.59 KB
-
2_alphadiv_group2.R
4.59 KB
-
3_PCoA-Permanova.R
1.55 KB
-
4_heatmap.R
3.14 KB
-
5_corrNetwork.qmd
2.93 KB
-
6_modularity_estimation.qmd
590 B
-
ASV_group2_filtered.txt
6.22 MB
-
bundle_edge_165.txt
6.92 KB
-
chao1_R.csv
678 B
-
Chao1.tsv
360 B
-
faith_R.csv
717 B
-
Genus_ab_165.txt
60.54 KB
-
group2_meta.txt
573 B
-
group2_taxonomy.txt
5.48 MB
-
guppymicro_skinbundles_electronic_supplementary.R
224.52 KB
-
MOTREP.xlsx
13.15 KB
-
primers
59 B
-
qiime_pipeline
1.16 KB
-
README.md
14.32 KB
-
shannon_R.csv
1.07 KB
-
shannon.tsv
737 B
-
skin_bundles_comp_R.xlsx
65.58 KB
-
skin_edge_165.txt
13.78 KB
-
w-unifrac.tsv
17.93 KB
Abstract
Growing interest in the relationship between microbiome composition and host biology has revealed the many ways host-associated microbes influence physiology, ecology, and evolution. However, microbial communities associated with reproductive organs -and their roles in reproduction -remain poorly understood. Here, we characterized the skin-and ejaculate-associated microbiomes in an internally fertilizing fish and tested whether microbial diversity and specific bacterial taxa correlate with sperm motility traits key for reproductive success. We used the guppy (Poecilia reticulata), a well-established model in ecology and evolutionary biology with well-characterised reproductive physiology. In guppies, sperm velocity is a validated predictor of male reproductive performance, making them a powerful system for exploring microbiome-fertility interactions. Our analyses reveal a correlation between skin microbiome diversity and sperm performance. Notably, increased skin microbiome total richness is associated with reduced sperm velocity, whereas no significant associations were detected for ejaculate-associated microbiomes. We also identified bacterial taxa across both tissues that were positively or negatively linked with sperm performance. These findings suggest that, while the ejaculate-associated microbiome may directly influence sperm traits, the skin microbiome could serve as a proxy for reproductive potential by reflecting systemic physiological and immunological states associated with fertility.
List of necessary files:
Files
1. group2_meta.txt
Description:
Metadata file containing information about the samples used in the study, including their unique identifiers, sex, and sampled tissue type.
Glossary:
| Variable | Description |
|---|---|
sample-id |
Unique identifier for each sample |
SEX |
Sex of the individual (M = male) |
TISSUE_ORGAN |
Sampled tissue type (e.g., skin, sperm_bundles) |
2. skin_edge_165.txt
Description:
Microbial co-occurrence network edges for samples derived from skin tissue. Each row represents a pairwise association between two microbial taxa, with a weight indicating the strength and direction of their interaction.
Glossary:
| Variable | Description |
|---|---|
id |
Unique row identifier |
x |
First microbial taxon in the association |
y |
Second microbial taxon in the association |
weight |
Association strength between taxa x and y (positive = co-occurrence; negative = mutual exclusion) |
3. bundle_edge_165.txt
Description:
Microbial co-occurrence network edges for samples derived from sperm bundle tissue. Structure is identical to skin_edge_165.txt but represents a separate network for comparative tissue-specific analyses.
Glossary:
| Variable | Description |
|---|---|
id |
Unique row identifier |
x |
First microbial taxon in the association |
y |
Second microbial taxon in the association |
weight |
Association strength between taxa x and y (positive = co-occurrence; negative = mutual exclusion) |
4. Genus_ab_165.txt
Description:
This file contains genus-level microbial relative abundance data across all samples included in the study. Each row corresponds to a bacterial genus (or ASV label), and each column to a sample (either from skin or sperm bundles).
Glossary:
| Variable | Description |
|---|---|
| Row names | Genus or taxonomic labels of microbial taxa (e.g., Bacteroides, Lactobacillus) |
| Column headers | Sample IDs (e.g., SS_2, B_1) corresponding to individual samples from skin (SS_) or sperm bundles (B_) |
| Cell values | Relative abundance of each microbial genus in a given sample (numeric values) |
5. group2_taxonomy.txt
Description:
This file provides the taxonomic classification of microbial sequence variants identified in the dataset. Each row corresponds to a unique taxonomic feature ID, and columns provide taxonomic resolution from domain to genus.
Glossary:
| Variable | Description |
|---|---|
| Row ID | Unique taxonomic feature identifier |
Domain |
Highest taxonomic level, typically "Bacteria" |
Phylum |
Second-highest taxonomic level |
Class |
Taxonomic class of the organism |
Order |
Taxonomic order of the organism |
Family |
Taxonomic family of the organism |
Genus |
Genus name, or "Unclassified Genus" if not assigned |
6. ASV_group2_filtered.txt
Description:
This file contains the filtered ASV (Amplicon Sequence Variant) count data across all samples. Each row corresponds to a unique ASV identifier (hash string), and each column to a sample ID. Values indicate the observed counts of each ASV in each sample.
Glossary:
| Variable | Description |
|---|---|
ID |
Unique ASV identifier (matching those in group2_taxonomy.txt) |
| Columns | Sample IDs (e.g., B_1, SS_2), referring to sperm bundle (B_) and skin (SS_) samples |
| Cell values | Raw or normalized count of the corresponding ASV in the given sample |
7. MOTREP.xlsx
Description:
This file contains sperm motility metrics collected across three technical replicates (A, B, or C) for each individual sperm sample. These measurements were used to assess sperm performance traits at fine resolution.
Glossary of Variables:
| Variable | Description |
|---|---|
ID |
Individual unique identifier |
VAP |
Sperm average path velocity (µm/s) |
VSL |
Sperm straight line velocity (µm/s) |
VCL |
Sperm curvilinear velocity (µm/s) |
ALH |
Amplitude of lateral head displacement (µm) |
BCF |
Sperm beat-cross frequency (Hz) |
STR |
Sperm straightness (ratio of VSL to VAP) |
LIN |
Sperm linearity (%) |
MOT |
Percentage of motile sperm cells (%) |
REPLICATE |
Technical replicate identifier for each sperm measurement (A, B, or C) |
8. skin_bundles_comp_R.xlsx
Description:
This file includes comparative data between skin and sperm bundle samples. It contains sperm motility metrics averaged by inidividual, microbial diversity indices, and relative abundances of bacterial taxa.
Glossary of Variables:
| Variable | Description |
|---|---|
ID |
Individual unique identifier |
VAP |
Sperm average path velocity (µm/s) |
VSL |
Sperm straight line velocity (µm/s) |
VCL |
Sperm curvilinear velocity (µm/s) |
ALH |
Amplitude of lateral head displacement (µm) |
BCF |
Sperm beat-cross frequency (Hz) |
STR |
Sperm straightness (ratio of VSL to VAP) |
LIN |
Sperm linearity (%) |
MOT |
Percentage of motile sperm cells (%) |
chaoB |
Chao1 diversity index for sperm bundles |
chaoS |
Chao1 diversity index for skin |
evennessB |
Evenness index for sperm bundles |
evennessS |
Evenness index for skin |
entropyB |
Shannon entropy (diversity) for sperm bundles |
entropyS |
Shannon entropy (diversity) for skin |
Bacterial taxa abundances are reported as relative abundances for each sample. To distinguish the tissue origin of each taxon, variable (column) names are prefixed as follows:
· B_: Taxa detected in sperm bundle samples
· S_: Taxa detected in skin samples
Each column named B_<TaxonName> or S_<TaxonName> refers to the abundance of a specific bacterial taxon in that tissue type. For example:
· B_Acinetobacter refers to the abundance of Acinetobacter in sperm bundles
· S_Massilia refers to the abundance of Massilia in skin samples
A complete list of bacterial taxa (i.e., column names) can be found in the dataset file itself as column headers.
9. chao1_R.csv
Description:
This file contains Chao1 species richness estimates for each microbial community sample, along with metadata for sex and tissue type. It is used to assess alpha diversity (microbial richness) across individuals and tissue types (skin vs. sperm bundles).
Glossary of Variables:
| Variable | Description |
|---|---|
id |
Unique sample identifier (e.g., B_1, SS_2) |
chao1 |
Chao1 richness estimate (number of species/ASVs) |
SEX |
Sex of the host individual (M = male) |
TISSUE_ORGAN |
Tissue where sample was collected (skin or sperm_bundles) |
10. Chao1.tsv
Description:
Same content and structure as chao1_R.csv, but stored in a tab-separated format for compatibility with bioinformatics pipelines.
Glossary:
Same as for chao1_R.csv.
11. shannon_R.csv
Description:
This file contains Shannon entropy values for each microbial community sample, used to assess alpha diversity in terms of both richness and evenness. It also includes metadata such as sample ID, tissue type, and host sex.
Glossary of Variables:
| Variable | Description |
|---|---|
id |
Unique sample identifier (e.g., B_1, SS_10) |
shannon_entropy |
Shannon diversity index (higher = more diverse community) |
SEX |
Sex of the host individual (M = male) |
TISSUE_ORGAN |
Tissue source of the sample (sperm_bundles or skin) |
12. shannon.tsv
Description:
Same content and structure as shannon_R.csv, but stored in a tab-separated format. This version is suitable for compatibility with bioinformatics tools and pipelines that require TSV input.
Glossary:
Same as for shannon_R.csv.
13. faith_R.csv
Description:
This file contains Faith's Phylogenetic Diversity (Faith PD) values for microbial communities in each sample. Faith PD is an alpha diversity metric that incorporates phylogenetic relationships among taxa, offering a richness estimate based on branch lengths in a phylogenetic tree. This file is comma-separated and formatted for R-based analyses or spreadsheets.
Glossary of Variables:
| Variable | Description |
|---|---|
id |
Unique sample identifier (e.g., B_1, SS_10) |
faith_pd |
Faith’s Phylogenetic Diversity index (higher = more phylogenetic richness) |
14. w-unifrac.tsv
Description:
This file contains weighted UniFrac distance values between all pairs of samples. Weighted UniFrac is a beta diversity metric that measures differences in microbial community composition between samples, incorporating both taxonomic abundance and phylogenetic distances.
Glossary of Variables:
| Variable | Description |
|---|---|
| Rows/Columns | Sample IDs (e.g., B_1, SS_2) representing pairwise comparisons |
| Cell values | Weighted UniFrac distance between the corresponding sample pair (0–1 range; 0 = identical communities, 1 = completely different) |
15. primers/
Description:
This folder contains the primer sequences used for 16S rRNA gene amplicon sequencing.
These primer sequences correspond to those reported in the Methods section of the manuscript and were used to amplify microbial DNA from guppy skin and sperm samples.
Contents:
primers/*— FASTA or text files listing the forward and reverse primers used in PCR amplification.
16. qiime_pipeline/
Description:
This folder contains the QIIME2 pipeline including the rarefaction step.
(R-scripts to run all the analyses by using the above mentioned data files)
- guppymicro_skinbundles_electronic_supplementary.R: Reshapes genus-level abundance data into long format for plotting or tables.
- 1_phyloseq.R: Creates a phyloseq object from ASV counts, taxonomy, and sample metadata. It also cleans taxonomy labels (removes d__, p__, etc.) and fills missing Family/Genus with “Unclassified”.
- 2_alphadiv_group2.R: Analyzes alpha diversity (Chao1, Shannon, Faith’s PD, Observed ASVs). Generates boxplots comparing diversity between tissue/organ groups (e.g. skin vs sperm bundles).
- 3_PCoA-Permanova.R: Uses beta diversity (weighted UniFrac) to: Run PERMANOVA (adonis2) testing group differences and Prepare for PCoA ordination
- 4_heatmap.R: Filters taxa by prevalence and abundance, aggregates to top genera, and builds a heatmap.
- 5_corrNetwork.qmd: Builds a correlation network of bacterial genera: Filters genera, Transposes abundance table and Prepares data for correlation and network visualization
- 6_modularity_estimation.qmd: Takes network edge lists (from Cytoscape) and: Builds graphs with igraph and Estimates network modularity
