Data and code for: Relative testis size is associated with vagina length but not sperm storage traits in Galliformes
Data files
Oct 04, 2025 version files 71.93 KB
-
Analysis.R
33.24 KB
-
README.md
13.32 KB
-
RepeatDF.csv
17.97 KB
-
SummaryBothFolds.csv
3.51 KB
-
SummarySpeciesAvgs.csv
3.90 KB
Abstract
This dataset contains the data and code to replicate the analyses in “Relative testis size is associated with vagina length but not sperm storage traits in Galliformes.” The study examined relationships among sperm storage capacity, sperm storage tubule (SST) morphology, relative testis mass (a proxy for post-copulatory sexual selection), and sperm traits across 26 Galliforme species. The dataset includes four files: RepeatDF.csv (data for repeatability calculations); SummaryBothFolds.csv (raw species and fold-level data); SummarySpeciesAvgs.csv (species-level averages across both folds); and Analysis.R (code to replicate analyses). Data files contain measurements from female reproductive tracts, male testis mass, and sperm samples.
Dataset DOI: 10.5061/dryad.z08kprrrf
Description of the data and file structure
We dissected and measured the reproductive tract of 26 species of Galliformes, extracting and imaging the utero-vaginal junction (UVJ) and sperm storage tubules (using fluorescence microscopy). We also collected and measured testes and sperm of 20 species, which – combined with data from Liao et al (2019) – resulted in testes and sperm length measurements from 22 species of Galliformes (see methods in paper for further details).
The following three datasets and R script contain the data and code required to replicate analyses investigating the relationships between female reproductive tract morphology, sperm competition intensity, and sperm traits, whilst accounting for phylogenetic and allometric relationships. Phylogeny is provided by Stein et al., (2015).
Files and variables
File: RepeatDF.csv
Description:
- This dataset contains measurement data including Utero Vaginal Junction tissue area and tubule tissue area, sperm storage tubule length measurements and morphological categorisations for one randomly chosen image (from the pool of images containing SSTs within each sample), for the purpose of calculating measurement and categorisation repeatability between 2 observers.
- Repeatability was calculated for the measurement of tubule tissue area within each image, and for individual tubule length measurements chosen haphazardly by observer 1.
- To test whether haphazard choice of SSTs was a robust indication of true average SST length within the sample, observer 2 haphazardly chose 5 additional tubules to compare against the original subset.
- Repeatability was also calculated for the morphological categorisation of tubules, using the categorisation criteria provided in Assersohn et al., 2024.
Abbreviations:
SST - Sperm Storage Tubule
UVJ - Utero-Vaginal Junction
NA values:
NA values in this file represent missing data values.
- NA values in the SST length measurement columns for Observer 1 represent cases where either the was no SST tissue in the sample, there was SST tissue but it was not measurable (e.g., SSTs overlapped, or were not clearly defined), or there were fewer than 5 measurable SSTs.
- NA values in the SST length measurements columns for Observer 2 represent the cases above, but additionally, NA values are present because only 1 individual per species was measured by Observer 2.
Variables
- tip.label: Latin name corresponding to the tip labels in the phylogeny
- SpeciesCommon: Species common name
- Sample: Unique sample ID given in the format SpeciesCodeSampleNumber_FoldNumber e.g. BF02_F1 is BlackFrancolin2, fold 1.
- Measurerer: Either observer 1 (primary observer) or observer 2 (second observer measurements for repeatability purposes).
- ImageNumber: Which image in the fold the data pertains to.
- EmptySpace_um2: The amount of space in the image not taken up by tissue. Calculated using particle analysis in ImageJ.
- ImageSize_um2: The total size of the image.
- TotalUVJTissue_um2: Image size minus empty space, to give the total amount of UVJ tissue within the image.
- TotalSSTTissue_um2: Total amount of tubule tissue within the image. Calculated using particle analysis.
- Observer1SST1Length: The first SST chosen by observer 1, and measured by both observer 1 and observer 2 (to test measurement repeatability)
- Observer1SST2Length: The second SST chosen by observer 1, and measured by both observer 1 and observer 2 (to test measurement repeatability)
- Observer1SST3Length: The third SST chosen by observer 1, and measured by both observer 1 and observer 2 (to test measurement repeatability)
- Observer1SST4Length: The fourth SST chosen by observer 1, and measured by both observer 1 and observer 2 (to test measurement repeatability)
- Observer1SST5Length: The fifth SST chosen by observer 1, and measured by both observer 1 and observer 2 (to test measurement repeatability)
- Observer1.AverageSSTLength: The average length of observer 1 SSTs 1:5 (to test SST length measurement repeatability)
- Observer2SST1Length: The first SST chosen by observer 2 haphazardly (to test whether haphazard choice of SSTs influences repeatability of average SST length).
- Observer2SST2Length: The second SST chosen by observer 2 haphazardly (to test whether haphazard choice of SSTs influences repeatability of average SST length).
- Observer2SST3Length: The third SST chosen by observer 2 haphazardly (to test whether haphazard choice of SSTs influences repeatability of average SST length).
- Observer2SST4Length: The fourth SST chosen by observer 2 haphazardly (to test whether haphazard choice of SSTs influences repeatability of average SST length).
- Observer2SST5Length: The fifth SST chosen by observer 2 haphazardly (to test whether haphazard choice of SSTs influences repeatability of average SST length).
- Observer2.AverageSSTLength: The average length of SSTs chosen by observer 1 or observer 2 (to test whether haphazard choice of SSTs influences repeatability of average SST length).
- Complex: Whether the sample contains SSTs categorised as 'complex' (either branched, coiled or agglomerate), by both observer 1 and 2.
- StraightUnbranched: Whether the sample contains SSTs categorised as 'straight unbranched'.
- StraightBranched: Whether the sample contains SSTs categorised as 'straight branched', by both observer 1 and 2.
- Coiled: Whether the sample contains SSTs categorised as 'coiled', by both observer 1 and 2.
- Agglomerate: Whether the sample contains SSTs categorised as 'agglomerate', by both observer 1 and 2.
File: SummaryBothFolds.csv
Description:
- This dataset contains species information and measurement data for sperm storage tubule length and density, from the image of highest tubule density within each sample. Each individual (1 individual per species) is associated with 2 fold measurements.
- Within species, repeatability was calculated for SST tissue and length measurements.
Abbreviations:
SST - Sperm Storage Tubule
NA values:
NA values in this file represent missing data values.
- For species that we lack male testis mass and sperm length measurements, NA values appear in 'TotalTestisMassMean_g' and 'MeanTotalSpermLength_um' representing missing data. For this reason, data is not applicable for columns; 'n_sperm_sampled'; 'n_males_sampled'; 'SpermLengthDataSource'; and 'TestisMassDataFrom'.
- For several species, testis mass and sperm length measurements were obtained from a separate study (Liao et al., 2019), and in these cases NA values in the 'n_sperm_sampled' and 'n_males_sampled' columns represent information missing/not imported from this study.
Variables
- Species: Species common name
- tip.label: Latin name corresponding to the tip labels in the phylogeny.
- Sample: Unique sample ID given in the format SpeciesCodeSampleNumber_FoldNumber e.g. BF02_F1 is BlackFrancolin2, fold 1.
- SSTTissue.DensestImageOnly: Total amount of tubule tissue within the image of highest tubule density.
- SSTLength.DensestImageOnly: Average length of tubules within the image of highest tubule density.
File: SummarySpeciesAvgs.csv
Description:
- This dataset contains species average measurement data from Galliformes reproductive tract, testis mass and sperm samples.
- Mean values for each individual (across both folds) are given for SST tissue area and SST length, for both the entire sample and for the image of highest tubule density. Female body mass and vagina length measurements are also given per individual.
- Mean total sperm length is provided, including information about the numbers of individuals (and number of sperm) sampled. For 2 individuals, some data has been taken from Wen Bo Liao et al., 2019 (as indicated in the datasheet).
- Information is also given about the morphological categorisation of tubules within the samples (as determined by Assersohn et al., 2024. See methods therein).
Abbreviations:
SST - Sperm Storage Tubule
NA values:
NA values in this file represent missing data values.
- For species that we lack male testis mass and sperm length measurements, NA values appear in 'TotalTestisMassMean_g' and 'MeanTotalSpermLength_um' representing missing data. For this reason, data is not applicable for columns; 'n_sperm_sampled'; 'n_males_sampled'; 'SpermLengthDataSource'; and 'TestisMassDataFrom'.
- For several species, testis mass and sperm length measurements were obtained from a separate study (Liao et al., 2019), and in these cases NA values in the 'n_sperm_sampled' and 'n_males_sampled' columns represent information missing/not imported from this study.
Variables
- tip.label: Latin name corresponding to the tip labels in the phylogeny.
- SpeciesCommon: Species common name.
- SSTTissue.DensestImageOnly: Total amount of tubule tissue within the image of highest tubule density (species average).
- SSTLength.DensestImageOnly: Average length of tubules within the image of highest tubule density (species average).
- SpeciesAverageSSTTissue_um2: Total amount of tubule tissue across the entire sample (species average).
- SpeciesAverageTubuleLength_um: Average length of tubules across the entire sample (species average).
- FemaleBodyMass_g: Body mass of females.
- VaginaLengthMean_mm: Mean vagina length.
- TotalTestisMassMean_g: Mean total testis mass (right + left testis).
- MeanTotalSpermLength_um: Average total sperm length for males.
- n_sperm_sampled: Number of sperm sampled per species.
- n_males_sampled: Number of males sampled per species.
- n_females_sampled: Number of females sampled per species.
- n_folds_per_female_sampled : Number of folds per female sampled.
- SpermLengthDataSource: Where sperm length data was obtained from. UoS refers to the University of Sheffield (i.e., measurements and samples obtained from this study).
- TestisMassDataFrom: Where testis mass data was obtained from. UoS refers to the University of Sheffield (i.e., measurements and samples obtained from this study).
- Complexity: Whether the sample contains SSTs categorised as 'complex' (either branched, coiled or agglomerate).
- StraightUnbranched: Whether the sample contains SSTs categorised as 'straight unbranched'.
- StraightBranched: Whether the sample contains SSTs categorised as 'straight branched'.
Code/software
File: Analysis.R
Description:
- Code to replicate analyses and figure production is provided as an R script.
- All analyses were run in R V 4.3.1.
- We conduct the majority of these analyses using the measure of SST area and length taken from only the image (in each fold) with the highest density of tubule tissue (as opposed to taking the absolute values for the entire fold). This is because we expect the highest density region of the UVJ to be the most functional,and allows us to account for variation in UVJ length, the spread of SSTs, and SST functionality along the length of the UVJ between species.
- Note that We are missing testis mass and sperm length data for 4 species. Consequently, any models including these variables use separate data with these species removed.
R packages required:
rptR; dplyr; ape; caper; geiger; treeplyr; ggtree; ggplot2; viridis; phylolm
Script layout:
This script is split into 7 main sections:
1) Repeatability
- Measurement repeatability
- within-individual repeatability (how repeatable are the 2 folds taken per individual).
- We found them to be highly repeatable. Consequently, the average of both folds was taken to give a single value per species.
2) Prepare data and tree
- Creating the comparative.data class and objects (tree and data combined). Because we are missing testis mass and sperm length data for 4 species, we create 2 separate comparative data objects, one with all species included (for any model not using testis mass or sperm length variables), and one where the species with missing data are removed. We create a separate data object for the binary analyses (which use phylolm instead of caper)
3) Phylogeny plotting
4) – 7) Main analyses, testing each hypothesis in turn.
References
Assersohn , K., Richards, J. P., & Hemmings, N. (2024). The surprising complexity and diversity of sperm storage structures across Galliformes. Ecology and Evolution, 14 (6), e11585. https://doi.org/10.1002/ece3.11585
Liao , W. B., Zhong, M. J., & Lüpold, S. (2019). Sperm quality and quantity evolve through different selective processes in the Phasianidae. Scientific Reports , 9 (1), 19278. https://doi.org/10.1038/s41598-019-55822-3
Stein , R. W., Brown, J. W., & Mooers, A. Ø. (2015). A molecular genetic time scale demonstrates Cretaceous origins and multiple diversification rate shifts within the order Galliformes (Aves). Molecular Phylogenetics and Evolution , 92 , 155–164. https://doi.org/10.1016/j.ympev.2015.06.005
For each species, one female was dissected; from each, two utero-vaginal junction (UVJ) folds (the SST-bearing region of the vagina) were analysed. Measurements from each individual include SST area and average tubule length measurements (from each UVJ fold), body mass, testis mass, vagina length, and sperm length (from 1–3 males per species; see dataset for details). A phylogenetic tree from Stein et al. (2015) is also required and available via Dryad (https://doi.org/10.5061/dryad.p2pn8).
