Code and datasets associated with: A sex-linked supergene with large effects on sperm traits has little impact on reproductive traits in female zebra finches
Data files
Abstract
Despite constituting an essential component of fitness, reproductive success can vary remarkably between individuals and the causes of such variation are not well understood across taxa. In the zebra finch – a model songbird, almost all the variation in sperm morphology and swimming speed is maintained by a large polymorphic inversion (commonly known as a supergene) on the Z chromosome. The relationship between this polymorphism and reproductive success is not fully understood, particularly for females. Here, we explore the effects of female haplotype, and the combination of male and female genotype, on several primary reproductive traits in a captive population of zebra finches. Despite the inversion polymorphism’s known effects on sperm traits, we find no evidence that inversion haplotype influences egg production by females or survival of embryos through to hatching. However, our findings do reinforce existing evidence that the inversion polymorphism is maintained by a heterozygote advantage for male fitness. This work provides an important step in understanding the causes of variation in reproductive success in this model species.
README: Code and datasets associated with: A sex-linked supergene with large effects on sperm traits has little impact on reproductive traits in female zebra finches
Dataset structure
Five separate sheets of data are provided corresponding to the five primary analyses: 1) Egg Production, 2) Egg fertility and early embryo development, 3) Hatching success of developing eggs, 4) Offspring sex ratio and 5) Offspring genotype. An additional data sheet (6) is provided for the diagnostic SNP information, and raw cluster analysis output.
Datasets
1) EggProductionBCBoth: Dataset for the egg production analysis, formatted by clutch (each row is a separate clutch).
2) DevelopedBoth: Dataset for the fertility/early development analysis, formatted by pair (each row is a separate pair).
3) HatchedBoth: Dataset for the hatching success of developed eggs analysis, formatted by pair (each row is a separate pair).
4) Sex_data: Dataset for the offspring sex ratio analysis, formatted by clutch (each row is a separate clutch)
5) OffspringGenotypes: Dataset for the offspring genotype analysis (each row is a separate individual/offspring)
6) GenotypeCalling.SNPInfo: Diagnostic SNP info, used for calling genotypes, and the raw output from the cluster analysis (for the data collected in the current study, excluding the additional Kim et al. (2017) data).
Variables common to all datasets:
NewPairNo A unique pair ID
MotherRing Unique ID for the female of the pair
FatherRing Unique ID for the male of the pair
AllelesShared Number of supergene haplotypes shared between the male and female in a pair
Combination The combination of male and female Z chromosome supergene genotypes
FatherGenotype Z chromosome supergene genotype for the father
MotherGenotype Z chromosome supergene genotype for the mother
FatherAgeFirstLay Father age (in years) at the first egg laid for that pair
MotherAgeFirstLay Mother age (in years) at the first egg laid for that pair
FatherAgeAtLay_Z Father age at the first egg laid for that pair, Z-scored (scaled and mean centered)
MotherAgeAtLay_Z Mother age at the first egg laid for that pair (scaled and mean centered)
MBirthYear Year the mother was born
NA No data available. In GenotypeCalling.SNPInfo, NA's are due to ambiguous genotype calling during cluster
analysis for that SNP.
Variables specific to 'EggProductionBCBoth':
TotalEggs The total number of eggs laid per pair across all clutches
TotalClutches The total number of clutches/attempts per pair
ClutchNumber Which clutch in the series the row corresponds to
Eggs The number of eggs laid within a clutch
EggsP The number of eggs laid within a clutch plus 1 (to remove 0's for statistical analysis)
Variables specific to 'DevelopedBoth':
Developed The number of eggs laid by that pair that had a visible embryo at day 3 of incubation
Undeveloped The number of eggs laid by that pair that did not have a visible embryo at day 3 of incubation
Hatched The number of eggs laid by that pair that successfully hatched
NoClutches Number of clutches laid by pair
Variables specific to 'HatchedBoth':
NoClutches The number of clutches laid by pair
Hatched The number of developed eggs (eggs with a visible embryo at 3 days of development) that also hatched
Failed The number of developed eggs (eggs with a visible embryo at 3 days of development) that failed to hatch
Variables specific to 'Sex_data':
AttemptNo Which clutch in series the row corresponds to
Eggs The number of eggs laid in the clutch
FemaleOffspring The number of eggs in that clutch that were female
MaleOffspring The number of eggs in that clutch that were male
Variables specific to 'OffspringGenotype':
HatchDate Offspring hatch date
Ring Unique offspring ID
OffspringGenotype Offspring genotype
Methods
The dataset has been curated primarily from SNP genotyping data collected as part of this study in 2019 and 2022, with some additional SNP data obtained from a separate study (see Kim et al., 2017). Breeding data originates from a breeding database for a population of domesticated zebra finches maintained at The University of Sheffield between 1985 and 2016.
Birds from this population were bred in single pairs without access to other individuals, where no natural mate choice was permitted, and where the paternity was conclusively known for every egg. Data was collected on parental identity (ring number), parental age, number of breeding attempts per pair, number of eggs laid per breeding attempt (clutch), the fertility/development status of the eggs at 3 days of incubation (by candling), the outcome of every egg (hatching success), and offspring sex.
Genotyping data included the SNP Z chromosome supergene genotypes for males and females, determined by KASP-genotyping on an LGC SNPLine system using tissue samples from frozen stored birds.
Usage notes
Code created in R (v 4.2.1).
Data files are all in .csv format.
Packages required: brms, Matrix, tidyverse, dplyr, tidybayes, bayestestR, bayesplot, posterior, ggpubr, cowplot, gghalves, modelr