Female reed warblers in social pairs with low MHC dissimilarity achieve higher MHC dissimilarity through random extra-pair mating
Data files
Jan 09, 2026 version files 1.55 MB
-
README.md
8.11 KB
-
Supplementary_file_1.fasta
78.51 KB
-
Supplementary_file_2.R
42.79 KB
-
Table_S1.csv
219.75 KB
-
Table_S10.tsv
2.25 KB
-
Table_S11.csv
7.67 KB
-
Table_S2.csv
10.02 KB
-
Table_S3.csv
10.44 KB
-
Table_S5.csv
87.38 KB
-
Table_S6.csv
243.29 KB
-
Table_S7.csv
22.55 KB
-
Table_S8.csv
12.64 KB
-
Table_S9.tsv
799.76 KB
Abstract
Major Histocompatibility Complex (MHC) polymorphism is maintained by balancing selection through host-pathogen interactions and mate choice. MHC-based mate choice has been investigated across a wide range of vertebrates, and an established concept is that females should choose a mate with an MHC genotype that is dissimilar to her own to ensure high MHC divergence in her offspring. Here we present evidence from a population of reed warblers, Acrocephalus scirpaceus, that social pairs with extra-pair young in their nest have significantly lower MHC dissimilarity than expected by random MHC-based mate choice. Moreover, social pairs with extra-pair young in their nest have lower MHC dissimilarity than the potential pairs females could form with other males surrounding the social nest. Therefore, females in pairs with low MHC dissimilarity could improve the MHC divergence of their offspring through extra-pair mating. We propose that when the MHC dissimilarity in the social pair is low, any alternative male represents a better genetic prospect for the female in terms of MHC dissimilarity. This scenario generates a pattern of MHC-disassortative extra-pair mating without requiring active MHC-based mate choice.
Dataset DOI: 10.5061/dryad.bzkh189r2
Description of the data and file structure
Data to accompany the manuscript "Female reed warblers in social pairs with low MHC dissimilarity achieve higher MHC dissimilarity through random extra-pair mating."
Files and variables
Supplementary_file_1.fasta
Description of Supplementary_file_1.fasta Fasta file containing nucleotide sequences of the 281 MHC-I sequences found in the study. This can be opened using a standard text editor or any programs capable of reading fasta files. These sequences are publicly available in the NCBI database (https://www.ncbi.nlm.nih.gov/) using the accession numbers KU169387, KU169375, KU169386, KU169388, KU169376, KU169393 and OR053210 to OR053484).
Supplementary_file_2.R
Description of Supplementary_file_2.R Annotated R script for all analyses used in the study. Input files required to run all analyses can be found as supplementary tables (Tables S1, S3, S7-10).
Table_S1.csv
Description of Table_S1.csv Table of all MHC-I genotyped individuals in the study and the MHC-I sequences identified in each. "Seq_id" = sequence name allocated in the study; "Sequence" = nucleotide sequence. Columns C to IW contain bird identification numbers. Female IDs start with “F”. Male IDs start with “M”.
Table_S2.csv
Description of Table_S2.csv Table of data used for statistical analyses containing MHC-I dissimilarity data for all pairs. “Category” = pair type (1 = SP-EPY [where EGP male is known]; 2 = EGP; 3 = SP-WPY; 4 = SP-EPY [where EGP male is not known]); “year” = year pair were observed; “nest_ID” = ID of nest; “lay_date” = date of first egg recorded; “female” = female ID; “F_allelic_n” = number of MHC-I alleles of female; “male” = male ID; “M_allelic_n” = number of MHC-I alleles of male; “Pair”= combination of female and male ID; “mhc_dist” = MHC-I dissimilarity; “F_allelic_n” = number of MHC-I alleles in female; “M_allelic_n” = number of MHC-I alleles in male.
Table_S3.csv
Description of Table_S3.csv Key to sequence names used in study and the corresponding sequence names on GenBank. "Sequence" = sequence name allocated in the study; "Accession" = GenBank accession number; "NCBI_name" = name used on GenBank.
Table_S5.csv
Description of Table_S5.csv Table containing details of the amino acid sequence of the PBR for each sequence and which sequences were found in which individuals in the study. “PBR_name” = sequence name; “PBR_sequence” = amino acid sequence; Columns C to IW contain bird identification numbers. Female IDs start with “F”. Male IDs start with “M”.
Table_S6.csv
Description of Table_S6.csv Matrix of MHC-I dissimilarity scores (functional distances) between all MHC-I genotyped males and females.
Table_S7.csv
Description of Table_S7.csv Table of data used for statistical analyses on the putatively available males surrounding each nest. “OBS” = running number for each social pair; “Female” = female ID; “year” = year; “s_M” = social male ID; “EPP_M” = EGP male ID; “putative.M” = ID of putatively available male; “dist.put.M..m” = physical distance of male from female's nest in meters; “Combo” = combination of female & social male ID; “put_pair” = combination of female & putative male ID; “extra_pair” = combination of female & EGP male ID; “sp_mhc_dist” = MHC-I dissimilarity between social pair (SP-EPY); “pp_mhc_dist” = MHC-I dissimilarity between female and putatively available male; “ep_mhc_dist” = MHC-I dissimilarity between genetic pair (EGP).
Table_S8.csv
Description Table_S8.csv Table containing microsatellite genotypes for all 214 individuals in the social pairs (35 SP-EPY & 98 SP-WPY) analysed in this study. "full_id" = individual ID; "sex" = sex (male or female) of individual; "Ase18a" = length of Ase18a allele in individual; "Ase18b" = length of Ase18b allele in individual; "Ase25a" = length of Ase25a allele in individual; "Ase25b" = length of Ase25b allele in individual; "Ase37a" = length of Ase37a allele in individual; "Ase37b" = length of Ase37b allele in individual; "Ase48a" = length of Ase48a allele in individual; "Ase48b" = length of Ase48b allele in individual; "Ppi2a" = length of Ppi2a allele in individual; "Ppi2b" = length of Ppi2b allele in individual; "Ase58a" = length of Ase58a allele in individual; "Ase58b" = length of Ase58b allele in individual.
Table_S9.tsv
Description of Table_S9.tsv Table of data used for statistical analyses containing the MHC-I dissimilarity scores for all possible combinations of 128 females and 127 males genotyped in the population (16256 theoretical pairs). "Bird_nr" = female ID; "Male" = male ID; "value" = MHC-I dissimilarity score; "Pair" = combination of female and male IDs.
Table_S10.tsv
Description of Table_S10.tsv Table of data used for statistical analyses containing the number of MHC-I alleles in each of the 127 genotyped males. "ID" = male ID; "Nr_Alleles" = number of MHC-I alleles found.
Table_S11.csv
Description of Table_S11.csv Table of data on the number of hatched and unhatched eggs observed at nests. “Category” = pair type (1 = SP-EPY [where EGP male is known]; 2 = EGP; 3 = SP-WPY; 4 = SP-EPY [where EGP male is not known]); “year” = year pair were observed; “nest_ID” = ID of nest; "Original clutch" = total number of eggs laid in nest; "unhatched eggs" = number of eggs not hatched; "all eggs at hatching" = number of eggs at day of first egg hatching; “lay_date” = date of first egg recorded; “female” = female ID; “male” = male ID; “Pair”= combination of female and male ID; "missing_eggs_0_1" = number of eggs that disappeared from nest; "notes" = notable observations.
Note that Tables S4 and S12 to S16 (referred to in the manuscript) can be found in the main Supplementary Information document that accompany the manuscript
Code/software
The following is a list of file types contained within this repository, with brief descriptions of how to work with them:
R script that can be viewed, edited and run using the statistical software R (https://www.r-project.org/). R will run on Windows, MacOS, and a wide variety of UNIX platforms. The R packages loaded by the script are forcats_1.0.1, cowplot_1.2.0, ggplot2_4.0.0, Hmisc_5.2-3, lme4_1.1-37, Matrix_1.7-4, reshape2_1.4.4, plyr_1.8.9, tidyr_1.3.1 and dplyr_1.1.4. The script was run in R version 4.4.0.
.csv: Plain text files that use symbol delimiter (in this case semicolons) to separate values and new lines to separate newlines. These files can be viewed and edited with any plain text editor (e.g., Linux less command, Nano, Vim, Text Editor). They can also be opened in Excel by specifying the delimiter symbol (in this case semicolons).
.tsv: Tab-separated values files, which are plain text files with tab-delimited data fields. These files can be viewed/manipulated with any plain text editor or read with the R software using the read.table command with the sep="\t" argument.They can also be opened in Excel by specifying tab as the delimiter symbol.
.fasta: Plain text DNA sequence information in FASTA format. These files can be viewed and edited in any of the many free sequence editing programs (e.g., BioEdit, AliView, Jalview) but also any plain text editor (e.g., Linux less command, Nano, Vim, Text Editor).
Access information
Other publicly accessible locations of the data:
- The sequences in Supplementary_file_1.fasta are publicly available in the NCBI database (https://www.ncbi.nlm.nih.gov/) using the accession numbers KU169387, KU169375, KU169386, KU169388, KU169376, KU169393 and OR053210 to OR053484).
