Skip to main content
Dryad

Data from: Only rare classical MHC-I alleles are highly expressed in the European house sparrow

Cite this dataset

Watson, Hannah et al. (2024). Data from: Only rare classical MHC-I alleles are highly expressed in the European house sparrow [Dataset]. Dryad. https://doi.org/10.5061/dryad.4xgxd25hf

Abstract

The exceptional polymorphism observed within genes of the major histocompatibility complex (MHC), a core component of the vertebrate immune system, has long fascinated biologists. The highly polymorphic classical MHC class-I (MHC-I) genes are maintained by pathogen-mediated balancing selection (PMBS), as shown by many sites subject to positive selection, while the more monomorphic MHC-I genes show signatures of purifying selection. In line with PMBS, at any point in time, rare classical MHC alleles are more likely than common classical MHC alleles to confer a selective advantage in host-pathogen interactions. Combining genomic and expression data from the blood of wild house sparrows Passer domesticus, we found that only rare classical MHC-I alleles were highly expressed, while common classical MHC-I alleles were lowly expressed or not expressed. Moreover, highly expressed rare classical MHC-I alleles had more positively selected sites, indicating exposure to stronger PMBS, compared with lowly expressed classical alleles. As predicted, the level of expression was unrelated to allele frequency in the monomorphic non-classical MHC-I alleles. Going beyond previous studies, we offer a fine-scale view of selection on classical MHC-I genes in a wild population by revealing differences in the strength of PMBS according to allele frequency and expression level.

README: Data from: Only rare classical MHC-I alleles are highly expressed in the European house sparrow

https://doi.org/10.5061/dryad.4xgxd25hf

Datasets contain data on genomic and expressed diversity of the major histocompatibility (MHC) class I in European house sparrows (Passer domesticus). Data originates from amplicon sequencing of the exon 3 region of MHC-I in gDNA and cDNA, originating from the blood of 28 house sparrows, sampled at four sites across Europe. MHC-I alleles are classified as classical or non-classical, with non-classical alleles identified as having a 6-bp deletion. Analyses conducted included investigation of the variation in genomic and expressed MHC-I diversity, variation in MHC-I expression (probability and level of expression) in relation to allele frequency (i.e. frequency of allele in the sampled population) and variation in MHC-I expression level according to the rank order of expression level within each individual's genotype. Analyses were conducted at the level of nucleotide alleles and amino acid alleles.

Titles of datasets:

MHCI_cDNA_gDNA_total_alleles_pheno.csv

MHCI_cDNA_gDNA_nt_geno_pheno.csv

MHCI_cDNA_gDNA_aa_geno_pheno.csv

Description of the data and file structure

Dataset: MHCI_cDNA_gDNA_nt_geno_pheno.csv
Dataset: MHCI_cDNA_gDNA_aa_geno_pheno.csv

Data description:
The genomic (gDNA) and expressed (cDNA) genotypes, along with read counts, are given for each MHC-I allele and for each bird. Genotype is indicated by the presence (1) or absence (0) of alleles in gDNA or cDNA. Allele read counts from cDNA provide a measure of expression level. Total read counts for each sample are also given. All MHC-I alleles identified in the population (the sample of 28 birds) are included. The dataset named MHCI_cDNA_gDNA_nt_geno_pheno.csv contains data at the level of nucleotide (hence the notation, nt, in the file name) alleles. The 133 unique nucleotide sequences detected in gDNA were translated to amino acid sequences, yielding 101 putative alleles. Data at the level of amino acid alleles can be found in the dataset named MHCI_cDNA_gDNA_aa_geno_pheno.csv (hence the notation, aa, in the file name). The two datasets allow separate analysis of nucleotide alleles and amino acid alleles.

Data structure:
Tabular data are organised in long format with one row for every allele (n = 133/101 nucleotide/amino acid alleles) and every bird (n = 28). The structure and variable names are almost identical for the two datasets, the difference being that one contains data for nucleotide alleles and the other contains data for amino acid alleles.

Data definitions:
* bird_ID = unique identifier for each bird/blood sample
* Site = capture and sampling location (country)
* Season = season of capture and sampling ("Spring" or "Autumn")
* nt_allele_ID = unique identifier for each nucleotide allele, according to GenBank
* GenBank_Accession_number = Genbank accession number for the corresponding nucleotide sequence
* aa_allele_ID = unique identifier for the corresponding amino acid allele (the dataset MHCI_cDNA_gDNA_nt_geno_pheno.csv thus shows which nucleotide allele(s) translate to each amino acid allele since several nucleotide alleles can yield the same amino acid allele)
* Allele_type = "Classical" or "Non-classical"
* Allele_frequency = frequency of the corresponding nucleotide/amino acid allele in the population (i.e. sample of 28 birds)
* gDNA_genotype = presence (1) or absence (0) of the corresponding nucleotide/amino acid allele in gDNA from the corresponding bird/blood sample
* gDNA_read_count = number of (trimmed and filtered) gDNA reads corresponding to the nucleotide/amino acid allele from the corresponding bird/blood sample
* gDNA_total_reads = total gDNA read count for the corresponding bird/blood sample
* gDNA_NoAlleles = total number of genomic alleles for the corresponding bird/blood sample
* cDNA_genotype = presence (1) or absence (0) of the corresponding nucleotide/amino acid allele in cDNA from the corresponding bird/blood sample
* cDNA_read_count = number of (trimmed and filtered) cDNA reads corresponding to the nucleotide/amino acid allele from the corresponding bird/blood sample
* cDNA_total_reads = total cDNA read count for the corresponding bird/blood sample
* cDNA_NoAlleles = total number of expressed alleles for the corresponding bird/blood sample

Dataset: MHCI_cDNA_gDNA_total_alleles_pheno.csv

Data description: Dataset summarising the total number of genomic and expressed MHC-I alleles (both nucleotide and amino acid alleles) carried by each bird, separated between classical and non-classical alleles.

Data structure: Tabular data are organised in long format with one row per bird (n = 28) and allele type (n = 2; i.e. classical or non-classical).

Data definitions:
* bird_ID = unique identifier for each bird/blood sample
* Site = capture and sampling location (country)
* Season = season of capture and sampling ("Spring" or "Autumn")
* Allele_type = "Classical" or "Non-classical"
* gDNA_NoAlleles_nt = total number of genomic nucleotide alleles for the corresponding bird/blood sample
* gDNA_NoAlleles_aa = total number of genomic amino acid alleles for the corresponding bird/blood sample
* cDNA_NoAlleles_nt = total number of expressed nucleotide alleles for the corresponding bird/blood sample
* cDNA_NoAlleles_aa = total number of expressed amino acid alleles for the corresponding bird/blood sample

Code/Software

R code for performing the analyses and obtaining the summary statistics that are presented in the article is provided.

Methods

Exon 3 of MHC class I was amplified and sequenced on the Illumina MiSeq platform from gDNA and cDNA originating from blood samples collected from 28 house sparrows (Passer domesticus) at four sites across Europe. Reads were trimmed and filtered to identify unique allele sequences in gDNA and cDNA. Alleles were classified as classical or non-classical, with non-classical alleles identified as having a 6 bp deletion. Filtered read counts were converted into allele presence/absence (1/0) for gDNA and cDNA, and cDNA read counts were designated as the expression levels for (i) nucleotide alleles (MHCI_cDNA_gDNA_nt_geno_pheno.txt) and (ii) amino acid alleles (MHCI_cDNA_gDNA_aa_geno_pheno.txt). Allele frequencies (i.e. number of individuals carrying each allele) were calculated. The numbers of unique genomic and expressed alleles were summed for each bird (MHCI_cDNA_gDNA_total_alleles_pheno.txt). Accompanying phenotypic data include country of origin (Bulgaria, Poland, Spain or Sweden) and season of sample collection (spring/autumn). For full methods, refer to the published article.

Funding

European Research Council, Award: 679799