The genomic architecture of the passerine MHC region: high repeat content and contrasting evolutionary histories of single copy and tandemly duplicated MHC genes
Data files
Apr 06, 2022 version files 218.41 MB
-
Annotations_webapollo_grw_20211026.gff3
215.77 MB
-
genome.HC.S_Up_H32.FALCON042.v1.0_mhc_I_II.modified.contignames_EditEstelle.gff
17.67 KB
-
genome.JD.S_Up_J01.FALCON042.v1.0.mhc_I_II.modified.edited_contignames_EditEstelle.gff
72.75 KB
-
HC.genes.core_tap_genes.webapollo.20220223.gff3
105.82 KB
-
Intergenic_repeat_content_MHCIIB.xlsx
15.50 KB
-
Intragenicrepeats_rev.xlsx
15.71 KB
-
JD.genes.core_tap_genes.webapollo.20220308.gff3
137.63 KB
-
JD.mhc_region.fasta
1.75 MB
-
MHCI_full_length_nucleotide_manuedit.fasta
30.56 KB
-
MHCIIB_full_length_nucleotide_manuedit.fasta
99.32 KB
-
MUGN00000000_zf_primary_mhc_I_II.sorted_modified.contignames_EditEstelle.gff
74.18 KB
-
README_explanationfile.txt
6.04 KB
-
ZF.genes.core_tap_genes.webapollo.20220308.gff3
310.32 KB
Abstract
The Major Histocompatibility Complex (MHC) is of central importance to the immune system, and an optimal MHC diversity is believed to maximize pathogen elimination. Birds show substantial variation in MHC diversity, ranging from few genes in most bird orders to very many genes in passerines. Our understanding of the evolutionary trajectories of the MHC in passerines is hampered by lack of data on genomic organization. Therefore, we assemble and annotate the MHC genomic region of the great reed warbler (Acrocephalus arundinaceus), using long-read sequencing and optical mapping. The MHC region is large (>5.5Mb), characterized by structural changes compared to hitherto investigated bird orders and shows higher repeat content than the genome average. These features were supported by analyses in three additional passerines. MHC genes in passerines are found in two different chromosomal arrangements, either as single copy MHC genes located among non-MHC genes, or as tandemly duplicated tightly linked MHC genes. Some single copy MHC genes are old and putative orthologs among species. In contrast tandemly duplicated MHC genes are monophyletic within species and have evolved by simultaneous gene duplication of several MHC genes. Structural differences in the MHC genomic region among bird orders seem substantial compared to mammals and have possibly been fuelled by clade-specific immune system adaptations. Our study provides methodological guidance in characterizing complex genomic regions, constitutes a resource for MHC research in birds, and calls for a revision of the general belief that avian MHC has a conserved gene order and small size compared to mammals.
Methods
We used PacBio long-read, 10x genomics linked-read and BioNano optical mapping to create a de novo genome assembly from a female great reed warbler Acrocephalus arundinaceus. Eighteen scaffolds with MHC-genes (MHC-I and / or -IIB genes) and MHC-related genes, i.e. genes expected to be found in the MHC-region such as TAP1 and TAP2, were identified in the genome. The great reed warbler scaffold Aaru_Scaffold 18 shares large-scale homology with chicken chromosome 16, i.e. the chromosome holding the ‘core MHC region’, but the gene content and gene order partly differ. To investigate whether this genomic reorganization is common among passerines, we characterized MHC-genes and MHC-related genes in three additional passerine species with long-read sequenced assemblies hooded crow Corvus cornix, jackdaw Coloeus monedula and zebra finch Taenopygia guttata. Contigs with MHC-genes (MHC-I and / or -IIB genes) and MHC-related genes, i.e. genes expected to be found in the MHC-region such as TAP1 and TAP2, were identified in their genome. The total repeat content in the MHC-scaffolds/contigs, the repeat content within MHC genes and the repeat content between tandemly duplicated MHC genes were analysed.
Usage notes
MHC-I and MHC-IIB genes in open reading frame after manual curation as fasta files, repeat content within MHC genes (intragenic) and between tandemly duplicated MHC-IIB genes (intergenic) as xls-files, and gene order in MHC and MHC-related scaffolds/contigs as gff-files in great reed warbler, hooded crow, jackdaw and zebra finch genomes.