Template-specific optimization of NGS genotyping pipelines reveals allele-specific variation in MHC gene expression

Efstratiou, Artemis 1 ; Gaigher, Arnaud1; Künzel, Sven2; Teles, Ana1; Lenz, Tobias L.1

Published Jan 29, 2024 on Dryad. https://doi.org/10.5061/dryad.qfttdz0qb

Data files

Jan 29, 2024 version files 70.10 KB

Data_MHC-I.xlsx

40.36 KB
Data_MHC-II.xlsx

28.32 KB
README.md

1.42 KB

Abstract

Using high-throughput sequencing for precise genotyping of multi-locus gene families, such as the Major Histocompatibility Complex (MHC), remains challenging, due to the complexity of the data and difficulties in distinguishing genuine from erroneous variants. Several dedicated genotyping pipelines for data from high-throughput sequencing, such as next-generation sequencing (NGS), have been developed to tackle the ensuing risk of artificially inflated diversity. Here, we thoroughly assess three such multi-locus genotyping pipelines for NGS data, the DOC method, AmpliSAS and ACACIA, using MHC class IIβ datasets of three-spined stickleback gDNA, cDNA, and “artificial” plasmid samples with known allelic diversity. We show that genotyping of gDNA and plasmid samples at optimal pipeline parameters was highly accurate and reproducible across methods. However, for cDNA data, gDNA-optimal parameter configuration yielded decreased overall genotyping precision and consistency between pipelines. Further adjustments of key clustering parameters were required tο account for higher error rates and larger variation in sequencing depth per allele, highlighting the importance of template-specific pipeline optimization for reliable genotyping of multi-locus gene families. Through accurate paired gDNA-cDNA typing and MHC-II haplotype inference, we show that MHC-II allele-specific expression levels correlate negatively with allele number across haplotypes. Lastly, sibship-assisted cDNA-typing of MHC-I revealed novel variants linked in haplotype blocks and a higher-than-previously-reported individual MHC-I allelic diversity. In conclusion, we provide novel genotyping protocols for the three-spined stickleback MHC-I and -II genes and evaluate the performance of popular NGS-genotyping pipelines. We also show that fine-tuned genotyping of paired gDNA-cDNA samples facilitates amplification bias-corrected MHC allele expression analysis.

Template-specific optimization of NGS genotyping pipelines reveals allele-specific variation in MHC gene expression

Data files

Abstract

README: Template-specific optimization of NGS genotyping pipelines reveals allele-specific variation in MHC gene expression

Description of the data and file structure

Works referencing this dataset