Skip to main content
Dryad

Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture

Cite this dataset

Schurz, Haiko et al. (2022). Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture [Dataset]. Dryad. https://doi.org/10.5061/dryad.6wwpzgn2s

Abstract

The heritability of susceptibility to tuberculosis disease (TB) has been well recognized. Over one-hundred genes have been studied as candidates for TB susceptibility, and several variants were identified by genome-wide association studies (GWAS), but few replicates. We established the International Tuberculosis Host Genetics Consortium (ITHGC) to perform a multi-ancestry meta-analysis of GWAS including 14,153 cases and 19,536 controls of African, Asian, and European ancestry. Our analyses demonstrate a substantial degree of heritability (pooled polygenic h2=26.3% 95% CI 23.7–29.0%) for susceptibility to TB that is shared across ancestries, highlighting an important host genetic influence on disease. We identified one global host genetic correlate for TB at genome-wide significance (p < 5x10-8) in the human leukocyte antigen (HLA)-II region (rs28383206, p-value = 5.2x10-9). These data demonstrate the complex shared genetic architecture of susceptibility to TB and the importance of large-scale GWAS analysis across multiple ancestries experiencing different levels of infection pressures.

Methods

This analysis includes 12 of the 17 published (and unpublished) GWAS studies of TB (with HIV-negative cohorts) prior to 2022. It excludes data from Iceland and Vietnam, as they declined to share data. It excludes data from China, Korea, Peru, and Japan, as data-sharing agreements could not be finalized in time for this analysis. The Indonesian data was not suitable for reliable imputation, and the Moroccan data was family-based and thus also not suitable for this meta-analysis. Finally, genotyped TB cases and controls are also available in the UK Biobank, but this data was not included in this analysis as genetic association studies on such highly selected datasets need to be undertaken with caution, and to not bias results, were excluded for this analysis.

Included individuals were genotyped on a variety of genotyping arrays, and raw genotyping data were available for eight datasets, and for the remainder, association testing summary statistics were obtained to use in the meta-analysis. Quality control (QC) and imputation of the data with raw genotyping information available was done using Plink (v1.9), followed by pre-phasing using SHAPEIT and Impute2 with the 1000 genomes phase 3 reference panel. QC and imputation were done as described previously; briefly we used a minor allele filter of 0.025 and an individual and SNP missingness filter of 0.1. Hardy–- equilibrium threshold was set at a Bonferroni corrected p-value according to the number of SNPs testes (0.05/number of SNPs) and samples where sex could not be determined from genotyping were also removed. Imputed data were filtered at a quality score of 0.3, prior to individual and genotype filtration steps. Prior to QC and imputation, allele orientation was corrected using Genotype Harmoniser version 1.4.15, and the genome build of all datasets was checked for consistency (GRCh37) and updated if necessary using the liftOver software from the UCSC genome browser. The four datasets with only summary statistics available were imputed and QC’d during the original investigations, but the marker names and allele orientation were checked for concordance between the summary statistics and the rest of the consortium’s imputed data.

Usage notes

Summary statistics are stored as compressed .zip files.
Unzipped files can be opened with any text editor, Excel, or read into R programming environment (or equivalent data analysis software)' note the files are space-separated and have a header row. Each row in the files contains information for a single SNP. 

All authors have approved sharing the summary statistics of the analysis. The original individual datasets are only available from request from the original authors.