Signatures of hybridization in Trypanosoma brucei
Data files
Apr 01, 2025 version files 2.10 GB
-
assembled_genomes.zip
2.08 GB
-
hybrid_and_parent_maxicircle_whole_coding_region.zip
91.03 KB
-
hybrid_pooled_minicircles.zip
1.19 MB
-
hybrid_VSG_pool.zip
1.35 MB
-
maxicircle_sequences.zip
152.88 KB
-
MES_loci.zip
758.33 KB
-
MES_promoter.zip
13.15 KB
-
README.md
2.73 KB
-
sistrom_pooled_minicircles.zip
5.93 MB
-
sistrom_VSG_pool.zip
8.12 MB
-
Tbb_minicircles_hmm.zip
150.87 KB
-
Trypan_glycop_C_hmm.zip
50.66 KB
Abstract
Genetic exchange among disease-causing micro-organisms can generate progeny that combine different pathogenic traits. Though sexual reproduction has been described in trypanosomes, its impact on the epidemiology of Human African Trypanosomiasis (HAT) remains controversial. However, human infective and non-human infective strains of Trypanosoma brucei circulate in the same transmission cycles in HAT endemic areas in sub-Saharan Africa, providing the opportunity for mating during the developmental cycle in the tsetse fly vector. Here we investigated inheritance among progeny from a laboratory cross of T. brucei and then applied these insights to genomic analysis of field-collected isolates to identify signatures of past genetic exchange. Genomes of two parental and four hybrid progeny clones with a range of DNA contents were assembled and analysed by k-mer and single nucleotide polymorphism (SNP) frequencies to determine heterozygosity and chromosomal inheritance. Variant surface glycoprotein (VSG) genes and kinetoplast (mitochondrial) DNA maxi- and minicircles were extracted from each genome to examine how each of these components was inherited in the hybrid progeny. The same bioinformatic approaches were applied to an additional 37 genomes representing the diversity of T. brucei in sub-Saharan Africa and T. evansi. SNP analysis provided evidence of crossover events affecting all 11 pairs of megabase chromosomes and demonstrated that polyploid hybrids were formed post-meiotically and not by fusion of the parental diploid cells. VSGs and kinetoplast DNA minicircles were inherited biparentally, with approximately equal numbers from each parent, whereas maxicircles were inherited uniparentally. Extrapolation of these findings to field isolates allowed us to distinguish clonal descent from hybridization by comparing maxicircle genotype to VSG and minicircle repertoires. Discordance between maxicircle genotype and VSG and minicircle repertoires indicated inter-lineage hybridization. Significantly, some of the hybridization events we identified involved human infective and non-human infective trypanosomes circulating in the same geographic areas.
Dataset DOI: 10.5061/dryad.xd2547djb
Description of the data and file structure
Read data for the hybrid clones from the experimental cross is available from the NCBI SRA (Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra Project no. PRJNA795331
All other data have been deposited here. The project contains the folders listed below. Each folder contains a “Readme” file with further details of the contents.
assembled_genomes
The genomic SPAdes assembled genomes used to predict ORFs and subsequently VSGs, contigs are of the form:
>NODE_166886_length_56_cov_1939465.000000
GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGG
introgression_maps
Contains the VCF files used to construct the introgression maps, the script to derive them, and breakdowns of SNP types/density.
Hybrid_and_parent_maxicircle_whole_coding_region
FASTA file containing the commonly oriented and trimmed coding regions of maxicircle contigs
Hybrid_pooled_minicircles
Contains the identified contigs from the selective subassembly of minicircles. Contig headers inherit their numbering from the SPAdes contigs header (Contig number, contig length, coverage depth)
>Tbb_F1G2#59#1030#370##
GGGCGTGCAGATTTCACCATACACAAATACCGTGCTATTTTCGGGCATTTTTGAGGGCCGTGGTACTTCGAAAGGGG
Hybrid_VSG_pool
Contains a fasta of identified VSGs, entry headers include information about their origin. # separated fields indicate, isolate, SPAdes contig, contig length, coverage depth, position/orientation of ORF, HMM used, and HMM e-values
>Tbb_F1G2_k-121#1576#5315#45#1836-3419(+)#Trypan_glycop_C#8e-2
MLKKLALTAIVLAFSNGRKATGAALNDGDNAKYFKPLCGIIRAASAAPEAVPEQPVLDDLEATALLINLSYASPKAMSELTA
MES_promoter
The HMM used to identify MES contigs.
MES_loci
Presented as a fasta in the form:
>Cluster_1#Tbb_FIG2#114#31951#R
CCAGACACCCTTAGAGACAGAGGGGGTATGCAATCAACAAGCAACAGAAACAGAAAGAGGGGAAGAGAATAATA
Where contig, isolate, genomic SPAdes assembly contig number, length, and orientation relative to assembly (R indicates the sequence has been reversed)
Maxicircle_sequences
Identified maxicircle contigs from the field isolates.
Sistrom_pooled_minicircles
As for the minicircles fasta above, except includes other isolates, and does not include hybrids
Sistrom_VSG_pool
As for the VSG fasta above, except includes other isolates, and does not include hybrids
Tbb_minicircles
The derived HMM from our assembled minicircles
Trypan_glycop_C
The PFAM HMM for the VSG C-terminal domain at the time of submission
Code/software
Text editor, fasta files