Reconstructing NOD-like receptor alleles with high internal conservation in Podospora anserina using long-read sequencing
Data files
Jan 28, 2025 version files 217.76 MB
-
FixingHetDEdata_v2.zip
217.75 MB
-
README.md
4.55 KB
Feb 05, 2025 version files 217.76 MB
-
FixingHetDEdata_v2.zip
217.75 MB
-
README.md
4.73 KB
Abstract
NOD-like receptors (NLRs) are intracellular immune receptors that detect pathogen-associated cues and trigger defense mechanisms, including regulated cell death. In filamentous fungi, some NLRs mediate heterokaryon incompatibility, a self/non-self recognition process that prevents the vegetative fusion of genetically distinct individuals, reducing the risk of parasitism. The het-d and het-e NLRs in Podospora anserina are highly polymorphic incompatibility genes (het genes) whose products recognize different alleles of the het-c gene via a sensor domain composed of WD40 repeats. These repeats display unusually high sequence identities maintained by concerted evolution. However, some sites within individual repeats are hypervariable and under diversifying selection. Despite extensive genetic studies, inconsistencies in the reported WD40 domain sequence have hindered functional and evolutionary analyses. Here we demonstrate that the WD40 domain can be accurately reconstructed from long-read sequencing (Oxford Nanopore and PacBio) data, but not from Illumina-based assemblies. Functional alleles are usually formed by 11 highly conserved repeats, with different repeat combinations underlying the same phenotypic het-d and het-e incompatibility reactions. Protein structure models suggest that their WD40 domain folds into two 7-blade β-propellers composed of the highly conserved repeats, as well as three cryptic divergent repeats at the C-terminus. We additionally show that one particular het-e allele does not have an incompatibility reaction with common het-c alleles, despite being 11-repeats long. Our findings provide a robust foundation for future research into the molecular mechanisms and evolutionary dynamics of het NLRs, while also highlighting both the fragility and the flexibility of β-propellers as immune sensor domains.
README: Reconstructing NOD-like receptor alleles with high internal conservation in Podospora anserina using long-read sequencing
https://doi.org/10.5061/dryad.h18931zww
Principle Investigator Contact Information
Name: S. Lorena Ament-Velásquez
Institution: Stockholm University
ORCID: https://orcid.org/0000-0003-3371-9292
Dataset Overview
This dataset contains the whole genome assemblies required to replicate analyses in Ament-Velásquez et al. (2025), where we show that the WD40 repetitive domain of the NOD-like receptors het-e, het-d, and het-r from the fungus Podospora anserina can be reconstructed from long-read sequencing (Oxford Nanopore Technology and PacBio) data, but not from Illumina-based assemblies.
We sequenced the whole-genome of three wildtype strains (Y+, Z+, and Wa63+) and nine lab strains with Oxford Nanopore Technology- ONT (R10 flowcells). The lab strains were the product of backcrossing different het-e, het-d, and het-c alleles into the genomic background of strains s ("little s"). The backcrossed strains are designated by their reactive genotypes. For example, the strain CmEm- contains the het-c and het-e alleles of the French strain M, while having the non-reactive het-d allele of strain s. Likewise, the strain ChEhDa+ has the het-c and het-e alleles of the H strain and the het-d allele of the A strain. The exceptions are strains with a null het-c allele, here termed Co (CoEc+ and CoEf+).
In addition, we re-assembled the Illumina reads of strains Wa63+, Y+, Z+, and Wa137-, which were retrieved from NCBI’s Sequence Read Archive (accession numbers SRX5458088, SRX5458091, SRX11405146, and SRX8537866). There are two versions of SPAdes assemblies (varying the k-mers used) for most strains, and only one for Wa137-.
In addition, the nucleotide sequence alignment in fasta format of het-e, het-d, and het-r from all assemblies are found in 2024.09.10_hnwd_master_het_reAl_Illu_noGuides_noemptycols.fa
. The pipeline used to process this fasta file is available here: https://github.com/SLAment/FixingHetDE
All genome assemblies are provided in fasta format.
Funding
This work was supported by the Swedish Research Council (grant 2022-00341) and the Stiftelsen Anna-Greta och Holger Crafoords fond (CR2023-0039) to S.L.A.-V.
Sharing/Access information
This work is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license.
Related Data Sources
We also analysed the whole genome assemblies from the following studies:
- Vogan et al. (2019) "Combinations of Spok genes create multiple meiotic drivers in Podospora", eLife 8:e46454 https://doi.org/10.7554/eLife.46454, data available here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.vm1192g
- Vogan et al. (2021) "The Enterprise, a massive transposon carrying Spok meiotic drive genes", Genome Res. 31:789-798 https://genome.cshlp.org/content/31/5/789, data available here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.4tmpg4f8m
- Ament-Velásquez (2024) "High-Quality Genome Assemblies of 4 Members of the Podospora anserina Species Complex", Genome Biology and Evolution, 16(3):evae034 https://doi.org/10.1093/gbe/evae034, data available here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.1vhhmgr0j
Files and variables
File: FixingHetDEdata_v2.zip
Description: This zip file contains all the genome assemblies from Illumina (folder Illu_assemblies
) and ONT (ONT_assemblies
) data, as well as the alignment of the het genes (2024.09.10_hnwd_master_het_reAl_Illu_noGuides_noemptycols.fa
). All files are in fasta format.
Code/software
Code associated with this study can be found here. Since all files are in fasta format, they can be opened in any standard text editor, or in specialized software such as AliView.
Access information
Illumina assemblies were derived from reads available at NCBI’s Sequence Read Archive (accession numbers SRX5458088, SRX5458091, SRX11405146, and SRX8537866). New sequencing data was deposited in NCBI’s Sequence Read Archive Bioproject PRJNA1216259.
Methods
This dataset consists of genome assemblies of three wildtype strains (Y+, Z+, and Wa63+) and nine lab strains that were the product of backcrossing different het-e, het-d, and het-c alleles into the genomic background of strains s ("little s"). Whole-genome DNA of most strains was extracted with the Zymo Quick-DNA Fungal/Bacterial Miniprep Kit D6005 (Zymo Research; https://zymoresearch.eu/). For the strain CmEm-, ~800mg of mycelia were used for high-molecular-weight DNA extraction using the QIAGEN Genomic-tip 100/G kit (Qiagen). Oxford Nanopore Technology (ONT) sequencing was performed in-house using a Native Barcoding Kit 24 V14 SQK-NBD114.24 and a MinION Mk1C machine following the standard protocol. In total, 12 strains were barcoded into two pools (pool1: CmEm-, CoEc+, CoEc-, Y+, Z+, and Wa63+ with barcodes 1 to 6, and pool2: CoEf+, ChEhDa+, ChEhDa-, CaDa-, CsDf+, and CsDf- with barcodes 7 to 12). Each pool was sequenced in two separate R10.4.1 flow cells (FLO-MIN114). Basecalling was performed using Dorado v. 0.5.3 (https://github.com/nanoporetech/dorado/) with the dna_r10.4.1_e8.2_400bps_sup@v4.3.0 model. The resulting BAM files were transformed into fastq files with the bam2fq program of SAMtools v. 1.19.2 (Danecek et al. 2021). Reads corresponding to the DNA Control Sample (DNA CS) introduced during library preparation were removed using chopper v. 0.7.0 (De Coster and Rademakers 2023). For each sample, we removed reads that contained perfect matches to ONT native barcodes assigned to other samples. We removed barcodes and performed minimum quality control with fastplong v. 0.2.2 (Chen 2023) and parameters --trimming_extension 20 -l 50 -q 15 -d 0.1 (hereafter, cleaned ONT reads). The cleaned ONT reads of each sample were used as input for Flye v. 2.9.3 (Kolmogorov et al. 2019), with parameters --nano-hq --iterations 2.
In addition, the paired-end Illumina reads of the strains Wa63+, Y+, Z+, and Wa137- were retrieved from NCBI’s Sequence Read Archive (accession numbers SRX5458088, SRX5458091, SRX11405146, and SRX8537866) and assembled with SPAdes v. 4.0.0 (Prjibelski et al. 2020) using the --careful parameter and either the default k-mers setting (Wa63-, Z+, and Y-) or the k-mers 21, 33, 55, and 77 (all strains, "allkmers").
From all these assemblies, the nucleotide sequences of the het-e, het-d, and het-r genes were extracted and aligned manually (file 2024.09.10_hnwd_master_het_reAl_Illu_noGuides_noemptycols.fa).
Associated code can be found in the repository https://github.com/SLAment/FixingHetDE. New sequencing data was deposited in NCBI’s Sequence Read Archive Bioproject PRJNA1216259.
References
Chen S. 2023. Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107.
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. 2021. Twelve years of SAMtools and BCFtools. GigaScience 10: giab008.
De Coster W, Rademakers R. 2023. NanoPack2: population-scale evaluation of long-read sequencing data. Bioinformatics 39: btad311.
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. 2019. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37: 540–546.
Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. 2020. Using SPAdes De Novo Assembler. CP in Bioinformatics 70: e102.