Data from: Repeated loss of function at HD mating-type genes and of recombination suppression without mating-type locus linkage in anther-smut fungi
Data files
Mar 14, 2025 version files 326.69 MB
-
featurecounts_noLinsert_A_on_MPbarcode50flye
2.23 MB
-
featurecounts_noLinsert_a1a2_mated_48-1_on_Mv-lyc-1064-A2_A1impMAT
2.29 MB
-
featurecounts_noLinsert_a1a2_mated_48-2_on_Mv-lyc-1064-A2_A1impMAT
2.30 MB
-
featurecounts_noLinsert_B_on_MPbarcode50flye
2.23 MB
-
featurecounts_noLinsert_C_on_MPbarcode50flye
2.23 MB
-
gene_trees
59.49 KB
-
MPbarcode50_tsebra.gtf
26.53 MB
-
MPbarcode50.fa
42.49 MB
-
Mscorzo-A1_renamed.fa
27.18 MB
-
Mscorzo-A2_renamed.fa
28.08 MB
-
Mv-cat-C212-A1.fa
44.46 MB
-
Mv-cat-C212-A2.fa
44.57 MB
-
Mv-lyc-1064_impuntedMAT_region_gene.gtf
4.40 MB
-
Mv-sup-1065-A1.fa
48.79 MB
-
Mv-sup-1065-A2.fa
48.83 MB
-
README.md
6.43 KB
-
speciestree.tre
1.62 KB
Abstract
A wide diversity of mating systems occur in nature, with frequent evolutionary transitions in mating-compatibility mechanisms. Basidiomycete fungi typically have two mating-type loci controlling mating compatibility, HD and PR, usually residing on different chromosomes. In Microbotryum anther-smut fungi, there have been repeated events of linkage between the two mating-type loci through chromosome fusions, leading to large non-recombining regions. By generating high-quality genome assemblies, we found that two sister Microbotryum species parasitizing Dianthus plants, M. superbum and M. shykoffianum, as well as the distantly related M. scorzonarae, have their HD and PR mating-type loci on different chromosomes, but with the PR mating-type chromosome fused with part of the ancestral HD chromosome. Furthermore, progressive extensions of recombination suppression have generated evolutionary strata. In all three species, rearrangements suggest the existence of a transient stage of HD-PR linkage by whole chromosome fusion, and, unexpectedly, the HD genes lost their function. In M. superbum, multiple natural diploid strains were homozygous, and the disrupted HD2 gene was hardly expressed. Mating tests confirmed that a single genetic factor controlled mating compatibility (i.e. PR) and that haploid strains with identical HD alleles could mate and produce infectious hyphae. The HD genes have therefore lost their function in the control of mating compatibility in these Microbotryum species. While the loss of function of PR genes in mating compatibility has been reported in a few basidiomycete fungi, these are the first documented cases for the loss of mating-type determination by HD genes in heterothallic fungi. The control of mating compatibility by a single genetic factor is beneficial under selfing and can thus be achieved repeatedly, through evolutionary convergence in distant lineages, involving different genomic or similar pathways.
Data description
This repository includes three types of data associated with Lucotte et al., 2025.
- Genome assemblies and annotation.
- RNAseq quantification to explore HD expression.
- Gene trees to explore transpecific polymorphism in non recombining regions.
Files and variables
FILES
File: MPbarcode50.fa
Description: Whole genome assembly in fasta format obtained with flye from ONT reads of an haploid sample.
File: Mscorzo-A1_renamed.fa
Description: Whole genome assembly in fasta format obtained with canu from ONT reads of an haploid sample.
File: Mscorzo-A2_renamed.fa
Description: Whole genome assembly in fasta format obtained with canu from ONT reads of an haploid sample.
File: Mv-cat-C212-A1.fa
Description: Whole genome assembly in fasta format obtained with hifiasm from HiFi reads of a diploid sample (haplotype 2).
File: Mv-cat-C212-A2.fa
Description: Whole genome assembly in fasta format obtained with hifiasm from HiFi reads of a diploid sample (haplotype 1).
File: Mv-sup-1065-A1.fa
Description: Whole genome assembly in fasta format obtained with canu from HiFi reads of an haploid sample.
File: Mv-sup-1065-A2.fa
Description: Whole genome assembly in fasta format obtained with canu from HiFi reads of an haploid sample.
File: Mv-lyc-1064_impuntedMAT_region_gene.gtf
Description: General Transfer Format (gtf). Nine columns, tab separated
File: MPbarcode50_tsebra.gtf
Description: General Transfer Format (gtf). Nine columns, tab separated
File: featurecounts_noLinsert_a1a2_mated_48-1_on_Mv-lyc-1064-A2_A1impMAT
Description: Fractional read mapping per gene feature. Seven columns, tab separated.
File: featurecounts_noLinsert_a1a2_mated_48-2_on_Mv-lyc-1064-A2_A1impMAT
Description: Fractional read mapping per gene feature. Seven columns, tab separated.
File: featurecounts_noLinsert_A_on_MPbarcode50flye
Description: Fractional read mapping per gene feature. Seven columns, tab separated.
File: featurecounts_noLinsert_B_on_MPbarcode50flye
Description: Fractional read mapping per gene feature. Seven columns, tab separated.
File: featurecounts_noLinsert_C_on_MPbarcode50flye
Description: Fractional read mapping per gene feature. Seven columns, tab separated.
File: gene_trees
Description: Two columns file with stratum imputation and gene tree in newick format
File: speciestree.tre
Description: Newick format
DATA DESCRIPTION
Each *gtf has nine tab separated columns as follows, see GFF/GTF specification
- seqname - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix.
- source - name of the program that generated this feature, or the data source (database or project name)
- feature - feature type name, e.g. Gene, Variation, Similarity
- start - Start position* of the feature, with sequence numbering starting at 1.
- end - End position* of the feature, with sequence numbering starting at 1.
- score - A floating point value.
- strand - defined as + (forward) or - (reverse).
- frame - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
- attribute - A semicolon-separated list of tag-value pairs, providing additional information about each feature.
Each featurecounts* file has seven tab separate columns as follows. Command used is recorded in the first row, columns names are defined in the second row and data starts at the third row.
- Geneid - Gene/transcript identifier as in the attribute column of the
gtffile - Chr - Assembled pseudomolecule identifier as in the seqname column of the
gtffile - Start - Semicolon separated list of CDS features start positions
- End - Semicolon separated list of CDS features end positions
- Strand - defined as + (forward) or - (reverse)
- Length - Sum length of the CDS features associated to the same Geneid
- <string> - Total fractional read counts across the feature
Code/software
Genome assembly
Mscorzo-A[12]_renamed.fa canu 1.7bwith default parameters on ONT readsMv-cat-C212-A[12].fa hifiasm 0.16.1-r375with default parameters on HiFi readsMv-sup-1065-A[12].fa canu 2.2with-pacbio-hifiparameter on HiFi readsMPbarcode50.fa flye 2.9.4-b1799with-g 48.7m -i 3 --nano-rawparameters on ONT reads
Genome annotation
Whole genome shotgun assemblies were annotated with Braker2 based on RNAseq or conserved proteins evidence and combined with TSEBRA. Genome annotation workflow correspond to Step I described here.
RNAseq quantification
Fractional read counts per transcript feature were summarized with featureCounts from subread package, based on STAR 2.7.11a read alignments. Specific commands are recorded at the first row of each featurecounts* file.
Phylogenetic trees
Species (concatenation of monocopy genes alignments) and individual gene trees were obtained with IQTree2 2.2.6 and processed as described in the methods section of this repository (gotree and newick_utils).
Access information
Other publicly accessible locations of the data:
- Additional custom scripts are available at https://github.com/eliselucotte/MicrobotryumOnDianthus
- Additional data, including SRA with WGS and RNAseq reads, can be found at GenBank associated to BioProjects PRJNA1169769, PRJNA1206992 and PRJNA1206994
Data was derived from the following sources:
- Previously obtained RNAseq data is available at BioProject PRJNA246470
Read mapping and quantification
Reference genome matching reads were obtained with bbduk from bbmap [Ref:https://sourceforge.net/projects/bbmap/] with deffault options for quality and adapter trimming. Matching reads were mapped back to the reference genome with STAR 2.7.11a [Ref:https://doi.org/10.1093/bioinformatics/bts635] with the deffault options for paired end reads. Reads with inserts larger than 5 kbp were filtered out. Read counts by transcript feature were obtained with featureCounts from subread 2.0.2 suite [Ref:https://doi.org/10.1093/bioinformatics/btt656].
Trans-specific polymorphism
Gene trees were obtained with IQ-TREE 2.2.6 based on macse v2.05 codon-based alignments. Gene trees were midpoint rooted with gotree [Ref:https://doi.org/10.1093/
- Lucotte, Elise A; Jay, Paul; Rougemont, Quentin et al. (2024). Repeated loss of function at HD mating-type genes and of recombination suppression without mating-type locus linkage in anther-smut fungi [Preprint]. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2024.03.03.583181
