Skip to main content
Dryad

Large-scale fungal strain sequencing unravels the molecular diversity in mating loci maintained by long-term balancing selection

Cite this dataset

Peris, David et al. (2024). Large-scale fungal strain sequencing unravels the molecular diversity in mating loci maintained by long-term balancing selection [Dataset]. Dryad. https://doi.org/10.5061/dryad.fxpnvx0t4

Abstract

Balancing selection, an evolutionary force that retains genetic diversity, has been detected in multiple genes and organisms, such as the sexual mating loci in fungi. However, to quantify the strength of balancing selection and define the mating-related genes require a large number of strains. In tetrapolar basidiomycete fungi, sexual type is determined by two unlinked loci, MATA and MATB. Genes in both loci define mating type identity, control successful mating and completion of the life cycle. These loci are usually highly diverse. Previous studies have speculated, based on culture crosses, that species of the non-model genus Trichaptum (Hymenochaetales, Basidiomycota) possess a tetrapolar mating system, with multiple alleles. Here, we sequenced a hundred and eighty strains of three Trichaptum species. We characterized the chromosomal location of MATA and MATB, the molecular structure of MAT regions and their allelic richness. The sequencing effort was sufficient to molecularly characterize multiple MAT alleles segregating before the speciation event of Trichaptum species. Analyses suggested that long-term balancing selection has generated trans-species polymorphisms. Mating sequences were classified in different allelic classes based on an amino acid identity (AAI) threshold supported by phylogenetics. 17,550 mating types were predicted based on the allelic classes. In vitro crosses allowed us to support the degree of allelic divergence needed for successful mating. Even with the high amount of divergence, key amino acids in functional domains are conserved. We conclude that the genetic diversity of mating loci in Trichaptum is due to long-term balancing selection, with limited recombination and duplication activity. The large number of sequenced strains highlighted the importance of sequencing multiple individuals from different species to detect the mating-related genes, the mechanisms generating diversity and the evolutionary forces maintaining them.

README: Large-scale fungal strain sequencing unravels the molecular diversity in mating loci maintained by long-term balancing selection publication in Plos Genetics 2022

https://doi.org/10.5061/dryad.fxpnvx0t4

Additional information is described in the dedicated GitHub page

Information about data in the dryad repository

iWGS_SPAdes_Assemblies.tar.gz:

Compressed file with genome assemblies for individuals sequenced by Illumina technology. Additional information about these assemblies can be found in Supplementary Table 1 of the manuscript.

CrossingPictures.rar:

Compressed picture files related with experimental crosses.

IndividualGeneAlignments_trimmed.zip:

Compressed file with trimmed alignments for the coding sequences (CDS) and amino acid sequences (aa)

MATA.zip:

Assembled MATA regions (.fas) and annotations (.gff) for each specimen.

MATB.zip:

Assembled MATB regions (.fas) and annotations (.gff) for each specimen.

SourceData.rar:

Compressed file with raw data to generate figures and tables in the manuscript:

  • IQTree_logFiles.tar.gz: IQTree log files with information to replicate the phylogenetic reconstruction represented in iTOL.
  • AllvsAll_distances.meg: Converted Average Nucleotide Identity (ANI) used for reconstructing a Neighbour-Joining tree
  • BUSCO_MAT_info.csv: BUSCO annotation statistics and location on TA10106M1 genome
  • dxy.csv: Absolute divergence statitstic for BUSCO and MAT genes
  • Fst.csv: Relative divergence statitstic for BUSCO and MAT genes
  • MKT.csv: Multilocus Hudson–Kreitman–Aguadé (HKA) test performed with HKAdirect 0.7b
  • paml.csv: Average number of synonymous substitutions per synonymous sites (dS) and non-synonymous substitutions per non-synonymous sites (dN) for BUSCO and MAT genes
  • pi.csv: Nucleotide diversity values for BUSCO and MAT genes
  • tajimaD.csv: Tajima’s D values for BUSCO and MAT genes

Annotation_Tabietinum_10106M1.zip:

Compressed file with annotation files for TA10106M individual.

  • TA10106M1_BUSCO.gff: annotation file with the coordinates of BUSCO genes for the genome TA10106M1.
  • TA10106M1_nuclearV2.gff: MAKER pipeline annotation file with the coordinates of genes, CDS and other features for the genome TA10106M1. It also includes Interproscan, Blastp and KEGG (GenomeMaple KAAS) annotations.
  • TA10106M1_RepeatMasker.gff: annotation file with the coordinates of features annotated by RepeatMasker.

Methods

Description of methods included in the manuscript

Usage notes

Additional information can be found in the dedicated GitHub webpage: https://perisd.github.io/TriMAT/

Funding

The Research Council of Norway, Award: RCN 274337

The Research Council of Norway, Award: RCN 324253

Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital, Award: CIDEGENT/2021/039