Reticulate evolution and rapid development of reproductive barriers upon secondary contact in a forest fungus
Data files
Oct 07, 2025 version files 4.49 GB
Abstract
Reproductive barriers between sister species of the mushroom-forming fungi tend to be stronger in sympatry, leading to speculation on whether they are being reinforced by selection against hybrids. In this study, we use population genomic analyses together with in vitro crosses of a global sample of the wood decay fungus Trichaptum abietinum to investigate reproductive barriers within this species complex and the processes that have shaped them. Our phylogeographic analyses show that T. abietinum is delimited into six major genetic groups, one in Asia, two in Europe, and three in North America. The two groups present in Europe are interfertile and admixed, whereas our crosses show that the North American groups are reproductively isolated. In Asia, a more complex pattern appears, with partial intersterility between multiple sub-groups that likely originated independently and more recently than the reproductive barriers in North America. We found pre-mating barriers in T. abietinum to be moderately correlated with genomic divergence, whereas the mean growth reduction of the mated hybrids showed a strong correlation with increasing genomic divergence. Genome-wide association analyses identified candidate genes with programmed cell death annotations, which are known to be involved in intersterility in distantly related fungi, although their link here remains unproven. Our demographic modelling and phylogenetic network analyses fit a scenario where reproductive barriers in Trichaptum abietinum could have been reinforced upon secondary contact between groups that diverged in allopatry during the Pleistocene glacial cycles. Our combination of experimental and genomic approaches demonstrates how T. abietinum is a tractable system for studying speciation mechanisms.
https://doi.org/10.5061/dryad.z8w9ghxnw
VCF-file containing whole genome genetic variation of a global dataset of the wood decay fungus Trichaptum abietinum.
Description of the data and file structure
illumina_1_2_3_4_hf_excl_DP3_GQ20_nomono_noindel_nodik_miss20.vcf.gz is a gzipped VCF-file obtained from GATK4´s pipeline of joint variant discovery by running Haplotypecaller in GVCF mode with ploidy = 1, followed by GenomicsDBImport and GenotypeGVCFs to produce the GVCF-file containing the joint variants in the data. The GVCF-file was quality filtered using BCFtools 1.10.275 following an approach outlined in Barth et al. 2019 (https://doi.org/10.1111/mec.15010) to obtain the current VCF-file. The filtering thresholds were set to exclude sites that have: FS > 40, MQRankSum < −5.0 || > 5.0, ReadPosRankSum < −4, QD < 2, MQ > 40, SnpGap = 10, number of indels = 0, number of alleles = 2, INFO/DP > meanDP/2, missing data <20% and no monomorphic SNPs. All other VCF files used in the paper can be generated from this one by filtering for missing sites, specific populations, LD, retaining only biallelic SNPs, etc.
Mapping was performed with BWA 0.7.17, and raw reads were quality filtered at q3o with Trim Galore 0.6.5.
A simple explanation of the VCF-file format:
Variable/Column List: (as standard VCF-file: https://samtools.github.io/hts-specs/VCFv4.2.pdf)
There are 8 fixed fields per record. All data lines are tab-delimited. In all cases, missing values are specified with a dot (‘.’). Fixed fields are:
1. CHROM - chromosome: An identifier from the reference genome or an angle-bracketed ID String (“”) pointing to a contig in the assembly file (cf. the ##assembly line in the header). All entries for a specific CHROM should form a contiguous block within the VCF file. (String, no whitespace permitted, Required).
2. POS - position: The reference position, with the 1st base having position 1. Positions are sorted numerically, in increasing order, within each reference sequence CHROM. It is permitted to have multiple records with the same POS. Telomeres are indicated by using positions 0 or N+1, where N is the length of the corresponding chromosome or contig. (Integer, Required)
3. ID - identifier: Semicolon-separated list of unique identifiers where available. If this is a dbSNP variant, it is encouraged to use the rs number(s). No identifier should be present in more than one data record. If there is no identifier available, then the missing value should be used. (String, no whitespace or semicolons permitted)
4. REF - reference base(s): Each base must be one of A, C, G, T, N (case insensitive). Multiple bases are permitted. The value in the POS field refers to the position of the first base in the String. For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event; this padding base is not required (although it is permitted), e.g., complex substitutions or other events where all alleles have at least one base represented in their Strings. If any of the ALT alleles is a symbolic allele (an angle-bracketed ID String “”), then the padding base is required and POS denotes the coordinate of the base preceding the polymorphism. Tools processing VCF files are not required to preserve case in the allele Strings. (String, Required).
5. ALT - alternate base(s): Comma-separated list of alternate non-reference alleles. These alleles do not have to be called in any of the samples. Options are base Strings made up of the bases A, C, G, T, N,, (case insensitive) or an angle-bracketed ID String (“”) or a breakend replacement string as described in the section on breakends. The ‘’ allele is reserved to indicate that the allele is missing due to an upstream deletion. If there are no alternative alleles, then the missing value should be used. Tools processing VCF files are not required to preserve case in the allele String, except for IDs, which are case sensitive. (String; no whitespace, commas, or angle-brackets are permitted in the ID String itself)
6. QUAL - quality: Phred-scaled quality score for the assertion made in ALT. i.e. −10log10 prob(call in ALT is wrong). If ALT is ‘.’ (no variant), then this is −10log10 prob(variant), and if ALT is not ‘.’, this is −10log10 prob(no variant). If unknown, the missing value should be specified. (Numeric)
7. FILTER - filter status: PASS if this position has passed all filters, i.e., a call is made at this position. Otherwise, if the site has not passed all filters, a semicolon-separated list of codes for filters that fail, e.g., “q10;s50” might indicate that at this site the quality is below 10 and the number of samples with data is below 50% of the total number of samples. ‘0’ is reserved and should not be used as a filter String. If filters have not been applied, then this field should be set to the missing value. (String, no whitespace or semicolons permitted)
8. INFO - additional information: (String, no whitespace, semicolons, or equals-signs permitted; commas are permitted only as delimiters for lists of values) INFO fields are encoded as a semicolon-separated series of short keys with optional values in the format: =[,data]. If no keys are present, the missing value must be used. Arbitrary keys are permitted, although the following sub-fields are reserved (albeit optional):
Missing data codes:
.
./.
Sharing/Access information
Raw reads deposited to: NCBI SRA #PRJNA679164 and NCBI SRA #PRJNA679164 (Described in Peris et al. 2022 10.1371/journal.pgen.1010097 and Lu et al. 2024: https://doi.org/10.1016/j.cub.2024.08.046
Reference genome used for mapping: ENA #PRJEB45061 (Described in Peris et al. 10.1371/journal.pgen.1010097)
