Genomic resources of the Podospora anserina species complex
Abstract
The filamentous fungus Podospora anserina is a model organism used extensively in the study of molecular biology, senescence, prion biology, meiotic drive, mating-type chromosome evolution, and plant biomass degradation. It has recently been established that P. anserina is a member of a complex of seven, closely related species. In addition to P. anserina, high-quality genomic resources are available for two of these taxa. Here we provide chromosome-level annotated assemblies of the four remaining species of the complex, as well as a comprehensive dataset of annotated assemblies from a total of 28 Podospora genomes.
README: Genomic resources of the Podospora anserina species complex
Here you'll find all the genome assemblies annotated for:
Ament-Velásquez et al. (2023) "High-quality genome assemblies of four members of the Podospora anserina species complex"
The study includes 29 strains (28 Podospora spp. and 1 Cercophora samala). Some of the assemblies were made with long-reads (either PacBio or Oxford Nanopore) and polished with Illumina reads, usually reaching chromosome-size contigs. Others are just SPAdes assemblies of Illumina reads (1000-2000 contigs). I annotated the genes and repeat elements for all assemblies, but I put extra effort into the assembly of four strains (CBS 124.78+, CBS 411.78-, CBS 415.72-, and CBS 112042+) and annotated their mitochondrial genomes as well.
There are seven Podospora species: P. anserina, P. comata, P. pauciseta, P. pseudoanserina, P. pseudocomata, P. psuedopauciseta, and P. bellae-mahoneyi. Most species have one or a couple of strains. P. anserina is the model organism.
Description of the data and file structure
You'll find the following directories:
- Alignments: Contains the individual alignments of the mitochondrial genes and the concatenated assembly used, as well as the alignments of all nuclear orthologs used for the phylogenomic analysis of Figure 2.
- Assemblies: Mostly assemblies I made (
XXXX.nice.fa
), but I included the second version of the JGI assembly of the reference strain S+ of Podospora anserina (Podan2_AssemblyScaffoldsmt.fa
) or "Podan2" and the NCBI assembly of the strain Td+ of Podospora comata (PODCO_genomic.fas
) or PODCO. - Annotations:
- Annotations of the assemblies I produced (
XXXX.nice-3.00.gff3
): - The mitochondrial annotation if available (
XXXX_mt.gff3
) - The gene annotations of the JGI Podan2 assembly and the NCBI annotation of PODCO for completeness.
- Annotations of the assemblies I produced (
- Metadata: The supplementary table 1 of the article with strain information and sequencing and genome statistics. It contains the following fields:
- Strain: Name of the strain (derived from the dikaryon) and its mating type (+ or -)
- Assembly ID: Name used to distinguish this assembly
- Species: The different Podospora species (or Cercophora samala)
- Locality: Place or country where the strain was isolated
- Year: Year of isolation
- Substrate (Herbivore dung): Animal names indicate the herbivore that produced the dung where the fungus was isolated. One strain was isolated just from soil (CBS415.72)
- Assembly: Quality of the assembly, with "High quality" indicating chromosome-level contigs, "Reference" acting as the reference genome for that species, and "Fragmented" if there are more than 20 contigs in the assembly.
- Technology: Sequencing technologies used to produce the input data for genome assembly
- Assembler: Program used to produce the main assembly (before polishing, when applicable)
- Basecaller: Basecaller used for the Oxford Nanopore data, when applicable.
- Filtering: Was there any filtering to the long reads used for assembly?
- Contigs: Number of contigs in the assembly (excluding the mitochondrial contig in the long-read assemblies). For the long-read assemblies (Reference and High-quality), this gives a feeling of how many chromosomes were assembled in their entirety (there should be a total of 7 chromosomes).
- N50: N50 of the final assembly
- GC content: GC content of the final assembly
- GC long reads: GC of the long reads themselves
- Size (bp): Size of the final assembly in bp, as an indication of genome size
- Mean Depth (x) long: Mean depth of coverage based on the mapping of the long reads
- Mean Depth (x) short: Mean depth of coverage based on the mapping of the short reads.
- No. Long reads: number of long reads before filtering
- Mean Read Length (bp): Mean length of the long reads
- BUSCO (n:3817): BUSCO statistics based on the Sordariomycetes_odb10 database, indicating the percentage of 3817 conserved orthologs. C: complete; S: complete and single-copy; D: complete and duplicated F: fragmented; M: missing
- Repeat Content (%): Percentage of base pairs in the assembly annotated as a repeat based on the "PodoTE-1.00" library (available here). For long-read assemblies, only scaffolds larger than 50kb are considered, excluding the mitochondrial and rDNA scaffolds
- No. Proteins: given by our pipeline, unless otherwise marked
- Source: Reference publication of the genome assembly
- Bioproject: NCBI accession number of the BioProject
- Biosample: NCBI accession number of the Biosample
- Illumina SRA: NCBI accession number of the raw short-reads
- Long reads SRA: NCBI accession number(s) of the raw long-reads
Sharing/Access information
The data of the published genomes came from the following studies:
- Espagne, E. et al. 2008. The genome sequence of the model ascomycete fungus Podospora anserina. Genome Biology 9, R77. https://doi.org/10.1186/gb-2008-9-5-r77
- Silar, P., et al. 2018. A gene graveyard in the genome of the fungus Podospora comata. Molecular Genetics and Genomics 294, 177–190. https://doi.org/10.1007/s00438-018-1497-3
- Vogan, A.A. et al. 2019. Combinations of Spok genes create multiple meiotic drivers in Podospora. eLife 8, e46454. https://doi.org/10.7554/eLife.46454
- Vogan, A.A. et al. 2021. The Enterprise, a massive transposon carrying Spok meiotic drive genes. Genome Research 31, 789–798. https://doi.org/10.1101/gr.267609.120
- Ament-Velásquez, S.L., et al. 2022. Allorecognition genes drive reproductive isolation in Podospora anserina. Nat Ecol Evol 6, 910–923. https://doi.org/10.1038/s41559-022-01734-x
Code/Software
The code of analyses made on the paper is available in the GitHub repository.
Methods
Here we present whole-genome annotated assemblies of 29 strains (28 Podospora spp. and 1 Cercophora samala), most of which were produced in previous studies. We sequenced four strains with MinION Oxford Nanopore and Illumina, and four strains more with just Illumina. We assembled the MinION reads with mean Phred quality (QV) above 9 and longer than 1kb using Minimap2 v. 2.11 and Miniasm v. 0.2, followed by polishing with Racon v. 1.3.1 with the raw long-reads and five rounds of Pilon v. 1.22 with the Illumina reads. The genomes of samples with just Illumina data were assembled using SPAdes v. 3.12.0. Genen models were produced with MAKER v. 3.01.04 and functional annotation was inferred with Funannotate v. 1.8.15. Mitochondrial annotation was done with MFannot (consulted during the second half of July 2023) setting the genetic code to 4. See the manuscript for details.