Genome sequence assembly and annotation of MATA and MATB strains of Yarrowia lipolytica
Data files
Oct 12, 2025 version files 186.63 MB
-
README.md
5.32 KB
-
repeatLib.fungi.fa
7.39 MB
-
yali_id.system_name.orthologs.csv
9.31 MB
-
yarrowia_lipolytica-22301-5_core_2_87_1.repeat_report
3.31 KB
-
yarrowia_lipolytica-22301-5_repeat_feature.gene.overlap.txt
38.10 KB
-
yarrowia_lipolytica-22301-5.AED_filter_1.gff
4.48 MB
-
yarrowia_lipolytica-22301-5.evd.cdna.fasta
11.99 MB
-
yarrowia_lipolytica-22301-5.evd.cds.fasta
10.29 MB
-
yarrowia_lipolytica-22301-5.evd.protein.fasta
3.53 MB
-
yarrowia_lipolytica-22301-5.evd.protein.interproscan.tsv
6.20 MB
-
yarrowia_lipolytica-22301-5.fasta
21.09 MB
-
yarrowia_lipolytica-22301-5.fasta.mod.EDTA.intact.gff3
37.56 KB
-
yarrowia_lipolytica-22301-5.fasta.mod.EDTA.TEanno.gff3
262.82 KB
-
yarrowia_lipolytica-22301-5.fasta.mod.EDTA.TEanno.sum
7.92 KB
-
yarrowia_lipolytica-22301-5.fasta.mod.EDTA.TElib.fa
221.81 KB
-
yarrowia_lipolytica-22301-5.fasta.mod.MAKER.masked
21.09 MB
-
yarrowia_lipolytica-22301-5.gff
5.61 MB
-
yarrowia_lipolytica-E122_core_2_87_1.repeat_report
3.17 KB
-
yarrowia_lipolytica-E122_repeat_feature.gene.overlap.txt
40.16 KB
-
yarrowia_lipolytica-E122.AED_filter_1.gff
4.59 MB
-
yarrowia_lipolytica-E122.evd.cdna.fasta
12.26 MB
-
yarrowia_lipolytica-E122.evd.cds.fasta
10.23 MB
-
yarrowia_lipolytica-E122.evd.protein.fasta
3.51 MB
-
yarrowia_lipolytica-E122.evd.protein.interproscan.tsv
6.19 MB
-
yarrowia_lipolytica-E122.fasta
21.02 MB
-
yarrowia_lipolytica-E122.fasta.mod.EDTA.intact.gff3
37.56 KB
-
yarrowia_lipolytica-E122.fasta.mod.EDTA.TEanno.gff3
257.77 KB
-
yarrowia_lipolytica-E122.fasta.mod.EDTA.TEanno.sum
8.09 KB
-
yarrowia_lipolytica-E122.fasta.mod.EDTA.TElib.fa
236.15 KB
-
yarrowia_lipolytica-E122.fasta.mod.MAKER.masked
21.02 MB
-
yarrowia_lipolytica-E122.gff
5.68 MB
Abstract
Yeast is commonly utilized in molecular and cell biology research, and Yarrowia lipolytica is favored by bio-engineers due to its ability to produce copious amounts of lipids, chemicals, and enzymes for industrial applications. Y. lipolytica is a dimorphic yeast that can proliferate in aerobic and hydrophobic environments conducive to industrial use. However, there is limited knowledge about the basic molecular biology of this yeast, including how the genome is duplicated and how gene silencing occurs. Genome sequences of Y. lipolytica strains have offered insights into the genetic basis of this yeast species and have facilitated the development of new industrial applications. Although previous studies have reported the genome sequence of a few Y. lipolytica strains, it is of value to have more precise sequences and annotation, particularly for studies of the biology of this yeast. To further study and characterize the molecular biology of this microorganism, a high-quality reference genome assembly and annotation has been produced for two related Y. lipolytica strains of the opposite mating type, MATA (E122) and MATB (22301-5). The combination of short-read and long-read sequencing of genome DNA and short-read and long-read sequencing of transcript cDNAs allowed the genome assembly and a comparison with a distantly related Yarrowia strain.
Two related Yarrowia lipolytica strains of opposite mating type were obtained from Richard A. Rachubunski, University of Alberta, Canada, and single clones were isolated and used for both genome and transcript sequencing. The strains were 22301-5 (MATB) and E122 (MATA, alternatively called CLIB120) (5, 28, 29) (Figure 1). E122 is MATA, ura3-302, leu2-270, lys8-11, and is related to the MATB strain E150 (CLIB122 and is the current DNA sequence reference strain (6, 23). Strain 22301-5 is MATB, his-1, uras-302, leu2-270.
Files and variables
File: yarrowia_lipolytica-22301-5.fasta
Description: Strain 22301-5 Genome assembly
File: yarrowia_lipolytica-E122.fasta
Description: Strain E122 Genome assembly
File: yarrowia_lipolytica-E122.gff
Description: Strain E122 Transcriptome assembly and annotation. Full structural gene annotation in GFF3 format (genes, exons, CDS)
File: yarrowia_lipolytica-22301-5.gff
Description: Strain 22301-5 Transcriptome assembly and annotation. Full structural gene annotation in GFF3 format (genes, exons, CDS)
File: yarrowia_lipolytica-22301-5.AED_filter_1.gff
Description: Strain 22301-5 Filtered GFF with high-confidence gene models (AED ≤ 1).
File: yarrowia_lipolytica-E122.AED_filter_1.gff
Description: Strain E122 Filtered GFF with high-confidence gene models (AED ≤ 1).
File: yarrowia_lipolytica-22301-5.evd.cdna.fasta
Description: Strain 22301-5 FASTA of spliced cDNA (transcripts including UTRs).
File: yarrowia_lipolytica-E122.evd.cdna.fasta
Description: Strain E122 FASTA of spliced cDNA (transcripts including UTRs).
File: yarrowia_lipolytica-22301-5.evd.cds.fasta
Description: Strain 22301-5 FASTA of coding sequences (CDS only).
File: yarrowia_lipolytica-E122.evd.cds.fasta
Description: Strain E122 FASTA of coding sequences (CDS only).
File: yarrowia_lipolytica-22301-5.evd.protein.fasta
Description: Strain 22301-5 FASTA of translated protein sequences.
File: yarrowia_lipolytica-E122.evd.protein.fasta
Description: Strain E122 FASTA of translated protein sequences.
File: yarrowia_lipolytica-22301-5.evd.protein.interproscan.tsv
Description: Strain 22301-5 InterProScan results: protein domain/function annotations.
File: yarrowia_lipolytica-E122.evd.protein.interproscan.tsv
Description: Strain E122 InterProScan results: protein domain/function annotations.
File: yarrowia_lipolytica-22301-5.fasta.mod.EDTA.intact.gff3
Description: Strain 22301-5 GFF3 of intact transposable elements (TEs) identified by EDTA.
File: yarrowia_lipolytica-E122.fasta.mod.EDTA.intact.gff3
Description: Strain E122 GFF3 of intact transposable elements (TEs) identified by EDTA.
File: yarrowia_lipolytica-22301-5.fasta.mod.EDTA.TEanno.gff3
Description: Strain 22301-5 GFF3 of all annotated TE features (both intact and fragmented).
File: yarrowia_lipolytica-E122.fasta.mod.EDTA.TEanno.gff3
Description: Strain E122 GFF3 of all annotated TE features (both intact and fragmented).
File: yarrowia_lipolytica-22301-5.fasta.mod.EDTA.TEanno.sum
Description: Strain 22301-5 Summary statistics of TE annotations (counts, lengths, classes).
File: yarrowia_lipolytica-E122.fasta.mod.EDTA.TEanno.sum
Description: Strain E122 Summary statistics of TE annotations (counts, lengths, classes).
File: yarrowia_lipolytica-22301-5.fasta.mod.EDTA.TElib.fa
Description: Strain 22301-5 Custom TE library (consensus sequences of identified repeats).
File: yarrowia_lipolytica-E122.fasta.mod.EDTA.TElib.fa
Description: Strain E122 Custom TE library (consensus sequences of identified repeats).
File: yarrowia_lipolytica-22301-5.fasta.mod.MAKER.masked
Description: Strain 22301-5 Genome sequence with repeats masked (soft-masked by MAKER using EDTA TE annotations).
File: yarrowia_lipolytica-E122.fasta.mod.MAKER.masked
Description: Strain E122 Genome sequence with repeats masked (soft-masked by MAKER using EDTA TE annotations).
File: repeatLib.fungi.fa
Description: Custom fungal repeat library in FASTA format (used for repeat masking).
File: yarrowia_lipolytica-22301-5_core_2_87_1.repeat_report
Description: Strain 22301-5 RepeatMasker output report summarizing repeat content (type, count, coverage).
File: yarrowia_lipolytica-E122_core_2_87_1.repeat_report
Description: Strain E122 RepeatMasker output report summarizing repeat content (type, count, coverage).
File: yarrowia_lipolytica-22301-5_repeat_feature.gene.overlap.txt
Description: Strain 22301-5 TE-related genes overlapping repeat element
File: yarrowia_lipolytica-E122_repeat_feature.gene.overlap.txt
Description: Strain E122 TE-related genes overlapping repeat element.
File: yali_id.system_name.orthologs.csv
Description: CSV file containing ortholog relationships between Yarrowia lipolytica genes and genes from other species or strains; includes gene IDs and matched orthologs.
Code/software
Genome and transcriptome data, as well as gff3 annotation files can all be viewed using a genome browser. (e.g. IGV). The remaining files can be opened in a text editor/viewer or excel.
