A chromosome-level genome assembly of the snow leopard, Panthera uncia
Data files
Jul 11, 2025 version files 12.93 GB
-
Irbis_yahs_gapclosed.scaff_seqs_updated_TEs_hm_SR_sm.fasta
2.51 GB
-
GeMoMa_results.zip
434.68 MB
-
hardmasked_all.zip
3.15 GB
-
hardmasked_TEs_softmasked_SR.zip
588.24 MB
-
hardmasked_TEs.zip
3.05 GB
-
InterProScan.zip
712.43 MB
-
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta
2.48 GB
-
README.md
6.40 KB
-
SwissProt_results.zip
1.69 MB
Abstract
The snow leopard (Panthera uncia), a vulnerable big cat native to Central Asia, faces an ongoing population decline due to habitat loss and human activities. Despite its conservation importance, genomic resources for this species remain limited. High-quality reference genomes are essential for assessing genetic diversity, structural variation, and evolutionary history. To address this gap, we have generated a long-read-based and proximity-ligation scaffolded de novo genome assembly of a male snow leopard. The final assembly has a total length of 2.46 Gb in 280 scaffolds, of which the 19 largest correspond to the 18 autosomes and the X chromosome. The scaffold N50 is 145.76 Mb, and the L50 is seven scaffolds. BUSCO and compleasm scores are 98.7 % and 98.9 % of identified Carnivora orthologs. Telomeric sequences were identified on at least one end of 18 out of 19 chromosomes. Scaffolds corresponding to the Y chromosome were identified and mapped. Additionally, the assembly's annotation identified a repeat content of 42.27 % and 25,391 genes. We produced a high-quality, long-read-based chromosome-level assembly of a male snow leopard, as evidenced by the data above. As a first assembly of a male genome, it can serve as a suitable reference genome for the species. The Y chromosome scaffolds provide a glimpse into the chromosome organization and interspecies differences.
Dataset DOI: 10.5061/dryad.5x69p8dgv
Description of the data and file structure
The here presented dataset is the accompanying annotation results for a Genome report of the same title submitted published in the Journal of Heredity.
A de novo assembly for the snow leopard (Panthera uncia) was generated from PacBio HiFi reads using hifiasm v.0.19.7. The pseudohaploid assembly was then scaffolded using available Hi-C data from the DNAZoo (www.dnazoo.org) using the Arima Hi-C mapping pipeline used by the Vertebrate Genomes Project and YaHS v.1.1. Gaps in scaffolds were filled with TGS-GapCloser v.1.1.1 using the PacBio HiFi reads.
Repeats in the assembly were annotated using a de novo repeat library generated with RepeatModeler v.2.0.1 and a Felidae-specific repeat dataset from Dfam_3.1 and RepBase release 20181026 and masked using RepeatMasker v.4.1.0. In addition to hardmasking all repeats we generated a masked assembly with hardmasked Transposable Elements and softmasked simple repeats as input for gene annotation with the GeMoMA pipeline v.1.7.1. Furthermore, we used the following eight assemblies and corresponding annotations as references: Mus musculus (GCF_000001635.27), Homo sapiens (GCF_000001405.40) , Canis lupus familiaris (GCF_011100685.1), Felis catus (GCF_018350175.1), Panthera onca (GCF_028533385.1), Panthera uncia (GCF_023721935.1) , Panthera tigris (GCF_018350195.1), and Panthera leo (GCF_018350215.1). The predicted proteins were then functionally annotated with a BLASTP search against the Swiss-Prot database and using InterProScan.
Files and variables
Usage Note: compressed directories (.zip) can be decompressed using built-in tools on Windows (right-click > “Extract All”), macOS (double-click), and Linux (using the unzip
command in the terminal or archive managers). The remaining files (.fasta) as well as all files (except of .cat.gz) within the compressed directories (.out, .gff, .fasta, .fai, .masked, .tbl, .log) can be read with any text editor; however some are large files that might require a command line tool such as the Linux “less” command to be read properly. Files with “.cat.gz” can be read with linux command line tools such as “less” or “zcat” or decompressed using “gunzip” and subsequently read with any text editor.
File: Irbis_yahs_gapclosed.scaff_seqs_updated.fasta
Description: final genome assembly file after scaffolding and gap-closing
File: hardmasked_all.zip
Description: RepeatMasker output, hardmasking all repeats, including full repeat table
- Irbis_repeatmasker_onesteponly.log #Repeatmasker logfile for run hardmasking all repeats
- Irbis_masked_db-families.fa #consensus sequences of repeat families
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.tbl #summary table of repeatcontent of the assembly by repeat class
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.out.gff #gff annotation file of repeats in the assembly
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.out #list of all repeats in the assembly
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked #assembly fasta with all repeats hardmasked (Ns)
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.fai #assembly fai index file
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.cat.gz #list of repeat family consensus sequences used by repeatmasker for masking
File: Irbis_yahs_gapclosed.scaff_seqs_updated_TEs_hm_SR_sm.fasta
Description: final assembly with hardmasked TEs and softmasked simple repeats
File: GeMoMa_results.zip
Description: GeMoMa gene prediction results
- final_annotation.cds.bed #bedfile containing positions of coding sequences in the assembly
- final_annotation.gene.bed #bedfile containing positions of predicted genes in the assembly
- final_annotation.gff #annotation gff file
- final_annotation.gff_summary #summary of annotation results
- final_annotation.mrna.bed #containing positions of predicted mrna transcripts
- predicted_cds.fasta #fasta file containing coding sequences of predicted proteins
- predicted_genomic.fasta #fasta file containing the complete sequence of predicted genes including introns, exons, etc.
- predicted_proteins.fasta #fasta file containing translated protein sequences
- protocol_GeMoMaPipeline.txt #GeMoMA logfile
File: hardmasked_TEs.zip
Description: RepeatMasker output, hardmasking all TEs
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.tbl #summary table of repeatcontent of the assembly by repeat class that were hardmasked
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.out.gff #gff annotation file of hardmasked repeats in the assembly
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.out #list of hardmasked repeats in the assembly
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked #assembly fasta with TEs hardmasked (Ns)
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.cat.gz #list of repeat family consensus sequences used by repeatmasker for masking
File: SwissProt_results.zip
Description: BLASTP hits of the predicted proteins against Swiss-Prot
- Irbis_annotation_swiss-prot_blast.out #BLASTP results
File: InterProScan.zip
Description: Results of functional annotation with InterProScan
- predicted_proteins_nostop.fasta.tsv #interproscan output file with annotation of GO-terms, motifs etc. of the proteins predicted by GeMoMa
- functional_annotation_results.txt #results summary generated by custom stripts
File: hardmasked_TEs_softmasked_SR.zip
Description: RepeatMasker output, softmasking simple repeats (following the hardmasking of TEs)
-
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.tbl #summary table of repeatcontent of the assembly by repeat class that were softmasked
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.out.gff #gff annotation file of softmasked repeats in the assembly
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.out #list of softmasked repeats in the assembly
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.masked #assembly fasta with TEs hardmasked (Ns) and simple repeats softmasked (lowercase)
- Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.cat.gz #list of repeat family consensus sequences used by repeatmasker for masking