A chromosome-level genome assembly of the snow leopard, Panthera uncia

Plasil, Martin 1 ; Winter, Sven 2 3 ; Stejskalova, Karla1; Vychodilova, Leona1; Jelinek, April1; Futas, Jan1; Burger, Pamela A.3; Horin, Petr1

Published Jul 11, 2025 on Dryad. https://doi.org/10.5061/dryad.5x69p8dgv

Data files

Jul 11, 2025 version files 12.93 GB

GeMoMa_results.zip

434.68 MB
hardmasked_all.zip

3.15 GB
hardmasked_TEs_softmasked_SR.zip

588.24 MB
hardmasked_TEs.zip

3.05 GB
InterProScan.zip

712.43 MB
Irbis_yahs_gapclosed.scaff_seqs_updated_TEs_hm_SR_sm.fasta

2.51 GB
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta

2.48 GB
README.md

6.40 KB
SwissProt_results.zip

1.69 MB

Abstract

The snow leopard (Panthera uncia), a vulnerable big cat native to Central Asia, faces an ongoing population decline due to habitat loss and human activities. Despite its conservation importance, genomic resources for this species remain limited. High-quality reference genomes are essential for assessing genetic diversity, structural variation, and evolutionary history. To address this gap, we have generated a long-read-based and proximity-ligation scaffolded de novo genome assembly of a male snow leopard. The final assembly has a total length of 2.46 Gb in 280 scaffolds, of which the 19 largest correspond to the 18 autosomes and the X chromosome. The scaffold N50 is 145.76 Mb, and the L50 is seven scaffolds. BUSCO and compleasm scores are 98.7 % and 98.9 % of identified Carnivora orthologs. Telomeric sequences were identified on at least one end of 18 out of 19 chromosomes. Scaffolds corresponding to the Y chromosome were identified and mapped. Additionally, the assembly's annotation identified a repeat content of 42.27 % and 25,391 genes. We produced a high-quality, long-read-based chromosome-level assembly of a male snow leopard, as evidenced by the data above. As a first assembly of a male genome, it can serve as a suitable reference genome for the species. The Y chromosome scaffolds provide a glimpse into the chromosome organization and interspecies differences.

Dataset DOI: 10.5061/dryad.5x69p8dgv

Description of the data and file structure

The here presented dataset is the accompanying annotation results for a Genome report of the same title submitted published in the Journal of Heredity.

A de novo assembly for the snow leopard (Panthera uncia) was generated from PacBio HiFi reads using hifiasm v.0.19.7. The pseudohaploid assembly was then scaffolded using available Hi-C data from the DNAZoo (www.dnazoo.org) using the Arima Hi-C mapping pipeline used by the Vertebrate Genomes Project and YaHS v.1.1. Gaps in scaffolds were filled with TGS-GapCloser v.1.1.1 using the PacBio HiFi reads.

Repeats in the assembly were annotated using a de novo repeat library generated with RepeatModeler v.2.0.1 and a Felidae-specific repeat dataset from Dfam_3.1 and RepBase release 20181026 and masked using RepeatMasker v.4.1.0. In addition to hardmasking all repeats we generated a masked assembly with hardmasked Transposable Elements and softmasked simple repeats as input for gene annotation with the GeMoMA pipeline v.1.7.1. Furthermore, we used the following eight assemblies and corresponding annotations as references: Mus musculus (GCF_000001635.27), *Homo sapiens *(GCF_000001405.40) , Canis lupus familiaris (GCF_011100685.1), *Felis catus *(GCF_018350175.1), Panthera onca (GCF_028533385.1), Panthera uncia (GCF_023721935.1) , Panthera tigris (GCF_018350195.1), and Panthera leo (GCF_018350215.1). The predicted proteins were then functionally annotated with a BLASTP search against the Swiss-Prot database and using InterProScan.

Files and variables

Usage Note: compressed directories (.zip) can be decompressed using built-in tools on Windows (right-click > "Extract All"), macOS (double-click), and Linux (using the unzipcommand in the terminal or archive managers). The remaining files (.fasta) as well as all files (except of .cat.gz) within the compressed directories (.out, .gff, .fasta, .fai, .masked, .tbl, .log) can be read with any text editor; however some are large files that might require a command line tool such as the Linux "less" command to be read properly. Files with ".cat.gz" can be read with linux command line tools such as "less" or "zcat" or decompressed using "gunzip" and subsequently read with any text editor.

File: Irbis_yahs_gapclosed.scaff_seqs_updated.fasta

Description: final genome assembly file after scaffolding and gap-closing

File: hardmasked_all.zip

Description: RepeatMasker output, hardmasking all repeats, including full repeat table

Irbis_repeatmasker_onesteponly.log #Repeatmasker logfile for run hardmasking all repeats
Irbis_masked_db-families.fa #consensus sequences of repeat families
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.tbl #summary table of repeatcontent of the assembly by repeat class
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.out.gff #gff annotation file of repeats in the assembly
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.out #list of all repeats in the assembly
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked #assembly fasta with all repeats hardmasked (Ns)
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.fai #assembly fai index file
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.cat.gz #list of repeat family consensus sequences used by repeatmasker for masking

File: Irbis_yahs_gapclosed.scaff_seqs_updated_TEs_hm_SR_sm.fasta

Description: final assembly with hardmasked TEs and softmasked simple repeats

File: GeMoMa_results.zip

Description: GeMoMa gene prediction results

final_annotation.cds.bed #bedfile containing positions of coding sequences in the assembly
final_annotation.gene.bed #bedfile containing positions of predicted genes in the assembly
final_annotation.gff #annotation gff file
final_annotation.gff_summary #summary of annotation results
final_annotation.mrna.bed #containing positions of predicted mrna transcripts
predicted_cds.fasta #fasta file containing coding sequences of predicted proteins
predicted_genomic.fasta #fasta file containing the complete sequence of predicted genes including introns, exons, etc.
predicted_proteins.fasta #fasta file containing translated protein sequences
protocol_GeMoMaPipeline.txt #GeMoMA logfile

File: hardmasked_TEs.zip

Description: RepeatMasker output, hardmasking all TEs

Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.tbl #summary table of repeatcontent of the assembly by repeat class that were hardmasked
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.out.gff #gff annotation file of hardmasked repeats in the assembly
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.out #list of hardmasked repeats in the assembly
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked #assembly fasta with TEs hardmasked (Ns)
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.cat.gz #list of repeat family consensus sequences used by repeatmasker for masking

File: SwissProt_results.zip

Description: BLASTP hits of the predicted proteins against Swiss-Prot

Irbis_annotation_swiss-prot_blast.out #BLASTP results

File: InterProScan.zip

Description: Results of functional annotation with InterProScan

predicted_proteins_nostop.fasta.tsv #interproscan output file with annotation of GO-terms, motifs etc. of the proteins predicted by GeMoMa
functional_annotation_results.txt #results summary generated by custom stripts

File: hardmasked_TEs_softmasked_SR.zip

Description: RepeatMasker output, softmasking simple repeats (following the hardmasking of TEs)

Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.tbl #summary table of repeatcontent of the assembly by repeat class that were softmasked

Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.out.gff #gff annotation file of softmasked repeats in the assembly
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.out #list of softmasked repeats in the assembly
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.masked #assembly fasta with TEs hardmasked (Ns) and simple repeats softmasked (lowercase)
Irbis_yahs_gapclosed.scaff_seqs_updated.fasta.masked.cat.gz #list of repeat family consensus sequences used by repeatmasker for masking