Data from: A prelude to conservation genomics: First chromosome-level genome assembly of a flying squirrel (Pteromyini: Pteromys volans)
Data files
Aug 13, 2025 version files 1.47 GB
-
Pteromys_volans_PV54_diamond_blastp_annotation_swiss-prot.out.gz
1.42 MB
-
Pteromys_volans_PV54_final_annotation.gff.gz
8.11 MB
-
Pteromys_volans_PV54_functional_annotated_predicted_cds.fasta.gz
100.37 MB
-
Pteromys_volans_PV54_functional_annotated_predicted_proteins.fasta.gz
8.02 MB
-
Pteromys_volans_PV54_hardmasked_genome_assembly.fasta.gz
587.54 MB
-
Pteromys_volans_PV54_interproscan_output.tsv.gz
655.93 MB
-
Pteromys_volans_PV54_interproscan_results.txt.gz
481 B
-
Pteromys_volans_PV54_predicted_cds.fasta.gz
100.05 MB
-
Pteromys_volans_PV54_predicted_proteins.fasta.gz
7.97 MB
-
README.md
8.82 KB
Abstract
The Siberian flying squirrel (Pteromys volans) represents the only European Pteromyini species. Thus, it is biogeographically unique due to its specialised anatomy and biology as a volant rodent. As a result of habitat fragmentation and destruction, Siberian flying squirrels experience severe and ongoing population declines throughout most of their distribution. While considered Least Concern throughout their immense Eurasian distribution, this species is red-listed as Vulnerable and even Critically Endangered in parts of its range. More knowledge about the population structure and overall biology is needed to improve conservation efforts for this umbrella and flagship species of old-growth boreal forests. Here, we present the first chromosome-level genome assembly of any Pteromyini, represented by P. volans (Uoulu_pteVol_1.0). The final assembly has a total length of 2.85 Gbp in 19 chromosome-scale scaffolds with only minor differences in the chromosomal structure compared to other Sciuridae. All chromosome-scale scaffolds show indications for telomeres at both ends; the N50 value and BUSCO, as well as k-mer completeness scores, are high with 157.39 Mbp and 97 – 99 %, respectively, indicating chromosome-level quality of the assembly. Based on whole-genome data from 17 rodent species, P. volans clusters according to known evolutionary relationships. Additionally, we present a new 16,511 bp long mitogenome unveiling differences from known conspecific mitogenomes. We propose the utility of the new reference genome for further research and development of conservation-applied genetic methods.
Dataset DOI: 10.5061/dryad.3xsj3txth
Description of the data and file structure
The presented data is related to the publication by Wehrenberg et al.: "A Prelude to Conservation Genomics: First Chromosome-Level Genome Assembly of a Flying Squirrel (Pteromyini: Pteromys volans)" available as a preprint at biorxiv.org (DOI: https://doi.org/10.1101/2025.03.20.644356).
Any questions regarding this dataset or the publication can be addressed to the corresponding authors, Gerrit Wehrenberg (gerrit.wehrenberg@oulu.fi) and Stefan Prost (stefan.prost@oulu.fi).
Files and variables
File: Pteromys_volans_PV54_hardmasked_genome_assembly.fasta.gz
Description: Hard-masked genome assembly of Pteromys volans volans
File: Pteromys_volans_PV54_interproscan_output.tsv.gz
Description: Functional annotation results tab-separated values (TSV) output file by InterProScan v5.73-104.0
Column description of the T file:
Column# 1: Protein Accession
Column# 2: Sequence
Column# 3: Sequence Length
Column# 4: Analysis
Column# 5: Signature Accession
Column# 6: Signature Description
Column# 7: Start Location
Column# 8: End Location
Column# 9: Score
Column# 10: Status
Column# 11: Date
Column# 12: InterPro Accession
Column# 13: InterPro Description
Column# 14: GO Terms
Column# 15: Pathways
File: Pteromys_volans_PV54_interproscan_results.txt.gz
Description: Functional annotation results TXT output file by InterProScan v5.73-104.0
File: Pteromys_volans_PV54_diamond_blastp_annotation_swiss-prot.out.gz
Description: Functional annotation results output file by diamond (blastp mode) v2.1.6 utilising the database ‘Swiss-Prot’ (release: 04-2024; The UniProt Consortium, 2019)
Column descriptions of the OUT file:
Column# 1: Query ID
Column# 2: Subject ID
Column# 3: % Identity
Column# 4: Alignment length
Column# 5: Mismatches
Column# 6: Gap opens
Column# 7: Query start
Column# 8: Query end
Column# 9: Subject start
Column# 10: Subject end
Column# 11: e-value
Column# 12: Bit score
File: Pteromys_volans_PV54_final_annotation.gff.gz
Description: GFF3 annotation of genes by GeMoMa v.1.9
File: Pteromys_volans_PV54_predicted_cds.fasta.gz
Description: Nucleotide coding sequences (CDS) by GeMoMa v.1.9
File: Pteromys_volans_PV54_predicted_proteins.fasta.gz
Description: Predicted amino acid sequences (proteins) by GeMoMa v.1.9
File: Pteromys_volans_PV54_functional_annotated_predicted_cds.fasta.gz
Description: Functional annotated nucleotide coding sequences (CDS)
File: Pteromys_volans_PV54_functional_annotated_predicted_proteins.fasta.gz
Description: Functional annotated predicted amino acid sequences (proteins)
Code/software
RepeatMasker v4.1.6 (Smit et al., 2023)
GeMoMa v.1.9 (Keilwagen et al., 2016, 2018)
diamond (blastp mode) v2.1.6 (Buchfink et al., 2015)
InterProScan v5.73-104.0 (Quevillon et al., 2005; Jones et al., 2014)
MAKER v3.01.03 (Cantarel et al., 2008)
database: ‘Swiss-Prot’ database (release: 04-2024; The UniProt Consortium, 2019)
Access information
The full chromosome-level genome assembly, the mitochondrial genome assembly and the assigned biosample information are available in BioProject PRJNA1141127.
All files provided here were gzipped.
Repeat Annotation:
Hard-masking was executed with RepeatMasker v4.1.6 (Smit et al., 2023).
Gene annotation:
Genes in the masked assembly were predicted based on homology with Gene Model Mapper GeMoMa v.1.9 (Keilwagen et al., 2016, 2018) using the following publicly available assemblies at NCBI GenBank:* human (Homo sapiens Linnaeus, 1758; GCA_000001405.29), *house mouse (Mus musculus Linnaeus, 1758; GCA_000001635.9), eastern grey squirrel (Sciurus carolinensis Gmelin, 1788; GCA_902686445.2 (Mead et al., 2020a)), thirteen-lined ground squirrel (Ictidomys tridecemlineatus (Mitchill, 1821); GCA_016881025.1 (Fu et al., 2021)), Arctic ground squirrel (Urocitellus parryii Richardson, 1825; GCA_003426925.1), Alpine marmot (Marmota marmota marmota (Linnaeus, 1758); GCA_001458135.2), and groundhog or woodchuck (*Marmota monax *(Linnaeus, 1758); GCA_021218885.2 (Clarke and Bader, 2024)). The busco completeness of the transcript annotation was assessed with BUSCO v5.4.7 utilising the lineage dataset mammalia_odb10.
Functional annotation of the predicted proteins by GeMoMa was conducted by diamond (blastp mode) v2.1.6 (Buchfink et al., 2015) with an e-value significance cutoff of ≤ 10−6 against the ‘Swiss-Prot’ database (release: 04-2024; The UniProt Consortium, 2019). Furthermore, we annotated gene ontology (GO) terms, domains, and motifs using InterProScan v5.73-104.0 (Quevillon et al., 2005; Jones et al., 2014). The gene annotation results were combined utilising modified functional annotation scripts from MAKER v3.01.03 (Cantarel et al., 2008; https://github.com/schellt/maker-functional).
References:
Buchfink, B., Xie, C., Huson, D.H., 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60. https://doi.org/10.1038/nmeth.3176
Cantarel, B.L., Korf, I., Robb, S.M.C., Parra, G., Ross, E., Moore, B., Holt, C., Sánchez Alvarado, A., Yandell, M., 2008. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196. https://doi.org/10.1101/gr.6743907
Clarke, Z., Bader, G., 2024. Healthy woodchuck genome with viral sequences appended used for single-cell RNA-seq analysis. https://doi.org/10.5281/ZENODO.10855128
Fu, R., Gillen, A.E., Grabek, K.R., Riemondy, K.A., Epperson, L.E., Bustamante, C.D., Hesselberth, J.R., Martin, S.L., 2021. Dynamic RNA Regulation in the Brain Underlies Physiological Plasticity in a Hibernating Mammal. Front. Physiol. 11, 624677. https://doi.org/10.3389/fphys.2020.624677
Jones, P., Binns, D., Chang, H.-Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., Pesseat, S., Quinn, A.F., Sangrador-Vegas, A., Scheremetjew, M., Yong, S.-Y., Lopez, R., Hunter, S., 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240. https://doi.org/10.1093/bioinformatics/btu031
Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S.O., Grau, J., 2018. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinformatics 19, 189. https://doi.org/10.1186/s12859-018-2203-5
Keilwagen, J., Wenk, M., Erickson, J.L., Schattat, M.H., Grau, J., Hartung, F., 2016. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res 44, e89–e89. https://doi.org/10.1093/nar/gkw092
Mead, D., Fingland, K., Cripps, R., Portela Miguez, R., Smith, M., Corton, C., Oliver, K., Skelton, J., Betteridge, E., Dolucan, J., Dudchenko, O., Omer, A.D., Weisz, D., Lieberman Aiden, E., Fedrigo, O., Mountcastle, J., Jarvis, E., McCarthy, S.A., Sims, Y., Torrance, J., Tracey, A., Howe, K., Challis, R., Durbin, R., Blaxter, M., 2020b. The genome sequence of the Eurasian red squirrel, Sciurus vulgaris Linnaeus 1758. Wellcome Open Res 5, 18. https://doi.org/10.12688/wellcomeopenres.15679.1
Mead, D., Fingland, K., Cripps, R., Portela Miguez, R., Smith, M., Corton, C., Oliver, K., Skelton, J., Betteridge, E., Doulcan, J., Quail, M.A., McCarthy, S.A., Howe, K., Sims, Y., Torrance, J., Tracey, A., Challis, R., Durbin, R., Blaxter, M., 2020a. The genome sequence of the eastern grey squirrel, Sciurus carolinensis Gmelin, 1788. Wellcome Open Res 5, 27. https://doi.org/10.12688/wellcomeopenres.15721.1
Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., Lopez, R., 2005. InterProScan: protein domains identifier. Nucleic Acids Research 33, W116–W120. https://doi.org/10.1093/nar/gki442
Smit, A.F.A., Hubley, R., Green, P., 2023. RepeatMasker.
The UniProt Consortium, 2019. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research 47, D506–D515. https://doi.org/10.1093/nar/gky1049
