Chromosome-Level genome assembly and transcriptome analysis of the ural owl, Strix uralensis Pallas, 1771
Data files
Jun 27, 2025 version files 6.31 GB
-
final_annotation.gff_summary
273 B
-
functional_annotation_results.txt
745 B
-
README.md
5.07 KB
-
S3B_Suralensis_transcriptome_ncbi_final3.fasta
348.25 MB
-
S3B_Suralensis_v1.3.3_GeMoMa_all.fun.gff
3.08 GB
-
S3B_Suralensis_v1.3.3_GeMoMa_proteins.fun.fasta
52.20 MB
-
S3B_Suralensis_v1.3.3.fasta
1.27 GB
-
S3B_Suralensis_v1.3.3.fasta.masked.masked.fasta
1.28 GB
-
S3B_Suralensis_v1.3.3.fasta.out
101.53 MB
-
S3B_Suralensis_v1.3.3.fasta.out.gff
59.51 MB
-
S3B_Suralensis_v1.3.3.fasta.tbl
2.46 KB
-
S3B_v1.3.3_GeMoMa_CDS.fun.fasta
111.39 MB
-
Suralensis_assembly_commands.txt
17.98 KB
-
uralowl_masked_db-families.fa
272.89 KB
Abstract
The Ural owl (Strix uralensis) is a large member of the Strigidae family and inhabits Eurasian forests ranging from Germany to Japan. However, it faces increased range reduction, particularly at its southwestern distribution edges. Despite being considered ‘Least Concern’ by the IUCN, local populations have become threatened in Central Europe due to severe habitat loss. Reintroduction programs aim to restore these populations by closing distribution gaps and facilitating natural recolonization of suitable habitats. To support these efforts, genomic resources have become an established tool to assess genetic diversity, geographic structure, and potential inbreeding, crucial for maintaining the genetic health and adaptability of newly established populations. Here, we present a de novo genome assembly and transcriptome of the Ural owl based on ONT long-reads, Omni-C Illumina short-reads, and RNASeq data. The final assembly has a total length of 1.26 Gb, of which 96.42% is anchored into the 42 largest scaffolds. The scaffold and contig N50 values of 88.65 Mb and 21.74 Mb, respectively, a BUSCO/compleasm completeness of 97.5%/99.65% and k-mer completeness of 95.18%, emphasize the high quality of this assembly. Furthermore, annotation of the assembly identified 17,650 genes and a repeat content of 12.48%. This new highly contiguous and chromosome-level assembly will greatly benefit Ural owl conservation management by informing reintroduction programs about the species’ genetic health and contributing a valuable resource to study genetic function in greater detail across the whole Strigidae family.
Dataset DOI: 10.5061/dryad.qjq2bvqs3
Description of the data and file structure
Assembly:
We assembled a high-quality reference genome and transcriptome for the Ural owl (Strix uralensis) from a female bird found dead in the municipality of Kuhmo near Lentiira, Finland.
The assembly was generated from three Oxford Nanopore long-read libraries sequenced on either the MinION Mk1c or PromethION 2Solo. The resulting data was assembled with Flye v. 2.9.3, including one iteration of long-read polishing. Subsequently, the contigs of the polished assembly were anchored into chromosome-scale scaffolds with YaHS v.1.1 using Dovetail Omni-C data generated from the same individual, and prepared following the Arima Hi-C mapping pipeline (https://github.com/VGP/vgp-assembly/blob/master/pipeline/salsa/arima_mapping_pipeline.sh). One iterations of gap-closing was performed with TGS-GapCloser v. 1.1.1 using the initial ONT reads. To further improve the assembly we performed a second round of both scaffolding and gap-closing.
Annotation:
Repeat Annotation
Repeats in the genome assembly were masked in a three-step process. First, we masked known repeats for birds (‘-species aves’) based on the Repbase (release 20181026) (Bao et al., 2015) and Dfam (release 3.1-rb20181026) (Storer et al., 2021) databases with RepeatMasker v.4.1.0 (Smit et al., 2015a). Next, we identified the remaining repeats in the assembly de novo using RepeatModeler v.2.0.1 (Smit et al., 2015b). The resulting de novo repeat library was used in a second iteration of repeat masking to mask the remaining repeats in the assembly.
Gene annotation
Genes in the masked assembly were predicted based on homology with GeMoMa v.1.9 (Keilwagen et al., 2018) using the following eight annotated assemblies as evidence: Chicken (Gallus gallus) GCF_016699485.2, Japanese quail (Coturnix japonica) GCF_001577835.2, Burrowing owl (Athene cunicularia) GCF_003259725.1 (Mueller et al., 2018), Common barn owl (Tyto alba) GCF_018691265.1 (Cumer et al., 2022), Speckled mousebird (Colius striatus) GCF_028858725.1, California Condor (Gymnogypus californianus) GCF_018139145.2 (Robinson et al., 2021), Red-fronted tinkerbird (Pogoniulus pusillus) GCF_015220805.1, Downy woodpecker (Dryobates pubescens) GCF_014839835.1. In addition, the corrected and trimmed RNAseq data of the five different tissues were mapped against the masked reference with STAR v.2.7.9a (Dobin et al., 2013) and used as extrinsic evidence during the annotation.
We functionally annotated the predicted proteins using InterProScan v.5.64.96 and a BLASTP v.2.15.0 search against the Swiss-Prot database (release 2024-04).
For more details on assembly quality assessment please read the original manuscript, and the included list of commands in this dataset.
Files and variables
File: S3B_Suralensis_v1.3.3.fasta.tbl
Description: RepeatMasker repeat table for the fully hard-masked assembly
File: S3B_Suralensis_v1.3.3.fasta.out.gff
Description: RepeatMasker repeat annotation in gff for the fully hard-masked assembly
File: S3B_Suralensis_v1.3.3.fasta.out
Description: RepeatMasker output for the fully hard-masked assembly
File: S3B_Suralensis_v1.3.3.fasta
Description: final Strix uralensis assembly (same as available under NCBI BioProject PRJNA1140424)
File: final_annotation.gff_summary
Description: Summary of GeMoMa gene prediction results
File: S3B_Suralensis_v1.3.3_GeMoMa_proteins.fun.fasta
Description: protein sequences of predicted proteins with functional annotation results in fasta format
File: S3B_v1.3.3_GeMoMa_CDS.fun.fasta
Description: Coding sequences of predicted and functionally annotated proteins
File: S3B_Suralensis_v1.3.3_GeMoMa_all.fun.gff
Description: GeMoMa annotation with functional annotation by InterProScan and Swiss-Prot in gff format
File: S3B_Suralensis_transcriptome_ncbi_final3.fasta
Description: final transcriptome assembly (same as available under NCBI BioProject PRJNA1140424)
File: Suralensis_assembly_commands.txt
Description: detailed list of bioinformatics commands used to generate the assemblies and accompanying analyses
File: uralowl_masked_db-families.fa
Description: de novo repeat library generated by RepeatModeler
File: functional_annotation_results.txt
Description: Summary of the functional annotation results
File: S3B_Suralensis_v1.3.3.fasta.masked.masked.fasta
Description: genome assembly with hard-masked repeats
Access information
Other publicly accessible locations of the data:
- The underlying raw data and the two assemblies (genome and transcriptome) can also be found under NCBI GenBank BioProject PRJNA1140424