Skip to main content
Dryad

Three haplotype-resolved pentaploid Rosa assemblies with assembled and extracted single copy orthologue (SCO) sequences from Rosa canina genome, diploid Rosa species, and sect. Caninae pollen

Data files

Jul 17, 2024 version files 171.54 MB
May 05, 2025 version files 2.30 GB
Jun 03, 2025 version files 2.30 GB

Abstract

This dataset was created to analyze nuclear single copy regions in Rosa canina genome assembly based on single copy orthologue (SCO) tags (Debray et al. 2019). We used 24 diploid rose species, pollen data from sekt. Caninae roses, Rosa canina subgenome specific SCO sequences, and outgroup species from the Rosaceae family. Target capturing baits were designed by Agilent Technologies covered exons, UTRs, and small introns.

The raw reads of target capturer sequencing were processed with GATK to obtain a sample specific reference for each SCO locus and each sample. Additionally, single copy loci for all whole genome assemblies (Rosa caninaRubus idaeus, and Fragaria species) were extracted and concatenated into subgenome-specific sequences and together with sample specific SCO references used for alignment analysis.

The dataset also includes haplotype-resolved genome assemblies of Rosa canina S27, Rosa canina (DToL), and Rosa agrestis (DToL). All three Rosa samples are pentaploid (2n=5x=35), and all three assemblies are chromosome-level. Note that the assemblies only contain pseudochromosomal sequences, no unplaced contigs. The PacBio HiFi and Hi-C data of Rosa canina S27 were sequenced in-house and can be downloaded by NCBI BioProject: PRJNA1111045, while the sequencing data of Rosa canina (DToL) and Rosa agrestis (DToL) are from Darwin Tree of Life (DToL). The NCBI BioProject accessions for the two DToL Rosa data are PRJEB79802 and PRJEB79880, respectively. The chromosomes of Rosa canina are named as "Rca#_Subgenome", in which # denotes the chromosome number (possible value: 1-7) and 'Subgenome' can only be one of S1_h1, S1_h2, S2, R3, and R4. Similarly, the chromosomes of Rosa agrestis are named as "Rag#_Subgenome", where # also denotes the chromosome number and 'Subgenome' can only be one of S1, S2, R3, R4_h1, and R4_h2. Please check our publication if you want to learn about how we resolve the haplotypes.