Three haplotype-resolved pentaploid Rosa assemblies with assembled and extracted single copy orthologue (SCO) sequences from Rosa canina genome, diploid Rosa species, and sect. Caninae pollen
Data files
Jul 17, 2024 version files 171.54 MB
-
2xphyl.align7.fasta
35.75 MB
-
C1_pollen_canina.ref.fasta
2.34 MB
-
CAN2pollen.ref.fasta
2.30 MB
-
CRW_pollen_canina.ref.fasta
2.34 MB
-
Fragaria_iinumae.fasta
822.13 KB
-
Fragaria_nilgerrensis.fasta
821.88 KB
-
Fvesca.fasta
828.45 KB
-
LA1_pollen_rubiginosa.ref.fasta
2.37 MB
-
Leuba2_pollen_sherardii.ref.fasta
2.38 MB
-
loci.list
46.81 KB
-
per1_persica.ref.fasta
2.31 MB
-
pollen_leaf_vcfs.zip
31.97 MB
-
R4.subgenome.fasta
1.48 MB
-
R5.subgenome.fasta
1.48 MB
-
README.md
2.37 KB
-
Rub_pollen.ref.fasta
2.30 MB
-
Rubus_idaeus.fasta
811.48 KB
-
ruW1_pollen_rubiginosa.ref.fasta
2.37 MB
-
S1.subgenome.fasta
1.48 MB
-
S13_pollen_corymbifera.ref.fasta
2.37 MB
-
S2_pollen_corymbifera.ref.fasta
2.37 MB
-
S2.subgenome.fasta
1.48 MB
-
s3.subgenome.fasta
1.49 MB
-
SA1_ecae.ref.fasta
2.29 MB
-
sa10_xanthina.ref.fasta
2.32 MB
-
SA106_carolina.ref.fasta
2.33 MB
-
SA11_omeiensis.ref.fasta
2.33 MB
-
SA114_multiflora.ref.fasta
2.34 MB
-
sa115_filipes.ref.fasta
2.32 MB
-
SA119_maximowicziana.ref.fasta
2.36 MB
-
SA12_majalis.ref.fasta
2.32 MB
-
SA120_sempervirens.ref.fasta
2.35 MB
-
SA122_brunonii.ref.fasta
2.33 MB
-
SA13_arvensis.ref.fasta
2.32 MB
-
sa14_filipes.ref.fasta
2.32 MB
-
SA2_stellata.ref.fasta
2.32 MB
-
SA20_multiflora.ref.fasta
2.33 MB
-
SA22_setigera.ref.fasta
2.32 MB
-
SA27_cf_longicuspis.ref.fasta
2.36 MB
-
SA32_rugosa.ref.fasta
2.31 MB
-
SA4_palustris.ref.fasta
2.32 MB
-
SA40_multiflora.ref.fasta
2.33 MB
-
SA47_cf_caudata.ref.fasta
2.33 MB
-
SA70_davurica.ref.fasta
2.32 MB
-
SA73_chinensis.ref.fasta
2.33 MB
-
SA82_xanthina.ref.fasta
2.32 MB
-
SA85_moschata.ref.fasta
2.32 MB
-
SA86_luciae.ref.fasta
2.31 MB
-
sa88_blanda.ref.fasta
2.31 MB
-
SA89_transmorrisonensis.ref.fasta
2.38 MB
-
sa98_luciae.ref.fasta
2.31 MB
-
sample.tab
5.25 KB
-
Supplementary_Dataset_2.fasta
2.15 MB
-
WZ19_majalis.ref.fasta
2.32 MB
May 05, 2025 version files 2.30 GB
-
2xphyl.align7.fasta
35.75 MB
-
C1_pollen_canina.ref.fasta
2.34 MB
-
CAN2pollen.ref.fasta
2.30 MB
-
CRW_pollen_canina.ref.fasta
2.34 MB
-
Fragaria_iinumae.fasta
822.13 KB
-
Fragaria_nilgerrensis.fasta
821.88 KB
-
Fvesca.fasta
828.45 KB
-
LA1_pollen_rubiginosa.ref.fasta
2.37 MB
-
Leuba2_pollen_sherardii.ref.fasta
2.38 MB
-
loci.list
46.81 KB
-
per1_persica.ref.fasta
2.31 MB
-
pollen_leaf_vcfs.zip
31.97 MB
-
R4.subgenome.fasta
1.48 MB
-
R5.subgenome.fasta
1.48 MB
-
README.md
4.27 KB
-
rosArg_DToL_5n_chrsOnly.fasta.gz
700.96 MB
-
rosCan_DTOL_5n_chrsOnly.fasta.gz
717.13 MB
-
rosCan_S27_v1.fasta.gz
708.16 MB
-
Rub_pollen.ref.fasta
2.30 MB
-
Rubus_idaeus.fasta
811.48 KB
-
ruW1_pollen_rubiginosa.ref.fasta
2.37 MB
-
S1.subgenome.fasta
1.48 MB
-
S13_pollen_corymbifera.ref.fasta
2.37 MB
-
S2_pollen_corymbifera.ref.fasta
2.37 MB
-
S2.subgenome.fasta
1.48 MB
-
s3.subgenome.fasta
1.49 MB
-
SA1_ecae.ref.fasta
2.29 MB
-
sa10_xanthina.ref.fasta
2.32 MB
-
SA106_carolina.ref.fasta
2.33 MB
-
SA11_omeiensis.ref.fasta
2.33 MB
-
SA114_multiflora.ref.fasta
2.34 MB
-
sa115_filipes.ref.fasta
2.32 MB
-
SA119_maximowicziana.ref.fasta
2.36 MB
-
SA12_majalis.ref.fasta
2.32 MB
-
SA120_sempervirens.ref.fasta
2.35 MB
-
SA122_brunonii.ref.fasta
2.33 MB
-
SA13_arvensis.ref.fasta
2.32 MB
-
sa14_filipes.ref.fasta
2.32 MB
-
SA2_stellata.ref.fasta
2.32 MB
-
SA20_multiflora.ref.fasta
2.33 MB
-
SA22_setigera.ref.fasta
2.32 MB
-
SA27_cf_longicuspis.ref.fasta
2.36 MB
-
SA32_rugosa.ref.fasta
2.31 MB
-
SA4_palustris.ref.fasta
2.32 MB
-
SA40_multiflora.ref.fasta
2.33 MB
-
SA47_cf_caudata.ref.fasta
2.33 MB
-
SA70_davurica.ref.fasta
2.32 MB
-
SA73_chinensis.ref.fasta
2.33 MB
-
SA82_xanthina.ref.fasta
2.32 MB
-
SA85_moschata.ref.fasta
2.32 MB
-
SA86_luciae.ref.fasta
2.31 MB
-
sa88_blanda.ref.fasta
2.31 MB
-
SA89_transmorrisonensis.ref.fasta
2.38 MB
-
sa98_luciae.ref.fasta
2.31 MB
-
sample.tab
5.25 KB
-
Supplementary_Dataset_2.fasta
2.15 MB
-
WZ19_majalis.ref.fasta
2.32 MB
Jun 03, 2025 version files 2.30 GB
-
2xphyl.align7.fasta
35.75 MB
-
C1_pollen_canina.ref.fasta
2.34 MB
-
CAN2pollen.ref.fasta
2.30 MB
-
CRW_pollen_canina.ref.fasta
2.34 MB
-
Fragaria_iinumae.fasta
822.13 KB
-
Fragaria_nilgerrensis.fasta
821.88 KB
-
Fvesca.fasta
828.45 KB
-
LA1_pollen_rubiginosa.ref.fasta
2.37 MB
-
Leuba2_pollen_sherardii.ref.fasta
2.38 MB
-
loci.list
46.81 KB
-
per1_persica.ref.fasta
2.31 MB
-
pollen_leaf_vcfs.zip
31.97 MB
-
R4.subgenome.fasta
1.48 MB
-
R5.subgenome.fasta
1.48 MB
-
README.md
4.41 KB
-
rosArg_DToL_5n_chrsOnly.fasta.gz
700.96 MB
-
rosCan_DTOL_5n_chrsOnly.fasta.gz
717.13 MB
-
rosCan_S27_v1.fasta.gz
708.16 MB
-
Rub_pollen.ref.fasta
2.30 MB
-
Rubus_idaeus.fasta
811.48 KB
-
ruW1_pollen_rubiginosa.ref.fasta
2.37 MB
-
S1.subgenome.fasta
1.48 MB
-
S13_pollen_corymbifera.ref.fasta
2.37 MB
-
S2_pollen_corymbifera.ref.fasta
2.37 MB
-
S2.subgenome.fasta
1.48 MB
-
s3.subgenome.fasta
1.49 MB
-
SA1_ecae.ref.fasta
2.29 MB
-
sa10_xanthina.ref.fasta
2.32 MB
-
SA106_carolina.ref.fasta
2.33 MB
-
SA11_omeiensis.ref.fasta
2.33 MB
-
SA114_multiflora.ref.fasta
2.34 MB
-
sa115_filipes.ref.fasta
2.32 MB
-
SA119_maximowicziana.ref.fasta
2.36 MB
-
SA12_majalis.ref.fasta
2.32 MB
-
SA120_sempervirens.ref.fasta
2.35 MB
-
SA122_brunonii.ref.fasta
2.33 MB
-
SA13_arvensis.ref.fasta
2.32 MB
-
sa14_filipes.ref.fasta
2.32 MB
-
SA2_stellata.ref.fasta
2.32 MB
-
SA20_multiflora.ref.fasta
2.33 MB
-
SA22_setigera.ref.fasta
2.32 MB
-
SA27_cf_longicuspis.ref.fasta
2.36 MB
-
SA32_rugosa.ref.fasta
2.31 MB
-
SA4_palustris.ref.fasta
2.32 MB
-
SA40_multiflora.ref.fasta
2.33 MB
-
SA47_cf_caudata.ref.fasta
2.33 MB
-
SA70_davurica.ref.fasta
2.32 MB
-
SA73_chinensis.ref.fasta
2.33 MB
-
SA82_xanthina.ref.fasta
2.32 MB
-
SA85_moschata.ref.fasta
2.32 MB
-
SA86_luciae.ref.fasta
2.31 MB
-
sa88_blanda.ref.fasta
2.31 MB
-
SA89_transmorrisonensis.ref.fasta
2.38 MB
-
sa98_luciae.ref.fasta
2.31 MB
-
sample.tab
5.25 KB
-
Supplementary_Dataset_12.fasta
2.15 MB
-
WZ19_majalis.ref.fasta
2.32 MB
Abstract
This dataset was created to analyze nuclear single copy regions in Rosa canina genome assembly based on single copy orthologue (SCO) tags (Debray et al. 2019). We used 24 diploid rose species, pollen data from sekt. Caninae roses, Rosa canina subgenome specific SCO sequences, and outgroup species from the Rosaceae family. Target capturing baits were designed by Agilent Technologies covered exons, UTRs, and small introns.
The raw reads of target capturer sequencing were processed with GATK to obtain a sample specific reference for each SCO locus and each sample. Additionally, single copy loci for all whole genome assemblies (Rosa canina, Rubus idaeus, and Fragaria species) were extracted and concatenated into subgenome-specific sequences and together with sample specific SCO references used for alignment analysis.
The dataset also includes haplotype-resolved genome assemblies of Rosa canina S27, Rosa canina (DToL), and Rosa agrestis (DToL). All three Rosa samples are pentaploid (2n=5x=35), and all three assemblies are chromosome-level. Note that the assemblies only contain pseudochromosomal sequences, no unplaced contigs. The PacBio HiFi and Hi-C data of Rosa canina S27 were sequenced in-house and can be downloaded by NCBI BioProject: PRJNA1111045, while the sequencing data of Rosa canina (DToL) and Rosa agrestis (DToL) are from Darwin Tree of Life (DToL). The NCBI BioProject accessions for the two DToL Rosa data are PRJEB79802 and PRJEB79880, respectively. The chromosomes of Rosa canina are named as "Rca#_Subgenome", in which # denotes the chromosome number (possible value: 1-7) and 'Subgenome' can only be one of S1_h1, S1_h2, S2, R3, and R4. Similarly, the chromosomes of Rosa agrestis are named as "Rag#_Subgenome", where # also denotes the chromosome number and 'Subgenome' can only be one of S1, S2, R3, R4_h1, and R4_h2. Please check our publication if you want to learn about how we resolve the haplotypes.
https://doi.org/10.5061/dryad.cc2fqz6fh
This dataset was created to analyze nuclear single copy regions in Rosa canina genome assembly based on single copy orthologue (SCO) tags (Debray et al. 2019). We used 24 diploid rose species, pollen data from sekt. Caninae roses, Rosa canina subgenome specific SCO sequences and outgroup species from the Rosaceae family. Target capturing baits were designed by Agilent Technologies covered exons, UTRs, and small introns.
The raw reads from target capturer sequencing were processed with GATK to obtain a sample specific reference for each SCO locus and each sample. Additionaly, single copy loci for all whole genome assemblies (Rosa canina, Rubus idaeus and Fragaria species) were extracted and concatenated into subgenome-specific sequences, together with sample specifice SCO refereces used for aqlignment analysis.
Description of the data and file structure
The dataset include a file called sample.tab
containing relevant information about .fasta files and sample source.
Sequence data as .fasta files containing assembled SCO loci.
A file called loci.list
represents all loci which were used for concatenating, alignment, and phylogenetic analysis because of single copy characteristics. Loci names come from Rosa chinensis haploid line genome v.1.0 (Hibrand Saint-Oyant et al., 2018, https://doi.org/10.1038/s41477-018-0166-1). These names describe the gene annotation and the region where SCO loci are located, e.g. RC7G0579500_0_602.
The alignment file is also included as 2xphyl.align7.fasta
.
Additionally, the data set contain variant calling format (.vcf) files within a zip archive (pollen_leaf_vcfs.zip
) for SNP count analysis between Rosa section Caninae somatic leaf tissue and pollen DNA samples. The original target based on Rosa chinensis haploid line genome v.1.0 is also included as Supplementary_Dataset_12.fasta
.
The dataset also includes haplotype-resolved genome assemblies of Rosa canina S27 (rosCan_S27_v1.fasta.gz
), Rosa canina DToL (rosCan_DTOL_5n_chrsOnly.fasta.gz
), and Rosa agrestis DToL (rosArg_DTOL_5n_chrsOnly.fasta.gz
). All three Rosa samples are pentaploid (2n=5x=35), and all three assemblies are chromosome-level. Note that the assemblies only contain pseudochromosomal sequences, no unplaced contigs. The PacBio HiFi and Hi-C data of Rosa canina S27 were sequenced in-house and can be downloaded by NCBI BioProject: PRJNA1111045, while the sequencing data of Rosa canina (DToL) and Rosa agrestis (DToL) are from Darwin Tree of Life (DToL). The NCBI BioProject accessions for the two DToL Rosa data are PRJEB79802 and PRJEB79880, respectively. The chromosomes of Rosa canina are named as “Rca#_Subgenome”, in which # denotes the chromosome number (possible value: 1-7) and ‘Subgenome’ can only be one of S1_h1, S1_h2, S2, R3, and R4. Similarly, the chromosomes of Rosa agrestis are named as “Rag#_Subgenome”, where # also denotes the chromosome number and ‘Subgenome’ can only be one of S1, S2, R3, R4_h1, and R4_h2. Please check our publication if you want to learn about how we assembled and resolve the haplotypes.
Sharing/Access information
NCBI BioProject: PRJNA1111045
Code/Software
For any code and bioinformatic information please contact the autor: veit.herklotz@senckenberg.de
Version changes
03.05.2025: The haplotype-resolved genome assemblies of Rosa canina S27 (rosCan_S27_v1.fasta.gz
), Rosa canina DToL (rosCan_DTOL_5n_chrsOnly.fasta.gz
), and Rosa agrestis DToL (rosArg_DTOL_5n_chrsOnly.fasta.gz
) were added to the dataset. Volker Wissemann was added as author, author order was changed and journal was specified. Title was changed indicating the three genome assemblies.
03.06.2025: The file Supplementary_Dataset_2.fasta was renamed into Supplementary_Dataset_12.fasta in order to fit with the article.