Data from: Chromosome-scale genome assemblies of two allopolyploid Cuscuta species uncover genomic signatures of parasitic lifestyle and polyploid evolution
Data files
Feb 06, 2026 version files 205.38 MB
-
Calystegia_soldanella_braker3.tar.gz
24.70 MB
-
Cuscuta_pentagona_trinity.tar.gz
41.20 MB
-
divergence_time.tar.gz
6.48 MB
-
ds_analysis.tar.gz
59.88 MB
-
README.md
5.16 KB
-
repeat_data.tar.gz
73.12 MB
Abstract
Dodders (Cuscuta spp.) are obligate parasitic plants that have lost a large portion of photosynthetic genes but gained host genes through parasitism-mediated horizontal gene transfer. Their migration across the world has contributed to the complexity of speciation via geographic isolation. Here, we report the de novo genome assemblies of two phylogenetically distinct dodders: Cuscuta campestris (2n = 4x = 60) and Cuscuta chinensis (2n = 4x = 60), which are classified into Clade B and Clade H of subgenus Grammica, respectively. Relatively low completeness of Benchmarking Universal Single-Copy Orthologs genes (ca. 87%) indicated progressive gene loss after evolution of the parasitic lifestyle due to release from functional constraints such as photosynthesis and organ development. Comparative genomics analyses revealed that the genome size of each species differs significantly, despite having the same cytotype and allopolyploidy, through independent hybridization involving different ancient parents. Various genomic rearrangements have likely contributed to the genomic diversity and sexual isolation of the two lineages, which partly share habitats, including (1) gene gain and loss events, (2) homoeologous recombination between two subgenomes, and (3) lineage-specific proliferation of transposable elements. Our findings not only provide a genomic basis for surveying parental species for allopolyploidization but also enhance understanding of the unique speciation of parasitic dodders through these chromosomal events.
Name: Tenta Segawa
Institution: Research Institute, Suntory Global Innovation Center Ltd.
Email: Tenta_Segawa@suntory.co.jp
Name: Eiichiro Ono
Institution: Research Institute, Suntory Global Innovation Center Ltd.
Email: Eiichiro_Ono@suntory.co.jp
Dataset Overview
This dataset comprises the raw and key intermediate data supporting our study on genome construction for two allopolyploid Cuscuta genomes.
- Repeat data for the Cuscuta genomes
- Divergence time estimation inputs
- Gene prediction results for Calystegia soldanella
- ds calculations
- Trinity-based transcriptome assembly results for Cuscuta pentagona
Files and Folders
Repeat data for the Cuscuta genomes
The files are contained in repeat_data.tar.gz Details are provided below.
Cuscuta_campestris_Kyotango.TE-families.fa
RepeatModeler results for Cuscuta campestris.
Consensus sequences of TE families are provided in FASTA format.
Cuscuta_campestris_Kyotango.TE.v1.gff
RepeatMasker results for Cuscuta campestris.
Genomic locations of TEs are provided in GFF format.
Cuscuta_chinensis_Kaifu.TE-families.fa
RepeatModeler results for Cuscuta chinensis.
Consensus sequences of TE families are provided in FASTA format.
Cuscuta_chinensis_Kaifu.TE.v1.gff
RepeatMasker results for Cuscuta chinensis.
Genomic locations of TEs are provided in GFF format.
Divergence time estimation inputs
The files are contained in divergence_time.tar.gz Details are provided below.
Orthogroups.txt
Orthofinder results.
A table listing each OrthoID and the gene IDs belonging to that group.
alignment folder
This folder has the following structure.
alignment---OrthoID---prot.fa
| |-aligned_prot.fa
| |-aligned_cds.fa
:
Inside the alignment folder, there are subfolders named after the OrthoIDs listed in Orthogroups.txt.
prot.fa contains the amino acid sequences from multiple species corresponding to that OrthoID.
aligned_prot.fa contains the multiple sequence alignment of prot.fa produced with MAFFT.
aligned_cds.fa contains the codon-aware alignment obtained by converting aligned_prot.fa with pal2nal.
4Dtv.fa
The concatenated nucleotides from 4Dtv sites are provided in FASTA format.
Untitled.xml
File containing the prior settings for divergence time estimation created with BEAUti.
Gene prediction results for Calystegia soldanella
The files are contained in Calystegia_soldanella_braker3.tar.gz Details are provided below.
Csol_braker.gtf
Gene prediction results generated by BRAKER3 in GTF format.
Csol_cds.fa
Predicted CDS sequences generated by BRAKER3, provided in FASTA format.
Csol_prot.fa
Predicted amino acid sequences generated by BRAKER3, provided in FASTA format.
Gene prediction results for ds calculations
The files are contained in ds_analysis.tar.gz Details are provided below.
yn00.ctl
yn00 control (.ctl) file used for the ds calculations.
ds_Ccampestris_Caustralis folder
Raw data used for ds calculations between C. campestris and C. australis.
This folder has the following structure.
ds_Ccampestris_Caustralis---blast.txt
|-calc_dS---gene_id_vs_geneid---aligned_cds.fa
| |-aligned_prot.fa
:
Inside the ds_Ccampestris_Caustralis folder, the BLAST results linking C. campestris and C. australis genes are provided in blast.txt.
The calc_dS subfolder contains per-ortholog subdirectories named with the best-hit gene IDs from the BLAST search.
aligned_prot.fa stores the MAFFT multiple alignment of the corresponding amino acid sequences.
aligned_cds.fa stores the codon-aware alignment obtained by converting aligned_prot.fa with pal2nal.
ds_Ccampestris_Cpentagona folder
Raw data used for ds calculations between C. campestris and C. pentagona. The folder structure and file contents are the same as in ds_Ccampestris_Caustralis.
ds_Cchinensis_HI_HII folder
Raw data used for ds calculations between the HI and HII subgenomes of C. chinensis. The folder structure and file contents are the same as in ds_Ccampestris_Caustralis.
Trinity-based transcriptome assembly results for Cuscuta pentagona
The files are contained in Cuscuta_pentagona_trinity.tar.gz Details are provided below.
trinity_out.Trinity.fasta
Trinity-assembled transcriptome of C. pentagona, provided in FASTA format.
trinity_out.Trinity.fasta.transdecoder.cds
Predicted CDS sequences from TransDecoder run on trinity_out.Trinity.fasta, provided in FASTA format.
trinity_out.Trinity.fasta.transdecoder.pep
Predicted amino acid sequences from TransDecoder run on trinity_out.Trinity.fasta, provided in FASTA format.
