Skip to main content

Evolution of Rosaceae chloroplast genomes highlights unique Cerasus diversification and independent origins of fruit cherry

Cite this dataset

Zhang, Jing et al. (2021). Evolution of Rosaceae chloroplast genomes highlights unique Cerasus diversification and independent origins of fruit cherry [Dataset]. Dryad.


Rosaceae plants comprise numerous fruit crops with huge economic values. The lack of genomic characteristics has largely blocked our understanding about the Rosaceae gene and plastome evolution. Here, we analyzed 121 Rosaceae chloroplast (cp) genomes of 51 taxa from 19 genera, predominantly including the Cerasus plants and their relatives. To our knowledge, we generated the first comprehensive map of genomic variation across Rosaceae plastomes. Protein-coding genes of Rosaceae plastomes were characterized with high proportion (over 50%) of synonymous variants and InDels with multiple triplets. Four photosynthesis-related genes were under Darwin selection, which are unique in woody fruit trees of Rosaceae. We detected considerable variations in genome size among Rosaceae plastomes and observed trivial and obvious structural variation in the examined cp genomes of tribes Pyrodae and Amygdaleae. Phylogenomic analyses and molecular dating highlighted the independent evolution of true cherry, dwarf cherry and relatives. Our findings strongly support to taxonomically treat the monophyletic true cherry group as a separate genus excluding dwarf cherry. High levels of genomic differentiation and distinct phylogenetic relationships implied independent origins and domestication between fruit cherries, particularly between cultivated Cerasus psuedocerasus and Cerasus avium. We further proposed an evolutionary model to elucidate multiple genomic introgression events among true cherries occurring since ~15 Mya. Well-resolved maternal phylogeny suggested that the cultivated C. pseudocerasus might be originated from Longmenshan Fault zone, the eastern edge of Himalaya-Hengduan Mountains, where they have subjected to frequent genomic introgression between its presumed wild ancestors and other close relatives. In conclusion, comparative analyses of plastomes and chloroplast genes detected diverse evolutionary behaviors and divergent adaptive selection in Rosaceae. We provide robust evidences for the independent origins and domestication of fruit cherries.


126 chloroplast genomes were preliminary selected, among which 91 were newly assembled and 35 were downloaded from the NCBI database. Our samples covered 19 genera of subfamilies Spiraeoideae (Tribe Amygdaleae, Exochordeae, Spiraeeae and Pyrodae) and Rosoideae of family Rosaceae. Three species Morus mongolica (Moraceae), Ziziphus jujuba (Rhamnaceae) and Elaeagnus macrophylla (Elaeagnaceae) were used as outgroups. Based on our previous studies, 91 Cerasus representative accessions were selected for whole-genome re-sequencing, consisting of 34 C. pseudocerasus accessions, 6 accessions referring to 4 European cherry species (C. avium, C. fruticosa, C. vulgaris and C. mahaleb), 5 dwarf cherry accessions (C. glandulosa, C. tomentosa and C. tianshanica) and 46 accessions covering 20 other Cerasus taxa. Cerasus vulgaris was supposed to be a natural hybrid between C. fruticosa and C. avium and confirmed by cytogenetic studies. 34 C. pseudocerasus accessions included 11 landraces and 23 wild individuals from diverse genotypes, phenotypes and geographical distributions. We also obtained the genomic pair-end reads of Prunus cerasifera (SRR4036106) from GenBank database to assemble its chloroplast genome. Laurocerasus undulata and Cerasus glandulosa were excluded due to the incomplete genome sequence and unreasonable branch length in phylogenetic tree (data not shown), hence a total of 124 chloroplast genomes were used for the subsequent analyses.

I. We constructed the WCGD (whole chloroplast genome dataset, n=124) and PCGD (Cerasus and its relatives chloroplast genome dataset, n=107) datasets using complete chloroplast genome sequences in this study. For both datasets, we removed all missing data (N) and the long insertion (> 50 bp) sequences that only detected in one individual.

II. We kept large single-copy (LSC), short single-copy (SSC) and one conserved inverted repeat (IR) sequences to construct the WOID (whole one inverted-repeat dataset, n=124) and POID (Cerasus and its relatives one inverted-repeat dataset, n=107) datasets. All missing data were removed in both datasets.

III. Based on the WCGD and PCGD datasets, the VSWD (Variant sites of whole chloroplast genome, n=124) and VSPD (Variant sites of Cerasus and its relatives chloroplast genome, n=107) datasets were constructed with custom bash script, both of which contained no InDels.

IV. WGSD (whole gene sequence dataset, n=124) and PGSD (Cerasus and its relatives gene sequence dataset, n=107) datasets were constructed with 102 unique genes.

V. PCWD (Protein-coding sequence of whole chloroplast genome, n=124) and PCPD (Protein-coding sequence of Cerasus and its relatives chloroplast genome, n=107) were constructed with the exons of 72 unique protein-coding genes.

VI. Based on WCGD and PCGD datasets, we removed all ambiguous sites to construct the PWGD (Pruned whole chloroplast genome dataset, n=124) and PPGD (Pruned Cerasus and its relatives chloroplast genome dataset, n=107) using the GBLOCKS v.91b with the following parameters: minimum sequences per conserved position, 65; minimum sequences per flank position, 110 (PWGD) / 100 (PPGD); maximum number of contiguous non-conserved positions, 8; minimum block length, 10; allowed gap positions, none.


National Natural Science Foundation of China, Award: 31672114

Sichuan Province Science and Technology Support Program, Award: 2019JDTD0010