Skip to main content
Dryad logo

Genome-structural analyses support an allotetraploid origin of the walnut family from within Myricaceae and shared genome duplications reveal substitution rate variation

Citation

Bai, Weining (2022), Genome-structural analyses support an allotetraploid origin of the walnut family from within Myricaceae and shared genome duplications reveal substitution rate variation, Dryad, Dataset, https://doi.org/10.5061/dryad.sbcc2fr8r

Abstract

In lineages of allopolyploid origin, entire parental subgenomes may coexist, with two or more sets of homoeologous chromosomes that differ in gene content and syntenic structure. Presence or absence of genes, and microsynteny along chromosomal blocks, can be used to differentiate subgenomes and can be coded as phylogenetic data. We assembled chromosome-level genomes of representative species across an ancient allopolyploid lineage, the walnut family (Juglandaceae), with Myrica and other Fagales as outgroups, and used genome-structural data to infer a phylogeny. Microsynteny (with various collinear block sizes) and gene content analyses, using the dominant or recessive progenitor subgenomes or both, all yielded identical topologies that place Engelhardia (a SE Asian and Central American clade) with Platycarya, an enigmatic monospecific taxon endemic in East Asia, but well-represented in the Paleocene-Eocene of North America and Europe. Morphological studies including fossils also found the Platycarya/Engelhardia clade because of leaf architecture, floral morphology, and nut walls without lacunae, but DNA-alignment-based phylogenetics carried out here and in previous studies never detected this uniformly wind-dispersed clade, instead grouping Platycarya with Carya and Juglans. The novel analyses further reveal the family’s hybrid origin from extinct or unsampled progenitors nested within Myricaceae and that Rhoiptelea chiliantha, the Chinese sister species to all other Juglandaceae, contains proportionally more genes related to DNA repair and evolved at a rate 2.6- to 3.5-times slower than the remaining species. Our results have implications for the molecular clock hypothesis and suggest that genomic structure contains so-far undervalued phylogenetic signal.

Methods

This dataset is about the input genome data for microsynteny- and gene-content-based approaches, which infer parental lineages and subgenome relationships in the family.

Funding

National Natural Science Foundation of China