The Rhododendron genome and chromosomal organization provide insight into shared whole-genome duplications across the heath family (Ericaceae)
Data files
Oct 09, 2020 version files 446.97 MB
Abstract
The genus Rhododendron (Ericaceae), which includes horticulturally important plants such as azaleas, is a highly diverse and widely distributed genus of >1,000 species. Here, we report the chromosome-scale de novo assembly and genome annotation of Rhododendron williamsianum as a basis for continued study of this large genus. We created multiple short fragment genomic libraries, which were assembled using ALLPATHS-LG. This was followed by contiguity preserving transposase sequencing (CPT-seq) and fragScaff scaffolding of a large fragment library, which improved the assembly by decreasing the number of scaffolds and increasing scaffold length. Chromosome-scale scaffolding was performed by proximity-guided assembly (LACHESIS) using chromatin conformation capture (Hi-C) data. Chromosome-scale scaffolding was further refined and linkage groups defined by restriction-site associated DNA (RAD) sequencing of the parents and progeny of a genetic cross. The resulting linkage map confirmed the LACHESIS clustering and ordering of scaffolds onto chromosomes and rectified large-scale inversions. Assessments of the R. williamsianum genome assembly and gene annotation estimate them to be 89% and 79% complete, respectively. Predicted coding sequences from genome annotation were used in syntenic analyses and for generating age distributions of synonymous substitutions/site between paralgous gene pairs, which identified whole-genome duplications (WGDs) in R. williamsianum. We then analyzed other publicly available Ericaceae genomes for shared WGDs. Based on our spatial and temporal analyses of paralogous gene pairs, we find evidence for two shared, ancient WGDs in Rhododendron and Vaccinium (cranberry/blueberry) members that predate the Ericaceae family and, in one case, the Ericales order.
Methods
For the genome assembly of R. williamsianum, we created multiple short fragment genomic libraries, which were assembled using ALLPATHS-LG. This was followed by contiguity preserving transposase sequencing (CPT-seq) and fragScaff scaffolding of a large fragment library, which improved the assembly by decreasing the number of scaffolds and increasing scaffold length. Chromosome-scale scaffolding was performed by proximity-guided assembly (LACHESIS) using chromatin conformation capture (Hi-C) data. Chromosome-scale scaffolding was further refined and linkage groups defined by restriction-site associated DNA (RAD) sequencing of the parents and progeny of a genetic cross. The resulting linkage map confirmed the LACHESIS clustering and ordering of scaffolds onto chromosomes and rectified large-scale inversions. We then performed repeat annotation and masking with MAKER v2.31.9 (Cantarel et al. 2008; Holt & Yandell 2011) using a de novo species-specific repeat database and homology-based methods within MAKER, followed by gene annotation done within Maker v2.31.9 (Cantarel et al. 2008; Holt & Yandell 2011).
Usage notes
README
Files deposited in Dryad are below with file descriptions.
Files include whole genome assembly and annotations for Rhododendron williamsianum and pseudochromosomal files used in syntenic analyses in the study below:
Soza VL et al. 2019. The Rhododendron genome and chromosomal organization provide insight into shared whole genome duplications across the heath family (Ericaceae). Genome Biol. Evol. 11:3353–3371. doi: 10.1093/gbe/evz245.
Rwill.assembly.renamed.fixed042418.fasta.gz - entire genome assembly in fasta format for Rhododendron williamsianum.
Rwill10.masked.pseudochromos.fasta.gz - fasta sequences for pseudochromosomes of R. williamsianum. These were generated by stitching ordered scaffolds assigned to linkage groups (LGs) together with 100-N spacers.
Rwill10.pseudochromos.maker.sorted.renamed.blast.function.iprdomains.gff - gff file for gene annotations on R. williamsianum pseudochromosomes.
Rwill10.repeatmasker.runner.match.sorted.gff - repeatmasker and repeatrunner annotations from Maker for the entire genome assembly of R. williamsianum.
Rwill10standard2.maker.proteins.renamed.fasta - predicted protein sequences from Maker for the entire genome assembly of R. williamsianum; gene IDs not sorted.
Rwill10standard2.maker.sorted.renamed.gff - gff file for gene annotations for the entire genome assembly of R. williamsianum.
Rwill10standard2.maker.transcripts.renamed.fasta - predicted transcript sequences from Maker for the entire genome assembly of R. williamsianum; gene IDs not sorted.
V.macrocarpon.allLGs.sorted.offset.gff3.2 - gff file for gene annotations on Vaccinium macrocarpon pseudochromosomes. These annotations were extracted from Cranberry_Gene_Models.gff at http://cyanophora.rutgers.edu/cranberry/ based on LGs identified below.
V.macrocarpon.masked.pseudochromos.fasta.gz - fasta sequences for pseudochromosomes of V. macrocarpon. These were generated from anchored scaffolds within LGs from FileS2.csv in Schlautman et al 2017 and the Cranberry_masked_assembly.fa available at http://cyanophora.rutgers.edu/cranberry/ from Polashock et al. (2014), stitching scaffolds within each LG together with 100-N spacers.
References:
Polashock J, et al. 2014. The American cranberry: first insights into the whole genome of a species adapted to bog habitat. BMC Plant Biol. 14(1):165.
Schlautman B, et al. 2017. Construction of a high-density American cranberry (Vaccinium macrocarpon Ait.) composite map using genotyping-by-sequencing for multi-pedigree linkage mapping. G3 (Bethesda) 7:1177–1189.