A genome for Bidens hawaiensis: a member of a hexaploid Hawaiian plant adaptive radiation
Data files
Feb 04, 2022 version files 596.08 MB
-
alt.organelle.contigs.n2580.fasta.gz
-
consensi.fa
-
consensi.fa.classified
-
families-classified.stk
-
families.stk
-
primary.organelle.contigs.n54.fasta.gz
-
README.txt
Abstract
The plant genus Bidens (Asteraceae or Compositae; Coreopsidae) is a species-rich and circumglobally distributed taxon. The 19 hexaploid species endemic to the Hawaiian Islands are considered an iconic example of adaptive radiation, of which many are imperiled and of high conservation concern. Until now, no genomic resources were available for this genus, which may serve as a model system for understanding the evolutionary genomics of explosive plant diversification. Here, we present a high-quality reference genome for the Hawai‘i Island endemic species B. hawaiensis A. Gray reconstructed from long-read, high-fidelity sequences generated on a Pacific Biosciences Sequel II System. The haplotype-aware, draft genome assembly consisted of ~6.67 Giga bases (Gb), close to the holoploid genome size estimate of 7.56 Gb (± 0.44 SD) determined by flow cytometry. After removal of alternate haplotigs and contaminant filtering, the consensus haploid reference genome was comprised of 15,904 contigs containing ~3.48 Gb, with a contig N50 value of 422,594. The high interspersed repeat content of the genome, approximately 74%, along with hexaploid status, contributed to assembly fragmentation. Both the haplotype-aware and consensus haploid assemblies recovered >96% of Benchmarking Universal Single-copy Orthologs. Yet, the removal of alternate haplotigs did not substantially reduce the proportion of duplicated benchmarking genes (~79% versus ~68%). This reference genome will support future work on the speciation process during adaptive radiation, including resolving evolutionary relationships, determining the genomic basis of trait evolution, and supporting ongoing conservation efforts.
Methods
Bidens hawaiensis primary and alternate contigs were assembled from long-read, high-fidelity sequences generated on a Pacific Biosciences Sequel II System using two 8M SMRT cells. The organelle contigs (chloroplast and mitochondria) in this repository were identified for each primary and alternate contig dataset. Organelle contigs were identified using a combination of Mitofinder v 1.4, nucmer implemented in the MUMmer package (mummer-4.0.0beta2), and read-depth coverage output from PurgeDups. For mitofinder and nucmer alignments we used references Helianthus annuus (mitochondria, NCBI accession NC_023337.1) and B. hawaiensis (chloroplast, NCBI accession NC_047259).
The Repeat-Masker files were generated using Repeat Modeler v2, RECON v. 1.08, Repeat-Masker version open-4-1-1, and Dfam v3.3 (download date 2020-11-09). Searches were implemented using RMBlastN 2.10.0.
Additional details are available in association with the manuscript.