Skip to main content

A turn in species conservation for hairpin banksias: Demonstration of oversplitting leads to a better management of diversity

Cite this dataset

Wilson, Trevor et al. (2022). A turn in species conservation for hairpin banksias: Demonstration of oversplitting leads to a better management of diversity [Dataset]. Dryad.


We generated SNP genotype data and chloroplast genomic data to test the current taxonomy and infer a population-scale evolutionary scenario for the Hairpin Banksias (B. collina, B. cunninghamii, B. neoanglica, B. spinulosa and B. vincentia) and outgroups using a sample-set comprehensive in its representation of morphological diversity and a two-and-a-half thousand kilometer distribution. Here, we provide an archive of these SNP genotype and chloroplast sequence alignment data.


Leaf material from silica dried material or herbarium specimens lodged at the National Herbarium NSW were sampled and are identified by the insitutional database number.

For nDNA analysis, DNA was extracted from each sample using the Plant DNA Extraction Protocol for DArT available from the Diversity Arrays Technology Pty Ltd (DArT PL) website (Rossetto et al. 2019). Samples were sent to DArT PL (Canberra, Australia) for the DArT PL genotype by sequencing analysis, according to the documented in-house procedure. All specimens were co-analysed first to create a total dataset. Following this, two additional datasets were created through separate reanalysis of the raw data for two separate subclades identified from network analysis of the total dataset.

For cpDNA analysis,  leaf material was sent to Deakin Genomics Research and Discovery Facility at Deakin University (Geelong, Australia) for DNA extraction, preparation of genomic libraries consisting of paired-end reads (2 x 150 bp) and paired-end sequencing using the Illumina NextSeq 500 platform.

Consistent in silico assembly of chloroplast genomic DNA SNP detection across all Illumina Nextseq libraries were performed to generate a comparable dataset from paired-end libraries. Library-specific chloroplast genome sequences were constructed using de novo assembly of the NGS libraries relevant to each species using Organelle Assembler (; Coissac et al., 2016). With the library-specific chloroplast genomic sequence, a consensus sequence was then created from each of the relevant paired-end libraries using CLC Bio Genomics Workbench 8.0 (CLC; This first involved trimming the raw paired-end reads with the Quality Trimming Tool, then mapping the trimmed reads to the library-specific chloroplast genomic sequence using default settings. The resulting library-specific consensus sequence was remapped with the same quality trimmed reads using more stringent mapping parameters of 0.8 for the similarity and 0.9 for the length fraction to maintain a high-quality library-specific consensus sequence with read coverage greater than 5x. Each sequence was annotated using the gene prediction tool GeSeq (Tillich et al., 2017). The annotations were inspected manually using Geneious by making sure the position of the start and stop codons was correct.


An alignment of relevant library-specific consensus sequences was generated using MAUVE (Darling et al., 2004) in Geneious Pro 9.1.8. The alignment was checked by removing areas of low coverage and where dubious SNPs exist (i.e., SNPs in the first or second codons and repetitive regions).


Coissac, E., P. M. Hollingsworth, S. Lavergne and P. Taberlet. 2016. From barcodes to genomes: Extending the concept of DNA barcoding. Molecular Ecology 25: 1423–1428.

Darling, A. C., B. Mau, F. R. Blattner and N. T. Perna. 2004. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome research 14: 1394–1403. 10.1101/gr.2289704

Tillich, M., P. Lehwark, T. Pellizzer, E. S. Ulbricht-Jones, A. Fischer, R. Bock, and S. Greiner. 2017. GeSeq–versatile and accurate annotation of organelle genomes. Nucleic Acids Research 45: W6–W11.

Usage notes

All datafiles may be opened using text editing software such as wordpad and microsoft word. The resulting reads can be read by the open access DArT proprietary analysis pipeline (KDDart) described at ( and available at GitHub ( The chloroplast dataset is a nexus file format and can be opened by sequence editing programs.