Skip to main content
Dryad

Dryococelus australis genome assembly supplementary data

Cite this dataset

Stuart, Oliver Patrick; Cleave, Rohan; Magrath, Michael; Mikheyev, Alexander (2023). Dryococelus australis genome assembly supplementary data [Dataset]. Dryad. https://doi.org/10.5061/dryad.905qfttqc

Abstract

We present a chromosome-scale genome assembly for the critically endangered Lord Howe Island stick insect Dryococelus australis. Contained in this repository are the original unfiltered annotation .gff file, the .fasta file of repeat families identified by RepeatScout, and the .gff file of repetitive elements throughout the genome assembly.

Methods

Repeat families were identified de novo and classified using the software package RepeatModeler v2.0.1. We further filtered repeats by a BLAST search to the `nr` database and removed any unclassified family whose best hit was a known protein not originating from a transposable element, including mitochondrial proteins. These repeat families were then annotated in the genome assembly using RepeatMasker v2.0.1. Coding sequences from Clitarchus hookeri, Medauroidea extradentata, and Timema cristinae were used to train the ab initio model for D. australis using both AUGUSTUS software v2.5.5 and SNAP version 2006-07-28. We extracted RNA from tissue samples of both sexes of the ventral ganglion, leg muscle, gut lumen, Malpighian tubules, and gonads with RNEasy spin-column kits (Qiagen); libraries were prepared and sequenced at BGI Hong Kong on a BGISEQ-500 instrument in paired-end mode with 150 bp reads. Reads were mapped onto the assembly using STAR v2.7 and intron hints generated with the bam2hints tools within the AUGUSTUS software. SNAP and AUGUSTUS (with intron-exon boundary hints provided from RNA-Seq) were then used to predict for genes in the repeat-masked assembly. Only gene models that were predicted by both SNAP and AUGUSTUS were retained. Genes were further characterised for their putative function by performing a BLAST search of the peptide sequences against a set of protein sequences from UniProt. All raw sequence data have been deposited into NCBI's SRA database under BioProject PRJNA930028.

Funding

Dovetail Genomics

Australian Research Council, Award: LP210200654

Equity Trustees

Zoos Victoria