Annotated genome assemblies for Geoscapheus dilatatus, Panesthia cribrata and Neogeoscapheus hanni
Data files
Feb 06, 2024 version files 5.75 GB
-
Geoscapheus-dilatatus_fgenesh-annotation_cds.fa
-
Geoscapheus-dilatatus_fgenesh-annotation_mrna.fa
-
Geoscapheus-dilatatus_fgenesh-annotation_proteins.fa
-
Geoscapheus-dilatatus_fgenesh-annotation.gff3
-
Geoscapheus-dilatatus_genome-assembly.fa
-
Neogeoscapheus-hanni_fgenesh-annotation_cds.fa
-
Neogeoscapheus-hanni_fgenesh-annotation_mrna.fa
-
Neogeoscapheus-hanni_fgenesh-annotation_proteins.fa
-
Neogeoscapheus-hanni_fgenesh-annotation.gff3
-
Neogeoscapheus-hanni_genome-assembly.fa
-
Panesthia-cribrata_fgenesh-annotation_cds.fa
-
Panesthia-cribrata_fgenesh-annotation_mrna.fa
-
Panesthia-cribrata_fgenesh-annotation_proteins.fa
-
Panesthia-cribrata_fgenesh-annotation.gff3
-
Panesthia-cribrata_genome-assembly.fa
-
README.md
Abstract
Genetic changes that enabled the evolution of eusociality have long captivated biologists. More recently, attention has focussed on the consequences of eusociality on genome evolution. Studies have reported higher molecular evolutionary rates in eusocial hymenopteran insects compared with their solitary relatives. To investigate the genomic consequences of eusociality in termites, we analysed nine genomes, including newly sequenced genomes from three non-eusocial cockroaches. Using a phylogenomic approach, we found that termite genomes have experienced lower rates of synonymous substitutions than those of cockroaches, possibly as a result of longer generation times. We identified higher rates of non-synonymous substitutions in termite genomes than in cockroach genomes, and identified pervasive relaxed selection in the former (24–31% of the genes analysed) compared with the latter (2–4%). We infer that this is due to reductions in effective population size, rather than gene-specific effects (e.g. indirect selection of caste-biased genes). We found no obvious signature of increased genetic load in termites, and postulate efficient purging of deleterious alleles at the colony level. Additionally, we identified genomic adaptations that may underpin caste differentiation, such as genes involved in post-translational modifications. Our results provide insights into the evolution of termites and the genomic consequences of eusociality more broadly.
README: De novo assemblies of Blaberidae genomes
https://doi.org/10.5061/dryad.sqv9s4n9t
This repository contains genome assemblies and associated annotations for three Blaberidae species: Geoscapheus dilatatus, Neogeoscapheus hanni and Panesthia cribrata.
The genome were assembled using a combination of linked-read, long-read and Hi-C data. An initial Geoscapheus dilatatus genome assembly was generated using linked-read data, then gaps were filled using low-coverage long-read data, and the assembly was subsequently scaffolded using Hi-C reads.
Description of the data and file structure
Five files are available for each of the three species:
- Full genome assembly (fasta file)
- Geoscapheus dilatatus genome assembly: 'Geoscapheus-dilatatus_genome-assembly.fa'
- Neogeoscapheus hanni genome assembly: 'Neogeoscapheus-hanni_genome-assembly.fa'
- Panesthia cribrata genome assembly: 'Panesthia-cribrata_genome-assembly.fa'
- Genome annotation (gff3)
- Geoscapheus dilatatus genome annotation: 'Geoscapheus-dilatatus_fgenesh-annotation.gff3'
- Neogeoscapheus hanni genome assembly: 'Neogeoscapheus-hanni_fgenesh-annotation.gff3'
- Panesthia cribrata genome assembly: 'Panesthia-cribrata_fgenesh-annotation.gff3'
- Protein sequences derived from the annotation (fasta)
- Geoscapheus dilatatus protein sequences: 'Geoscapheus-dilatatus_fgenesh-annotation_proteins.fa'
- Neogeoscapheus hanni protein sequences: 'Neogeoscapheus-hanni_fgenesh-annotation_proteins.fa'
- Panesthia cribrata protein sequences: 'Panesthia-cribrata_fgenesh-annotation_proteins.fa'
- mRNA sequences derived from the annotation (fasta)
- Geoscapheus dilatatus mRNA sequences: 'Geoscapheus-dilatatus_fgenesh-annotation_mrna.fa'
- Neogeoscapheus hanni mRNA sequences: 'Neogeoscapheus-hanni_fgenesh-annotation_mrna.fa'
- Panesthia cribrata mRNA sequences: 'Panesthia-cribrata_fgenesh-annotation_mrna.fa'
- Coding sequences (CDS) derived from the annotation (fasta)
- Geoscapheus dilatatus CDS sequences: 'Geoscapheus-dilatatus_fgenesh-annotation_cds.fa'
- Neogeoscapheus hanni CDS sequences: 'Neogeoscapheus-hanni_fgenesh-annotation_cds.fa'
- Panesthia cribrata CDS sequences: 'Panesthia-cribrata_fgenesh-annotation_cds.fa'
The sequence names in the genome assembly files were produced by the assembly software used to generate the assemblies. The final assembly step of each genome assembly (in which the final sequence names were produced) were as follows: the Geoscapheus dilatatus genome was scaffolded using the program SALSA2 with Hi-C reads; the Neogeoscapheus hanni genome was produced using the program Supernova with linked-reads; the Panesthia cribrata linked-read de novo assembly (produced using the program Supernova) and long-read de novo assembly (produced using the program IPA) were merged inot a single assembly using the program QuickMerge.
The annotation files were produced using FGENESH++ v7.2.2. The tab-delimited 'General Feature Format' annotation files (i.e. the 'gff3' files) represent a standard file format for genome assemblies; they contains information for every feature in the associated reference genome. The contig/scaffold sequence names in the gff3 files correspond to the sequence names in the associated genome assembly files. The protein, mRNA and CDS fasta files are based on the annotations detailed in the gff3 files.
Sharing/Access information
All sequence data that were utilised to generate these three genomes are available on NCBI under BioProject PRJNA1065107, with BioSample accessions SAMN39450771, SAMN39450772 and SAMN39450773 for Geoscapheus dilatatus, Neogeoscapheus hanni and Panesthia cribrata respectively.
Methods
Three Blaberidae genomes (Geoscapheus dilatatus, Panesthia cribrata and Neogeoscapheus hanni) were sequenced and assembled to investigate the evolution of this group, and to provide genomic resources for studies on Blattodea. These genomes were assembled using a combination of linked-read, long-read and Hi-C data (the raw seqeunce data are avalable on the SRA database under BioSample accessions SAMN39450771, SAMN39450772 and SAMN39450773 for Geoscapheus dilatatus, Neogeoscapheus hanni and Panesthia cribrata respectively). Assembly methods are outlined in the associated manuscript. Briefly, an initial de novo Geoscapheus dilatatus genome assembly was generated using the program Supernova with linked-read stLFR data. Gaps were filled using the program TGS-GapCloser with low-coverage long-read PacBio HiFi data, and the assembly was subsequently scaffolded using the SALSA2 with Hi-C reads.
An initial Panesthia cribrata genome assembly was generated using the program Supernova with linked-read TELL-Seq data. A second de novo assmembly was generated using the program IPA with long-read PacBio Hifi data. The two assemblies were then merged using the program QuickMerge.
The Neogeoscapheus hanni genome assembly was generated using the program Supernova with linked-read stLFR data.
All three genomes were annotated using FGENESH++.