Long read genome assembly of Automeris io (Lepidoptera: Saturniidae) an emerging model for the evolution of deimatic displays
Data files
Feb 23, 2024 version files 65.77 MB
-
Automeris_blob_result.blobDB.json
-
Automeris_io_blob_result.table.txt
-
Automeris_io_braker2_augustus_pep.aa
-
Automeris_io_braker2_augustus.gtf
-
full_table.tsv
-
missing_busco_list.tsv
-
README.md
-
short_summary.txt
Abstract
Automeris moths are a morphologically diverse group with 145 described species that have a geographic range that spans from the New World temperate zone to the Neotropics. Many Automeris have hindwing eyespots that are thought to deter or disrupt the attack of potential predators, allowing the moth time to escape. Some species in the genus have vestigial eyespots or lack them completely, suggesting that this trait may provide a selective benefit. The Io moth (Automeris io), known for its striking eyespots, is the most widely studied species within the genus and is an emerging model system to study the evolution of deimatism, a predatory defense that combines visual stimuli and movement. Here we present a high-quality, PacBio HiFi genome assembly for Io moth to aid existing research on the molecular development of eyespots. Genomic research is needed to address questions involving antipredatory defenses and eyespot pattern development. BUSCO analysis for this genome shows a completeness of 98.4%, and N50 of 15.
README: Data and supporting information for "Genome assembly for Automeris io (Lepidoptera: Saturniidae)"
This readme file contains the list supporting files.
Authors: Chelsea Skojec, R. Keating Godfrey, Akito Y. Kawahara
FILES
readme.md
This file describes the files included in this supplemental material
Files related to genome assembly
- full_table.tsv BUSCO results from curated assembly using lepidoptera_odb10 database.
Column headings
Busco ID: orthologous group ID
Status: missing or complete
Sequence: sequence id
Gene Start: beginning location of the gene
Gene End: end location of the gene
Strand: forward or reverse
Score: scores are dereived from comparison of input sequences against a database of known orthologs
Length: length of sequence
OrthoDB url Description: url link to OrthoDB corresponding orthologous group
- short_summary.txt Summary table of BUSCO results in txt format
- missing_busco_list.tsv List of BUSCOs IDs not found in the assembly. From the lineage dataset lepidoptera_odb10
Column headings
Busco ID: This is the Busco ID for list of BUSCOs IDs not found in the assembly.
Automeris_blob_result.blobDB.json
The output of blobplot analysis from blobtools v1.0 in json formatAutomeris_io_blob_result.table.txt
The output of blobplot analysis from blobtools v1.0
Files related to structural annotation
Automeris_io_braker2_augustus_pep.aa
A file containing predicted peptide sequences from the structural annotationaugustus.hints.gtf
Structural annotation from the genome from braker2
Methods
PacBio consensus reads were used to estimate genome size and heterozygosity using the program K-Mer counter (KMC) v.3.2.1 (RRID: SCR_001245). A k-mer length of 23 (-m 23) was used to create a histogram of k-mer frequencies and visualized using GenomeScope 2.0 (RRID:SCR_017014). PacBio consensus reads were assembled using the de novo assembler, HiFiasm v.0.16.1 r307 (RRID:SCR_021069), resulting in a 500 Mbp primary assembly. The assembly_stat.py script was used to assess assembly contiguity. BUSCO v.5.2.0 was used to assess completeness with 5,286 putative single copy genes from the lepidoptera_odb10.2019-11-20 database (RRID:SCR_015008; ). Therefore, we attempted to further collapse allelic variation using the Purge Haplotigs pipeline was used to purge additional duplicates. (purge_haplotigs v.1.1.2 ). A coverage histogram was used to choose a minimum, median, and maximum read depth cut off value for purging. This was produced by mapping raw reads to the primary assembly using minimap v.2.21 (RRID:SCR_018550). Contigs were assigned as haplotigs if the 80% of the contig showed diploid-level coverage (-s 80) and discarded if coverage was 80% above or below the read depth cut offs (-j 80). We used blobtools v1.0 (RRID:SCR_017618) to investigate potential contamination in the assembly. RepeatMasker was used to mask repeat regions of the assembly, creating a soft-masked genome to be used for all downstream analyses. The BRAKER2 pipeline (v2.1.5; RRID:SCR_018964) with protein sequences from the NCBI Bombyx mori Annotation Release 101 was used for structural annotation. This pipeline uses programs Bamtools (RRID:SCR_015987;, GeneMark-EP+ (RRID:SCR_011930, DIAMOND (RRID:SCR_016071), and Augustus (RRID:SCR_008417). Annotation statistics were summarized using gFACs (RRID:SCR_022017. BUSCO v.5.2.0 was used to assess completeness of this annotation. Refer to original publication for further details.
Usage notes
Files can be opened in a text editor or a spreadsheet software.