Reference genome and annotation for Teleopsis dalmanni
Data files
Aug 28, 2025 version files 35.11 GB
-
m64157e_210730_141553.hifi_reads.bam
8.41 GB
-
m64157e_210730_141553.hifi_reads.fasta.gz
4.43 GB
-
m64157e_210730_141553.hifi_reads.fastq.gz
8.90 GB
-
m64157e_211024_013127.hifi_reads.bam
4.93 GB
-
m64157e_211024_013127.hifi_reads.fasta.gz
2.74 GB
-
m64157e_211024_013127.hifi_reads.fastq.gz
5.24 GB
-
README.md
2.24 KB
-
ST_FINAL.fa
401.38 MB
-
ST_FINAL.gff
51.53 MB
Abstract
This dataset provides a reference genome assembly and sequencing data for Teleopsis dalmanni (stalk-eyed fly). T. dalmanni is an important model organism for the study of sexual selection, sexual conflicts, and selfish genetic elements; however, only more recently have high-quality genomes become more readily available for understanding the genetic basis of these processes. Here, we present a whole genome assembly (three chromosomes with GFF annotation). The assembly was generated from PacBio HiFi reads from two runs (BAM, FASTQ, and FASTA formats) and Iso-Seq transcript data for annotation. The genome was assembled with HiFiasm, haplotigs removed with purge_dups, and scaffolds generated using publicly available chromatin conformation capture data. This data also showcases the use of the novel annotation method OMAnnotator that uses the OMA algorithm, utilising the evolutionary relationships among genes across species.
https://doi.org/10.5061/dryad.j6q573nqw
Description of the data and file structure
Reference genome and sequencing data for Teleopsis dalmanni
ST_FINAL.fa
Fasta file for three chromosomes 1,2, and X.
ST_FINAL.gff
Genome annotation in gff format
m64157e_210730_141553.hifi_reads.bam (8.41 GB)
HiFi reads from sequencing run 210730_141553 in BAM format.
m64157e_210730_141553.hifi_reads.fastq.gz (8.90 GB)
FASTQ reads for sequencing run *210730_141553 *in FASTQ format with sequences.
m64157e_210730_141553.hifi_reads.fasta.gz (4.43 GB)
FASTA version of the sequencing run 210730_141553.
m64157e_211024_013127.hifi_reads.bam (4.93 GB)
HiFi reads from sequencing run 211024_013127 in BAM format.
m64157e_211024_013127.hifi_reads.fastq.gz (5.24 GB)
FASTQ reads of the sequencing run 210730_141553 .
m64157e_211024_013127.hifi_reads.fasta.gz (2.74 GB)
FASTA version of the sequencing run *210730_141553 *.
Methods: Genome sequence
PacBio HiFi reads were generated from pooled female larvae. Reads were then assembled with HiFiasm (Cheng et al., 2021) before applying the purge_dups pipeline (Guan et al., 2020). Contigs were scaffolded with chromatin conformation capture data previously produced by Reinhardt et al (2023; ncbi BioSample SRX9103577) with the Arima pipeline (https://github.com/ArimaGenomics/mapping_pipeline).
Methods: Annotation
PacBio Iso-seq was generated from a mixture of adults and larvae. Reads were processed using the Isoseq3 pipeline (https://github.com/ylipacbio/IsoSeq3). The OMAnnotator pipeline was used to build a consensus annotation based on the homology of transcripts and features identified (Bates et al., 2024; https://doi.org/10.1101/2024.12.04.626846[)](https://www.zotero.org/google-docs/?b3VBFF).
