Skip to main content
Dryad

A chromosome-level genome assembly of the highly heterozygous sea urchin Echinometra sp. EZ reveals adaptation in the regulatory regions of stress response genes

Cite this dataset

Ketchum, Remi et al. (2022). A chromosome-level genome assembly of the highly heterozygous sea urchin Echinometra sp. EZ reveals adaptation in the regulatory regions of stress response genes [Dataset]. Dryad. https://doi.org/10.5061/dryad.mw6m90602

Abstract

Echinometra is the most widespread genus of sea urchin and has been the focus of a wide range of studies in ecology, speciation, and reproduction. However, available genetic data for this genus are generally limited to a few select loci. Here, we present a chromosome-level genome assembly based on 10x Genomics, PacBio, and Hi-C sequencing for Echinometra sp. EZ from the Persian/Arabian Gulf. The genome is assembled into 210 scaffolds totaling 817.8 Mb with an N50 of 39.5 Mb. From this assembly we determined that the E. sp. EZ genome consists of 2n = 42 chromosomes. BUSCO analysis showed that 95.3% of BUSCO genes were complete. ab initio and transcript-informed gene modeling and annotation identified 29,405 genes, including a conserved Hox cluster. E. sp. EZ can be found in high-temperature and high-salinity environments, and we therefore compared gene families and transcription factors associated with environmental stress response (“defensome”) with other echinoid species with similar high-quality genomic resources. While the number of defensome genes was broadly similar for all species, we identified strong signatures of positive selection in non-coding elements near genes involved in environmental response pathways as well as losses of transcriptions factors important for environmental response. These data provide key insights into the biology of E. sp. EZ as well as the diversification of Echinometra more widely and will serve as a useful tool for the community to explore questions in this taxonomic group and beyond.

Usage notes

spez_chrom_genome_final.fasta -- Chromosome-level genome assembly

spez_gene_annotations.txt -- Gene annotation file

spez_annotations.gff -- Gene annotation file gff format

spez_proteins.fa -- Protein file

spez_only_sig_sites.bed -- file generated from positive selection analysis (adaptiphy), includes sites that are under selection in E. sp. EZ

PFAM_SPEZ -- file generated from running hmmscan on spez protein files and PFAM database (includes all results)

Please see: http://eddylab.org/software/hmmer3/3.1b2/Userguide.pdf for formatting information (page 45)

PFAMs_Queried -- this is the list of PFAMs that we extracted from the PFAM_SPEZ file in order to look at 'defensome' PFAMs

spez_chrom_genome_final.fasta.mod.EDTA.TEanno.gff3 -- Whole-genome TE annotation file which contains structurally intact and fragmented TE annotations (generated through EDTA)

Funding

Nick Simons Foundation, Award: 1924498

National Science Foundation, Award: GRFP

Tamkeen, Award: CGSB5