A chromosome-level genome assembly of the highly heterozygous sea urchin Echinometra sp. EZ reveals adaptation in the regulatory regions of stress response genes
Data files
Sep 02, 2022 version files 890.76 MB
-
README.rtf
-
spez_annotations.gff
-
spez_annotations.txt
-
spez_chrom_genome_final.fasta
-
spez_only_sig_sites.bed
-
spez_proteins.fa
Sep 14, 2022 version files 1.14 GB
-
PFAM_SPEZ.txt
-
PFAMs_Queried
-
README.rtf
-
spez_annotations.gff
-
spez_annotations.txt
-
spez_chrom_genome_final.fasta
-
spez_chrom_genome_final.fasta.mod.EDTA.TEanno.gff3
-
spez_only_sig_sites.bed
-
spez_proteins.fa
Abstract
Echinometra is the most widespread genus of sea urchin and has been the focus of a wide range of studies in ecology, speciation, and reproduction. However, available genetic data for this genus are generally limited to a few select loci. Here, we present a chromosome-level genome assembly based on 10x Genomics, PacBio, and Hi-C sequencing for Echinometra sp. EZ from the Persian/Arabian Gulf. The genome is assembled into 210 scaffolds totaling 817.8 Mb with an N50 of 39.5 Mb. From this assembly we determined that the E. sp. EZ genome consists of 2n = 42 chromosomes. BUSCO analysis showed that 95.3% of BUSCO genes were complete. ab initio and transcript-informed gene modeling and annotation identified 29,405 genes, including a conserved Hox cluster. E. sp. EZ can be found in high-temperature and high-salinity environments, and we therefore compared gene families and transcription factors associated with environmental stress response (“defensome”) with other echinoid species with similar high-quality genomic resources. While the number of defensome genes was broadly similar for all species, we identified strong signatures of positive selection in non-coding elements near genes involved in environmental response pathways as well as losses of transcriptions factors important for environmental response. These data provide key insights into the biology of E. sp. EZ as well as the diversification of Echinometra more widely and will serve as a useful tool for the community to explore questions in this taxonomic group and beyond.
Usage notes
spez_chrom_genome_final.fasta -- Chromosome-level genome assembly
spez_gene_annotations.txt -- Gene annotation file
spez_annotations.gff -- Gene annotation file gff format
spez_proteins.fa -- Protein file
spez_only_sig_sites.bed -- file generated from positive selection analysis (adaptiphy), includes sites that are under selection in E. sp. EZ
PFAM_SPEZ -- file generated from running hmmscan on spez protein files and PFAM database (includes all results)
Please see: http://eddylab.org/software/hmmer3/3.1b2/Userguide.pdf for formatting information (page 45)
PFAMs_Queried -- this is the list of PFAMs that we extracted from the PFAM_SPEZ file in order to look at 'defensome' PFAMs
spez_chrom_genome_final.fasta.mod.EDTA.TEanno.gff3 -- Whole-genome TE annotation file which contains structurally intact and fragmented TE annotations (generated through EDTA)