Comparative genomic analysis of chemosensory-related gene families in gastropods
Data files
May 02, 2023 version files 7.50 MB
Abstract
Chemoreception is critical for the survival and reproduction of animals. Except for a reduced group of insects and spiders, the molecular identity of chemosensory proteins is poorly understood in invertebrates. Gastropoda is the extant mollusk class with the greatest species richness, including marine, freshwater, and terrestrial lineages, and likely, highly diverse chemoreception systems. Here, we performed a comprehensive comparative genome analysis taking advantage of the chromosome-level information of two Gastropoda species, one of which belongs to a lineage that underwent a whole genome duplication event. We identified thousands of previously uncharacterized chemosensory-related genes, the majority of them encoding G protein-coupled receptors (GPCR), mostly organized into clusters distributed across all chromosomes. We also detected gene families encoding degenerin epithelial sodium channels (DEG-ENaC), ionotropic receptors (IR), sensory neuron membrane proteins (SNMP), Niemann–Pick type C2 (NPC2) proteins, and lipocalins, although much smaller in size. Our phylogenetic analysis of the GPCR gene family across protostomes revealed: (i) large gene family expansions in Gastropoda; (ii) clades including members from all protostomes; and (iii) species-specific clades with a huge number of receptors. For the first time, we provide new and valuable knowledge into the evolution of the chemosensory gene families in invertebrates other than arthropods.
Methods
Please see the README document. The starting datasets were represented by files with the nucleotide sequence corresponding to the genome and proteome of three gastropod species, with their respective annotation files. These data are available in the server of the National Center for Biotechnology Information (NCBI); it was not data generated in our work, as we indicated in materials and methods. Then we used the BITACORA v.1.2.1 software to identify genes from chemosensory families, the output of the analysis yielded files with protein sequences identified by this tool. Alignments were made on these sequences with the Mafft v.7.453 software and phylogenetic trees were built with IQTree v.2.1.2. In addition, we run homemade scripts for the identification of gene clusters by measuring the physical distance among genes. Finally, the genetic distances among genes were estimated with the MEGA-CC v.11.0.11 program.
Usage notes
Please see the README document. The sequences, alignments, distance matrices, and phylogenetic trees can be viewed in any text editor.