Skip to main content
Dryad

Integrating venom peptide libraries into a phylogenetic and broader biological framework

Cite this dataset

Chase, Kevin; Watkins, Maren (2022). Integrating venom peptide libraries into a phylogenetic and broader biological framework [Dataset]. Dryad. https://doi.org/10.5061/dryad.ksn02v751

Abstract

The venomous marine snails are conventionally divided into three groups, the cone snails (family Conidae), the auger snails (family Terebridae) and the turrids (formerly all assigned to a single family, Turridae). In this study, a library of venom peptides from species conventionally assigned to the genus Turris was correlated to a phylogenetic analysis. Nucleotide sequences of multiple genes from transcriptomes were used to assess the phylogenic relationships across a diverse set of species. The resulting tree shows that as conventionally defined, the Conoidean genus Turris, is polyphyletic. We describe a new genus, Purpuraturris gen. nov., that comprises the outlier species. In addition to morphological distinctions, molecular data reveal that this group are more closely related to Unedogemmula and Turridrupa than to Turris sensu stricto. The correlation between phylogenetic information and multiple peptide sequences from the library of venom peptides was used to highlight those peptides mostly likely to be unique and intimately associated with biological diversity. The plethora of peptide sequences available requires two prioritization decisions: which subset of peptides to initially characterize, and after these are characterized, which to comprehensively investigate for potential biomedical applications such as drug developments.

Methods

Adapter clipping and quality trimming of raw reads were performed using fqtrim software (Version 0.9.4, ) and PRINSEQ (Version 0.20.4). After processing, sequences shorter than 70 bps and those containing more than 5% ambiguous bases (Ns) were discarded. De novo transcriptome assembly was performed using Trinity Version 2.0.5 with a kmer size for building De Bruijn Graphs of 31, a minimum kmer coverage of 10, and a minimum glue of 10. Assembled transcripts were annotated using Blastx (NCBI-Blast-2.2.28+) against conotoxin sequences extracted from the ConoServer and UniProt databases.  Common genes that were shared between all datasets were identified using the blast identities of assembled contigs (e < 10-4).
Fasta contig sequences were aligned using mafft v7.222 (with the “auto” flag).  For each alignment file we used the RAxML (Stamatakis 2006) program with a GTRGAMMA model (unpartitioned) to determine the “best” tree.

Each zip file is a compressed folder.
fasta.zip has the contiq sequences from each sample grouped by blast identity gene name.  Each file is named according to the Swiss-Prot identifier.
aligned.zip has mafft aligned sequences from all samples in fasta format.
bestTree.zip has has the “best” RaxML trees for all genes.

Funding

National Institute of General Medical Sciences, Award: GM048677

United States Department of Defense, Award: W81XWH-17-1-0413