Structural models of Cheiracanthium punctorium spider toxins with putative defensive function (CSTX-type and phospholipase A2)
Data files
Oct 06, 2025 version files 517 KB
-
fold_cptx_1a_model_0.cif
94.42 KB
-
fold_cptx_2d_model_0.cif
46.14 KB
-
fold_cptx_5a_model_0.cif
46.98 KB
-
fold_cptx_5b_model_0.cif
46.98 KB
-
fold_cptx14a_model_0.cif
96.70 KB
-
fold_P00630_PA2_model_0.cif
93.12 KB
-
fold_Q6T178_PA2_model_0.cif
90.74 KB
-
README.md
1.90 KB
Abstract
Spider venom is an important but evolutionarily poorly understudied functional trait. In this study, we have sequenced multiple venom gland transcriptomes from various spiders and analyzed their venom profile. The transcriptomes were generated via Illumina TruSeq RNA Sample Prep Kit v2 or TruSeq Stranded mRNA Library Prep Kit (paired-end, 151-bp read length), sequenced with Macrogen (Korea) using different Illumina chemistries an assembled de novo using a pipeline incorporating Trinity v2.13.2/2.15.1 as well as rnaSPAdes v3.15.5. Identified toxin precursors were annotated using InterProScan v5.61-93.0/5.69-101.0 and a DIAMOND v2.0.15/2.1.9 blastp search against the public available databases VenomZone, UniProtKB/Swiss-Prot Tox-Prot, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL v2022_05/2024_04 was performed. Our analysis revealed the presence of novel double-domain toxins from the CSTX family and some phospholipase A2 toxins sequence-wise resembling homolgs from honeybees and scorpions, indicating functional similarity. In order to explore the architecture of the CSTX toxins and whether the observed phospholipase A2 sequence similarities have structural repercussions, we utilized structural predictions. Therefore, Alphafold 3 in the Galaxy platform was used to generate 3D models of the proteins. Sequences were submitted online and structures were predicted using default settings. The resulting model was downloaded and visualized using ChimeraX. The results indeed suggested a high structural similarity of phospholipases from spider, honeybee and scorpion venom which renders them as likely functionally related, supposedly acting in defense and illustrated a complex structure of novel CSTX toxins. The dataset herein presented contains the models of the chosen spider toxins stemming from the Nurses thorn finger (Cheiracanthium punctorium), as well as the most similar ones from honeybees and scorpions.
Dataset DOI: 10.5061/dryad.fn2z34v7t
Description of the data and file structure
In order to grasp the architecture of doule-domain CSTX toxins and to understand whether the sequence-wise suggested similarity of phospholipase A2 toxins from the Nurses thorn finger (Cheiracanthium punctorium) are reminsicent to similarities in folding, structure and, hence, potentially function, we employed structural modelling of the sequenced transcripts. We modelled selected CSTX toxins (CPTX1a, CPTX 2d, CPTX 5a and CPTX5d) as well asidentified phospholipsae A2 toxins from C. punctorium (CPTX14a) and their most similar homologs from bees and scorpions (P00630 and Q6T178). The models (used to generate figure 3 and 5 from linked publication/preprint) are created via alphafold3 from sequence data under default parameters.
Files and variables
File: fold_cptx_1a_model_0.cif
Description: Model of CPTX1a
File: fold_cptx_2d_model_0.cif
Description: Model of CPTX 2d
File: fold_cptx_5a_model_0.cif
Description: Model of CPTX 5a
File: fold_cptx_5b_model_0.cif
Description: Model of CPTX 5b
File: fold_cptx14a_model_0.cif
Description: Model of Chepu-PLA2-1a
File: fold_P00630_PA2_model_0.cif
Description: Model of P00630
File: fold_Q6T178_PA2_model_0.cif
Description: Model of Q6T178
Code/software
The data has been visualized using Chimera X (https://www.cgl.ucsf.edu/chimerax/) but can be visualized in any of the commonly used protein viewers.
Access information
Other publicly accessible locations of the data:
- Not applicable.
Data was derived from the following sources:
- Not applicable.
RNA extraction and sequencing were outsourced to Macrogen. Following RNA extraction, libraries were constructed using the Illumina TruSeq RNA Sample Prep Kit v2 or TruSeq Stranded mRNA Library Prep Kit (paired-end, 151-bp read length). Quality was controlled by the verification of PCR-enriched fragment sizes using an Agilent Technologies 2100 Bioanalyzer with a DNA 1000 chip. The library quantity was determined by qPCR using the rapid library standard quantification solution and calculator (Roche).
Transcriptome data were processed using a modified version of our in-house assembly and annotation pipeline62. All input sequences were inspected using FastQC v0.11.9/0.12.1 (www.bioinformatics.babraham.ac.uk) before trimming with cutadapt v4.2/4.9. The trimmed reads were corrected using Rcorrector v1.0.5/1.0.779 and assembled de novo using a pipeline incorporating Trinity v2.13.2/2.15.1 with a minimum contig size of 30 bp and maximum read normalization of 50 and rnaSPAdes v3.15.5 with and without error corrected reads. All contigs were combined into a single assembly, in which transcripts from all assemblers were merged if they were identical. Reads were remapped to the assembly using HISAT2 v2.2.1 and expression values (transcripts per million, TPM) were calculated using StringTie v2.2.1/2.2.2. SAM and BAM files were converted using samtools v1.16.1/1.20. Open reading frames (ORFs) were then predicted with TransDecoder v5.5.0/5.7.1 (github.com/TransDecoder/TransDecoder) with a minimum length of 10 amino acids and provided for proteome analysis.
Identified toxin precursors were annotated using InterProScan v5.61-93.0/5.69-101.0 and a DIAMOND v2.0.15/2.1.9 blastp search was performed. The E-value was set to a maximum of 1 x 10^-3^ in ultra-sensitive mode with all target sequences reported (--max-target-seqs 0). We then calculated the coverage of query and subject, and the similarity with the BLOSUM62 matrix using BioPython v1.81/1.83 for each hit. Sorting by similarity, query and subject coverage for each toxin candidate led to the resulting hit for the final analysis. Precursors without a predicted signal peptide by SignalP v6.0g/h in slow–sequential mode for eukarya were removed from our dataset. Annotated precursors were aligned to known venom components from the same putative family to verify our assignments using ClustalW in Geneious v10.2.6 (www.geneious.com).
Alphafold 3 in the Galaxy platform was used to generate 3D models of the proteins. Sequences were submitted online and structures were predicted using default settings. The resulting model was downloaded and visualized using ChimeraX93, retrieved models are provided in the herein presented database.
