Skip to main content

Ecological and biogeographic processes drive the proteome evolution of snake venom

Cite this dataset

Siqueira-Silva, Tuany et al. (2021). Ecological and biogeographic processes drive the proteome evolution of snake venom [Dataset]. Dryad.


The emergence of venom is an evolutionary innovation that favored the diversification and survival of snakes. The composition of snake venoms is known in detail from venom gland proteomic data. However, there is still a gap of knowledge about the forces that lead to the expression of different toxins in different proportions in the venom cocktail across space and time.






Elapidae and Viperidae.

We integrated proteomic data with phylogenetic comparative methods to understand how ecological and biogeographical processes drive the evolution of snake venom. We observed that more productive environments favor a more complex venom, which presents a higher diversity and similarity on the toxin proportions in its composition. We found that the taxa that live in islands, where there is lower variability of resources, tended to present  less complex venom dominated by few toxins. In this case, the island's isolation seems to be a relevant element for a faster fixation of specific venom compositions. We show that ecological and biogeographic processes, which can act differentially over time and space, affect the gene expression of toxins in snake venoms.


We obtained proteomic data of snake venom from Tasoulis and Isbister (2017). Proteomic data was available for 112 taxa, 74 of which belonging to the Viperidae family, and 38 to the Elapidae family. The data includes information on the proteomes of some subspecies and isolated populations on islands (Table S1). In addition, this dataset corresponds to ten types of toxins found in these species: phopolispases A2 (PLA2), serine proteases (SVSP), metalloproteases (SVMP), Kunitz peptides (KUN), 3-finger toxins (3FT), L-amino acid oxidases (LAAO), cysteine-rich secretory proteases (CRiSP), C-type lectins (CTL), disintegrins (DIS), and natriuretic peptides (NP).  The defensins (DEF), metalloprotease inhibitors (Mpi), vascular endothelial growth factors (VEGF), and cystatins (CYS) toxins were not considered for analysis due to their low frequency in proteomes. To understand how snake venom compositions vary, we performed a Pearson correlation analysis between the toxins from the venoms in R platform v.4.0.2 (R Core Team, 2020). Additionally, we estimated the complexity of the species’ venom using the standardized Levin index (Winemiller & Pianka, 1990). The Levin index is a measure that calculates the niche trophic breadth of an organism, where the number of items and the proportion of each item are used in the calculation. In this case, species that feed on a greater variety of items and with more equitable proportions have a broader trophic niche. However, this measure can be used to quantify the complexity of the venom, since we expect that venoms with a greater diversity of toxins and in more equitable proportions will have more diverse pathways of action against their prey. The Levin index is calculated as follows:

Where Bi is the Levin’s standardized index of the snake species i, Pij is the proportion of the toxin j in the species i; and n is the total number of toxins in the venom. Thus, Levin's index values oscillate between 0 and 1, where a value close to 0 represents a less complex venom, while values close to 1 are a highly complex venom.

We collected georeferenced occurrences of each species from the GBIF platform (, associated references are in the Supporting Information), with the help of the “spocc” package (Chamberlain, 2020) in R v.4.0.2 (R Core Team, 2020). We also collected occurrence data of holotypes of all species that are not present in the GBIF database, using the Reptile Database ( All spatial coordinates were validated with the proposed distributions on the IUCN Red List to eliminate possible erroneous points.

From the geographic coordinates, we obtained data of Average Annual Temperature (Bio1) and Precipitation Seasonality (Bio15) from WorldClim (Booth et al., 2014;, Net Primary Productivity (NPP) from the FAO database (, and the Speed of Temperature Change in the last 21 thousand years (Velocity) by Sandel et al. (2011). For each of the variables (Bio1, Bio15, NPP, and Velocity), we estimate the average for each species. We used the variables Bio1 and NPP as proxies for the diversity of prey available in the environment (Food availability hypothesis) (Waide et al., 1999; Allen et al., 2002). We didn't include dietary data of Viperidae and Elapidae due to the lack of data for more than 50% of the analyzed taxa (for the majority of species there are few records of the dietary resources consumed by them) (Grundler, 2020). We used the variables Bio15 and Velocity as proxies for the resource variation over the year and historical time (Brito, 2004; Brown & Shine, 2007; Morales-Barbero et al., 2021), respectively (Food seasonality hypothesis). To assess the effect of body size on snake venoms (Body size hypothesis), we obtained the body mass estimate in grams available from Feldman et al. (2016). Additionally, to assess the biogeographic effect on snake venoms (Biogeography hypothesis), we collected for each proteome data point the locality of occurrence of the taxon to classify whether it occurs on islands or the mainland (Table S2).

Phylogenetic Reconstruction

We used the genetic alignments of Figueroa et al. (2016), which is based on five mitochondrial genes and five nuclear genes. Given that some species for which we had proteome data were not included in the study by Figueroa et al. (2016), we supplemented our data with CytB, ND2, ND4, 12S, and 16S sequences available in the GenBank database (Table S2). We used the species Rena humilis, Boa constrictor, and Iguana iguana as external groups. The final alignment of the sequences was performed with the MAFFT software ( We conducted a Bayesian phylogenetic reconstruction using the BEAST v 2.5 software (Bouckaert et al., 2019), using the evolutionary model of GTR + G + I, a relaxed lognormal molecular clock, and the Yule process prior.  For the calibration of the tree, we used five fossils associated with the first register of Elapidae, Viperinae, Crotalinae, Scolecophidia, and Boidae (see Table S3 for prior distribution and fossil sources).  Furthermore, we used 50 million generations of MCMC chains, and the chain convergence was evaluated using Tracer software v 1.7.1 (Rambaut et al., 2018), using a 10% burn-in and an ESS> 200. Finally, we used TreeAnnotator v.2.6.3 ( to obtain the Major Credibility Tree. Taxa that lacked genetic data (Table S2) were added randomly within the genus, with the help of the “phytools” package (Revell, 2012) in the R v.4.0.2 environment (R Core Team, 2020).

Evolutionary Models

To evaluate the mode of evolution of venom complexity in the snake families Elapidae and Viperidae, we applied an evolutionary model-fitting approach to both families, together, as well as to each of the lineages. We evaluated different evolutionary models derived from the stochastic processes of Brownian motion (BM) (Felsenstein, 1985) and Ornstein-Uhlenbeck (OU) (Hansen & Martins, 1996) with the help of the “mvMORPH” package (Clavel et al., 2015). The BM model establishes that phenotypic differences are the result of a stochastic accumulation of changes, which occur independently and at a specific rate (σ2) along the lineages, increasing the trait variance through time (Felsenstein, 1985). The OU model is a modified version of the BM model that incorporates a selection process. Thus, the OU model presents adaptive zones that function as attractors of biological characteristics, representing evolution through adaptive processes (Butler & King, 2004). Thus, two parameters are incorporated in the model, one that describes the strength of the selection (α) while another describes an adaptive optimal (θ) (Hansen & Martins, 1996). When α=0, the OU model is reduced to a BM. The four evolutionary models that we evaluated were: (i) BM1 (one rate), where there is a single drift rate (σ2); (ii) BM2 (two rates), where there are different σ2 for species that inhabit islands and continents; (iii) OU1 (one optimal), where there is a single adaptive optimal (θ); and (iv) OU2 (two optima), where there are different adaptive peaks for species that inhabit islands and continents. Therefore, BM2 and OU2 models allowed us to validate the existence of different rates and optima, respectively, for the complexity of the venoms in islands and continents. To perform the BM (two rates) and OU (two optima) models, we estimate the ancestral states of the binary characteristic island/continent (1000 simulations) using the make.simmap in the “phytools” package (Revell, 2012). To take phylogenetic uncertainty into account, we generated 100 trees where we included the missing species (n = 7) randomly within the genus with the “phytools” package (Revell, 2012) in the R v.4.0.2 environment (R Core Team, 2020), as recommended by Martinez et al. (2015). Subsequently, we ran the model with these 100 trees and estimated the average of AIC values. We selected the model using the lowest AIC value (average AIC) and the highest AIC weight (wi) (average wi) (Wagenmakers & Farrell, 2004; Johnson & Omland, 2004).

Phylogenetic Path Analysis

To assess the hypotheses Food availability, Food seasonality, Body size, and Biogeography, we performed a Phylogenetic Path Analysis (PPA) (von Hardenberg & Gonzalez-Voyer, 2013). We built nine possible causal models, represented by acyclic graphs, which illustrate the relationship between our variables (Figure S1). The causal models were developed to consider the separated or combined effects of the predictors on toxins and on the venom complexity. These causal models illustrate different paths to describe the Food availability hypothesis (Figure S1a, b, f, h), the Food seasonality hypothesis (Fig. S1a, b, d, e, f, g, h, i), the Body size hypothesis (Fig. S1a, b, c) and the Biogeography hypothesis (Fig. S1a, b, i) (more details of the causal models in the legend of Fig. S1).  For each response variable, we select the best causal model based on the C statistic's Information Criterion (CICc), which is a modified version of the Akaike Information Criterion, AIC (von Hardenberg & Gonzalez-Voyer, 2013). Models with ΔCICc lower than two were considered to be the best fitting ones. The PPAs were built and tested with the support of the “phylopath” package (van der Bijl, 2018) in R v.4.0.2 (R Core Team, 2020). For the analysis, our venom complexity index was used on a logarithmic scale to comply with the assumption of normality. To take into account phylogenetic uncertainty, PPAs were performed using 100 phylogenetic trees. Subsequently, we ran the PPAs with these 100 trees and estimated the average, minimum, and maximum of the standardized coefficients and the 95% confidence intervals.