A computational neuroscience framework for quantifying warning signals
Cite this dataset
Penacchio, Olivier et al. (2023). A computational neuroscience framework for quantifying warning signals [Dataset]. Dryad. https://doi.org/10.5061/dryad.x3ffbg7kd
Abstract
Animal warning signals show remarkable diversity, yet subjectively appear to share certain visual features that make defended prey stand out and look different from more cryptic palatable species. For example, many (but far from all) warning signals involve high contrast elements, such as stripes and spots, and often involve the colours yellow and red. How exactly do aposematic species differ from non-aposematic ones in the eyes (and brains) of their predators?
Here we develop a novel computational modelling approach, to quantify prey warning signals and establish what visual features they share. First, we develop a model visual system, made of artificial neurons with realistic receptive fields, to provide a quantitative estimate of the neural activity in the first stages of the visual system of a predator in response to a pattern. The system can be tailored to specific species. Second, we build a novel model that defines a ‘neural signature’, comprising quantitative metrics that measure the strength of stimulation of the population of neurons in response to patterns. This framework allows us to test how individual patterns stimulate the model predator visual system.
For the predator-prey system of birds foraging on lepidopteran prey, we compared the strength of stimulation of a modelled avian visual system in response to a novel database of hyperspectral images of aposematic and undefended butterflies and moths. Warning signals generate significantly stronger activity in the model visual system, setting them apart from the patterns of undefended species. The activity was also very different from that seen in response to natural scenes. Therefore, to their predators, lepidopteran warning patterns are distinct from their non-defended counterparts, and stand out against a range of natural backgrounds.
For the first time, we present an objective and quantitative definition of warning signals based on how the pattern generates population activity in a neural model of the brain of the receiver. This opens new perspectives for understanding and testing how warning signals have evolved, and, more generally, how sensory systems constrain signal design.
README: A computational neuroscience framework for quantifying warning signals
by O.Penacchio, C.G.Halpin, I.C.Cuthill, P.G.Lovell, M.Wheelwright, J.Skelhorn, C.Rowe, J.M.Harris, May 2023
This repository contains all the material needed to understand the construction and content of the St Andrews hyper spectral database of Lepidoptera and reproduce the analysis of the paper 'A computational neuroscience framework for quantifying warning signals', .
The associated Zenodo files contain Matlab and R code to tune and run the model of predator visual system and reproduce the analysis in the paper.
*** Please note that for the sake of file size this repository only provides a single example of hyperspectral image. The full St Andrews hyperspectral database of Lepidoptera is available at https://arts.st-andrews.ac.uk/lepidoptera/index.html ***
Description of the Data and file structure
Summary:
- File count: 10
- Total file size: 433.6 MB
- Range of individual file sizes: 3 kB - 433.3 MB
- File formats: .md/.bil/.bil.hdr/.xlsx/.mat
File naming:
- The hyperspectral .bil file is named according to the scientific name of the species (here, Arctia caja), the specimen number within all the specimens for the species (here, 1, the first specimen for the species Arctia caja), the side of the specimen scanned (here, D, for dorsal [V is used for the ventral side]), and next the date the specimen was scanned (here, 030717, in day/,month/year format). Each .bil file is accompanied by a .bil.hdr that contains all the information for reading the hyperspectral image. The .bil.hdr files are named after their corresponding .bil files.
- The Excel files (.xlsx) contain all the relevant information on the database, starting with a full description of the included species with literature references to justify their classification into the aposematic/non-aposematic categories (Table_S1_List_of_Species.xlsx), the full list of the scans with their museum reference and the three main metrics associated with each scan (i.e., luminance contrast, ODD and colour contrast) and the category of the corresponding species (aposematic/non-aposematic; Table_S2_List_of_Specimens.xlsx). Finally, the file Table_S3_List_of_Supplementary_metrics.xlsx gives the values of the alternative metrics for all the scans.
- The Matlab .mat files provide a convenient structure for handling computations with the database (allSpecies_APvsnAP_database.mat), a simple numerical description of the wavelength considered in the scans (all_wavelengths_HypScan.mat), and simple files to keep track of the computations (track_success_stored.mat and track_successColour_stored.mat).
Details on the tabular files:
- Table_S1_List_of_Species.xlsx
*Description: a table giving the list of the 125 species in the database and used in the study. It includes information on the species' "category", i.e., whether the species is aposematic or not. Each row corresponds to one species.
*Format: *.xlsx
*Size: 16 KB
*Dimensions: 125 rows x 5 columns
*Variables:
- Species number: a simple numerical reference for each species
- Species: the (scientific) name of the species
- Family: the family of the species in column 2
- Aposematic/non-aposematic: the category (aposematic or non-aposematic) of the species
- Evidence of palatability, representative paper: a reference that give evidence for the category (column 4) of the species
- Table_S2_List_of_Specimens.xlsx
*Description: a table that gives the exhaustive list of scans of specimens in the database and used in the study. Each row corresponds to one scan, i.e., one side of one specimen. The table includes information on the specimen's species, genus, and family, of its category (aposematic/non-aposematic), the date of the scan, its mueseum collection of origin and reference, whether it was included in the analysis (two scans were excluded, see below at 'Reason for exclusion'), as well as the three main metrics (i.e., luminance contrast, ODD, colour contrast).
*Format: *.xlsx
*Size: 81 KB
*Dimensions: 662 rows x 15 columns
*Variables:
- Scan number: a simple numerical reference for each scan
- Species number: the species number as per Table_S1_List_of_Species.xlsx for the scanned specimen
- Scan name: reference name of the file of the scan in the database built as [species name]_[specimen number][D/V for dorsal side or ventral side]_[date of scanning]. For example\, the two first rows give scan names Abraxas fulvobasalis_1D_040917 and Abraxas fulvobasalis_1V_040917 to specify that they correspond to the species Abraxas fulvobasalis, first specimen for this species (hence, the "1"), dorsal side ("D") for the first row and ventral side ("V") for the second row, and that these scans were aquired on the 4th of September 2017.
- Species: species name for the scanned specimen
- Genus: genus of the species
- Family: family of the genus
- Date scanned: date the specimen was scanned
- Aposematic/non-aposematic: the category (aposematic or non-aposematic) of the species (see also Table_S1_List_of_Species.xlsx)
- Collection: the museum collection the specimen belongs to (see paper for details).
- NHMUK collection number: The specimen reference in the collection, if available.
- Included in analysis: a simple flag to specify whether the scan was included in the analysis. We excluded two specimens on the basis that one of their size was missing in the database due to a software problem during acquisition (see next column and Supplementary Material for details)
- Reason for exclusion: a brief description of the reason for exclusion
- Luminance contrast: the first metric based on luminance information (see manuscript for details)
- ODD: the second metric based on luminance information (see manuscript for details)
- Colour contrast: metric based on chromatic information (see manuscript for details) N.B.: Blank cells in column 10 ("NHMUK collection number") correspond to specimens with no collection numbers; blank cells in column 12 ("Reason for exclusion") correspond to scans that were ***not*** excluded from the analysis (i.e., all scans but 2).
- Table_S3_List_of_Supplementary_metrics.xlsx
*Description: a table that gives alternative metrics for the scanned specimen. Each row correspond to one scan (i.e., one side of one specimen).
*Format: *.xlsx
*Size: 128 KB
*Dimensions: 662 rows x 15 columns
*Variables:
- Species number: the species number as per Table_S1_List_of_Species.xlsx for the scanned specimen
- Columns 2 to 15: Main metrics of the manuscript (columns 2 to 4) and alternative metrics (columns 5 to 15) as described in the Supplementary Material document. The full name of these metrics are:
- Column 2: Luminance contrast (main metric, see Table_S2_List_of_Specimens.xlsx)
- Column 3: ODD (main metric, see Table_S2_List_of_Specimens.xlsx)
- Column 4: Colour contrast (main metric, see Table_S2_List_of_Specimens.xlsx)
- Column 5, "scinvL1": scale-invariant L1-norm
- Column 6, "scinvL2": scale-invariant L2-norm
- Column 7, "max2_5": average response of the 2.5% of the units with the highest response
- Column 8, "max10": average response of the 10% of the units with the highest response
- Column 9, "kurtosis": sparsity of the population response to the pattern measured as the kurtosis
- Column 10, "rms": root mean square contrast of the luminance image associated with each pattern
- Column 11, "ELplusMS_std": contrast energy of the (L + M) - S opponent channel
- Column 12, "ESU_std": contrast energy of the S - U opponent channel
- Column 13, "GiniL_M": the statistical dispersion of the population response of the ‘red-green’ channel, L - M, measured using the Gini index
- Column 14, "GiniLM_S": the statistical dispersion of the population response of the ‘red-green’ channel, (L + M) - S, measured using the Gini index
- Column 15, "GiniS_U": the statistical dispersion of the population response of the ‘red-green’ channel, S - U, measured using the Gini index N.B.: The two blank rows in this table correspond to specimen/side for which we experimented a computer problem and weren't properly recorded.
Table of contents:
README.md
Arctia_caja_1D_030717.bil
Arctia_caja_1D_030717.bil.hdr
Table_S1_List_of_Species.xlsx
Table_S2_List_of_Specimens.xlsx
Table_S3_List_of_Supplementary_metrics.xlsx
allSpecies_APvsnAP_database.mat
all_wavelengths_HypScan.mat
track_success_stored.mat
track_successColour_stored.mat
Related files:
Please find below a description of the files contained in the Zenodo folder related to this repository.
Description of the Matlab code
The Matlab files in this code have been reconfigured to be user-friendly, and to be adaptable to other projects using hyperspectral images and/or luminance/colour images already processed. They are equivalent to the files that were used in the different stages of the analysis in the paper. The code consists of files to implement the three following steps:
[1.] read hyperspectral images in *.iml format;
[2.] transform them to luminance (e.g., response of double cones) or colour channels (e.g., response of colour cones);
[3.] compute the summary statistics of the paper from the luminance and colour channels.
We describe these three steps below.
Important note:
As hyperspectral images are usually very large files, we only provide one file here (see below); the integrality of the hyperspectral images used in the paper is freely available for download at the webpage of the University of St Andrews database of Lepidoptera at https://arts.st-andrews.ac.uk/lepidoptera/index.html
[1.] ===== Read hyperspectral images =====
[readIMLhyp.m]
This file reads *.bil files specified by 'names' in the folder specified by 'address'; note that the
hyperspectral files *.bil go with associated *.bil.hdr files which specify their format (number of lines, pixels, wavelengths, settings of the imaging system etc.) as explained above, section 'File naming'.
[example_readIMLhyp.m]
A simple example of the above routine; reads the hyperspectral image provided (Arctia caja_1D_030717.bil
using the associated information file Arctia caja_1D_030717.bil.hdr) and outputs the hyperspectral image
(sometimes called 'hypercube') 'im', the wavelength resolution 'wvls', the spatial and spectral size of the image 'scan' and the gain of the imaging system during acquisition 'gain'.
[2.] ===== Transform hyperspectral images to luminance or colour information =====
The code to transform the hyperspectral images into the response of visual receptors consist of three functions, namely get_double_cone_responses.m, get_mask_inner_body.m and get_colour_cone_responses.m. These functions should be run in order as the input of get_double_cone_responses.m depends on the output of get_double_cone_responses.m, and get_colour_cone_responses.m uses the output of get_mask_inner_body.m.
[get_double_cone_responses.m]
This file reads an *.iml files specified by 'names' in the folder specified by 'address' and convert the corresponding hyperspectral image to the response of double cones for species 'species'. By default, the code uses the sensitivity functions of the double cones of 'chicken' (Gallus gallus domesticus). Other species can be added easily.
Example of use using the scan provided:
address = pwd;
name = 'Arctia caja_1D_030717.bil';
get_double_cone_responses(address, name)
When running the three lines of code above, the code will read the specified scan of the Arctia caja, create a subfolder 'extracted_luminance' (if it does not already exist), and export the responses of double cones as a Matlab *.mat file called 'Arctia caja_1D_030717_gain_corrected.mat'; in this file, the body of the imaged animal has been segmented, i.e., the 'pixels' of the background have all the same value '0' ('black'). The code will also export another *.mat file that contains the coordinates of the region of the background used for segmentation 'Arctia caja_1D_030717_bckgInfo.mat', as well as a non-calibrated png image of the specimen 'Arctia caja_1D_030717_luminance_non_calibrated.png'. This last image is created for convenience and illustration and should not be used for analysis!
[get_mask_inner_body.m]
This file reads the luminance *.mat file corresponding to an *.iml files specified by 'names' in the folder specified by 'address', namely the output of get_double_cone_responses(address, name), and creates a binary mask that singles out the location of the body on the image by creating a png image ('the mask') with a value of 255 for the pixels corresponding to the inner part of the body and a value of 0 elsewhere (i.e., outline and background). These masks are used to discard the response of the model to the outline of the body.
Example of use:
address = [pwd '' 'extracted_luminance'];
name = 'Arctia caja_1D_030717.png';
get_mask_inner_body(address, name)
This code will provide a png image 'inner_mask_Arctia caja_1D_030717.png' that singles out the inner part of the imaged Lepidoptera. Please note that this routine needs the output of get_double_cone_responses.m.
[get_colour_cone_responses.m]
This file reads an *.iml files specified by 'names' in the folder specified by 'address' and converts the corresponding hyperspectral image to the response of colour cones for the species 'species'. By default, the code uses the sensitivity functions of the colour cones of 'chicken' (Gallus gallus domesticus), but other species can be added easily.
Example of use:
address = pwd;
name = 'Arctia caja_1D_030717.bil';
get_colour_cone_responses(address, name)
When running the above lines of code, the code will read the specified scan of the Arctia caja, create a subfolder 'extracted_colour' (if it does not already exist), and export the responses of colour cones as Matlab *.mat files called 'Arctia caja_1D_030717_ll.mat' (the response of the L cones), 'Arctia caja_1D_030717_mm.mat' (the response of the M cones), 'Arctia caja_1D_030717_ss.mat' (the response of the S cones), and 'Arctia caja_1D_030717_uu.mat' (the response of the UV-sensitive
cones). All the values outside the inner part of the body of the imaged Lepidoptera are sent to 0 using the mask image created using get_mask_inner_body.m. The code get_colour_cone_responses.m should therefore be run after get_mask_inner_body.m.
[3.] ===== Compute the summary statistics =====
[APvsnAP_luminance_contrast_and_ODD_metrics.m]
This file computes the metrics based on luminance information in the main manuscript and Supplementary Material. It uses as input the output of the double cones extracted (using get_double_cone_responses.m, see above) to the subfolder 'extracted_luminance' of the working directory (pwd). The code also uses already computed and provided Matlab *.mat files and Matlab structures that contain information on the full database and outputs the metrics in the same format. Please see the head of the file for more information.
N.B.: Please note that the APvsnAP_energy_isotropy_metrics.m computes the responses of Gabor filters using function included in the zipped folder called 'gabor_cells.zip'. This folder must be unzipped before running APvsnAP_luminance_contrast_and_ODD_metrics.m.
[APvsnAP_colour_metrics.m]
This file computes the metrics based on colour information in the main manuscript and Supplementary Material. It uses as input the output of the colour cones extracted (using get_colour_cone_responses.m, see above) to the subfolder 'extracted_colour' of the working directory (pwd). The code also uses already computed and provided Matlab *.mat files and Matlab structures that contain information on the full database and outputs the metrics in the same format. Please see the head of the file for more information.
The functions [APvsnAP_luminance_contrast_and_ODD_metrics.m] and [APvsnAP_luminance_contrast_and_ODD_metrics.m] are specifically set to compute all the metrics presented in the papers for all the images of the St Andrews hyperspectral database of Lepidoptera. The code also provides more versatile versions for computing the metrics for a single image of a set of images within a folder.
[APvsnAP_luminance_contrast_and_ODD_metrics_generic.m] is the counterpart of [APvsnAP_luminance_contrast_and_ODD_metrics.m]. These two functions are similar in all but the loop over the images. Whereas this loop is specific to the database in [APvsnAP_luminance_contrast_and_ODD_metrics.m], [APvsnAP_luminance_contrast_and_ODD_metrics_generic.m] can handle a set of luminance images (in Matlab's *.mat, or *.png format—other image format can be added easily) and compute the all the luminance diagnostics (in particular, luminance contrast and ODD) for each image using APvsnAP_luminance_contrast_and_ODD_metrics_generic( luminance_folder, im_ext), where 'luminance_folder' is the address of the folder containging the images and 'im_ext' is the image format (or, more generally and for convenience, any 'suffix' description of the names of the files to be analysed; see function fo details).
[APvsnAP_colour_metrics_generic.m] is the counterpart of [APvsnAP_colour_metrics.m]. These two functions are similar in all but the loop over the images. Whereas this loop is specific to the database in [APvsnAP_colour_metrics.m], [APvsnAP_colour_metrics_generic.m] can analyse a set of cone outputs saved in Matlab's format *.mat. See function for details.
An 'example of use' is provided for both [APvsnAP_luminance_contrast_and_ODD_metrics_generic.m] and [APvsnAP_colour_metrics_generic.m].
Description of the R code
The R code implements the statistical analysis of the paper and Supplementary Material. It consists of three
parts. The first and third parts respectively provide the statistical analysis of the summary statistics in the main manuscript and in the Supplementary Material. These parts are implemented in the file
'statistical_analysis A computational neuroscience framework for quantifying warning signals Penacchio et al 2023.R'.
The second part implements a bootstrap procedure to compare pairs of aposematic and non-aposematic species within the same family and corresponds to a second file named 'bootstrap_pairs A computational neuroscience framework for quantifying warning signals Penacchio et al May 23.R'.
Running the code should be straightforward provided the path to the tables Table_S2_List_of_Specimens.xlsx and
Table_S3_List_of_Supplementary_metrics.xlsx is set correctly:
-The path to Table_S2_List_of_Specimens.xlsx should be set on line 21
of 'statistical_analysis A computational neuroscience framework for quantifying warning signals Penacchio et al May 23.R'
rawData <- read_excel("*** specify path to table Table_S2_List_of_Specimens.xlsx***")
and on line 16 of 'bootstrap_pairs A computational neuroscience framework for quantifying warning signals Penacchio et al May 23.R'
rawData <- read_excel("*** specify path to table Table_S2_List_of_Specimens.xlsx***")
-The path to Table_S3_List_of_Supplementary_metrics.xlsx should be set on line 183
of 'statistical_analysis A computational neuroscience framework for quantifying warning signals Penacchio et al May 23.R'
supplrawData <- read_excel("*** specify path to table Table_S3_List_of_Supplementary_metrics.xlsx***")
Alternatives to the Matlab code
All the Matlab functions of the code but the dependent function multibandread.m can be run in the open source language Octave (using GNU Octave 8.3.0, see https://octave.org/) provided the right packages are loaded (Octave will automatically indicate which packages are missing and how to load them). The function multibandread.m is not available in Octave. This function is crucial to read the hyperspectral scans. Reading the scans can be done with the open source language Python using the module SPy (see http://www.spectralpython.net/index.html; SPy). Once read the hyperspectral images (sometimes called "hypercubes") can be exported to *.mat files using Python functions to export Python's arrays to Matlab's matrices stored as *.mat files (e.g., using scipy.io.savemat). The resulting *.mat files can be read in Octave.
Methods
Database construction
The novel database of lepidopteran patterns of aposematic and non-aposematic species consists of a representative set made of 125 species of Lepidoptera across 12 families (96 aposematic and 29 non-aposematic species, with a total of 676 hyperspectral images; see paper’s Supplementary Material 1 for details). Samples of each species were located in museum collections (the Natural History Museum (BMNH), London, UK, the Manchester Museum (MMUE), Manchester, UK, and the American National Museum (AMNH), New York, USA). Their dorsal and ventral sides were photographed using an ultraviolet hyperspectral camera (Resonon Pika NUV, Resonon Inc., MT USA) covering the 350 nm – 800 nm spectral range, with a spectral resolution of 1 nm. The camera was fitted with a near ultraviolet 17 mm focal length objective lens. To maximize the homogeneity of the light field, the specimens were illuminated by four blue-enhanced halogen lamps (SoLux, 35W, 12V-MR16 GU5.3 4700K, EiKO Global, KS USA) placed 22 cm apart on a squared fixture light and oriented vertically toward the horizontal scanning plane. See the paper’s Supplementary Methods 1 for details on the spatial and spectral calibration of the imaging system.
The database is freely accessible at https://arts.st-andrews.ac.uk/lepidoptera/index.html
Image analysis – neural model of predator vision – computation of metrics (summary statistics)
The neural model of a predator visual system and the computation of the metrics of the modelled neural activity were coded in Matlab (MATLAB and Statistics Toolbox Release 2019b, 9.7.0.1190202 (R2019b). Natick, Massachusetts, The MathWorks Inc.). Please see details in accompanying the README.md file and Supplementary Method 2 and 3.
Statistical analysis
The statistical analysis was done in R (R Development Core Team 2020) using generalized linear models (function glm) for the logistic regressions and the function glmer in the package lme4 (Bates et al. 2014) for fitting generalized linear mixed models. See README.md and Supplementary Method 4 for details.
Usage notes
Neural model and computation of metrics
The software required for using the model, extracting the metrics from the model response, and generating the figures is Matlab (proprietary; MATLAB and Statistics Toolbox Release 2019b, 9.7.0.1190202 (R2019b). Natick, Massachusetts, The MathWorks Inc.). An open-source alternative to run the Matlab routines is Octave (https://octave.org/).
Statistical analysis
The software required for the statistical analysis is R (free, open-source; Team, R. C. (2020). "R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.)
Funding
Maria Zambrano Fellowship for attraction of international talent for the requalification of the Spanish university system—NextGeneration EU (ALRC)
Biotechnology and Biological Sciences Research Council, Award: BB/N006569/1
Biotechnology and Biological Sciences Research Council, Award: BB/N00602X/1
Biotechnology and Biological Sciences Research Council, Award: BB/N005945/1
Biotechnology and Biological Sciences Research Council, Award: BB/N007239/1