Sexual selection in wild populations of seed bugs: the role of size in pre-copulatory mate choice by females and males
Data files
Sep 22, 2025 version files 21.78 GB
-
R_Scripts.zip
15.57 KB
-
Raw_Data_Photos.zip
21.78 GB
-
README.md
6.92 KB
-
Spreadsheets.zip
103.33 KB
Abstract
This dataset link comprises all data, R scripts, and raw data for the research paper titled 'Sexual selection in wild populations of seed bugs: the role of size in pre-copulatory mate choice by females and males'. It contains data spreadsheets and R scripts for both Spilostethus pandurus and Lygaeus creticus, the raw photos taken and used to obtain the size measurements, and a READ.ME file
Dataset DOI: 10.5061/dryad.sf7m0cgkr
Description of the data and file structure
This README file is for all the data related to the research paper ‘Sexual selection in wild populations of seed bugs: the role of size in pre-copulatory mate choice by females and males’.
Authors: Ophelia S Fritsch and David M Shuker
In this study, we sampled mating and non-mating individuals of both Spilostethus pandurus and Lygaeus creticus in Sicily, directly in their habitat. We then measured them for three body size measurements and compared these between mated individuals and individuals found not mating to see patterns of non-random mating in regard to body size.
Throughout the R scripts and in document titles, the names ‘Spilo’ or ‘spilo’ are short for Spilostethus pandurus and ‘Creticus’ or ‘creticus’ are short for Lygaeus creticus.
Files and variables
Spreadsheets can be found in the folder 'Spreadsheets' under the zip file 'Spreadsheets.zip'.
Seven data spreadsheets are provided:
First, the spreadsheets titled ‘spilo.field.dataset.csv’ and ‘creticus.field.dataset.csv’ are the two raw datasets with the original variables, and are the ones used to do the analyses (see R scripts for each species).
Second, the ones titled ‘Spilostethus_pandurus_wild_caught_final_dataset.csv’ and ‘Lygaeus_creticus_wild_caught_final_dataset.csv’ are the final datasets, which include all new variables created during the analysis, including the principal component individual scores.
Thirdly, the spreadsheet titled 'Spilo_field_repeatability_measures.csv' is the dataset with the morphological measures from 32 individuals, which were re-measured to test the repeatability using the Intra Class Correlation Coefficients.
And finally, the two spreadsheets titled 'Spilo_photo_IDs_individual_IDs_and_measurements.csv' and 'Creticus_photo_IDs_individual_IDs_and_measurements.csv' contain data linking the raw photo IDs to the individual IDs and the dates the photos were taken.
Brief description of variables used in the seven spreadsheets:
date – date the individuals were measured
individual_ID – Individual identification, which was given pseudo-randomly
sex – individual sex
body_length – measure of body length in millimetres (mm)
thorax_width – measure of thorax width in millimetres (mm)
abdomen_width – measure of abdomen width in millimetres (mm)
pair_ID – pair identification for individuals that mated
status – status at the time of collection (collected mating or collected not mating)
collected_mating_Y_N – status at collection with Y (Yes) = 1 and N (No) = 0
body_length_sq – value of body length measure multiplied by itself (squared)
thorax_width_sq – value of thorax width measure multiplied by itself (squared)
abdomen_width_sq – value of abdomen width measure multiplied by itself (squared)
PC1_times_minus_1 – Individual scores for principal component 1, which had opposite signs
PC2 – individual scores for principal component 2
PC3 - individual scores for principal component 3
PC1 - individual scores for principal component 1 with the correct sign – used for analysis
PC_and_4 – individual scored for principal component 1 with the value 4 added
PC_and_4sq – the above multiplied by itself (squared)
vstd_body_length – variance standardised measure of body length (scaled)
vstd_thorax_width - variance standardised measure of thorax width (scaled)
vstd_adbomen_width - variance standardised measure of abdomen width (scaled)
vstd_body_length_sq – variance standardised measure of body length squared
vstd_thorax_width_sq - variance standardised measure of thorax width squared
vstd_abdomen_width_sq - variance standardised measure of abdomen width squared
PC1_sq – individual score for principal component 1 squared
BL_1 - body length measures from the original dataset
TW_1 - thorax width measures from the original dataset
AW_1 - abdomen width measures from the original dataset
BL_2 - body length re-measurements for repeatability
TW_2 - thorax width re-measurements for repeatability
AW_2 - abdomen width re-measurements for repeatability
date_photo_taken - the date the photo was taken
photo_ID - the name of each raw photograph
measure_type - brief description of the body size measure type
measure_mm - the body size measures in one column as the Image J output
Notes or notes - this variable describes the reason behind why body size measurements are missing for each individual. Only individuals where an error occurred during photography or if they flew away will have a note in this column. The others are not relevant, hence left blank (see below).
NA s and blanks explained
In the variables containing body size measures, 'NA' s represent 'not available'/ missing data due to errors during photography - either the ventral side of the bug was forgotten to be photographed or the bug flew away before being photographed - which information is available in the notes variable in the ‘Spilostethus_pandurus_wild_caught_final_dataset.csv’ and ‘Lygaeus_creticus_wild_caught_final_dataset.csv’ final datasets.
In other variables such as 'pair_ID', NA s represent 'not applicable' as these individuals were not paired.
In the notes variable, blank rows represent 'not relevant' hence left blank.
Raw data:
All raw photographs are provided in the folder named 'Raw_Data_Photos' under the zip file 'Raw_Data_Photos.zip' . Within this folder, the photos are organised in several other folders. First, they separated by species 'Spilostethus' and 'Creticus', and then by the date the photos were taken, such as '11-05-24 Spilo' or '31-05-24 Creticus'.
Code/software
To view and extract data from the the .arw images (raw images), the plugin Mica Toolbox in Image J software can be used, as well as Image J by itself by selecting import ARW image. Alternatively, to view the images only, RawTherapee, GIMP with a RawTherapee plugin can also be used.
R- Scripts for the data analyses can be found in in the folder 'R_Scripts' in the zip file 'R_Scripts.zip'. All scripts were produced in R-Studio 2024 version 4.3.3 (R Core Team, 2024)
The three scripts provided are as follow: ‘Spilo R analyses for manuscript’ for S. pandurus data analyses, ‘Creticus R analyses for manuscript’ for L. creticus data analyses and 'Spilo_field_repeatability' for the body size measures repeatability analysis.
Access information
Other publicly accessible locations of the data, excluding the raw photos:
