Data and source code used for: Diversification of pre-mating behaviors through temporal reordering of components
Data files
Jan 15, 2026 version files 1.07 GB
-
analysis.zip
1.07 GB
-
README.md
22.11 KB
Abstract
This repository provides access to the data and source code used for the manuscript: "Diversification of pre-mating behaviors through temporal reordering of components". We formatted and analyzed two separate datasets based on the trajectory data we obtained from Social LEAP (SLEAP) and UMATracker. One dataset was formatted to compare the speeds and tandem metrics of Microcerotermes nervosus before and after wing-shed (df_output.rda). The second formatted dataset was used to compare speed and tandem metrics of M. nervosus to those of seven sympatric species (df_comp.rda).
Article Information
This repository provides access to the data and source code used for the manuscript:
"Diversification of pre-mating behaviors through temporal reordering of components"
Elijah Carroll, Taisuke Kanao, Nathan Lo, Nobuaki Mizumoto
Contact, Elijah Carroll: epc0015@auburn.edu; Nobuaki Mizumoto: nzm0095@auburn.edu
This study investigates the tandem running behavior and change to the pre-mating sequence of Microcerotermes nervosus. The data were analyzed using deep-learning posture tracking software except for when the pairs were winged. Winged individuals were tracked using UMATracker, a software that tracks centroids using background subtraction.
We formatted and analyzed two separate datasets based on the trajectory data we obtained from SLEAP (v1.3.3). and UMATracker. One dataset was formatted to compare the speeds and tandem metrics of Microcerotermes nervosus before and after wingshed. The second formatted dataset was used to compare speed and tandem metrics of M. nervosus to those of seven sympatric species including Amitermes darwini, A. parvus, Coptotermes lacteus, Macrognathotermes errator, M. sunteri, Nasutitermes graveolus, and Tumulitermes hastilis. This repository includes data and the python/R scripts.
Definitions
Treatments for winged vs. wingless comparison:
- FwM is the treatment corresponding to a winged female in the presence of a male.
- FwM is the treatment corresponding to a wingless female in the presence of a male.
- F is the treatment corresponding to a winged female in the absence of a male.
- wing_shed_frame: Frame of wingshed out of 15-minute video at 30 frames per second.
Folder names and treatments associated with species comparison:
- Ami_dar or Ad: Amitermes darwini
- Ami_par or Ap: Amitermes parvus
- Cop_lac or Cl: Coptotermes lacteus
- Mac_err or Me: Macrognathotermes erratus
- Mac_sun or Ms: Macrognathotermes sunteri
- Nas_gra or Ng: Nasutitermes graveolus
- Tum_has or Th: Tumulitermes hastilis
- Mic_ner or Mn: Microcerotermes nervosus
File types and usage:
.H5: Also named hierarchical data format version 5 (HDF5), this data file type is hierarchical binary containers used to store large, arrays of metadata. Can be opened using python.
How to read .H5 in python (using python command line):
import h5py
with h5py.File("Mic_ner_A_FwM_02.h5", "r") as f:
tracks = f["tracks"][()]
print(type(tracks))
print(tracks.shape)
print(tracks.dtype)
- Note that the shape of the data corresponds to (tracks, xy, nodes, frames). In this example, it is (2, 2, 7, 27000), meaning 2 tracks, 2 x and y coordinates, 7 nodes, and 27000 frames.
Node= Body parts that are tracked in SLEAP. We have 7 (two antennae, head tip, pronotum, body center, marker, and abdomen tip) for this project.
.feather: A columnar data storage format that can be used in python and r. This data type can be opened and handled in R and python.
How to read .feather in r:
# Install dependency
install.packages(“arrow”)
library(arrow)
# Read the Feather file
df <- read_feather("your_file.feather")
# View data
View(df)
.rda: An R data file that is used to store one or more R objects into a compressed, binary format. rda files can be opened and used in R.
How to load a .rda file in r:
# Read the RDA file
df = load("your_file.rda")
# View data
View(df)
.csv: A CSV (comma-separated values) file is a plain text file used to store tabular data, such as spreadsheets or databases, with each row representing a record and each value within a row separated by a comma. Can be opened with software such as LibreOffice Calc, OpenOffice Calc, Microsoft Excel or imported into R using the read.csv function.
How to load a .csv file in r:
# Install dependency
library(data.table)
# Load data
df <- fread("your_file.csv")
# View data
View(df)
Tracking software:
SLEAP (Social LEAP Estimates Animal Poses): Open-source deep-learning animal tracking software that performs markerless pose estimation from images or video. It allows users to define a body “skeleton” (landmarks and connections), label a subset of frames, train neural-network models, and then generate time-series trajectories of body-point coordinates for one or multiple animals.
UMATracker: Open-source video-tracking software designed for the markerless tracking of animal movement, used for tracking centroids (body center) of one or multiple animals in videos. It uses image-processing–based methods (e.g., background subtraction and object detection) to extract trajectories, and outputs a time-series of body center coordinates that can be used for analyses of movement
Table of contents
This repository includes tracking data, R codes to analyze it, and Python code for video analysis, SLEAP models, and UMATracker RDA files. The entire file is contained within the analysis.zip file.
- README.md - this file
- analysis
- code
- sleap_processing.py - Python script to extract and process inferred tracks from SLEAP files with linear interpolation of missing data points.
- data_preprep.R - R script to format and combine metadata from h5 and and RDA files from SLEAP and UMATracker respectively, to make one master dataset named 'df_all'.
- format_trajectories.R - R script for manual threshold setting and formatting datasets to compare tandem speed, duration in tandem, and duration separated for M. nervosus before and after shedding their wings.
- output.R - R script for analysis of comparison of tandem speed, tandem duration, and separation duration before and after shedding their wings.
- Format_comparisons.R - R script to combine and format metadata from M. nervosus after wingshed and all other compared species to form a master dataset for species comparisons of tandem speed, tandem duration, and separation duration.
- Output_comparisons.R - R script for comparative analysis between M. nervosus and all other tested species (see article information for species). We analyzed differences in tandem speed, tandem duration, and separation duration between M. nervosus and seven sympatric species.
- Check_NoMoveTandem.R - R script that provides a visual validation of our manually chosen threshold for defining tandem running.
- data_fmt - File containing temporal datasets generated from raw data
Ami_dar_df.feather,Ami_par_df.feather,Cop_lac_df.feather,Mac_err_df.feather,Mac_sun_df.feather,Nas_gra_df.feather,Tum_bas_df.feather,sleap_FwM_df.feather,sleap_FM_df.feather: Trajectory data generated by sleap_processing.py. Performed interpolation and filling for data_raw and converted them to be able to read in R.Ami_dar_bodysize.csv,Ami_par_bodysize.csv,Cop_lac_bodysize.csv,Mac_err_bodysize.csv,Mac_sun_bodysize.csv,Nas_gra_bodysize.csv,Tum_bas_bodysize.csv,sleap_FM_bodysize.csv,sleap_FwM_bodysize.csv: Raw data for body size generated by sleap_processing.py. Used to provide species-specific distance thresholds to define tandem running.- These datasets contain body size of males and females for each vieo in pixels.
- Definition of columns:
- video: Metadata for identification of specific replicates.
- male: body length of male in pixels.
- female: Body length of female in pixels.
df_all_mn.rda: rda file containing combined data from FwM, FM, and F generated usingdata_preprep.r. Frames are downsampled and converted into seconds, pixels are converted into millimeters. This dataset was used to generatedf_output.rdafor comparisons between winged and wingless states of M. nervosus (see definitions).- Definition of columns:
- Frame: Time in seconds
- Video: Metadata for identification of specific replicates.
- fx, fy, mx, my: x and y coordinates in millimeters of where each male (mx, my) and female (fx, fy) are at each time point.
- software: The tracking program used to obtain the trajectory data (SLEAP or UMATracker).
- treat: Treatment group used for comparisons, winged female with male (FwM), non-winged female with male (FM), or single female (F).
- Definition of columns:
df_body_mn.rda: rda file generated usingdata_preprep.rcontaining body size in millimeters for male and female of each replicate. Body length used to generate a distance threshold for determination of tandem running behavioral state.- Definition of columns:
- Video: Metadata for identification of specific replicates.
- female: Female body length.
- male: Male body length.
- software: tracking software used to track individuals (SLEAP or UMATracker).
- treat: Treatment group used for comparisons, winged female with male (FwM) or non-winged female with male (FM).
- Definition of columns:
df_output.rda: rda data files generated using format_trajectories.r which compiles three data matrices containing average tandem speeds of each video (df_sum), tandem events and the length of each event (df_tandem), and separation events and the length of each separation event (df_separation) for winged pairs, wingless pairs, and single females of M. nervosus. This was used for analysis and generating figures for comaprison between winged and wingless pairs of M. nervosus.
Definitions of datasets compressed indf_output.rda:df_sum:- name: Metadata for identification of specific replicates.
- f_speed: Average speed of female during the 30 minute video.
- f_speed_tandem: Average speed of female when only when in tandem.
- m_speed: Average speed of male during the 30 minute video.
- m_speed_tandem: Average speed of male when only when in tandem.
- software: tracking software used to track individuals (SLEAP or UMATracker).
- treat: Treatment group used for comparisons, winged female with male (FwM), non-winged female with male (FM), or single female (F).
df_tandem:- name: Metadata for identification of specific replicates.
- tan_duration: Duration of each tandem event before separation.
- tan_cens: Indication of right censored data. FALSE if tandem ended within observation window, TRUE if event did not stop before observation stopped.
- tan_end: Data point where tandem ended for each tandem event. Note that 9000 is the number of observations within the 30 minute period (5 frames per second)* 60 seconds * 30 minutes = 9000 data points.
- software: tracking software used to track individuals (SLEAP or UMATracker).
- treat: Treatment group used for comparisons, winged female with male (FwM) or non-winged female with male (FM).
df_separation:- name: Metadata for identification of specific replicates.
- sep_duration: Duration of each separation event before individuals reunite in tandem.
- sep_cens: Indication of right censored data. FALSE if separation ended within observation window, TRUE if event did not stop before observation stopped.
- software: tracking software used to track individuals (SLEAP or UMATracker).
- treat: Treatment group used for comparisons, winged female with male (FwM) or non-winged female with male (FM).
df_comp.rda: rda file containing combined dataset for trajectories of all species and generated usingFormat_comparisons.r. This dataset was used to generatedf_sum_speed_comparison.rda,tandem_comparison.rda, andseparation_comparison.rda.- Definition of columns:
- Species: Species denoted by genus (first, capitalized letter) and species name (second, lowercase letter). Example: Mn is denoting Microcerotermes nervosus.
- time: Time in seconds, every 0.2 seconds.
- Video: Contains metadata for replicate identification.
- fx, fy, mx, my: x and y coordinates in millimeters of where each male (mx, my) and female (fx, fy) are at each time point.
- colony: Denotes which colony used (Only M. nervosus has more than 1 colony, other species only sampled from one colony).
- Definition of columns:
df_sum_speed_comparison.rda,tandem_comparison.rda,separation_comparison.rda: rda files generated from df_comp.rda that contain data for average tandem speed, tandem events and the length of each event, and separation events and the length of each event for all videos of each species used for comparative analysis.- Definition of columns:
df_sum_speed_comparison.rda:- species: Species denoted by genus (first, capitalized letter) and species name (second, lowercase letter). Example: Mn is denoting Microcerotermes nervosus.
- name: Metadata for identification of specific replicates.
- colony: Denotes which colony used (Only M. nervosus has more than 1 colony, other species only sampled from one colony).
- sex: Identification of male and female for each video.
- speed: Average speed during the 30 minute video.
- tandem_speed: Average speed only during tandem.
tandem_comparison.rda:- species: Species denoted by genus (first, capitalized letter) and species name (second, lowercase letter). Example: Mn is denoting Microcerotermes nervosus.
- name: Metadata for identification of specific replicates.
- colony: Denotes which colony used (Only M. nervosus has more than 1 colony, other species only sampled from one colony).
- tan_duration: Duration of each tandem event before separation.
- tan_cens: Indication of right censored data. FALSE if tandem ended within observation window, TRUE if event did not stop before observation stopped.
separation_comparison.rda:- species: Species denoted by genus (first, capitalized letter) and species name (second, lowercase letter). Example: Mn is denoting Microcerotermes nervosus.
- name: Metadata for identification of specific replicates.
- colony: Denotes which colony used (Only M. nervosus has more than 1 colony, other species only sampled from one colony).
- sep_duration: Duration of each separation event before individuals reunite in tandem.
- sep_cens: Indication of right censored data. FALSE if separation ended within observation window, TRUE if event did not stop before observation stopped.
- Definition of columns:
combined_bodysize.rda: rda file containing the male and female body size in millimeters all videos of each species used for comparative analysis.- Definition of columns:
- video: name: Metadata for identification of specific replicates. Metadata is formatted as
genus_species_treatment_replicate. - female: Female body length.
- male: Male body length.
- software: tracking software used to track individuals (SLEAP or UMATracker). Only used to combine FwM into species comparisons to compare FM and FwM to show that winged M. nervosus was better at tandem than most other species which were not winged (see supplementary files of manuscript).
- treat: Treatment group used for comparisons, winged female with male (FwM) or non-winged female with male (FM). Just a remnant from combining this file with FwM body size data. Not used.
- video: name: Metadata for identification of specific replicates. Metadata is formatted as
- Definition of columns:
- data_raw - Folder containing all raw data
- File name format:
Genus_species_colony_treatment_replicate.h5for SLEAP results andGenus_species_colony_treatment_replicate.csv forUMATracker results. - For
.h5files, each file contains a "locations" array with the shape (frame, node, xy, individual), representing x–y coordinates of each body part over time. - Nodes (0–7) correspond to: "Head", "Pronotum", "Tip", "AntennaR", "AntennaL", "BodyCenter", "Marker."
- sleap_FM - Folder containing raw data of sleap tracking results for Microcerotermes nervosus female - male pairs that have already shed wings prior to starting video. Files are in
.h5format. - sleap_FwM - Folder containing raw data of sleap tracking results for Microcerotermes nervosus famale-male pairs in
.h5format. Data was cutoff before wingshed and only data for after wingshed was processed for analysis. UMATracker was used to obtain trajectory daqta before wingshed (See trajectories_UMA_FwM folder). - trajectories_UMA_F - Folder containing raw data of sleap tracking results for Microcerotermes nervosus single female with wings in
.csvformat. - trajectories_UMA_FwM - Folder containing raw data of sleap tracking results for Microcerotermes nervosus female-male pairs with wings in
.csvformat. Data was cutoff after wingshed and SLEAP was used to obtain trajectory data after shedding wings. - Frames_UMATracker.csv - Frames when females with a male partner shed their wings. This dataset was used to compare female wingshed in presence of conspecific or alone and to determine point when UMATracker data or SLEAP data would be used used for tracking.
- species_comp - Folder containing raw data from sympatric species used for comparative analysis, all raw data in these folders are in
.h5format.
- File name format:SLEAPlabels_SLEAPpredictions_Genus_species_colony_treatment_replicate.h5- Ami_dar: Folder containing raw data for Amitermes darwini
- Ami_par: Folder containing raw data for Amitermes parvus
- Cop_lac: Folder containing raw data for Coptotermes lacteus
- Mac_err: Folder containing raw data for Macrognathotermes erratus
- Mac_sun: Folder containing raw data for Macrognathotermes sunteri
- Nas_gra: Folder containing raw data for Nasutitermes graveolus
- Tum_has: Folder containing raw data for Tumulitermes hastilis
- File name format:
- code
Session information
Python (data processing)
python version 3.9.9
h5py: 3.11.0
numpy: 1.24.4
pandas: 2.0.3
pip: 24.2
pyarrow: 17.0.0
python-dateutil: 2.9.0.post0
pytz: 2024.2
scipy: 1.10.1
setuptools: 75.1.0
six: 1.17.0
tzdata: 2024.2
wheel: 0.44.0
R
library(devtools)
session_info()
Session info ──────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31 ucrt)
os Windows 11 x64 (build 26100)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.utf8
ctype English_United States.utf8
tz America/Chicago
date 2025-10-21
rstudio 2024.12.0+467 Kousa Dogwood (desktop)
─ Packages ──────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-8 2024-09-12 [1] CRAN (R 4.4.1)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
broom 1.0.8 2025-03-28 [1] CRAN (R 4.4.2)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.2)
car 3.1-3 2024-09-27 [1] CRAN (R 4.4.2)
carData 3.0-5 2022-01-06 [1] CRAN (R 4.4.2)
cli 3.6.5 2025-04-23 [1] CRAN (R 4.4.3)
colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.4.2)
data.table 1.17.0 2025-02-22 [1] CRAN (R 4.4.3)
devtools * 2.4.6 2025-10-03 [1] CRAN (R 4.4.3)
dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.4.2)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.3)
evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.3)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.2)
Formula 1.2-5 2023-02-24 [1] CRAN (R 4.4.0)
fs 1.6.6 2025-04-12 [1] CRAN (R 4.4.3)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.2)
ggplot2 3.5.1 2024-04-23 [1] CRAN (R 4.4.2)
ggpubr 0.6.0 2023-02-10 [1] CRAN (R 4.4.2)
ggsignif 0.6.4 2022-10-13 [1] CRAN (R 4.4.2)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.4.2)
gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.2)
km.ci 0.5-6 2022-04-06 [1] CRAN (R 4.4.2)
KMsurv 0.1-5 2012-12-03 [1] CRAN (R 4.4.0)
knitr 1.50 2025-03-16 [1] CRAN (R 4.4.3)
lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.2)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.2)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.2)
Matrix 1.7-1 2024-10-18 [2] CRAN (R 4.4.2)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.2)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.2)
pillar 1.10.1 2025-01-07 [1] CRAN (R 4.4.3)
pkgbuild 1.4.8 2025-05-26 [1] CRAN (R 4.4.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.2)
pkgload 1.4.1 2025-09-23 [1] CRAN (R 4.4.3)
purrr 1.0.4 2025-02-05 [1] CRAN (R 4.4.3)
R6 2.6.1 2025-02-15 [1] CRAN (R 4.4.3)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.3)
rlang 1.1.6 2025-04-11 [1] CRAN (R 4.4.3)
rstatix 0.7.2 2023-02-01 [1] CRAN (R 4.4.2)
rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.4.2)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.2)
sessioninfo 1.2.3 2025-02-05 [1] CRAN (R 4.4.3)
survival 3.7-0 2024-06-05 [2] CRAN (R 4.4.2)
survminer 0.5.0 2024-10-30 [1] CRAN (R 4.4.2)
survMisc 0.5.6 2022-04-07 [1] CRAN (R 4.4.2)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.2)
tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.4.2)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.2)
usethis * 3.2.1 2025-09-06 [1] CRAN (R 4.4.3)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.2)
xfun 0.51 2025-02-19 [1] CRAN (R 4.4.3)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.2)
zoo 1.8-13 2025-02-22 [1] CRAN (R 4.4.3)
