Data from: Loss of pair formation predates the evolution of male-less society in termites
Abstract
Asexual lineages are rare in social animals with biparental care, where successful reproduction typically requires coordinated behavior between two individuals of opposite sex. Male-less lineages of the termite Glyptotermes nakajimai provide a unique opportunity to unravel how sexual reproduction can be lost in such animals. Here, we show that modification of the mate-pairing process predated the evolution of the asexual lineage. Termite colonies are typically initiated by a mating pair that searches for a nest site through a tandem courtship behavior. Our comparative analysis of tandem running in Glyptotermes termites revealed that two related species, G. fuscus and G. satsumensis, exhibited both female-leader and male-leader tandem runs. However, tandem running was rare and ephemeral in both sexual and asexual lineages of G. nakajimai. Furthermore, our comparative studies throughout termites’ diversity showed that a typical monogamous pairing was uniquely lost in G. nakajimai, while G. fuscus and G. satsumensis initiated nests in pairs. Our study evidenced that a clear disruption of the reproductive behavioral sequence, coupled with an alternative mode of colony foundation, seems to be a precondition for asexuality in biparental species.
Article Information
This provides access to the data and source code used for the manuscript: "Loss of pair formation predates the evolution of male-less society in termites"
Nobuaki Mizumoto, Toshihisa Yashiro, Simon Hellemans
This study investigates the tandem running behavior of three Glyptotermes termite species, including G. nakajimai (both sexual and asexual populations), G. fuscus, and G. satsumensis. The videos were analyzed using the deep-learning posture tracking software, SLEAP (v1.4.0). to quantify tandem running behavior and compare it across species. Additionally, this study conducts phylogenetic comparative analyses of tandem running behavior and mating systems across the entire termite diversity. This repository includes data and the Python/R scripts.
Table of Contents
This repository includes tracking data, R codes to analyze it, and Python code for video analysis and SLEAP models. The entire file is contained within the data.zip file.
- README - this file
- analysis
- code
data_prep.py: Python script to process and clean data from SLEAP tracking outputs.plot.R: R script for conducting statistical analysis and generating figures.phylogeny.R: R script for phylogenetic comparative analysissleap_metric_error.py: Python script for sleap model evaluation
- data_raw - folder containing raw data
- Gly_fus, Gly_nak_asexual, Gly_nak_sexual, Gly_sat: These five folders include raw data of SLEAP tracking results in
.h5format.- File name format:
Genus_Species_Colony_rep.h5 - Each file contains a "locations" array with the shape (frame, node, xy, individual), representing x–y coordinates of each body part over time for 2 individuals
- Nodes (0–6) correspond to: "Head", "Pronotum", "Tip", "AntennaR", "AntennaL", "BodyCenter", "Marker."
- File name format:
- tree: this folder contains the data for phylogenetic comparative analysis
mating_system.csv, mating_system_table.xlsxthe results of the literature survey.- Definition of columns
-
Group: Basal: other groups, Kalo: Kalotermitidae, Neo: Neoisoptera -
Genus,Species: Genus and species -
Tandem,Parthenogenesis: Presence of these traits. -
`Incipient`, `Mature`: Reproductive composition in incipient or mature colonies. Either 2 (monogamous) or >2 (multiple) -
Source: type of information source, including field-field observation, behavior-behavioral evidence, genetics-genetic structures of colonies -
Ref: Reference -
Note1,Note2
-
- Definition of columns
run5_400M_burnin20_mcc_median.tree: phylogenetic tree generated in this studytandem_info_Mizumoto-etal-2022-PNAS.csv: information of tandem running downloaded from Mizumoto et al., 202,2, PNAS
Glyptotermes_reproductive_number.csv: Data of the number of reproductives in Glyptotermes spp.- Definition of columns
genus,species: Genus and speciesdevelopment: incipient or mature coloniespq,pk,sq,sk: The number of primary queens, primary kings, secondary queens, and secondary kings. primary indicates alate-derived reproductives, while secondary indicates neotenic. We only focus on alate-derived individuals in the analysisp_total: total number of reproductives
- Definition of columns
- Gly_fus, Gly_nak_asexual, Gly_nak_sexual, Gly_sat: These five folders include raw data of SLEAP tracking results in
- data_fmt - add temporal data generated from raw data. See the codes for how they were generated.
Gly_fus_df.feather,Gly_nak_asexual_df.feather,Gly_nak_sexual_df.feather,Gly_sat_df.feather: created bydata_prep.pyPerformed interpolation, filling, and smoothing for data_raw and converted them to be able to read in R.df_bodylength.rda,df_relative.rda,df_tandem_analysis.rda:.featherData will be converted for plotting and statistical analysis.HRT_data.rda,ace_tandem.rda: temporal data for ancestral state reconstruction
- output - folder containing outputs. Empty before running the codes.
- code
- SLEAP - This folder contains trained SLEAP models used for pose estimation in the termite species. Within each species subfolder, the
modelsdirectory includes two models for a top-down tracking approach: one for locating individuals (centroid; e.g.,Gly_sat_general_221229_092738.centroid.n=342) and one for identifying body parts (centered instance; e.g.,Gly_sat_general_221229_094744.centered_instance.n=342). The contents (e.g.,.h5,.json,.slp,.npz,.csvFiles) are the SLEAP project structure (v1.4.0) includes model weights, training logs, configurations, and ground-truth labels. Recommend using either (Gly_sat_general_221229_092738.centroid.n=342andGly_sat_general_221229_094744.centered_instance.n=342) or (230630_110308.centroid.n=472and230630_121831.centered_instance.n=472) for future reanalysis of unanalyzed videos. For details on file structure and usage, refer to the official SLEAP documentation: https://sleap.ai- Gly_fus - folder for Glyptotermes fuscus. The general model was created with n=47, and then the models were trained for each colony separately (Gly-fus_G05, Gly-fus_21A).
- Gly_nak - folder for Glyptotermes nakajimai. Using the model of G. fuscus as a starting point, four models were developed for G. nakajimai, corresponding to each of the original colonies: colony JP2107 (asexual), colony 356, colony 357, and colony NM2344.
- Gly_sat - folder for Glyptotermes satsumensis. The general model was created (Gly_sat_general). Most videos were analyzed using this general model, but in cases of low tracking accuracy, we further labeled 3-13 frames and developed a specific model (340-2, 340-3, 340-B1, 347-2, B-10, JP2106-3, JP2106-5).
Setup & Dependencies
The scripts of this project are written in R and Python, tested on Windows 11 (64-bit). The following are the environments.
R Session Infn R version 4.4.1 (2024-06-14)
Packages: scales, phytools, maps, ape, multcomp, TH.data, MASS, mvtnorm, lme4, Matrix, coxme, bdsmatrix, car, carData, survminer, ggpubr, survival, CircMLE, NPCirc, circular, ggridges, viridis, viridisLite, ggplot2, forcats, tidyr, dplyr, data.table, stringr, arrow
packages <- c(scales="1.3.0", phytools="2.4-4", maps="3.4.2.1", ape="5.8-1", multcomp="1.4-28", TH.data="1.1-3", MASS="7.3-60.2", mvtnorm="1.3-3", lme4="1.1-36", Matrix="1.7-0", coxme="2.2-22", bdsmatrix="1.3-7", car="3.1-3", carData="3.0-5", survminer="0.5.0", ggpubr="0.6.0", survival="3.6-4", CircMLE="0.3.0", NPCirc="3.1.1", circular="0.5-1", ggridges="0.5.6", viridis="0.6.5", viridisLite="0.4.2", ggplot2="3.5.1", forcats="1.0.0", tidyr="1.3.1", dplyr="1.1.4", data.table="1.17.0", stringr="1.5.1", arrow="19.0.1")
for (pkg in names(packages)) remotes::install_version(pkg, version = packages[pkg])
Python Environment
Python 3.11.4
pip install \
h5py==3.13.0 \
numpy==1.25.0 \
pandas==2.2.3 \
scipy==1.15.2 \
feather-format==0.4.1 \
pillow==11.2.0
