Data from: Automated profiling of social behaviors to assess the genetic basis of evolution of aggressive behaviors in Astyanax mexicanus

Mapa, Renee 1 ; Kowalko, Johanna 1 ; Choy, Stefan 1 ; Kimmel, Alli1; Lapko, Solomia1; Carino-Bazan, Isabel1

Published Mar 10, 2026 on Dryad. https://doi.org/10.5061/dryad.ksn02v7hd

Data files

Mar 10, 2026 version files 35.36 GB

models.zip

3.58 GB
README.md

5.66 KB
SLEAP_data.zip

3.58 GB
Videos.zip

28.20 GB

Abstract

Across the animal kingdom, social behaviors such as aggression are critical for survival and reproductive success. While there is significant variation in social behaviors within and between species, the genetic mechanisms underlying natural variation in social behaviors are poorly understood. A central challenge to investigating the mechanisms contributing to the evolution of social behaviors is that these behaviors are typically complex, making them a challenge to quantify. The Mexican tetra, Astyanax mexicanus, is a powerful model for investigating the evolution of traits, as it is a single species that exists as populations of eyed, river-dwelling surface fish and blind cave-dwelling fish. The blind cavefish have evolved morphological and behavioral differences compared to surface fish, including reduced aggression. Here, we developed and validated an automated machine learning pipeline that integrates pose-estimation and supervised behavioral classification to track and quantify aggression-associated behaviors—striking, following, and circling. Using this pipeline, we established that these behaviors are quantitatively different between surface and cave fish during juvenile stages in A. mexicanus, similar to what was observed previously in adults. Moreover, assessment of these aggressive behaviors in surface-cave F2 hybrid fish revealed that striking and following are strongly positively correlated, while striking and circling are negatively correlated, suggesting that these behaviors evolved through some shared genetic mechanisms. These findings demonstrate the power of automated tracking and behavioral phenotyping in multiple fish in A. mexicanus and establish a foundation for future studies investigating the genetic basis of evolution of social behaviors.

Dataset DOI: 10.5061/dryad.ksn02v7hd

Description of the data and file structure

Aggression was elicited using a resident-intruder assay. Videos were recorded from above on a Basler ace acA1300-60gm or acA1280-60gm GigE Mono camera attached to a 16-mm C Series Lens, VIS-NIR (Edmund Optics) at 30 fps. Videos were recorded using PylonViewer software (Basler).

Multi-animal pose estimation

For automated multi-animal pose tracking, we utilized and trained the machine learning model Social LEAP Estimates Animal Poses (SLEAP) (Pereira et al., 2022) on an Ubuntu 20.04.6 LTS desktop equipped with 11^th Gen Intel® Core™ i7-11700F processor (2.50 GHz), 66.0 GB of RAM, and a NVIDIA GeForce RTX 3060 Ti Lite Hash Rate (LHR) graphics card to identify and track both resident and intruder fish during the behavioral assays. 9 nodes were assigned to key body-parts of each fish to compromise the project skeleton–including nose, head, left eye, right eye, upper body, center, lower body, tail, and fin. Body parts were manually labeled on a subset of frames from the training videos imported as grayscale. Training was initially performed using a multi-animal top-down pipeline, with the center of the fish as the anchor. The trained model was then run on 20 random frames, and model performance was assessed for accuracy by manually comparing computer generated pose-estimations to true body parts. Additional frames were manually annotated for the training dataset until performance was satisfactory based on manual inspection of the output of the training videos. The final model was trained on a total of 526 manually annotated frames across 1 surface-resident/Pachón-intruder, 2 surface-resident/surface-intruder, and 3 Pachón-resident/Pachón-intruder videos. All videos were run through the final trained model with the following parameters: Tracking_pipeline = multi-animal top-down, max_instances = 2, batch_size = 4, tracking.tracker = flow, tracking.max_tracking = 2, tracking.similarity = instance, and tracking.match = greedy. The tracks were exported as (.h5) files for analysis.

Automated quantification of behavior

Classifiers were built using Simple Behavioral Analysis (SimBA) algorithms for automated quantification of behaviors (Goodwin et al., 2024). SimBA project configurations were created and executed on a Windows 11 Enterprise (64-bit) computer equipped with a 13th Gen Intel® Core™ i7-13700 processor (2.10 GHz), 16.0 GB of RAM, and an Intel® UHD Graphics 770 GPU. Since following behaviors may occur simultaneously with striking behavior, a separate project configuration file was created for following, while striking and circling models exist in a single project. Project configuration files specified the number of behavioral classifiers per project, and a custom multi-animal body-part configuration was created to mirror SLEAP pose-estimation with 9 body parts per fish. Videos (.mp4), SLEAP tracking files (.h5), and BORIS training files (.csv) were imported for each training video. Video parameters were indicated and pixels per millimeter was defined by using the length of the tank (120mm). Outlier correction was skipped, and features were extracted for each video. We developed behavior classifiers using a random forest model. The model was implemented with parameters including 2,000 random forest estimators, a minimum sample leaf node of 1, RF_criterion set to "gini," RF_max_features set to "sqrt," and a test size of 20 %. Minimum bout durations were determined based on manual annotations, with thresholds set at 67 ms for striking, 200 ms for following, and 300 ms for circling.

A precision-recall performance analysis curve was generated for each classifier during training to assess model performance. The harmonic mean between precision and recall was reported as the F1 curve, where F1_max corresponds to the optimal discrimination threshold value, or the probability cutoff in determining if the behavior is present. The discrimination value was manually adjusted to yield optimal performance for striking = 0.375, following = 0.275, and circling = 0.174 based on manual inspection of videos at a range of discrimination thresholds. Classifier models were run on the unseen testing dataset and compared to the manual annotations to assess performance. All videos were run through the pipeline, predicting bout number (striking and circling) and total bout duration (s) (following).

Files and variables

File: Videos.zip

Description: This folder contains the training and test videos used to train the SimBA model.

File: models.zip

Description: This folder has the trained SimBA automated behavioral classification models for striking, following, and circling behaviors. It also has the trained SLEAP model for computerized tracking of the juvenile A. mexicanus during the resident-intruder assay.

File: SLEAP_data.zip

Description: The SLEAP pose-estimation output files for each of the aggression assays.

Code/software

SLEAP software: This is the multi-animal pose-estimation software that will be needed to run the pose-estimation models to get the tracks for the juvenile resident-intruder assays.

https://sleap.ai/

SimBA software: This is the automated behavioral classification software that will be needed to run the trained behavioral models.

https://github.com/sgoldenlab/simba