Data from: Reverse engineering the control law for schooling in zebrafish using virtual reality
Abstract
Revealing the evolved mechanisms that give rise to collective behavior is a central objective in the study of cellular and organismal systems. Additionally, understanding the algorithmic basis of social interactions in a causal and quantitative way offers an important foundation for subsequently quantifying social deficits. Here, with Virtual Reality (VR) technology, we employ virtual robot fish to reverse-engineer the sensory-motor control of social response during schooling in a vertebrate model: juvenile zebrafish (Danio rerio). In addition to providing a highly-controlled means to understand how zebrafish translate visual input to movement decisions, networking our systems allows real fish to swim and interact together in the same virtual world. Together, this allows us to directly test models of social interactions in situ. A key feature of social response is shown to be single- and multi-target-oriented pursuit. This is based on an egocentric representation of the positional information of conspecifics, and is highly robust to incomplete sensory input. We demonstrate, including with a Turing test and a scalability test for pursuit behavior, that all key features of this behavior are accounted for by individuals following a simple experimentally-derived proportional derivative control law, which we term ‘BioPD’. Since target pursuit is key to effective control of autonomous vehicles, we evaluate—as a proof of principle—the potential utility of this simple evolved control law for human-engineered systems. In doing so, we find close-to-optimal pursuit performance in autonomous vehicle (terrestrial, airborne, and watercraft) pursuit, while requiring limited system-specific tuning or optimization.
This Dryad package contains the data and analysis code supporting the study “Reverse engineering the control law for schooling in zebrafish using virtual reality.” It includes (i) biology (VR zebrafish) datasets and (ii) robotics datasets and precomputed analysis outputs used to reproduce key results/figures in the associated Science Robotics article.
Description of the data and file structure
Directory tree (files not shown)
The compressed master directory contains code notebooks/scripts, precomputed analysis outputs (.npy, .pkl), and data folders (including platform-specific subfolders and an aggregated HDF5 file). The directory tree structure is shown below (files within subfolders not expanded):
data.zip
├── 1V_MultiFixedSpeed4Plot.npy
├── 1VF_ExtractForceCaracters_BackTrack_IntervalDisappear4Plot.npy
├── 1VF_Heatmap_temp_Cframe30_AngDiff30
├── 1VF_LowDisplayFrequency.npy
├── 1VF_ShortDisappear_02_s_SwimFaster4Plot.npy
├── 2RF_3D_Heatmap_Katz_2022-01-02.pkl
├── 2RF_Matrix_Heatmap_Katz_2022-01-02.pkl
├── 2VF_BifurcationRefine_Result_temp_Cframe30_AngDiff30
├── Bifurcation_modeling_2023-04-10.pkl
├── scalability_data.h5
├── Speed_vs_Dis_VF_diff_speed_ori_data_Cframe100_2023-04-10.pkl
├── data_boat
│ ├── biopd_force_20220831.xlsx
│ ├── biopd_pose_20220831.xlsx
│ ├── mpc3_force_20220908.xlsx
│ ├── mpc3_pose_20220908.xlsx
│ ├── reference_20220831.xlsx
│ └── reference_20220902.xlsx
├── data_car
│ ├── BioPD.csv
│ └── MPC.csv
├── data_drone
│ ├── BioPD.csv
│ └── MPC.csv
└── scalability_robots
├── log_angular_vel_20230610162658.npy
├── log_angular_vel_20230721133933.npy
├── log_angular_vel_20230723121938.npy
├── log_angular_vel_20230723162112.npy
├── log_robot_ctrl_20230610162658.npy
├── log_robot_ctrl_20230721133933.npy
├── log_robot_ctrl_20230723121938.npy
├── log_robot_ctrl_20230723162112.npy
├── log_robot_states_20230610162658.npy
├── log_robot_states_20230610162658.png
├── log_robot_states_20230721133933.npy
├── log_robot_states_20230721133933.png
├── log_robot_states_20230723121938.npy
├── log_robot_states_20230723121938.png
├── log_robot_states_20230723162112.npy
├── state_sim_20230610162658.npy
├── state_sim_20230721133933.npy
├── state_sim_20230723121938.npy
├── state_sim_20230723153223.npy
└── state_sim_20230723162112.npy
Column-wise data dictionary (applies to raw/processed logs in data_boat/, data_car/, data_drone/)
Common identifiers and metadata for data_car/ for both MPC and BioPD
| Column | Meaning |
|---|---|
desired_x |
desired x |
desired_y |
desired y |
desired_ori |
desired orientation |
x |
robot position x |
y |
robot position y |
ori |
robot orientation |
u1 |
Control of speed |
u2 |
Control of angular speed |
Common identifiers and metadata for data_drone/ for both MPC and BioPD
| Column | Meaning |
|---|---|
desired_x |
desired x |
desired_y |
desired y |
current_x |
desired orientation |
current_y |
desired orientation |
xe_in |
deviation in x as input |
ye_in |
deviation in y as input |
ang_in |
orientation in the original global coordinate |
xe_o |
deviation in x in local coordinate |
ye_o |
deviation in y in local coordinate |
xedt_o |
first derivative of deviation in x in local coordinate |
yedt_o |
first derivative of deviation in y in local coordinate |
vx |
speed control in x in global coordinate |
vy |
speed control in y in global coordinate |
vx_o |
speed control in x in local coordinate |
vy_o |
speed control in y in local coordinate |
Common identifiers and columns for data_boat/ pose files (MPC and BioPD)
Applies to: data_boat/mpc3_pose_*.xlsx and data_boat/biopd_pose_*.xlsx (loaded with header=None).
In the provided code snippet, only columns 0–2 are used (time, x, y).
| Column | Meaning |
|---|---|
0 |
Time stamp (s) |
1 |
Position (x) in the global/world frame (used as xs in the code) |
2 |
Position (y) in the global/world frame (used as ys in the code) |
3 |
Heading / orientation angle in the global frame (not used in the shown snippet) |
4 |
Additional state/kinematic variable (not used in the shown snippet) |
5 |
Additional state/kinematic variable (not used in the shown snippet) |
6 |
Additional state/kinematic variable (not used in the shown snippet) |
Applies to: data_boat/reference_*.xlsx (loaded with header=None).
| Column | Meaning |
|---|---|
0 |
Time stamp (s), if present |
1 |
Reference/desired position (x) (used as dxs in the code) |
2 |
Reference/desired position (y) (used as dys in the code) |
3+ |
Other reference variables (not used in the shown snippet) |
HDF5 contents: scalability_data.h5
The file scalability_data.h5 is an HDF5 container used for scalability analyses.
Abbreviations used in group names
| Column | Meaning |
|---|---|
s |
speed (scalar or component) |
l |
leader |
f |
follower |
x, y |
Cartesian components (arena/world frame) |
xs, ys |
relative distance (follower position relative to leader) in x/y |
_new |
“processed/standardized” version used in the final analyses (as defined in the codebase) |
_scale |
scaled/normalized by body length |
_av |
average (typically across time, or across individuals within a run; see the analysis notebooks for the exact averaging operation) |
Top-level groups and their meaning
all_sl_av
Average speed of the leader (typically per trial/run).all_slx_new
Leader speed component in x, processed (“_new”) version used in the final analyses.all_sly_new
Leader speed component in y, processed (“_new”) version used in the final analyses.all_sfx_new
Follower speed component in x, processed (“_new”) version used in the final analyses.all_sfy_new
Follower speed component in y, processed (“_new”) version used in the final analyses.all_xs_new
Relative distance in x (typically follower x minus leader x), processed (“_new”) version.all_ys_new(sometimes shown/used asall_sly_newin directory listings; confirm spelling in the HDF5)
Relative distance in y (typically follower y minus leader y), processed (“_new”) version.all_xs_scale
Relative distance in x scaled by body length (dimensionless).all_ys_scale
Relative distance in y scaled by body length (dimensionless).
Code/Software
All code in this submission is custom-written in Python 3.8. The repository provides a set of Jupyter notebooks (.ipynb) and a small utility script for plotting.
Main notebooks and scripts (as provided)
Data preprocessing notebooks
dataprocess_bars.ipynb
Preprocesses/cleans the relevant dataset(s) for bar/summary analyses (e.g., aggregations used in bar plots).dataprocess_car.ipynb
Preprocesses car-platform datasets (reads raw logs, produces standardized arrays/tables, and/or cached intermediates).dataprocess_drone.ipynb
Preprocesses drone-platform datasets (reads raw logs, produces standardized arrays/tables, and/or cached intermediates).dataprocess_roboboat_exp_kinematic.ipynb
Preprocesses boat-platform experimental kinematics (trajectory/state processing and derived kinematic variables).
Analysis / figure-generation notebooks
1VF_ExtractForceCaracters_BackTrack.ipynb
Extracts force/interaction “characteristics” (as defined in the study) for the 1VF configuration and performs back-tracking / reconstruction steps used in the analysis.2VF_Backforth_RefineBifurcationPoints_MotionModel.ipynb
Performs back-and-forth refinement of bifurcation points using the motion model for the 2VF configuration.ReadDataAndPlot_RF_Scalability.ipynb
Reads scalability datasets (includingscalability_data.h5) and generates scalability plots/metrics for the RF condition(s).SwarmRobotsAna_final.ipynb
Main analysis notebook for swarm-robot datasets; computes summary metrics and produces final analysis outputs used in the manuscript.VerifyFBC_3Ddata_Matrix.ipynb
Verification notebook for the FBC condition; operates on 3D data and matrix-form summaries to validate results.Plot4mainfig3v2.ipynb
Generates plots for the main figure(s), including the version labeled “fig3 v2” in the filename.
Utility script
clsplot.py
Python helper utilities for consistent plotting (e.g., style/formatting helpers) used by one or more notebooks.
Dependencies
The notebooks typically rely on the standard scientific Python stack:
numpy,scipy,pandas,matplotlib.h5py(for readingscalability_data.h5)jupyter(to run notebooks)
Data are collected by observing zebrafish interacting with virtual robotic 'conspecifics' under various controls and treatments. Robotic data are gathered using different types of robots that are controlled based on rules derived from the biological system.
