Vocal communication is seasonal in social groups of wild, free-living house mice

Jourjine, Nicholas 1 ; Goedecker, Caspar2; Konig, Barbara2; Lindholm, Anna2

Published May 15, 2025 on Dryad. https://doi.org/10.5061/dryad.kprr4xhfk

Data files

May 15, 2025 version files 11.79 GB

README.md

20.48 KB
wild-mus-vocal-ecology-data.zip

11.79 GB

Abstract

House mice (Mus musculus domesticus) are among the most widely studied laboratory models of mammalian social behavior, yet we know relatively little about the ecology of their behaviors in natural environments. Here, we address this gap using radiotelemetry to track social interactions in a population of wild mice over 10 years, from 2013 to 2023, and interpret these interactions in the context of passive acoustic monitoring data collected from August 2022 to November 2023. Using automated vocal detection, we identify 1.3 million individual vocalizations and align them in time with continuously collected telemetry data recording social interactions between individually identifiable mice. We find that vocalization is seasonal and correlated with long-term dynamics in features of social groups. In addition, we find that vocalization is closely associated in time with entrances to and exits from those groups, occurs most often in the presence of pups, and is correlated with how much time pairs of mice spend together. This work identifies seasonal patterns in the vocalizations of wild mice and lays a foundation to investigate the social role of acoustic communication in wild populations of an important laboratory model organism.

This repository contains data needed to reproduce figures and analyses from "Vocal communication is seasonal in social groups of wild, free-living house mice" (Jourjine et al. 2025). It is organized into three directories:

1. data

Comprises four subdirectories, each holding raw data for analyses:

phenotypes
- Holds the following CSV files that describe barn population checks.
sexes.csv is a table mapping transponder ID for each mouse to its morphological sex, as determined by an expert when the mouse was first transpondered on the basis of genital morphology. All transponder IDs have been anonymized.
- Columns are (transponder_id), the transponder ID of the mouse the first time it was caught and sex (Sex), its sex determined as above.
All_popul_checks_update2023_transponder-and-popchecks.csv records the size of the barn population at each population check between 2003 and 2023
- Columns are the date of the check (Date) and the total number of mice caught (Total mice).
pups.csv describes the pups found in the barn between 2006 and 2023.
- Columns are the year of the check (year), the season of the check (season: autumn, winter, spring, or summer) and the total number of pups found during that season in that year (total_pups).
audio_recorded_pups.csv contains information about pups discovered when boxes were opened for each audio recording
- Columns are moth, the audiomoth that recorded the box, deployment, the deployment during which this audiomoth recorded in yyyymmdd-yyyymmdd format, pups_count_dropoff, the number of pups observed by a quick glance when setting the audiomoth on the RFID box, and pups_count_pickup, the number of pups counted by hand at the end of the recording after removing the audiomoth from the RFID box
rfid
- Data from the barn RFID system. See Supplementary Information S1 of the accompanying manuscript for more detail about these tables. Note that any empty cells correspond to information that is either not applicable or was not collected.
- Subdirectories:
  - box_events
    - Data tables (.feather format) with one row for each box event (entrance or exit).
    - Analyzed columns are event time (event_time), number of mice post-event (num_partners_after_event), box location (box), and event type (event_type: 1 for entrance, 2 for exit). Other columns are internal database IDs used to detect accidental row duplication.
    - File naming convention: Files start with two dates in yyyymmdd format indicating the time range of the data.
  - mouse_meets
    - Data tables (.feather format) with one row for each meeting between unique pairs of mice.
    - Analyzed columns are start (overlap_start_time) and end time (overlap_end_time), duration (time_in_secs), mouse IDs (id1, id2), and box location (box). Other columns are internal database IDs used to detect accidental row duplication.
    - File naming convention: Files start with two dates in yyyymmdd format indicating the time range of the data.
  - mouse_stays
    - Data tables (.feather format) with one row for each stay by a mouse in a box.
    - Analyzed columns are transponder ID (transponder_id), box location (box), duration (time_in_secs), start (entry_time) and end timestamps (exit_time). Other columns are internal database IDs used to detect accidental row duplication.
    - File naming convention: Files start with two dates in yyyymmdd format indicating the time range of the data.
segments
- Data tables containing information about detected vocalizations from recorded boxes. Note that any empty cells correspond to information that is either not applicable or was not collected.
- Subdirectories:
  - vocal_counts
    - Data tables (.csv format) with one row for each consecutive recorded 55 second interval, for each audiomoth.
    - Columns include the timestamp of the start of the interval in yyyymmdd_hhmmss format (minute), the timestamp in yyyy-mm-dd hh:mm:ss format (audiomoth_timestamp), the number of USVs (ultrasonic vocalizations) detected in that interval (USV_count), the number of squeaks detected in that interval (squeak_count), whether the sun was up during the interval (sunup, where 1 indicates yes; 0 indicates no), the dates of the audiomoth deployment in which these vocalizations were detected (deployment), the audiomoth that recorded the vocalizations (moth), and the box they were recorded from (box)
    - File naming convention: yyyymmdd-yyyymmdd_box#_counts where yyyymmdd-yyyymmdd is the date range of the depoloyment and # is the box number recorded during that deployment to which the data file corresponds.
  - vocal_events
    - Data tables (.csv format) with one row for each detected vocalization
    - Columns include the deployment date when the vocalization was detected (deployment), the audiomoth that recorded it (moth), the box it was recorded in (box), the timestamp of the wav file it was detected in (audiomoth_timestamp), the start and end of the vocalization relative to the beginning of that wav file (start_seconds and stop_seconds, respectively), its duration (duration), the path to the model that assigned that label (model), the original location of the wav file containing the vocalization when inference was performed (source_file), the human-interpretable label of the vocalization (label), the value in the audiomoth_timestamp column in datetime format (audiomoth_timestamp_datetime), the absolute start and stop time of the vocalization using the audiomoth internal clock (audiomoth_start_seconds and audiomoth_stop_seconds), the location of the wav file at the time of analysis (source_file), the path to the deep audio segmenter model at the time of the analysis (model). Note that when inference was performed the "squeak" vocalizations were assigned the label "cry". All "cry" labels were replaced with "squeak" prior to analyses for consistency.
    - File naming convention: yyyymmdd-yyyymmdd_box#_segments or yyyymmdd-yyyymmdd_box#_time-adjusted where yyyymmdd-yyyymmdd is the date range of the depoloyment and # is the box number recorded during that deployment to which the data file corresponds. Files ending in 'time-adjusted' contain the following additional columns with timestamps adjusted so that they are aligned to the RFID system (see methods for details):
      - audiomoth_start_seconds_adjusted: start of the vocalization adjusted to match the RFID system clock
      - audiomoth_stop_seconds_adjusted: end of the vocalization adjusted to match the RFID system clock
      - audiomoth_timestamp_datetime_adjusted: start of the audiomoth wav file adjusted to match the RFID system clock
      - deployment_correction_seconds: time difference in seconds between RFID and audiomoth clock at start of the deployment
      - recovery_correction_seconds: time difference in seconds between RFID and audiomoth clock at end of the deployment
      - estimated_or_actual_time_correction: whether the rate of clock drift was calculated directly from this audiomoth during this deployment ('actual') or from an average ('estimated') rate of clock drift based on deployments when this could be directly measured from this audiomoth (ie when an acoustic chime was used at both the start and end of the recording).
umap
- Contains parameters and output needed to reproduce the UMAP embedding plotted in Figure 4A. See the notebook Figure 4.ipynb for details.
- The spectrograms directory contains example spectrograms from Figure 4B in npy format
- 20230707_111132_params_for_UMAPembedding.json contains parameters for the UMAP embedding in Figure 4A.
- 20230707_111132_UMAPembedding.feather is a data table with one row per spectrogram in Figure 4A and columns corresponding to intensity values in each pixel of the 128x128 spectrogram image (numbered from 0 through (128*128)-1 = 16383), umap coordinates for that spectrogram in the embedding (umap1 and umap2), and information about the vocalization:
  moth: the audiomoth that recorded the vocalization
  clip_name: an identifier for the vocalization in the format audiomoth##_yyyymmdd_hhmmss_clip#
  box: the box where the vocalization was recorded
  deployment: the audiomoth deployment during the vocalization it was recorded in yyyymmdd_yyyymmdd format (ie, start-date_end-date)
  label: whether the vocalization was labeled as a USV or a squeak by our deep audio segmenter model
  sound.files: the name of the raw wav file where the vocalization was detected
  selec: the clip number of the vocalization, as in clip_name
  start: the start time of the vocalization in seconds relative to the start of the raw wav file
  end: the start time of the vocalization in seconds relative to the start of the raw wav file
  date: the minute the vocalization was recorded in yyyy-mm-dd hh:mm:ss format

2. models

Contains files related to the DAS model, generated following training using the DAS package:
- Final trained model: 20230219_120047_model.h5
- Training parameters: 20230219_120047_params.yaml
- Evaluation file generated by deep audio segment (used by Supplemental Figure 3.ipynb): 20230219_120047_results.h5

3. annotations

Contains hand-annotated vocalizations and their acoustic features. Used for model training and evaluation. See the notebook Supplemental Figure 3.ipynb for details.
- counts_test contains data files containing hand counts of vocalizations (box22_20220706-20220708_hand_counts) from a single recorded box and corresponding model predictions (box22_20220706-20220708_predictions)
  - 20220706-20220708_box22_moth00_hand_counts.csv is a data table where each row corresponds to one recorded 55s interval in a recording by audiomoth00 which was deployed between July 6 and July 8 of 2022 (this data is not included in the full dataset analyzed in the manuscript)
  - Columns are:
    vocs?: Did a human expert (NJ) see any vocalizations in this recording when viewing the spectrogram in deep audio segmenter's graphical user interface using default spectrogram settings
    actual squeak count: The number of squeaks found in the recording by an expert using the method above, for the vocs? column
    actual USV count: The number of ultrasonic vocalizations (USVs) found in the recording by an expert using the method above, for the vocs? column
    minute: The timestamp of the minute in yyyy-mm-dd hh:mm:ss format
    file_name: The name of the raw wav file
  - 20220706-20220708_predicted_counts.csv is a data table where each row corresponds to one recorded 55s interval in a recording by audiomoth00 which was deployed between July 6 and July 8 of 2022 (this data is not included in the full dataset analyzed in the manuscript)
  - Columns are:
    deployment: The deployment dates (all are '20220706-20220708')
    moth: The audiomoth (all are 'audiomoth00')
    box: The box recorded (all are 22)
    minute: The timestamp of the recorded minute in yyyy-mm-dd hh:mm:ss format
    predicted_squeak_count: The number of squeaks predicted to be in this minute by our trained deep audio segmenter model
    predicted_USV_count: The number of ultrasonic vocalizations (USVs) predicted to be in this minute by our trained deep audio segmenter model
  - 20220706-20220708_predicted_segments.csv is a data table where each row corresponds to each segment of sound predicted and labeled by our trained deep audio segmenter model
  - Columns are:
    deployment: The deployment dates (all are '20220706-20220708')
    moth: The audiomoth (all are 'audiomoth00')
    box: The box recorded (all are 22)
    minute: The timestamp of the recorded minute in yyyymmdd_hhmmss format
    start_seconds: The start of the segment relative to the start of the raw wav file in seconds
    stop_seconds: The end of the segment relative to the start of the raw wav file in seconds
    duration: The duration the segment relative in seconds
    label: The label assigned by the model (cry, which we originally used to label squeaks, USV, for ultrasonic vocalization, and noise, for non-vocal segments)
- segment_annotations contains the complete annotations used to train our deep audio segmenter model, with two files per annotated raw wav file: a csv that contains the annotations themselves, and an npz file that packages the annotated wav file and annotations, which can be loaded by the deep audio segmenter graphical user interface.
  - Files are named using the following convention: yyyymmdd_box##_moth##_hhmmss_annotations.csv where yyymmdd is the year (y), month (m), and day (d)
    of the recording, box## is the box ID, moth## is the audiomoth ID, hhmmss is start time of the wav file: hour (h), minute (m), and second (s).
  - Columns in the annotations are:
    name: The label given by the expert human annotator (NJ)
    start_seconds: The start of the segment relative to the start of the raw wav file in seconds
    stop_seconds: The end of the segment relative to the start of the raw wav file in seconds
    - Please note that if a label was not annotated, it is represented by the last row of the csv, where start_seconds is empty (NaN) and stop_seconds is 0.0. This formatting is recognized by the deep audio segmenter software.
- segment_annotations_acoustic_features contains data files of acoustic features calculated for each vocalization using the R package warbleR
  - Files are named using the following convention: yyyymmdd_box##_moth##_hhmmss_annotations.csv where yyymmdd is the year (y), month (m), and day (d)
    of the recording, box## is the box ID, moth## is the audiomoth ID, hhmmss is start of the wav file: hour (h), minute (m), and second (s). There is one file here for each file in segment_annotations, which shares the same name except that _features.csv is replaced by _annotations.csv. Note that any empty cells correspond to information that is either not applicable or features that could not be calculated by warbleR for a given vocalization.
  - Columns are:
    sound.files: The name of the raw wav file
    selec: The count of the annotated vocalization within each raw wav file
    label: The label assigned by the annotator ('cry' for squeaks, 'USV' for ultrasonic vocalizations)
    start: The start of the segment relative to the start of the raw wav file in seconds
    end: The end of the segment relative to the start of the raw wav file in seconds
    deployment: The deployment dates in yyyymmdd-yyyymmdd format
    moth: The audiomoth
    box: The box recorded
    full.path: The full path to the location of the raw data at the time of the analysis
    wavs.dir: The directory containing the raw data at the time of the analysis
    duration: The duration of the segment in seconds
    SPL: Relative, uncalibrated sound pressure level of the segment, as measured by the sound_pressure_level function of warbleR using default settings (https://marce10.github.io/warbleR/reference/sound_pressure_level.html)
    SNR: Signal to noise ratio of the segment, as measured by the sig2noise function of warbleR using default settings (https://marce10.github.io/warbleR/reference/sig2noise.html)
    prop.clipped: The percent of the frames in the audio segment that contain clipping, as measured by the find_clipping function of warbleR using default settings (https://marce10.github.io/warbleR/reference/find_clipping.html)
    param.wl: Window length parameter used by warbleR when generating spectrograms used for acoustic features
    param.ovlp: Window overlap parameter used by warbleR when generating spectrograms used for acoustic features
    param.bp_low: Minimum frequency in kHz used by warbleR when generating spectrograms for acoustic features
    param.bp_high: Maximum frequency in kHz used by warbleR when generating spectrograms for acoustic features
    param.mar: Margin before and after each segment in seconds used to calculate SNR
    All other columns correspond to output of the warbleR spectro_analysis function with Harmonicity set to False and fast set to True. Detailed definitions can be found in here, starting on line 41: https://github.com/maRce10/warbleR/blob/master/R/spectro_analysis.R
- segments_test contains predictions for the wav file in Supplemental Figure 3J,K, in the same format as the files in segment_annotations
- segments_test_wav_file contains the wav file used for prediction in Supplemental Figure 3J,K. These predictions are in segments_test

How to use

The dataset is intended to be analyzed with code at the GitHub repository here.

To combine code and data:

Clone or download the repository here. You should get a folder called wild-mus-vocal-ecology.
Download the data folder for this repository, then unzip it by clicking on it, or running
- unzip path/to/wild-mus-vocal-ecology-data.zip (MacOS Terminal)
- Expand-Archive -Path path\to\wild-mus-vocal-ecology-data.zip -DestinationPath path\to\output-folder (Windows Powershell)
You should end up with a folder called wild-mus-vocal-ecology-data containing three directories: "data", "models", and "annotations".
Copy or move the contents of the wild-mus-vocal-ecology-data folder (not the folder itself) to the wild-mus-vocal-ecology folder you cloned or downloaded from GitHub.
- To copy:
  MacOS Terminal:
  rsync -ahP /path/to/wild-mus-vocal-ecology-data/ /path/to/wild-mus-vocal-ecology/
  Windows Powershell:
  Copy-Item -Path "C:\path\to\wild-mus-vocal-ecology-data\*" -Destination "C:\path\to\wild-mus-vocal-ecology" -Recurse
- To move:
  MacOS Terminal:
  mv /path/to/wild-mus-vocal-ecology-data/* /path/to/wild-mus-vocal-ecology/
  Windows Powershell:
  Move-Item -Path "C:\path\to\wild-mus-vocal-ecology-data\*" -Destination "C:\path\to\wild-mus-vocal-ecology"
Set up the necessary virtual environments and access the analysis notebooks using the steps below:

Download and install Anaconda following the instructions here if you haven't already done so:

https://docs.anaconda.com/getting-started/

Then run the following in your terminal (Powershell on Windows, Terminal app on Mac/Linux) to create the virtual environments:

conda env create -f audiomoth_environment.yml -n audiomoth -v 
conda env create -f das_environment.yml -n das -v

Move to the wild-mus-vocal-ecology directory:

Mac/Linux: cd path/to/wild-mus-vocal-ecology
Windows Powershell: cd C:\path\to\wild-mus-vocal-ecology

Then install the necessary helper functions and set up Jupyter kernels by running:

conda activate audiomoth
python -m ipykernel install --user --name audiomoth --display-name "audiomoth"
pip install -e .
conda deactivate
conda activate das
pip install -e .
python -m ipykernel install --user --name das --display-name "DAS"
conda deactivate

This ensures that the helper functions are accessible in the notebooks and creates dedicated Jupyter kernels for each environment, allowing you to switch between them within a single notebook.

Then run the following

conda activate audiomoth
jupyter notebook

to launch Jupyter. A browser window should open, but if it doesn't, you can copy/paste the link that appears in the terminal window following these commands. You should now be able to navigate to the notebooks directory and select the notebook you would like to use.

If you have trouble completing any of this steps, please let me know by raising an issue at the GitHub repository!

Vocal communication is seasonal in social groups of wild, free-living house mice

Data files

Abstract

README: Vocal communication is seasonal in social groups of wild, free-living house mice

1. data

phenotypes

rfid

segments

umap

2. models

3. annotations

How to use