Skip to main content
Dryad

Data from: Two pup vocalization types are genetically and functionally separable in deer mice

Cite this dataset

Jourjine, Nicholas et al. (2024). Data from: Two pup vocalization types are genetically and functionally separable in deer mice [Dataset]. Dryad. https://doi.org/10.5061/dryad.g79cnp5ts

Abstract

Vocalization is a widespread social behavior in vertebrates that can affect fitness in the wild. Although many vocal behaviors are highly conserved, heritable features of specific vocalization types can vary both within and between species, raising questions of why and how some vocal behaviors evolve. Here, using new computational tools to automatically detect and cluster vocalizations into distinct acoustic categories, we compare pup isolation calls across neonatal development in eight taxa of deer mice (genus Peromyscus) and compare them with laboratory mice (C57BL6/J strain) and free-living, wild house mice (Mus musculus domesticus). Whereas both Peromyscus and Mus pups produce ultrasonic vocalizations (USVs), Peromyscus pups also produce a second call type with acoustic features, temporal rhythms, and developmental trajectories that are distinct from those of USVs. In deer mice, these lower frequency “cries” are predominantly emitted in postnatal days one through nine, whereas USVs are primarily made after day 9. Using playback assays, we show that cries result in a more rapid approach by Peromyscus mothers than USVs, suggesting a role for cries in eliciting parental care early in neonatal development. Using a genetic cross between two sister species of deer mice exhibiting large, innate differences in the acoustic structure of cries and USVs, we find that variation in vocalization rate, duration, and pitch displays different degrees of genetic dominance and that cry and USV features can be uncoupled in second-generation hybrids. Taken together, this work shows that vocal behavior can evolve quickly between closely related rodent species in which vocalization types, likely serving distinct functions in communication, are controlled by distinct genetic loci.

README

Peromyscus pup vocal evolution Dataset


Versions

  • April 2024: file developmentMU.tar.gz reuploaded to resolve issue with original file

How to use

This dataset contains raw audio recordings and processed data used to perform analyses and generate figures from Jourjine et al. Current Biology 2023

The files are stored in compressed directories. To uncompress them, double click on them or run the following in the command line tar -xvf full/path/to/file.tar.gz

The contents of these directories are described below. Please see the github repository https://github.com/nickjourjine/peromyscus-pup-vocal-evolution for code and instructions about how to use them to reproduce analyses and figures.

Two-letter codes

We use the following two-letter codes as short hand to refer to each taxon we analyze:

code taxon
BW P. maniculatus bairdii
BK P. maniculatus gambelli
SW P. maniculatus rubidus
NB P. maniculatus nubiterrae
PO P. polionotus subgriseus
LO P. polionotus leucocephalus
GO P. gossypinus
LL P. leucopus
MU Mus musculus domesticus (C57BL6/J)
MZ Mus musculus domesticus (wild)

Audio datasets

There are four sets of recordings that constitute the raw data: development, cross foster, F1, and F2.
Because of file upload limitations, the development and F2 datasets are split into parts.
All of the directories with the pre-fix "bwpof2" belong to the F2 dataset (six directories, split into approximately 100 recordings per directory, one .wav file per recorded pup).
All of the directories with the pre-fix "development" belong to the development dataset (ten directories, one directory per taxon, one .wav file per recorded pup).
The easiest way to use the development and F2 datasets is to unzip each of these subdirectories and collect all of the .wav files for each dataset into its own directory (i.e., one for all of the development wav files and one for all of the F2 wav files).

All of the directories containing unprocessed raw audio are described in the table below:

file dataset file type(s) number of files associated main figure(s) description
developmentBK.tar.gz development .wav 98 1,2 Audio recordings of isolation induced P. maniculatus gambelli pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentBW.tar.gz development .wav 80 1,2 Audio recordings of isolation induced P. maniculatus bairdii pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentNB.tar.gz development .wav 72 1,2 Audio recordings of isolation induced P. maniculatus nubiterrae pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentSW.tar.gz development .wav 73 1,2 Audio recordings of isolation induced P. maniculatus rubidus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentPO.tar.gz development .wav 76 1,2 Audio recordings of isolation induced P. polionotus subgriseus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentLO.tar.gz development .wav 66 1,2 Audio recordings of isolation induced P. polionotus leucocephalus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentGO.tar.gz development .wav 68 1,2 Audio recordings of isolation induced P. gossypinus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentLL.tar.gz development .wav 63 1,2 Audio recordings of isolation induced P. leucopus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentMU.tar.gz development .wav 116 1,2 Audio recordings of isolation induced Mus musculus domesticus (C57BL6/J) pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
developmentMZ.tar.gz development .wav 111 1,2 Audio recordings of isolation induced Mus musculus domesticus (wild) pup vocalizations between postnatal days 1 and 13 (day of birth = day 0)
bw_po_cf.tar.gz cross foster .wav 58 4 Audio recordings of isolaton induced P. maniculatus bairdii and P. polionotus subgriseus pup vocalizations at postnatal day 9 either raised by their own parents or cross fostered
bw_po_f1.tar.gz F1 recordings .wav 119 5 Audio recordings of isolaton induced pup vocalizations from first generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus
bwpof2-1.tar.gz F2 recordings .wav 100 5 Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus
bwpof2-2.tar.gz F2 recordings .wav 100 5 Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus
bwpof2-3.tar.gz F2 recordings .wav 100 5 Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus
bwpof2-4.tar.gz F2 recordings .wav 100 5 Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus
bwpof2-5.tar.gz F2 recordings .wav 100 5 Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus
bwpof2-6.tar.gz F2 recordings .wav 117 5 Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus

File name conventions

Each wav file in the above directories is named using the convention of separating specific information about the pup whose vocalizations it contains by underscores ('').
These conventions are described in the tables below, where index refers to the position of the list generated by splitting the file name by the '
' character (e.g., using the split() method in python).

Developmental data set file naming conventions
index description
0 species (using 2-letter code described in table above)
1 ID of the litter's dam and sire in the format 'damIDxsireID'
2 litter number from this dam and sire (this value is approximate and may not be accurate for every pup)
3 pup number from the litter (order in which pups were removed from their home cage for recording, using the convention that the first pup removed is pup1)
4 microphone channel used to record the pup
5 weight of the pup in milligrams
6 sex of the pup determined by anogenital distance (m=male, f=female)
7 temperature of the pup in degrees C immediately before recording (multiplied by 10 to avoid introducing '.')
8 temperature of the pup in degrees C immediately after recording (multiplied by 10 to avoid introducing '.')
9 whether or not the pup had to be removed from the dam by the experimenter (fr0=no, fr1=yes; fr stands for 'forcibly removed' while suckling)
10 age of the pup in days in the format 'p#' where # is the age counting day of birth as day 0
11 date of the recording in the format yyyy-mm-dd
12 time of the recording in the format hh-mm-ss
Cross foster data set file naming conventions
index description
0 species (using 2-letter code described in table above). CF-BW indicates a BW pup fostered by PO. CF-PO indicates a PO pup fostered by BW.
1 if pup was not cross fostered, this is the ID of the litter's dam and sire in the format 'damIDxsireID'. If cross fostered, it is the ID of the litter's dam and sire in the format 'damIDxsireID' followed by a '-' then the ID of the foster dam and sire in the format 'cfdamID-cfsireID'
2 pup number from the litter (order in which pups were removed from their home cage for recording, using the convention that the first pup removed is pup1)
3 age of the pup in days in the format 'p##' where number is the age counting day of birth as day 0 (all are p09, i.e. postnatal day 9)
4 weight of the pup in milligrams
5 sex of the pup determined by anogenital distance (m=male, f=female)
6 number of pups in the litter the pup came from
7 date of the recording in the format yyyy-mm-dd
8 time of the recording in the format hh-mm-ss
F1 data set file naming conventions
index description
0 species where BW-PO-cross-F1 indicates a first generation hybrid, cross-BW indicates P. maniculatus bairdii and cross-PO indicates P. polionotus subgriseus
1 ID of the litter's dam and sire. If not F1, in the format 'damIDxsireID'. If F1, the format is 'damIDxsireID-family-N', where N is one of A, B, C, or D and indicates which of four independent crosses between P. maniculatus bairdii and P. polionotus subgriseus the F1 pup came from
2 litter number from this dam and sire (this value is approximate and may not be accurate for every pup)
3 pup number from the litter (order in which pups were removed from their home cage for recording, using the convention that the first pup removed is pup1)
4 microphone channel used to record the pup
5 weight of the pup in milligrams
6 sex of the pup determined by anogenital distance (m=male, f=female)
7 temperature of the pup in degrees C immediately before recording (multiplied by 10 to avoid introducing '.')
8 temperature of the pup in degrees C immediately after recording (multiplied by 10 to avoid introducing '.')
9 whether or not the pup had to be removed from the dam by the experimenter (fr0=no, fr1=yes; fr stands for 'forcibly removed' while suckling)
10 age of the pup in days in the format 'p#' where # is the age counting day of birth as day 0 (all are p9, i.e. postnatal day 9)
11 date of the recording in the format yyyy-mm-dd
12 time of the recording in the format hh-mm-ss
                                                                                                                                                                                                                                                                                                                   |
F2 data set file naming conventions
index description
0 microphone channel used to record the pup. This is a copy of the information in index 6 added automatically by Avisoft recording software.
1 species where BW-PO-cross-F2 indicates a second generation hybrid (all files in this dataset are the same at this index since there are all F2 pups)
2 ID of the litter's dam and sire in the format 'damIDxsireID'.
3 family the pup came from in the format 'fam-N#' where N is the founder family its parents came from and # is the F1 breeding pair number from that family (e.g, third F1 pair generated from founder pair A is fam-A3)
4 litter number from the dam and sire (this value is approximate and may not be accurate for every pup)
5 pup number from the litter (order in which pups were removed from their home cage for recording, using the convention that the first pup removed is pup1)
6 microphone channel used to record the pup
7 weight of the pup in milligrams
8 sex of the pup determined by anogenital distance (m=male, f=female)
9 temperature of the pup in degrees C immediately before recording (multiplied by 10 to avoid introducing '.')
10 temperature of the pup in degrees C immediately after recording (multiplied by 10 to avoid introducing '.')
11 whether or not the pup had to be removed from the dam by the experimenter (fr0=no, fr1=yes; fr stands for 'forcibly removed' while suckling)
12 age of the pup in days in the format 'p#' where number is the age counting day of birth as day 0 (all are p9, i.e. postnatal day 9)
13 date of the recording in the format yyyy-mm-dd
14 time of the recording in the format hh-mm-ss

Processed data

processed_data.tar.gz contains processed data (data tables and machine learning models) used to make figures.
These files were generated using the code in the github repository https://github.com/nickjourjine/peromyscus-pup-vocal-evolution and are organized into sub-directories, one for each main and supplemental figure.
Each of these directories is described in the table below along with a reference to the related notebook and markdown section at https://github.com/nickjourjine/peromyscus-pup-vocal-evolution that uses the data.
Please refer to the README.md file in that repository for additional details about how to generate and use the data in these files.

directory file(s) description related notebooks (and markdown section)
figure_1/umap_embeddings all_species_HDBSCAN_labels.csv data table of umap embedding cluster labels where each row is a vocalization and columns correspond to vocalization wav file name ('source_file'), HDBSCAN cluster ('label'), and species (using 2-letter codes above) - used to make figure 1 panel C Analyze Vocalizations.ipynb (sections 2.1 and 2.2)
figure_1/umap_embeddings NN_embedding_coordinates.feather (10 files, one per taxon) where NN is one of the 10 2-letter species codes above - these are tables where each row is a linearized spectrogram (one per vocalization) and columns are pixel numbers and umap embedding coordinates - used to make figure 1 panel C Analyze Vocalizations.ipynb (sections 2.3, 2.4, and 2.5)
figure_1/acoustic_features all_species_warbler_features.csv data table of acoustic features used to generate Figures 1D, E, and F where each row corresponds to a vocalization and columns are features Analyze Vocalizations.ipynb (sections 2.3, 2.4, and 2.5)
figure_1/acoustic_features all_noise_floors.csv data table of spectrogram pixel values defining threshold for background noise for each vocalization in the development dataset, generated by Spectrogramming and UMAP.ipynb notebook Analyze Vocalizations.ipynb (sections 2.2 and 3.4)
figure_1/acoustic_features <data_set>_recording_lengths.json where data_set is one of bw_po_cf (cross foster dataset), bw_po_f1 (F1 dataset), bw_po_f2 (F2 dataset), or development (development dataset) - dictonaries of recording lengths for each recording, used to determine vocalization rates without recalculating recording lengths each time Analyze Vocalizations.ipynb (section 3)
figure_2 annotated_vocalizations.csv data table of the vocalizations annotated in the Annotate from UMAP.ipynb notebook where each row is a vocalization and columns are vocalization wav file name ('source_file'), umap embedding coordinates('umap1' and 'umap2'), hdbscan label ('hdbscan_label'), annotated label ('human_label'), and species Train Models on Features.ipynb (sections 2, 3, and 4); Analyze Vocalizations.ipynb (section 3.6)
figure_2 development_vocalizations_clipping_levels.csv data table of clipping levels for the development dataset calculated in Segmentation and UMAP.ipynb - each row is a vocalization and columns are vocalization wav file name ('source_file'), percent audio that is clipped ('percent_clipped'), and clipping threshold ('clipping_threshold') Analyze Vocalizations.ipynb (section 3)
figure_2 figure_2A_data.csv data table where each row is a vocalization and columns are acoustic features and annotated labels (cry or USV) - used to train the models evaluated in figure 2A Train Models on Features.ipynb (section 3.1, 3.2, and 3.3)
figure_2 figure_2B_data.csv data table of performance metrics for random forest models trained on varying numbers of vocalizations from each taxon - used to train the models evaluated in figure 2B Train Models on Features.ipynb (section 3.5 and 3.6)
figure_2 random_forest_model_cry.pkl random forest model evaluated in Figure 2A, left Train Models on Features.ipynb (section 3.4)
figure_2 random_forest_model_USV.pkl random forest model evaluated in Figure 2A, right Train Models on Features.ipynb (section 3.4)
figure_2 figure2CD_pups_data.csv data table where each row is a pup and columns are aggregate acoustic features - used to generate the vocalization rate panels in figure 2 panels C and D Analyze Vocalizations.ipynb (section 3.7)
figure_2 figure2CD_vocs_data.csv data table where each row is a vocalization and columns are acoustic features - used to generate the duration and mean frequency panels in figure 2 panels C and D Analyze Vocalizations.ipynb (section 3.7)
figure_3 playback_data.csv data table where each row is a dam and columns are descriptive statistics of dam behavior during audio playback of cries and USVs Analyze Playback.ipynb (sections 3 and 4)
figure_4 all_bw_po_cf_clipping.csv data table of clipping levels for the cross foster dataset calculated in Segmentation and UMAP.ipynb - each row is a vocalization and columns are vocalization wav file name ('source_file'), percent audio that is clipped ('percent_clipped'), and clipping threshold ('clipping_threshold') Analyze Vocalizations.ipynb (section 4)
figure_4 figure4_pups_data.csv data table of acoustic features aggregated by pup where each row is a pup and columns are features - used to generate figure 4 panels B and D Analyze Vocalizations.ipynb (section 4)
figure_4 figure4_pups_cry_pca.csv data table of cry acoustic features and PCA coordinates - used to generate figure 4 panel C Analyze Vocalizations.ipynb (section 4)
figure_4 figure4_pups_USV_pca.csv data table of USV acoustic features and PCA coordinates - used to generate figure 4 panel E Analyze Vocalizations.ipynb (section 4)
figure_5 all_bw_po_f1_clipping.csv data table of clipping levels for vocalizations in the F1 dataset calculated in Segmentation and UMAP.ipynb - each row is a vocalization and columns are vocalization wav file name ('source_file'), percent audio that is clipped ('percent_clipped'), and clipping threshold ('clipping_threshold') Analyze Vocalizations.ipynb (section 5)
figure_5 all_bw_po_f2_clipping.csv data table of clipping levels for vocalizations in the F2 dataset calculated in Segmentation and UMAP.ipynb - each row is a vocalization and columns are vocalization wav file name ('source_file'), percent audio that is clipped ('percent_clipped'), and clipping threshold ('clipping_threshold') Analyze Vocalizations.ipynb (section 5)
figure_5 figure5_pups_data.csv data table of acoustic features aggregated by pup where each row is a pup and columns are features - used to generate figure 5 panels B, D, F, G, and H Analyze Vocalizations.ipynb (sections 5.4 and 5.5)
figure_5 figure5_pups_cry_pca.csv data table of cry acoustic features and PCA coordinates - used to generate figure 5 panel C Analyze Vocalizations.ipynb (sections 5.4)
figure_5 figure5_pups_USV_pca.csv data table of USV acoustic features and PCA coordinates - used to generate figure 5 panel E Analyze Vocalizations.ipynb (sections 5.4)
supplemental_figure_1/umap_embeddings NN_embedding_coordinates_labeled.feather (10 files, one per taxon) where NN is one of the 10 2-letter species codes above - these are copies of the files in figure_1/umap_embeddings but with a column for the label given by hdbscan in the Annotate from UMAP.ipynb notebook - used to generate supplemental figure 1 Analyze Vocalizations.ipynb (section 6)
supplemental_figure_1/acoustic_features NNwarbler_features.csv (10 files, one per taxon) where NN is one of the 10 2-letter species codes above - these are copies of the data in figure_1/acoustic_features/all_species_warbler_features.csv but split up into one csv per species - used to generate supplemental figure 1 Analyze Vocalizations.ipynb (section 6)
supplemental_figure_1/acoustic_features all_development_SPL.csv data table where each row is a vocalization and columns are vocalization wav file name ('source_file'), species, and sound pressure level (SPL) calculated with warbleR - used to generate supplemental figure 1 Analyze Vocalizations.ipynb (section 6)
supplemental_figure_2 supplement_figure2_cry_vocs_data.csv data table of cry acoustic features and PCA coordinates - used to generate supplemental figure 2 panels B and D Analyze Vocalizations.ipynb (section 3.8)
supplemental_figure_2 supplement_figure2_USV_vocs_data.csv data table of USV acoustic features and PCA coordinates - used to generate supplemental figure 2 panels C and E Analyze Vocalizations.ipynb (section 3.8)
supplemental_figure_3 nonvocal_acoustic_features.csv data table of nonvocal sounds (one per row) and acoustic features (columns) used to generate supplemental_figure_3_data.csv Train Models on Features.ipynb (section 4); Analyze Vocalizations.ipynb (section 3.4)
supplemental_figure_3 supplemental_figure_3_data.csv data table of vocalizations (one per row) and acoustic features (columns) used to train random forest model for predicting 'cry' and 'USV' labels in figures 2, 4, and 5 Train Models on Features.ipynb (section 4); Analyze Vocalizations.ipynb (section 3.4)
supplemental_figure_3 random_forest_voc_type_model.pkl random forest model trained on the data in supplemental_figure_3_data.csv Train Models on Features.ipynb (section 4); Analyze Vocalizations.ipynb (section 3.4)
supplemental_figure_4 figure2CD_pups_data.csv this is a copy of the data table described above (figure_2/figure2CD_pups_data.csv) - used to generate supplemental figure 4 Analyze Vocalizations.ipynb (section 7)
supplemental_figure_4 figure2CD_vocs_data.csv this is a copy of the data table described above (figure_2/figure2CD_vocs_data.csv) - used to generate supplemental figure 4 Analyze Vocalizations.ipynb (section 7)
supplemental_figure_5 all_development_vocs_with_predictions.csv data table of vocalizations in the development dataset where each row is a vocalization and columns are acoustic features and the label predicted by the model supplemental_figure_3/random_forest_voc_type_model.pkl - used to generate supplemental_figure_5_data.csv Analyze Vocalizations.ipynb (section 8)
supplemental_figure_5 all_development_vocs_with_start_stop_times.csv data table of vocalizations in the development dataset where each row is a vocalization and columns are wav file the vocalization came from ('source_file'), its start and stop time in that wav file ('start_seconds' and 'stop_seconds'), and species - used to generate supplemental_figure_5_data.csv Analyze Vocalizations.ipynb (section 8)
supplemental_figure_5 supplemental_figure_5_data.csv data table of interonset intervals for vocalizations in the development dataset - used to generate supplemental figure 5 panels B and C Analyze Vocalizations.ipynb (section 8)

Funding

Howard Hughes Medical Institute

Jane Coffin Childs Memorial Fund for Medical Research

International Human Frontier Science Program Organization