Data from: Two pup vocalization types are genetically and functionally separable in deer mice
Data files
Mar 03, 2023 version files 191.39 GB
-
bw_po_cf.tar.gz
-
bw_po_f1.tar.gz
-
bwpof2-1.tar.gz
-
bwpof2-2.tar.gz
-
bwpof2-3.tar.gz
-
bwpof2-4.tar.gz
-
bwpof2-5.tar.gz
-
bwpof2-6.tar.gz
-
developmentBK.tar.gz
-
developmentBW.tar.gz
-
developmentGO.tar.gz
-
developmentLL.tar.gz
-
developmentLO.tar.gz
-
developmentMU.tar.gz
-
developmentMZ.tar.gz
-
developmentNB.tar.gz
-
developmentPO.tar.gz
-
developmentSW.tar.gz
-
processed_data.tar.gz
-
README.md
Mar 08, 2023 version files 191.40 GB
-
bw_po_cf.tar.gz
-
bw_po_f1.tar.gz
-
bwpof2-1.tar.gz
-
bwpof2-2.tar.gz
-
bwpof2-3.tar.gz
-
bwpof2-4.tar.gz
-
bwpof2-5.tar.gz
-
bwpof2-6.tar.gz
-
developmentBK.tar.gz
-
developmentBW.tar.gz
-
developmentGO.tar.gz
-
developmentLL.tar.gz
-
developmentLO.tar.gz
-
developmentMU.tar.gz
-
developmentMZ.tar.gz
-
developmentNB.tar.gz
-
developmentPO.tar.gz
-
developmentSW.tar.gz
-
processed_data.tar.gz
-
README.md
Apr 19, 2024 version files 197.51 GB
-
bw_po_cf.tar.gz
-
bw_po_f1.tar.gz
-
bwpof2-1.tar.gz
-
bwpof2-2.tar.gz
-
bwpof2-3.tar.gz
-
bwpof2-4.tar.gz
-
bwpof2-5.tar.gz
-
bwpof2-6.tar.gz
-
developmentBK.tar.gz
-
developmentBW.tar.gz
-
developmentGO.tar.gz
-
developmentLL.tar.gz
-
developmentLO.tar.gz
-
developmentMU.tar.gz
-
developmentMZ.tar.gz
-
developmentNB.tar.gz
-
developmentPO.tar.gz
-
developmentSW.tar.gz
-
processed_data.tar.gz
-
README.md
Abstract
Vocalization is a widespread social behavior in vertebrates that can affect fitness in the wild. Although many vocal behaviors are highly conserved, heritable features of specific vocalization types can vary both within and between species, raising questions of why and how some vocal behaviors evolve. Here, using new computational tools to automatically detect and cluster vocalizations into distinct acoustic categories, we compare pup isolation calls across neonatal development in eight taxa of deer mice (genus Peromyscus) and compare them with laboratory mice (C57BL6/J strain) and free-living, wild house mice (Mus musculus domesticus). Whereas both Peromyscus and Mus pups produce ultrasonic vocalizations (USVs), Peromyscus pups also produce a second call type with acoustic features, temporal rhythms, and developmental trajectories that are distinct from those of USVs. In deer mice, these lower frequency “cries” are predominantly emitted in postnatal days one through nine, whereas USVs are primarily made after day 9. Using playback assays, we show that cries result in a more rapid approach by Peromyscus mothers than USVs, suggesting a role for cries in eliciting parental care early in neonatal development. Using a genetic cross between two sister species of deer mice exhibiting large, innate differences in the acoustic structure of cries and USVs, we find that variation in vocalization rate, duration, and pitch displays different degrees of genetic dominance and that cry and USV features can be uncoupled in second-generation hybrids. Taken together, this work shows that vocal behavior can evolve quickly between closely related rodent species in which vocalization types, likely serving distinct functions in communication, are controlled by distinct genetic loci.
README
Peromyscus pup vocal evolution Dataset
Versions
- April 2024: file developmentMU.tar.gz reuploaded to resolve issue with original file
How to use
This dataset contains raw audio recordings and processed data used to perform analyses and generate figures from Jourjine et al. Current Biology 2023
The files are stored in compressed directories. To uncompress them, double click on them or run the following in the command line tar -xvf full/path/to/file.tar.gz
The contents of these directories are described below. Please see the github repository https://github.com/nickjourjine/peromyscus-pup-vocal-evolution
for code and instructions about how to use them to reproduce analyses and figures.
Two-letter codes
We use the following two-letter codes as short hand to refer to each taxon we analyze:
code | taxon |
---|---|
BW | P. maniculatus bairdii |
BK | P. maniculatus gambelli |
SW | P. maniculatus rubidus |
NB | P. maniculatus nubiterrae |
PO | P. polionotus subgriseus |
LO | P. polionotus leucocephalus |
GO | P. gossypinus |
LL | P. leucopus |
MU | Mus musculus domesticus (C57BL6/J) |
MZ | Mus musculus domesticus (wild) |
Audio datasets
There are four sets of recordings that constitute the raw data: development, cross foster, F1, and F2.
Because of file upload limitations, the development and F2 datasets are split into parts.
All of the directories with the pre-fix "bwpof2" belong to the F2 dataset (six directories, split into approximately 100 recordings per directory, one .wav file per recorded pup).
All of the directories with the pre-fix "development" belong to the development dataset (ten directories, one directory per taxon, one .wav file per recorded pup).
The easiest way to use the development and F2 datasets is to unzip each of these subdirectories and collect all of the .wav files for each dataset into its own directory (i.e., one for all of the development wav files and one for all of the F2 wav files).
All of the directories containing unprocessed raw audio are described in the table below:
file | dataset | file type(s) | number of files | associated main figure(s) | description |
---|---|---|---|---|---|
developmentBK.tar.gz | development | .wav | 98 | 1,2 | Audio recordings of isolation induced P. maniculatus gambelli pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentBW.tar.gz | development | .wav | 80 | 1,2 | Audio recordings of isolation induced P. maniculatus bairdii pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentNB.tar.gz | development | .wav | 72 | 1,2 | Audio recordings of isolation induced P. maniculatus nubiterrae pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentSW.tar.gz | development | .wav | 73 | 1,2 | Audio recordings of isolation induced P. maniculatus rubidus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentPO.tar.gz | development | .wav | 76 | 1,2 | Audio recordings of isolation induced P. polionotus subgriseus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentLO.tar.gz | development | .wav | 66 | 1,2 | Audio recordings of isolation induced P. polionotus leucocephalus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentGO.tar.gz | development | .wav | 68 | 1,2 | Audio recordings of isolation induced P. gossypinus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentLL.tar.gz | development | .wav | 63 | 1,2 | Audio recordings of isolation induced P. leucopus pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentMU.tar.gz | development | .wav | 116 | 1,2 | Audio recordings of isolation induced Mus musculus domesticus (C57BL6/J) pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
developmentMZ.tar.gz | development | .wav | 111 | 1,2 | Audio recordings of isolation induced Mus musculus domesticus (wild) pup vocalizations between postnatal days 1 and 13 (day of birth = day 0) |
bw_po_cf.tar.gz | cross foster | .wav | 58 | 4 | Audio recordings of isolaton induced P. maniculatus bairdii and P. polionotus subgriseus pup vocalizations at postnatal day 9 either raised by their own parents or cross fostered |
bw_po_f1.tar.gz | F1 recordings | .wav | 119 | 5 | Audio recordings of isolaton induced pup vocalizations from first generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus |
bwpof2-1.tar.gz | F2 recordings | .wav | 100 | 5 | Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus |
bwpof2-2.tar.gz | F2 recordings | .wav | 100 | 5 | Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus |
bwpof2-3.tar.gz | F2 recordings | .wav | 100 | 5 | Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus |
bwpof2-4.tar.gz | F2 recordings | .wav | 100 | 5 | Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus |
bwpof2-5.tar.gz | F2 recordings | .wav | 100 | 5 | Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus |
bwpof2-6.tar.gz | F2 recordings | .wav | 117 | 5 | Audio recordings of isolaton induced pup vocalizations from second generation hybrids between P. maniculatus bairdii and P. polionotus subgriseus |
File name conventions
Each wav file in the above directories is named using the convention of separating specific information about the pup whose vocalizations it contains by underscores ('').
These conventions are described in the tables below, where index refers to the position of the list generated by splitting the file name by the '' character (e.g., using the split() method in python).
Developmental data set file naming conventions
index | description |
---|---|
0 | species (using 2-letter code described in table above) |
1 | ID of the litter's dam and sire in the format 'damIDxsireID' |
2 | litter number from this dam and sire (this value is approximate and may not be accurate for every pup) |
3 | pup number from the litter (order in which pups were removed from their home cage for recording, using the convention that the first pup removed is pup1) |
4 | microphone channel used to record the pup |
5 | weight of the pup in milligrams |
6 | sex of the pup determined by anogenital distance (m=male, f=female) |
7 | temperature of the pup in degrees C immediately before recording (multiplied by 10 to avoid introducing '.') |
8 | temperature of the pup in degrees C immediately after recording (multiplied by 10 to avoid introducing '.') |
9 | whether or not the pup had to be removed from the dam by the experimenter (fr0=no, fr1=yes; fr stands for 'forcibly removed' while suckling) |
10 | age of the pup in days in the format 'p#' where # is the age counting day of birth as day 0 |
11 | date of the recording in the format yyyy-mm-dd |
12 | time of the recording in the format hh-mm-ss |
Cross foster data set file naming conventions
index | description |
---|---|
0 | species (using 2-letter code described in table above). CF-BW indicates a BW pup fostered by PO. CF-PO indicates a PO pup fostered by BW. |
1 | if pup was not cross fostered, this is the ID of the litter's dam and sire in the format 'damIDxsireID'. If cross fostered, it is the ID of the litter's dam and sire in the format 'damIDxsireID' followed by a '-' then the ID of the foster dam and sire in the format 'cfdamID-cfsireID' |
2 | pup number from the litter (order in which pups were removed from their home cage for recording, using the convention that the first pup removed is pup1) |
3 | age of the pup in days in the format 'p##' where number is the age counting day of birth as day 0 (all are p09, i.e. postnatal day 9) |
4 | weight of the pup in milligrams |
5 | sex of the pup determined by anogenital distance (m=male, f=female) |
6 | number of pups in the litter the pup came from |
7 | date of the recording in the format yyyy-mm-dd |
8 | time of the recording in the format hh-mm-ss |
F1 data set file naming conventions
index | description |
---|---|
0 | species where BW-PO-cross-F1 indicates a first generation hybrid, cross-BW indicates P. maniculatus bairdii and cross-PO indicates P. polionotus subgriseus |
1 | ID of the litter's dam and sire. If not F1, in the format 'damIDxsireID'. If F1, the format is 'damIDxsireID-family-N', where N is one of A, B, C, or D and indicates which of four independent crosses between P. maniculatus bairdii and P. polionotus subgriseus the F1 pup came from |
2 | litter number from this dam and sire (this value is approximate and may not be accurate for every pup) |
3 | pup number from the litter (order in which pups were removed from their home cage for recording, using the convention that the first pup removed is pup1) |
4 | microphone channel used to record the pup |
5 | weight of the pup in milligrams |
6 | sex of the pup determined by anogenital distance (m=male, f=female) |
7 | temperature of the pup in degrees C immediately before recording (multiplied by 10 to avoid introducing '.') |
8 | temperature of the pup in degrees C immediately after recording (multiplied by 10 to avoid introducing '.') |
9 | whether or not the pup had to be removed from the dam by the experimenter (fr0=no, fr1=yes; fr stands for 'forcibly removed' while suckling) |
10 | age of the pup in days in the format 'p#' where # is the age counting day of birth as day 0 (all are p9, i.e. postnatal day 9) |
11 | date of the recording in the format yyyy-mm-dd |
12 | time of the recording in the format hh-mm-ss |
|
F2 data set file naming conventions
index | description |
---|---|
0 | microphone channel used to record the pup. This is a copy of the information in index 6 added automatically by Avisoft recording software. |
1 | species where BW-PO-cross-F2 indicates a second generation hybrid (all files in this dataset are the same at this index since there are all F2 pups) |
2 | ID of the litter's dam and sire in the format 'damIDxsireID'. |
3 | family the pup came from in the format 'fam-N#' where N is the founder family its parents came from and # is the F1 breeding pair number from that family (e.g, third F1 pair generated from founder pair A is fam-A3) |
4 | litter number from the dam and sire (this value is approximate and may not be accurate for every pup) |
5 | pup number from the litter (order in which pups were removed from their home cage for recording, using the convention that the first pup removed is pup1) |
6 | microphone channel used to record the pup |
7 | weight of the pup in milligrams |
8 | sex of the pup determined by anogenital distance (m=male, f=female) |
9 | temperature of the pup in degrees C immediately before recording (multiplied by 10 to avoid introducing '.') |
10 | temperature of the pup in degrees C immediately after recording (multiplied by 10 to avoid introducing '.') |
11 | whether or not the pup had to be removed from the dam by the experimenter (fr0=no, fr1=yes; fr stands for 'forcibly removed' while suckling) |
12 | age of the pup in days in the format 'p#' where number is the age counting day of birth as day 0 (all are p9, i.e. postnatal day 9) |
13 | date of the recording in the format yyyy-mm-dd |
14 | time of the recording in the format hh-mm-ss |
Processed data
processed_data.tar.gz
contains processed data (data tables and machine learning models) used to make figures.
These files were generated using the code in the github repository https://github.com/nickjourjine/peromyscus-pup-vocal-evolution
and are organized into sub-directories, one for each main and supplemental figure.
Each of these directories is described in the table below along with a reference to the related notebook and markdown section at https://github.com/nickjourjine/peromyscus-pup-vocal-evolution
that uses the data.
Please refer to the README.md file in that repository for additional details about how to generate and use the data in these files.
directory | file(s) | description | related notebooks (and markdown section) |
---|---|---|---|
figure_1/umap_embeddings | all_species_HDBSCAN_labels.csv | data table of umap embedding cluster labels where each row is a vocalization and columns correspond to vocalization wav file name ('source_file'), HDBSCAN cluster ('label'), and species (using 2-letter codes above) - used to make figure 1 panel C | Analyze Vocalizations.ipynb (sections 2.1 and 2.2) |
figure_1/umap_embeddings | NN_embedding_coordinates.feather (10 files, one per taxon) | where NN is one of the 10 2-letter species codes above - these are tables where each row is a linearized spectrogram (one per vocalization) and columns are pixel numbers and umap embedding coordinates - used to make figure 1 panel C | Analyze Vocalizations.ipynb (sections 2.3, 2.4, and 2.5) |
figure_1/acoustic_features | all_species_warbler_features.csv | data table of acoustic features used to generate Figures 1D, E, and F where each row corresponds to a vocalization and columns are features | Analyze Vocalizations.ipynb (sections 2.3, 2.4, and 2.5) |
figure_1/acoustic_features | all_noise_floors.csv | data table of spectrogram pixel values defining threshold for background noise for each vocalization in the development dataset, generated by Spectrogramming and UMAP.ipynb notebook | Analyze Vocalizations.ipynb (sections 2.2 and 3.4) |
figure_1/acoustic_features | <data_set>_recording_lengths.json | where data_set is one of bw_po_cf (cross foster dataset), bw_po_f1 (F1 dataset), bw_po_f2 (F2 dataset), or development (development dataset) - dictonaries of recording lengths for each recording, used to determine vocalization rates without recalculating recording lengths each time | Analyze Vocalizations.ipynb (section 3) |
figure_2 | annotated_vocalizations.csv | data table of the vocalizations annotated in the Annotate from UMAP.ipynb notebook where each row is a vocalization and columns are vocalization wav file name ('source_file'), umap embedding coordinates('umap1' and 'umap2'), hdbscan label ('hdbscan_label'), annotated label ('human_label'), and species | Train Models on Features.ipynb (sections 2, 3, and 4); Analyze Vocalizations.ipynb (section 3.6) |
figure_2 | development_vocalizations_clipping_levels.csv | data table of clipping levels for the development dataset calculated in Segmentation and UMAP.ipynb - each row is a vocalization and columns are vocalization wav file name ('source_file'), percent audio that is clipped ('percent_clipped'), and clipping threshold ('clipping_threshold') | Analyze Vocalizations.ipynb (section 3) |
figure_2 | figure_2A_data.csv | data table where each row is a vocalization and columns are acoustic features and annotated labels (cry or USV) - used to train the models evaluated in figure 2A | Train Models on Features.ipynb (section 3.1, 3.2, and 3.3) |
figure_2 | figure_2B_data.csv | data table of performance metrics for random forest models trained on varying numbers of vocalizations from each taxon - used to train the models evaluated in figure 2B | Train Models on Features.ipynb (section 3.5 and 3.6) |
figure_2 | random_forest_model_cry.pkl | random forest model evaluated in Figure 2A, left | Train Models on Features.ipynb (section 3.4) |
figure_2 | random_forest_model_USV.pkl | random forest model evaluated in Figure 2A, right | Train Models on Features.ipynb (section 3.4) |
figure_2 | figure2CD_pups_data.csv | data table where each row is a pup and columns are aggregate acoustic features - used to generate the vocalization rate panels in figure 2 panels C and D | Analyze Vocalizations.ipynb (section 3.7) |
figure_2 | figure2CD_vocs_data.csv | data table where each row is a vocalization and columns are acoustic features - used to generate the duration and mean frequency panels in figure 2 panels C and D | Analyze Vocalizations.ipynb (section 3.7) |
figure_3 | playback_data.csv | data table where each row is a dam and columns are descriptive statistics of dam behavior during audio playback of cries and USVs | Analyze Playback.ipynb (sections 3 and 4) |
figure_4 | all_bw_po_cf_clipping.csv | data table of clipping levels for the cross foster dataset calculated in Segmentation and UMAP.ipynb - each row is a vocalization and columns are vocalization wav file name ('source_file'), percent audio that is clipped ('percent_clipped'), and clipping threshold ('clipping_threshold') | Analyze Vocalizations.ipynb (section 4) |
figure_4 | figure4_pups_data.csv | data table of acoustic features aggregated by pup where each row is a pup and columns are features - used to generate figure 4 panels B and D | Analyze Vocalizations.ipynb (section 4) |
figure_4 | figure4_pups_cry_pca.csv | data table of cry acoustic features and PCA coordinates - used to generate figure 4 panel C | Analyze Vocalizations.ipynb (section 4) |
figure_4 | figure4_pups_USV_pca.csv | data table of USV acoustic features and PCA coordinates - used to generate figure 4 panel E | Analyze Vocalizations.ipynb (section 4) |
figure_5 | all_bw_po_f1_clipping.csv | data table of clipping levels for vocalizations in the F1 dataset calculated in Segmentation and UMAP.ipynb - each row is a vocalization and columns are vocalization wav file name ('source_file'), percent audio that is clipped ('percent_clipped'), and clipping threshold ('clipping_threshold') | Analyze Vocalizations.ipynb (section 5) |
figure_5 | all_bw_po_f2_clipping.csv | data table of clipping levels for vocalizations in the F2 dataset calculated in Segmentation and UMAP.ipynb - each row is a vocalization and columns are vocalization wav file name ('source_file'), percent audio that is clipped ('percent_clipped'), and clipping threshold ('clipping_threshold') | Analyze Vocalizations.ipynb (section 5) |
figure_5 | figure5_pups_data.csv | data table of acoustic features aggregated by pup where each row is a pup and columns are features - used to generate figure 5 panels B, D, F, G, and H | Analyze Vocalizations.ipynb (sections 5.4 and 5.5) |
figure_5 | figure5_pups_cry_pca.csv | data table of cry acoustic features and PCA coordinates - used to generate figure 5 panel C | Analyze Vocalizations.ipynb (sections 5.4) |
figure_5 | figure5_pups_USV_pca.csv | data table of USV acoustic features and PCA coordinates - used to generate figure 5 panel E | Analyze Vocalizations.ipynb (sections 5.4) |
supplemental_figure_1/umap_embeddings | NN_embedding_coordinates_labeled.feather (10 files, one per taxon) | where NN is one of the 10 2-letter species codes above - these are copies of the files in figure_1/umap_embeddings but with a column for the label given by hdbscan in the Annotate from UMAP.ipynb notebook - used to generate supplemental figure 1 | Analyze Vocalizations.ipynb (section 6) |
supplemental_figure_1/acoustic_features | NNwarbler_features.csv (10 files, one per taxon) | where NN is one of the 10 2-letter species codes above - these are copies of the data in figure_1/acoustic_features/all_species_warbler_features.csv but split up into one csv per species - used to generate supplemental figure 1 | Analyze Vocalizations.ipynb (section 6) |
supplemental_figure_1/acoustic_features | all_development_SPL.csv | data table where each row is a vocalization and columns are vocalization wav file name ('source_file'), species, and sound pressure level (SPL) calculated with warbleR - used to generate supplemental figure 1 | Analyze Vocalizations.ipynb (section 6) |
supplemental_figure_2 | supplement_figure2_cry_vocs_data.csv | data table of cry acoustic features and PCA coordinates - used to generate supplemental figure 2 panels B and D | Analyze Vocalizations.ipynb (section 3.8) |
supplemental_figure_2 | supplement_figure2_USV_vocs_data.csv | data table of USV acoustic features and PCA coordinates - used to generate supplemental figure 2 panels C and E | Analyze Vocalizations.ipynb (section 3.8) |
supplemental_figure_3 | nonvocal_acoustic_features.csv | data table of nonvocal sounds (one per row) and acoustic features (columns) used to generate supplemental_figure_3_data.csv | Train Models on Features.ipynb (section 4); Analyze Vocalizations.ipynb (section 3.4) |
supplemental_figure_3 | supplemental_figure_3_data.csv | data table of vocalizations (one per row) and acoustic features (columns) used to train random forest model for predicting 'cry' and 'USV' labels in figures 2, 4, and 5 | Train Models on Features.ipynb (section 4); Analyze Vocalizations.ipynb (section 3.4) |
supplemental_figure_3 | random_forest_voc_type_model.pkl | random forest model trained on the data in supplemental_figure_3_data.csv | Train Models on Features.ipynb (section 4); Analyze Vocalizations.ipynb (section 3.4) |
supplemental_figure_4 | figure2CD_pups_data.csv | this is a copy of the data table described above (figure_2/figure2CD_pups_data.csv) - used to generate supplemental figure 4 | Analyze Vocalizations.ipynb (section 7) |
supplemental_figure_4 | figure2CD_vocs_data.csv | this is a copy of the data table described above (figure_2/figure2CD_vocs_data.csv) - used to generate supplemental figure 4 | Analyze Vocalizations.ipynb (section 7) |
supplemental_figure_5 | all_development_vocs_with_predictions.csv | data table of vocalizations in the development dataset where each row is a vocalization and columns are acoustic features and the label predicted by the model supplemental_figure_3/random_forest_voc_type_model.pkl - used to generate supplemental_figure_5_data.csv | Analyze Vocalizations.ipynb (section 8) |
supplemental_figure_5 | all_development_vocs_with_start_stop_times.csv | data table of vocalizations in the development dataset where each row is a vocalization and columns are wav file the vocalization came from ('source_file'), its start and stop time in that wav file ('start_seconds' and 'stop_seconds'), and species - used to generate supplemental_figure_5_data.csv | Analyze Vocalizations.ipynb (section 8) |
supplemental_figure_5 | supplemental_figure_5_data.csv | data table of interonset intervals for vocalizations in the development dataset - used to generate supplemental figure 5 panels B and C | Analyze Vocalizations.ipynb (section 8) |