Data and code supporting acoustic and environmental factors driving digging behavior in the early life of a freshwater turtle
Data files
Jun 19, 2025 version files 79.93 MB
-
auto_model_data_missingfiles.May29.2023.csv
117.91 KB
-
auto_model_data.May29.2023.csv
69.03 MB
-
behaviour.temp.summary.May24.2022.csv
131.76 KB
-
DailyPrecipitation.May9.2023.csv
8.36 KB
-
emerge.events.data.csv
11.42 KB
-
emerge.interval.data.csv
30.74 KB
-
file.metadata.Apr14.2022.csv
1.62 MB
-
hatch.emergence.data.csv
2.12 KB
-
manual_model_data.Apr13.2023.csv
184.52 KB
-
NestTemperature.May30.2023.csv
5.14 MB
-
README.md
48.56 KB
-
TurtleInNests.May17.2023.csv
286 B
-
validation_subset_data_labelled.May29.2023.csv
3.60 MB
Abstract
Acoustic communication is widespread in the animal kingdom and often fundamental to early life survival. Extant archosaurs, birds and crocodilians, display complex vocalizations in early life that function in parental care and embryo communication. Turtles are a sister group to archosaurs, and vocalize, but lack parental care, providing an opportunity to decouple parental care from other factors driving the evolution of neonatal communication. Turtle hatchlings produce vocalizations in the subterranean nest before emergence, but their function remains unclear. We hypothesized that acoustic cues may facilitate emergence from the nest by cuing hatchlings to begin digging out of the subterranean nest cavity. We test whether hatchling vocalizations are positively associated with digging activity in hatchling snapping turtles (Chelydra serpentina). In two successive years, we monitored subterranean hatchling behaviour in situ with acoustic recorders in semi-natural turtle nests. Structural Equation Modelling revealed that vocalization was causally associated with hatchling movement. Our findings support a hypothesis that early-life acoustic communication coordinates a social activity in the absence of parental care, a function with relatively few described parallels in the animal kingdom.
Authors: Claudia Lacroix, Christina M. Davy, Njal Rollinson
Last Updated: 2025-04-17
Abstract: Acoustic communication is vital for early life survival in animals. Extant archosaurs, birds and crocodilians, display vocalisations that function in parental care and neonatal communication. Turtles also vocalize in early life and are a sister group to archosaurs. However, turtles lack parental care, providing an opportunity to decouple factors related to parental care and social learning. We tested the hypothesis that hatchling turtle vocalizations coordinate digging in the nest. We monitored snapping turtle hatchlings over two summers with acoustic recorders and wildlife cameras. Machine learning-based acoustic signal processing and structural equation modeling showed a strong positive association between hatchling vocalization and nest movement. We also found evidence of emergence synchrony within nests. This supports the hypothesis that hatchling vocalizations aid in coordinating subterranean movements during nest emergence, a social behavior among juveniles that occurs in the absence of parental care, and one that has few parallels in the animal kingdom.
Description of the data, file structure, and associated code.
R codes are published on Zenodo. See Related Work link for Software.
Part 1 - Nest Emergence
Description: the R script NestEmergence.R uses data from camera trap recordings in 2021 and 2022. The script uses 3 formats of post-processed data to summarise different nest emergence behaviours.
Dataset 1: the dataframe hatch.emergence.data.csv describes the total time it took for hatchlings to climb out of the nest. This was calculated, for each nest, by subtracting the first emergence event (obtained from the camera traps) from the first detection of movement in the nest (using audio recordings, proxy for hatching).
| Column Name | Type | Unit | Description |
|---|---|---|---|
| year | ordinal | date | Year of sampling |
| clutch_ID | nominal | NA | unique identifier for each nest. |
| Group | nominal | NA | unique identifier for each nest pair. |
| Channel | nominal | NA | audio channel (left, right) |
| emergence | ordinal | date & time | date and time of the first emergence event inferred from monitoring the camera trap data. |
| hatching | ordinal | date & time | date and time of the first detected movement in the nest (proxy for hatching event) inferred from manually scanning the audio recordings. |
| duration | ordinal | days | time between hatching and the first emergence event. |
| ID | nominal | NA | unique identifier for each nest and year. |
| NestID | nominal | NA | unique identifier for each nest. |
| first_date | ordinal | date & time | date and hour of the first emergence event inferred from monitoring the camera trap data |
Dataset 2: The dataframe emerge.events.data.csv describes each instance there was an emergence event observed in the camera traps.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| Group | nominal | NA | unique identifier for each nest pair. |
| clutch_ID | nominal | NA | unique identifier for each nest. |
| DateHour | ordinal | date & time | Date and time of the emergence event summarised per hour. |
| n | discrete | hatchlings | number of hatchlings that emerged within the hour. |
| year | ordinal | date | Year of sampling |
| first_emerge | ordinal | date & time | date and hour of the first emergence event for the nest inferred from monitoring the camera trap data. |
| time_since_emergence | continuous | seconds | time since the first emergence event of that nest. |
| ID | nominal | NA | unique identifier for each nest and year. |
| NestID | nominal | NA | unique identifier for each nest. |
| first_date | ordinal | date | date of the first emergence event for the nest inferred from monitoring the camera trap data. |
Dataset 3: The dataframe emerge.interval.data.csv describes the time between emergence events per nest.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| Group | nominal | NA | unique identifier for each nest pair. |
| Side | nominal | NA | position of the nest cage the observation was recorded based on the camera trap video (left or right) |
| year | ordinal | date | Year of sampling |
| clutch_ID | nominal | NA | unique identifier for each nest. |
| Num | ordinal | integer | sequence of emergence events per nest, such that 1 is the first emergence event and 1+n is the subsequent emergence event for that nest. |
| DateTime | ordinal | date & time | Date and time of the observation in hours. |
| diff.minutes | continuous | minutes | time between the previous emergence event |
| ID | nominal | NA | unique identifier for each nest and year. |
| NestID | nominal | NA | unique identifier for each nest. |
| first_date | ordinal | date | date of the first emergence event for the nest inferred from monitoring the camera trap data. |
Part 2 - Structural Equation Modeling (manually labelled, 2021 dataset)
Description: the R script manual_SEM.R is an R script used to perform structural equation models of the 2021 data which contains manually labelled acoustic detections.
Dataset 1: the dataframe manual_model_data.Apr13.2023.csv describes the number of vocalisations and movement observed per hour between the first sign of hatching and the first detected date of emergence. Movement and Vocalisations were quantified using cluster analysis via Kaleidoscope Pro and manually labelled.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| Date.Hour | ordinal | date & time | Date and time of the observation in hours. |
| AirTemp | continuous | degrees Celsius | Average air temperature per hour recorded from a nearby temperature logger. |
| nestID | nominal | NA | unique identifier for each nest. |
| NestTemp | continuous | degrees Celsius | Average nest temperature per hour recorded from a burried temperature logger. Nest temperature is unique for each nest pair. |
| group | nominal | NA | unique identifier for each nest pair. |
| BehavPeriod | nominal | NA | 'Pre-Hatching', 'Hatching', 'Post-Hatching' and 'Emergence' behavioural periods during recording. Behavioural periods were delimited by using the first acoustic sign of movement, and the first emergence event recorded in the camera traps. |
| Vocalisation | discrete | unit per hour | Total number of vocalisations detected per hour. vocalisations were quantified using a cluster-based algorithm via Kaleidoscope Pro and manually labelled detections. |
| Movement | discrete | unit per hour | Total number of bouts of movement detected per hour. Movement was quantified using a cluster-based algorithm via Kaleidoscope Pro and manually labelled detections. |
| Hour | ordinal | hour | hour of the day |
| ClutchSize | discrete | number of eggs | Number of live eggs/hatchlings in the nest. |
| Date | ordinal | date | Date (yyyy-mm-dd) |
| ordinal | ordinal | date | Ordinal day of the year |
| Year | ordinal | date | Year sampling (2021) |
| Month | ordinal | date | Month of the year as numeric |
| Day | ordinal | date | Day of the month |
| precip_mean | continuous | millimeters | mean total hourly precipitation per day |
| precip_tot | continuous | millimeters | total precipitation per day |
Part 3 - Accuracy calculations of the automatically labelled model
Description: the R script ModelAccuracy.R uses the validation dataset to summarise accuracy metrics of the cluster model's performance.
Dataset 1: the dataframe file.metadata.Apr14.2022.csv describes all the metadata for the audio files recorded in 2021.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| File Name | nominal | NA | original WAV file name |
| Group | nominal | NA | unique identifier for each nest pair. |
| Channel | discrete | NA | audio channel (left= 0, right = 1) |
| Subset | nominal | NA | Indicator whether data from the file was included in the analysis or not. (Y=included, N=excluded) |
| BehavPeriod | nominal | NA | 'Pre-Hatching', 'Hatching', 'Post-Hatching' and 'Emergence' behavioural periods during recording. Behavioural periods were delimited by using the first acoustic sign of movement, and the first emergence event recorded in the camera traps. "NO DATA" means that the nest did not emerge and therefore the behavioural preiods were not defined. |
| DateTime.Modified | ordinal | date & time | Date and time that the file was created or last modified. |
| Size(GB) | continuous | gigabytes | size of the audio file |
| NestID | nominal | NA | unique identifier for each nest. |
| DateTime.file | ordinal | date & time | Date and time that the audio file started recording. |
Dataset 2: the dataframe validation_subset_data_labelled.May29.2023.csv contains the results from the validation dataset. It contains both manually labelled detections and automatically labelled detections, which are then used to produce accuracy measurement and the probability matrix of the cluster model's performance. This dataset is also used in part 4 (see below) to generate 1000 simulated datasets.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| INDIR | nominal | NA | output from Kaleidoscope Pro. directory that the audio file is contained in. |
| FOLDER | nominal | NA | output from Kaleidoscope Pro. folder the audio file is contained in. |
| IN FILE | nominal | NA | output from Kaleidoscope Pro. Original WAV file name. |
| CHANNEL | discrete | NA | output from Kaleidoscope Pro. Audio channel (left= 0, right = 1). |
| OFFSET | continuous | seconds | output from Kaleidoscope Pro. Time since the start of the audio file the detection originates from. |
| DURATION | continuous | seconds | output from Kaleidoscope Pro. The total duration of the detection. |
| Fmin | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the minimum frequency of the detection. |
| Fmean | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the mean frequency of the detection. |
| Fmax | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the maximum frequency of the detection. |
| DATE | ordinal | date | Date the detection was recorded. |
| TIME | ordinal | time | Time of day the detection was recorded. |
| HOUR | ordinal | hour | Hour of day the detection was recorded. |
| TOP1MATCH* | nominal | NA | output from Kaleidoscope Pro. The label that the cluster model assigned to the detection. |
| TOP1DIST | continuous | NA | output from Kaleidoscope Pro. Distance from the cluster center. |
| TOP2MATCH | nominal | NA | output from Kaleidoscope Pro. The second most likely label that the cluster model assigned to the detection. |
| TOP2DIST | continuous | NA | output from Kaleidoscope Pro. Distance from the cluster center. |
| TOP3MATCH | nominal | NA | output from Kaleidoscope Pro. The third most likely label that the cluster model assigned to the detection. |
| VOCALIZATIONS | discrete | NA | output from Kaleidoscope Pro. Value of 1 assigned to each row to enable calculations. |
| MANUAL ID | nominal | NA | The label that we manually labelled and assigned to the detection. |
| INPATHMD5 | nominal | NA | output from Kaleidoscope Pro. |
| Group | nominal | NA | unique identifier for each nest pair. |
| Subset | nominal | NA | Indicator whether data from the file was included in the analysis or not. (Y=included, N=excluded) |
| BehavPeriod | nominal | NA | 'Pre-Hatching', 'Hatching', 'Post-Hatching' and 'Emergence' behavioural periods during recording. Behavioural periods were delimited by using the first acoustic sign of movement, and the first emergence event recorded in the camera traps. |
| Size(GB) | continuous | gigabytes | size of the audio file |
| NestID | nominal | NA | unique identifier for each nest. |
| DateHour | ordinal | date & time | Date and time of the observation in hours. |
Dataset 3: the dataframe behaviour.temp.summary.May24.2022.csv describes the number of bouts of movement recorded per hour. Bouts of movement were quantified by running a separate cluster analysis across a wide acoustic window and manually labelling the detections. In this script, the dataset is used to validate the total duration of noise as a proxy for bouts of movement.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| Date.Hour | ordinal | date & time | Date and time of the observation in hours. |
| AirTemp | continuous | degrees celsius | Air temperature measure from a HOBO data logger hung on a nearby bush. |
| nestID | nominal | NA | unique identifier for each nest. |
| NestTemp | continuous | degrees celsius | Hourly nest temperature measured by a HOBO or ibutton data logger. |
| group | nominal | NA | unique identifier for each nest pair. |
| BehavPeriod | nominal | NA | 'Pre-Hatching', 'Hatching', 'Post-Hatching' and 'Emergence' behavioural periods during recording. Behavioural periods were delimited by using the first acoustic sign of movement, and the first emergence event recorded in the camera traps. "NO DATA" means that the nest did not emerge and therefore the behavioural preiods were not defined. |
| Vocalisation | discrete | number of vocalisations | Number of vocalisations detected per hour. Vocalisations were calculated from a manually labelled detections obtained from a cluster model in Kaleidoscope Pro. |
| Movement | discrete | number of bouts of movement | Number of bouts of movement detected per hour. Bouts of movement were calculated from a manually labelled detections obtained from a cluster model in Kaleidoscope Pro. |
| Hour | ordinal | time | hour of the day. |
| ClutchSize | discrete | number of eggs | number of viable eggs in the nest. |
Part 4 - Structural Equation Modeling using a Probability Matrix (automatically labelled, 2022 dataset)
Description: the R script auto_SEM.R is an R script used to perform structural equation models of the 2022 data which contains automatically labelled acoustic detections.
Dataset 1: the dataframe validation_subset_data_labelled.May29.2023.csv contains the results from the validation dataset. It contains both manually labelled detections and automatically labelled detections, which are then used to produce accuracy measurement and the probability matrix of the cluster model's performance.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| INDIR | nominal | NA | output from Kaleidoscope Pro. directory that the audio file is contained in. |
| FOLDER | nominal | NA | output from Kaleidoscope Pro. folder the audio file is contained in. |
| IN FILE | nominal | NA | output from Kaleidoscope Pro. Original WAV file name. |
| CHANNEL | discrete | NA | output from Kaleidoscope Pro. Audio channel (left= 0, right = 1). |
| OFFSET | continuous | seconds | output from Kaleidoscope Pro. Time since the start of the audio file the detection originates from. |
| DURATION | continuous | seconds | output from Kaleidoscope Pro. The total duration of the detection. |
| Fmin | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the minimum frequency of the detection. |
| Fmean | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the mean frequency of the detection. |
| Fmax | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the maximum frequency of the detection. |
| DATE | ordinal | date | Date the detection was recorded. |
| TIME | ordinal | time | Time of day the detection was recorded. |
| HOUR | ordinal | hour | Hour of day the detection was recorded. |
| TOP1MATCH* | nominal | NA | output from Kaleidoscope Pro. The label that the cluster model assigned to the detection. |
| TOP1DIST | continuous | NA | output from Kaleidoscope Pro. Distance from the cluster center. |
| VOCALIZATIONS | discrete | NA | output from Kaleidoscope Pro. Value of 1 assigned to each row to enable calculations. |
| MANUAL ID | nominal | NA | The label that we manually labelled and assigned to the detection. |
| Group | nominal | NA | unique identifier for each nest pair. |
| Subset | nominal | NA | Indicator whether data from the file was included in the analysis or not. (Y=included, N=excluded) |
| BehavPeriod | nominal | NA | 'Pre-Hatching', 'Hatching', 'Post-Hatching' and 'Emergence' behavioural periods during recording. Behavioural periods were delimited by using the first acoustic sign of movement, and the first emergence event recorded in the camera traps. |
| Size(GB) | continuous | gigabytes | size of the audio file |
| NestID | nominal | NA | unique identifier for each nest. |
| DateHour | ordinal | date & time | Date and time of the observation in hours. |
Dataset 2: the dataframe auto_model_data.May29.2023.csv describes the dataset that was automatically labelled by the built cluster model.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| INDIR | nominal | NA | output from Kaleidoscope Pro. directory that the audio file is contained in. |
| FOLDER | nominal | NA | output from Kaleidoscope Pro. folder the audio file is contained in. |
| year | ordinal | date | Year of sampling. |
| group | nominal | NA | unique identifier for each nest pair. |
| File | nominal | NA | Original WAV file name. |
| channel | discrete | NA | audio channel (left= 0, right = 1) |
| OFFSET | continuous | seconds | output from Kaleidoscope Pro. Time since the start of the audio file the detection originates from. |
| DURATION | continuous | seconds | output from Kaleidoscope Pro. The total duration of the detection. |
| Fmin | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the minimum frequency of the detection. |
| Fmean | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the mean frequency of the detection. |
| Fmax | continuous | Hertz (Hz) | output from Kaleidoscope Pro. the maximum frequency of the detection. |
| TOP1MATCH* | nominal | NA | output from Kaleidoscope Pro. The label that the cluster model assigned to the detection. |
| TOP1DIST | continuous | NA | output from Kaleidoscope Pro. Distance from the cluster center. |
| VOCALIZATIONS | discrete | NA | output from Kaleidoscope Pro. Value of 1 assigned to each row to enable calculations. |
| MANUAL ID | nominal | NA | The label that we manually labelled and assigned to the detection. |
| FilePath2 | nominal | NA | full path of the directory that the audio file is contained in. |
| FilePath | nominal | NA | alternative path of the directory that the audio file is contained in. |
| size | continuous | gigabytes | size of the audio file |
| clutch_id | nominal | NA | unique identifier for each nest. |
| DateTime.file | ordinal | date & time | date and time of the day of the audio the detection was recorded in. |
| date | ordinal | date | Date the detection was recorded. |
| month | ordinal | month | Month of the year of the detection. |
| day | ordinal | day | Day of the month of the detection. |
| time | ordinal | time | Time of day the detection was recorded. |
| hour | ordinal | hour | Hour of day the detection was recorded. |
| date_hour | ordinal | date & time | date and hour of the day of the detection |
Dataset 3: the dataframe auto_model_data_missingfiles.May29.2023.csv describes the meta data of all acoustic files, irrespective of whether there are acoustic detections present.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| year | ordinal | date | Year of sampling. |
| group | nominal | NA | unique identifier for each nest pair. |
| channel | discrete | NA | audio channel (left= 0, right = 1) |
| clutch_id | nominal | NA | unique identifier for each nest. |
| DateTime.file | ordinal | date & time | date and time of the day of the audio the detection was recorded in. |
Dataset 4: the dataframe NestTemperature.May30.2023.csv describes the hourly average nest temperature per nest, per year.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| nestID | nominal | NA | unique identifier for each nest. |
| Date.Hour | ordinal | date & time | date and hour of the day of nest temperature. |
| Date | ordinal | date | Date temperature was recorded and summarised. |
| Time | ordinal | time | Time of day temperature was recorded. |
| NestTemp_C | continuous | degrees Celsius | Average nest temperature per hour recorded from a burried temperature logger. Nest temperature is unique for each nest. |
Dataset 5: the dataframe DailyPrecipitation.May9.2023.csv describes the total daily precipitation per day, acquired from a nearby weather station.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| Year | ordinal | date | Year precipitation was recorded. |
| Month | ordinal | month | Month of the year precipitation was recorded. |
| Day | ordinal | day | Day of the month precipitation was recorded. |
| precip_mean | continuous | millimeters | mean total hourly precipitation per day. |
| precip_tot | continuous | millimeters | total precipitation per day. |
| Date | ordinal | date | Date precipitation was recorded and summarised. |
Dataset 6: the dataframe TurtleInNests.May17.2023.csv describes the number of viable eggs contained in each nest, each year.
| Column Name | Type | Unit | Description |
|---|---|---|---|
| Year | ordinal | date | Year of sampling. |
| NestID | nominal | NA | unique identifier for each nest. |
| ClutchSize | discrete | number of eggs | number of viable eggs in the nest. |
Part 5 - Supplemental Figure S1
Description: the R script ValidatingTemperature.R is an R script used to validate the iButton and HOBO temperature loggers between adjacent nests and over time.
- Lacroix, Claudia; Davy, Christina; Rollinson, Njal (2025). Data and code supporting acoustic and environmental factors driving digging behavior in the early life of a freshwater turtle. Zenodo. https://doi.org/10.5281/zenodo.10900821
- Lacroix, Claudia; Davy, Christina; Rollinson, Njal (2025). Data and code supporting acoustic and environmental factors driving digging behavior in the early life of a freshwater turtle. Zenodo. https://doi.org/10.5281/zenodo.10900822
- Lacroix, Claudia; Davy, Christina M.; Rollinson, Njal (2025). Acoustic communication and environmental factors drive coordinated digging behaviour during an early life stage transition of a freshwater turtle. Animal Behaviour. https://doi.org/10.1016/j.anbehav.2025.123270
