Data and code from: Dialect formation in ghost bats: Genetic, geographic, and morphological drivers of social and echolocation call divergence

Hanrahan, Nicola 1 2 ; Armstrong, Kyle3 4; Turbill, Christopher1; Dalziell, Anastasia1; Welbergen, Justin1

Published Apr 22, 2026 on Dryad. https://doi.org/10.5061/dryad.kprr4xhjf

Data files

Apr 22, 2026 version files 15.68 MB

all_ct.csv

70.56 KB
all_e.csv

42.63 KB
all_s.csv

58.09 KB
all_u.csv

59.86 KB
covariates_corrected_lat_lon_generalised_TandC_swapped.csv

2.98 KB
README.md

8.81 KB
Report_DBat17-3211_SNP_singlerow_2_NT.csv

15.43 MB

Abstract

Geographical patterns of vocal dialects are poorly understood in bats, despite growing evidence that they possess complex vocal communication systems.

We investigated patterns and drivers of variation in vocalisations recorded at five ghost bat colonies in the Northern Territory, Australia. We calculated the genetic and morphological distances among individuals and investigated correlations with geographic distance. We then determined variation within three ghost bat social vocalisations (“Chirp-trill”, “Squabble”, “Ultrasonic Social”) and their “Echolocation” call using seven spectrographic measurements. Finally, we tested whether acoustic distance could be explained by genetic, geographic, or morphological distance.

Geographic and genetic distance were highly correlated, suggesting the occurrence of isolation by distance. All measures of morphological distance were consistent with Bergmann’s Rule, except noseleaf shape, which is likely constrained by its role in echolocation. Geographic variation was evident within each of the three social vocalisations and the echolocation call, with the patterns of geographic variation differing among the four vocalisation types. The degree of difference was surprising, given the ghost bat’s long-range seasonal dispersal. Acoustic distance in Chirp-trill and Squabble calls was marginally significantly correlated with genetic (and geographic) distance. In contrast, Ultrasonic Social and Echolocation calls varied among colonies but showed no significant associations with other metrics, apart from a weak correlation between Ultrasonic Social distance and forearm length. This supports the view that these ultrasonic calls are under stabilising selection due to their role in foraging and orientation.

This study provides the first evidence of dialect formation in megadermatid bats. It highlights the importance of considering multiple vocalisation types and investigating multiple processes in signal evolution. Overall, we found genetic, geographic, and morphological distances accounted for some of the variation in acoustic differences among colonies, but further work is needed to investigate other processes that may also contribute to dialect formation in ghost bats.

Dataset DOI: 10.5061/dryad.kprr4xhjf

Description of the data and file structure

To investigate dialect formation in the vocal repertoire of the ghost bat (Macroderma gigas), we collected genetic, geographic, morphological, and acoustic data from five colonies in the Northern Territory, Australia. Distances among colonies were calculated for each data type, and relationships between acoustic distance and genetic, geographic, and morphological distance were assessed using R.

This repository contains acoustic measurement datasets, genetic datasets, and sample covariates associated with those analyses.

Files and variables

File: all_ct.csv, all_u.csv, all_s.csv, all_e.csv

Description: These files contain acoustic measurements extracted from individual ghost bat vocalisations. Each row represents one selected vocalisation measured from a sound recording using the software Raven Pro. The same variable definitions apply across all four files.

Variables:

selec: selection number assigned to the vocalisation within the annotation software.
Units: none
start: start time of the selected vocalisation within the source audio file.
Units: seconds
end: end time of the selected vocalisation within the source audio file.
Units: seconds
Begin.Path: original file path of the source audio recording on the authors’ computer system.
Units: none
Note: retained as original metadata identifying the source recording used for measurement.
Begin.File: filename of the source audio recording from which the vocalisation was measured.
Units: none
bw90: 90% bandwidth of the vocalisation.
Units: Hz
Interpretation: frequency bandwidth containing 90% of the signal energy.
iqrbw: interquartile range bandwidth of the vocalisation.
Units: Hz
Interpretation: the difference between the 75th and 25th percentile of the frequency distribution.
maxentropy: maximum entropy value measured for the vocalisation.
Units: unitless
minentropy: minimum entropy value measured for the vocalisation.
Units: unitless
aggentropy: aggregate entropy value measured for the vocalisation.
Units: unitless
peakfreq: peak frequency of the vocalisation.
Units: Hz
Interpretation: the frequency at which the call was the loudest (kHz)
length: duration of the selected vocalisation.
Units: milliseconds
pfcminfreq: minimum frequency of the peak frequency contour.
Units: Hz
pfcmaxfreq: maximum frequency of the peak frequency contour.
Units: Hz
peaktime: time from the start of the vocalisation to the point of maximum amplitude or peak energy.
Units: seconds
Interpretation: measured relative to the start of the selection.
rownames: row identifier carried over from the original analysis workflow.
Units: none
sound.files: source audio filename associated with the vocalisation.
Units: none
sel.name: filename assigned to the exported or labelled vocalisation selection.
Units: none
site: sampling locality code for the vocalisation.
Units: none
Interpretation key:
- PIN = Pine Creek
- CLA = Claravale
- PUN = Pungalina
- KAK = Kakadu
- TOL = Tolmer
name: vocalisation type.
Units: none
Interpretation key:
- CT = chirp–trill
- U = ultrasonic social call
- S = squabble
- E = echolocation call

File: covariates_corrected_lat_lon_generalised_TandC_swapped.csv

Description: Sample metadata for the individuals included in the DArTseq analysis. This file includes sample identifiers, colony/population assignments, and generalised geographic coordinates for the sampling locations.

Variables:

id: unique sample identifier for each individual.
Units: none
colony: colony or sampling locality from which the individual was assigned.
Units: none
pop: population grouping used in the analyses.
Units: none
Note: population labels correspond to colony names in this file.
lat: latitude of the sampling location.
Units: decimal degrees
Note: coordinates have been generalised to reduce disclosure of sensitive ghost bat roost locations.
lon: longitude of the sampling location.
Units: decimal degrees
Note: coordinates have been generalised to reduce disclosure of sensitive ghost bat roost locations.

File: Report_DBat17-3211_SNP_singlerow_2_NT.csv

Description: Raw DArTseq SNP report for ghost bat samples used in the population genetic analyses. This file is in standard DArT export format and includes several preliminary header rows containing batch, plate, and sample metadata, followed by a main header row and the SNP genotype data table.

File structure:

The first several rows contain DArT export metadata and sample layout information.
The main header row begins with locus-level metadata fields and is followed by one column per individual sample.
Each row after the header represents one SNP locus.

Locus-level variables:

AlleleID: unique identifier for the SNP locus and allele
AlleleSequence: full sequence associated with the SNP locus
TrimmedSequence: trimmed sequence used in the DArT report
Chrom_Pteropus_vampyrus_v01: reference chromosome/scaffold in the alignment reference genome
ChromPos_Pteropus_vampyrus_v01: genomic position in the alignment reference genome
AlnCnt_Pteropus_vampyrus_v01: number of alignments to the reference genome
AlnEvalue_Pteropus_vampyrus_v01: alignment expectation value (E-value) for the reference genome match
SNP: SNP identity and base change at the focal position
SnpPosition: position of the SNP within the sequence
CallRate: proportion of individuals successfully genotyped at that locus
OneRatioRef: proportion of one allele state for the reference allele in the dataset, as reported by DArT
OneRatioSnp: proportion of one allele state for the SNP allele in the dataset, as reported by DArT
FreqHomRef: frequency of homozygotes for the reference allele
FreqHomSnp: frequency of homozygotes for the SNP allele
FreqHets: frequency of heterozygotes
PICRef: polymorphism information content for the reference allele
PICSnp: polymorphism information content for the SNP allele
AvgPIC: average polymorphism information content
AvgCountRef: average count for the reference allele
AvgCountSnp: average count for the SNP allele
RepAvg: repeatability/reproducibility metric reported by DArT

Sample genotype columns:
Columns named by sample ID (e.g., Mg_PC1, Mg_K6, Mg_C5) give the genotype call for each individual at each SNP locus.

Genotype coding used in the raw DArT report:

0 = reference allele homozygote
1 = SNP allele homozygote
2 = heterozygote
- = missing call / null allele state, as reported in the raw DArT export

Notes:

Sample IDs correspond to individual bats and are linked to sample metadata in the covariates file.
Some individuals were later excluded during processing and filtering steps; the raw report retains the original exported sample set.
Preliminary rows above the main header are part of the native DArT export structure and contain plate/batch/sample metadata rather than SNP observations.

Code/software

The analysis code associated with this dataset is provided via the linked Zenodo software record rather than being hosted directly in Dryad. This allows the code to be released under a software licence appropriate for code.

Two scripts are provided:

Acoustic_analysis_lost_in_translation_updated28Nov25.R: R code for analyses of acoustic distance among colonies for four vocalisation types, and for tests of correlation with genetic, geographic, and morphological distance.
dartseq_NT_readdata.R: R code for population genetic analyses using DArTseq single-nucleotide polymorphism (SNP) data.

Notes on file paths

The scripts use relative file paths and assume that the user is running them from the relevant project directory.

dartseq_NT_readdata.R assumes the working directory contains the subfolder rawdata/.
Acoustic_analysis_lost_in_translation_updated28Nov25.R assumes the working directory contains the input CSV files all_ct.csv, all_s.csv, all_u.csv, and all_e.csv.

Users should place the input files in the expected folder structure or modify the path settings to match their local environment.